cs.AI - 2023-09-18

Towards Effective Semantic OOD Detection in Unseen Domains: A Domain Generalization Perspective

  • paper_url: http://arxiv.org/abs/2309.10209
  • repo_url: None
  • paper_authors: Haoliang Wang, Chen Zhao, Yunhui Guo, Kai Jiang, Feng Chen
  • for: simultaneously addresses both distributional shifts in real-world testing environments
  • methods: introduces two regularization strategies: domain generalization regularization and OOD detection regularization
  • results: showcases superior OOD detection performance compared to conventional domain generalization approaches while maintaining comparable InD classification accuracy
    Abstract Two prevalent types of distributional shifts in machine learning are the covariate shift (as observed across different domains) and the semantic shift (as seen across different classes). Traditional OOD detection techniques typically address only one of these shifts. However, real-world testing environments often present a combination of both covariate and semantic shifts. In this study, we introduce a novel problem, semantic OOD detection across domains, which simultaneously addresses both distributional shifts. To this end, we introduce two regularization strategies: domain generalization regularization, which ensures semantic invariance across domains to counteract the covariate shift, and OOD detection regularization, designed to enhance OOD detection capabilities against the semantic shift through energy bounding. Through rigorous testing on three standard domain generalization benchmarks, our proposed framework showcases its superiority over conventional domain generalization approaches in terms of OOD detection performance. Moreover, it holds its ground by maintaining comparable InD classification accuracy.
    摘要 有两种常见的分布偏移在机器学习中是 covariate shift(随着不同领域的观察)和 semantic shift(随着不同的类别的观察)。传统的ODOut detection技术通常只 Address one of these shifts. 然而,实际的测试环境通常会出现 covariate 和 semantic 两种分布偏移的组合。在本研究中,我们引入了一个新的问题:针对不同领域的 semantic OOD detection,这种问题同时解决了两种分布偏移。为此,我们提出了两种 régularization 策略:领域泛化 régularization,确保在不同领域下的semantic 一致性,以遏制 covariate 偏移;以及 OOD detection régularization,通过能量约束来提高ODOut detection能力,抵消 semantic 偏移。通过对三个标准领域泛化测试 benchmark 进行严格的测试,我们的提议框架在ODOut detection性能方面superior于传统的领域泛化方法,并且保持了与InD类别准确率相当的水平。

Stabilizing RLHF through Advantage Model and Selective Rehearsal

  • paper_url: http://arxiv.org/abs/2309.10202
  • repo_url: None
  • paper_authors: Baolin Peng, Linfeng Song, Ye Tian, Lifeng Jin, Haitao Mi, Dong Yu
  • for: 这份技术报告是为了解决RLHF训练中的稳定性问题,包括奖励抢夺和忘记性问题。
  • methods: 该报告提出了两种创新来稳定RLHF训练:1)优点模型,直接模型优点分数,并规范分数分布在任务之间以预防奖励抢夺。 2)选择性练习,通过精心选择数据进行PPO训练和知识练习,以 Mitigate catastrophic forgetting。
  • results: 我们的实验分析表明,提出的方法不仅提高了RLHF训练的稳定性,还达到了更高的奖励分数和胜利率。
    Abstract Large Language Models (LLMs) have revolutionized natural language processing, yet aligning these models with human values and preferences using RLHF remains a significant challenge. This challenge is characterized by various instabilities, such as reward hacking and catastrophic forgetting. In this technical report, we propose two innovations to stabilize RLHF training: 1) Advantage Model, which directly models advantage score i.e., extra reward compared to the expected rewards and regulates score distributions across tasks to prevent reward hacking. 2) Selective Rehearsal, which mitigates catastrophic forgetting by strategically selecting data for PPO training and knowledge rehearsing. Our experimental analysis on public and proprietary datasets reveals that the proposed methods not only increase stability in RLHF training but also achieve higher reward scores and win rates.
    摘要
  1. Advantage Model: directly models advantage score, i.e., extra reward compared to the expected rewards, and regulates score distributions across tasks to prevent reward hacking.2. Selective Rehearsal: mitigates catastrophic forgetting by strategically selecting data for PPO training and knowledge rehearsing.Our experimental analysis on public and proprietary datasets reveals that the proposed methods not only increase stability in RLHF training but also achieve higher reward scores and win rates.Translated into Simplified Chinese:大型语言模型(LLMs)已经革命化自然语言处理,但是使用RLHF与人类价值观和偏好相对是一大挑战。这种挑战由各种不稳定性,如奖励黑客和慢性忘记,引起。在这份技术报告中,我们提出了两项创新,以稳定RLHF训练:1. 优势模型:直接模型奖励分数,即比预期奖励更高的额外奖励,并规范分数分布遍布任务,以防止奖励黑客。2. 选择性练习:避免慢性忘记,通过策略选择数据进行PPO训练和知识练习。我们对公共和专用数据集进行实验分析,发现我们的方法不仅提高了RLHF训练的稳定性,还达到了更高的奖励分数和胜利率。

Graph-enabled Reinforcement Learning for Time Series Forecasting with Adaptive Intelligence

  • paper_url: http://arxiv.org/abs/2309.10186
  • repo_url: None
  • paper_authors: Thanveer Shaik, Xiaohui Tao, Haoran Xie, Lin Li, Jianming Yong, Yuefeng Li
  • for: 这个研究是为了提出一个基于图形神经网络(GNN)和强化学习(RL)的新方法来预测时间序列数据。
  • methods: 这个研究使用了GNN来处理时间序列数据,并与RL结合以监控模型。GNN能够自然地捕捉时间序列中的数据结构,并且可以更好地预测复杂的时间序列结构,例如健康、交通和天气预测。
  • results: 这个研究发现,GraphRL模型在时间序列预测和监控中具有更高的准确性和效率,相比于传统的深度学习模型。此外,这个研究还发现GNN在时间序列预测中的表现比RNN和LSTM更好。
    Abstract Reinforcement learning is well known for its ability to model sequential tasks and learn latent data patterns adaptively. Deep learning models have been widely explored and adopted in regression and classification tasks. However, deep learning has its limitations such as the assumption of equally spaced and ordered data, and the lack of ability to incorporate graph structure in terms of time-series prediction. Graphical neural network (GNN) has the ability to overcome these challenges and capture the temporal dependencies in time-series data. In this study, we propose a novel approach for predicting time-series data using GNN and monitoring with Reinforcement Learning (RL). GNNs are able to explicitly incorporate the graph structure of the data into the model, allowing them to capture temporal dependencies in a more natural way. This approach allows for more accurate predictions in complex temporal structures, such as those found in healthcare, traffic and weather forecasting. We also fine-tune our GraphRL model using a Bayesian optimisation technique to further improve performance. The proposed framework outperforms the baseline models in time-series forecasting and monitoring. The contributions of this study include the introduction of a novel GraphRL framework for time-series prediction and the demonstration of the effectiveness of GNNs in comparison to traditional deep learning models such as RNNs and LSTMs. Overall, this study demonstrates the potential of GraphRL in providing accurate and efficient predictions in dynamic RL environments.
    摘要 “强化学习”是已知能够模型顺序任务和学习隐藏数据模式的能力。深度学习模型在推断和分类任务中广泛应用。然而,深度学习有一些限制,例如假设数据平均 spacing和排序,以及无法运用时间序列预测中的图形结构。图形神经网络(GNN)可以超越这些挑战,捕捉时间序列数据中的时间相依性,并在更自然的方式中捕捉时间序列。这种方法可以在复杂的时间序列结构中提供更准确的预测,例如健康监测、交通预测和天气预测。我们还使用 bayesian 优化技术进一步提高性能。我们的 GraphRL 模型在时间序列预测和监控中超越了基eline 模型。这个研究的贡献包括:1. 提出一个 Novel GraphRL 框架 для 时间序列预测。2. 显示 GNNs 在比较于传统深度学习模型(RNNs 和 LSTMs)中更有效。3. 显示 GraphRL 框架在时间序列预测和监控中的应用前景。总体来说,这个研究显示 GraphRL 在动态RL环境中提供准确和高效的预测。

QoS-Aware Service Prediction and Orchestration in Cloud-Network Integrated Beyond 5G

  • paper_url: http://arxiv.org/abs/2309.10185
  • repo_url: https://github.com/hieu9955/ggggg
  • paper_authors: Mohammad Farhoudi, Masoud Shokrnezhad, Tarik Taleb
  • for: 本研究旨在探讨在Metaverse等 novel应用中, beyond 5G 网络所需的 ultra-low latency 通信和大量宽带连接,以及随着用户数量的不断变化,需要提高服务持续性考虑。
  • methods: 本文使用 edge-cloud 模式来充分利用云计算资源,实时管理用户在网络中的移动。然而,边缘云网络受到许多限制,包括网络和计算资源的共同管理,以及用户动态性、服务启动延迟和流量压力等因素。
  • results: 本文提出了一种基于 non-linear programming 模型的服务放置和资源分配算法,以最小化总成本,同时增强延迟。此外,我们还提出了一种基于 RNN 的 DDQL 技术,使用水填算法进行服务放置,以适应用户动态性和服务启动延迟。 simulation 结果表明,我们的解决方案可以提供有效的响应,最大化网络潜力,并提供可扩展的服务放置。
    Abstract Novel applications such as the Metaverse have highlighted the potential of beyond 5G networks, which necessitate ultra-low latency communications and massive broadband connections. Moreover, the burgeoning demand for such services with ever-fluctuating users has engendered a need for heightened service continuity consideration in B5G. To enable these services, the edge-cloud paradigm is a potential solution to harness cloud capacity and effectively manage users in real time as they move across the network. However, edge-cloud networks confront a multitude of limitations, including networking and computing resources that must be collectively managed to unlock their full potential. This paper addresses the joint problem of service placement and resource allocation in a network-cloud integrated environment while considering capacity constraints, dynamic users, and end-to-end delays. We present a non-linear programming model that formulates the optimization problem with the aiming objective of minimizing overall cost while enhancing latency. Next, to address the problem, we introduce a DDQL-based technique using RNNs to predict user behavior, empowered by a water-filling-based algorithm for service placement. The proposed framework adeptly accommodates the dynamic nature of users, the placement of services that mandate ultra-low latency in B5G, and service continuity when users migrate from one location to another. Simulation results show that our solution provides timely responses that optimize the network's potential, offering a scalable and efficient placement.
    摘要 新应用如Metaverse等强调 beyond 5G 网络的潜在力量,需要超低延迟通信和巨大宽带连接。此外,随着用户数量不断变化,需要提高服务连续性考虑。为激活这些服务,边云模式是一个可能的解决方案,可以利用云计算资源并实时管理用户。然而,边云网络面临着许多限制,包括网络和计算资源的集成管理,以实现其全部潜力。本文 Addresses 服务放置和资源分配问题在网络云Integrated 环境中,考虑到容量约束、动态用户和终端延迟。我们提出一个非线性 программирова模型,以最小化总成本而提高延迟。然后,我们引入了基于 RNN 的 DDQL 技术,利用水填算法来实现服务放置。该方案适应用户的动态性,强调 B5G 中服务放置的低延迟性和用户迁移时的服务连续性。 simulation 结果表明,我们的解决方案可以提供有效的响应,最大化网络潜力。

Positive and Risky Message Assessment for Music Products

  • paper_url: http://arxiv.org/abs/2309.10182
  • repo_url: None
  • paper_authors: Yigeng Zhang, Mahsa Shafaei, Fabio Gonzalez, Thamar Solorio
  • for: 评估音乐产品中的积极和危险信息
  • methods: 提出了一种新的研究问题,并提出了一种多角度多级音乐内容评估标准准,然后提出了一种有效的多任务预测模型,并在这些模型中加入了排序约束来解决这个问题。
  • results: 对比于强有力的任务特定对手,提出的方法不仅显著超越了它们,还可以同时评估多个方面。
    Abstract In this work, we propose a novel research problem: assessing positive and risky messages from music products. We first establish a benchmark for multi-angle multi-level music content assessment and then present an effective multi-task prediction model with ordinality-enforcement to solve this problem. Our result shows the proposed method not only significantly outperforms strong task-specific counterparts but can concurrently evaluate multiple aspects.
    摘要 在这项研究中,我们提出了一个新的研究问题:评估音乐产品中的积极和风险消息。我们首先建立了多角度多级音乐内容评估的标准准则,然后提出了一种高效的多任务预测模型,并在这些任务之间强制实施排序约束。我们的结果表明,我们提出的方法不仅在多个任务上显著超越了专门的对手,而且可以同时评估多个方面。Here's the breakdown of the translation:* 在这项研究中 (in this research)* 我们提出了一个新的研究问题 (we propose a new research problem)* 评估音乐产品中的积极和风险消息 (assessing positive and risky messages in music products)* 我们首先建立了多角度多级音乐内容评估的标准准则 (we first establish a benchmark for multi-angle multi-level music content assessment)* 然后提出了一种高效的多任务预测模型 (then we propose an effective multi-task prediction model)* 并在这些任务之间强制实施排序约束 (and enforce ordinal constraints between tasks)* 我们的结果表明 (our results show)* 我们提出的方法不仅在多个任务上显著超越了专门的对手 (our method significantly outperforms task-specific counterparts)* 而且可以同时评估多个方面 (and can concurrently evaluate multiple aspects)

Double Deep Q-Learning-based Path Selection and Service Placement for Latency-Sensitive Beyond 5G Applications

  • paper_url: http://arxiv.org/abs/2309.10180
  • repo_url: None
  • paper_authors: Masoud Shokrnezhad, Tarik Taleb, Patrizio Dazzi
  • for: This paper aims to solve the joint problem of communication and computing resource allocation in cloud-network integrated infrastructures to minimize total cost.
  • methods: The paper proposes two approaches based on the Branch & Bound and Water-Filling algorithms to solve the problem when the system is fully known, and a Double Deep Q-Learning (DDQL) architecture is designed for partially known systems.
  • results: Numerical simulations show that the proposed B&B-CCRA approach optimally solves the problem, while the WF-CCRA approach delivers near-optimal solutions in a substantially shorter time. Additionally, the DDQL-CCRA approach obtains near-optimal solutions in the absence of request-specific information.
    Abstract Nowadays, as the need for capacity continues to grow, entirely novel services are emerging. A solid cloud-network integrated infrastructure is necessary to supply these services in a real-time responsive, and scalable way. Due to their diverse characteristics and limited capacity, communication and computing resources must be collaboratively managed to unleash their full potential. Although several innovative methods have been proposed to orchestrate the resources, most ignored network resources or relaxed the network as a simple graph, focusing only on cloud resources. This paper fills the gap by studying the joint problem of communication and computing resource allocation, dubbed CCRA, including function placement and assignment, traffic prioritization, and path selection considering capacity constraints and quality requirements, to minimize total cost. We formulate the problem as a non-linear programming model and propose two approaches, dubbed B\&B-CCRA and WF-CCRA, based on the Branch \& Bound and Water-Filling algorithms to solve it when the system is fully known. Then, for partially known systems, a Double Deep Q-Learning (DDQL) architecture is designed. Numerical simulations show that B\&B-CCRA optimally solves the problem, whereas WF-CCRA delivers near-optimal solutions in a substantially shorter time. Furthermore, it is demonstrated that DDQL-CCRA obtains near-optimal solutions in the absence of request-specific information.
    摘要 现在,由于需求量的增长, Entirely novel services 是出现的。一个固定云网络集成基础设施是必要的,以便在实时响应和可扩展的方式提供这些服务。由于这些服务的多样性和有限的容量,通信和计算资源必须共同管理,以发挥最大的潜力。虽然一些创新的方法已经被提出来协调资源,但大多数忽视了网络资源或者将网络视为简单的图,只关注云资源。这篇论文填充了这个空白,通过研究集成通信和计算资源分配(CCRA)问题,包括函数放置和分配、吞吐率优先级和路径选择,对容量约束和质量要求进行最小化总成本。我们将问题转化为非线性编程模型,并提出了B\&B-CCRA和WF-CCRA两种方法,基于 Branch \& Bound 和 Water-Filling 算法来解决。而在部分知道系统下,我们设计了Double Deep Q-Learning(DDQL)架构。实验显示,B\&B-CCRA 可以最优解决问题,而WF-CCRA 可以在substantially shorter time 内提供近似优解。此外,我们还证明了 DDQL-CCRA 可以在请求特定信息缺失时获得近似优解。

Self-Sustaining Multiple Access with Continual Deep Reinforcement Learning for Dynamic Metaverse Applications

  • paper_url: http://arxiv.org/abs/2309.10177
  • repo_url: None
  • paper_authors: Hamidreza Mazandarani, Masoud Shokrnezhad, Tarik Taleb, Richard Li
  • for: This paper aims to address the challenge of managing multiple access to the frequency spectrum in a dynamic and complex scenario of the Metaverse, with a focus on maximizing the throughput of intelligent agents in multi-channel environments.
  • methods: The proposed method is based on Double Deep Q-Learning (DDQL) empowered by Continual Learning (CL), which is designed to overcome the non-stationary situation and unknown environment.
  • results: The numerical simulations show that the CL-DDQL algorithm achieves significantly higher throughputs with a considerably shorter convergence time compared to other well-known methods, especially in highly dynamic scenarios.
    Abstract The Metaverse is a new paradigm that aims to create a virtual environment consisting of numerous worlds, each of which will offer a different set of services. To deal with such a dynamic and complex scenario, considering the stringent quality of service requirements aimed at the 6th generation of communication systems (6G), one potential approach is to adopt self-sustaining strategies, which can be realized by employing Adaptive Artificial Intelligence (Adaptive AI) where models are continually re-trained with new data and conditions. One aspect of self-sustainability is the management of multiple access to the frequency spectrum. Although several innovative methods have been proposed to address this challenge, mostly using Deep Reinforcement Learning (DRL), the problem of adapting agents to a non-stationary environment has not yet been precisely addressed. This paper fills in the gap in the current literature by investigating the problem of multiple access in multi-channel environments to maximize the throughput of the intelligent agent when the number of active User Equipments (UEs) may fluctuate over time. To solve the problem, a Double Deep Q-Learning (DDQL) technique empowered by Continual Learning (CL) is proposed to overcome the non-stationary situation, while the environment is unknown. Numerical simulations demonstrate that, compared to other well-known methods, the CL-DDQL algorithm achieves significantly higher throughputs with a considerably shorter convergence time in highly dynamic scenarios.
    摘要 metaverse 是一新的思维方式,旨在创造一个虚拟环境,包括多个世界,每个世界都会提供不同的服务。为了面对这样一个动态和复杂的情况,一种可能的方法是采用自我维护策略,这可以通过使用适应性人工智能(适应AI)来实现,其中模型 continually 在新数据和条件下重新训练。一个自我维护的方面是频谱访问的管理。虽然已经有许多创新的方法被提出来解决这个挑战,主要使用深度强化学习(DRL),但是在不稳定的环境中适应代理人的问题还没有得到准确地解决。这篇论文填充了现有文献中的空白,研究了多个频道环境中多个访问者的情况,以 maximize 智能代理人的吞吐量,当有多个活动用户设备(UE)的情况下。为解决这个问题,我们提出了一种基于 Continual Learning(CL)的 Double Deep Q-Learning(DDQL)技术。在不稳定的环境中,CL-DDQL 算法可以在不知道环境的情况下,与其他已知方法相比,实现显著更高的吞吐量和迅速 converge 时间。

One ACT Play: Single Demonstration Behavior Cloning with Action Chunking Transformers

  • paper_url: http://arxiv.org/abs/2309.10175
  • repo_url: None
  • paper_authors: Abraham George, Amir Barati Farimani
  • for: 学习人类示例(行为克隆)是机器学习的基础。但大多数行为克隆算法需要许多示例来学习任务,特别是面临着许多初始条件的复杂任务。然而,人类可以通过只看一两次示例来完成任务。我们的工作想要模仿这种能力,使用行为克隆学习任务给定单个人类示例。我们实现这一目标通过使用线性变换扩展单个示例,生成一组路径 для许多初始条件。通过这些示例,我们可以训练一个行为克隆机器人成功完成三个块处理任务。
  • methods: 我们使用线性变换扩展单个示例,生成一组路径 для许多初始条件。此外,我们还开发了一种新的方法,即在推理过程中将行动预测的标准差添加到 temporal ensembling 方法中,以提高对不明确环境变化的弹性性。
  • results: 我们通过对三个块处理任务进行训练,成功地使用单个人类示例来学习这些任务。此外,我们的方法在对环境变化时表现更加稳定和可靠,从而实现了显著性能提高。
    Abstract Learning from human demonstrations (behavior cloning) is a cornerstone of robot learning. However, most behavior cloning algorithms require a large number of demonstrations to learn a task, especially for general tasks that have a large variety of initial conditions. Humans, however, can learn to complete tasks, even complex ones, after only seeing one or two demonstrations. Our work seeks to emulate this ability, using behavior cloning to learn a task given only a single human demonstration. We achieve this goal by using linear transforms to augment the single demonstration, generating a set of trajectories for a wide range of initial conditions. With these demonstrations, we are able to train a behavior cloning agent to successfully complete three block manipulation tasks. Additionally, we developed a novel addition to the temporal ensembling method used by action chunking agents during inference. By incorporating the standard deviation of the action predictions into the ensembling method, our approach is more robust to unforeseen changes in the environment, resulting in significant performance improvements.
    摘要 学习人类示例(行为克隆)是机器人学习的基础。然而,大多数行为克隆算法需要许多示例来学习任务,特别是面临着较大的初始条件多样性的任务。然而,人类可以通过只看一两次示例来完成复杂任务。我们的工作寻求模仿这种能力,使用行为克隆算法学习任务,只需要单个人类示例。我们通过使用线性变换来扩展单个示例,生成一系列的轨迹,以适应各种初始条件。通过这些示例,我们可以训练一个行为克隆代理人来成功完成三个块处理任务。此外,我们还开发了一种新的添加到 temporal ensembling 方法中的方法,通过在推理过程中包含行动预测的标准差,使我们的方法更加鲁棒对待不可预期的环境变化,从而实现显著的性能提升。

Asynchronous Perception-Action-Communication with Graph Neural Networks

  • paper_url: http://arxiv.org/abs/2309.10164
  • repo_url: None
  • paper_authors: Saurav Agarwal, Alejandro Ribeiro, Vijay Kumar
  • for: 这paper是为了解决大型机器人群体中的协同决策问题,以实现共同全局目标。
  • methods: 该paper使用Graph Neural Networks(GNNs)来解决协同Perception-Action-Communication(PAC)循环中的信息共享和行为选择问题。
  • results: 该paper使用 asynchronous PAC 框架,并使用分布式GNNs来计算导航动作和生成通信信息,实现了大规模机器人群体的协同导航和覆盖控制。
    Abstract Collaboration in large robot swarms to achieve a common global objective is a challenging problem in large environments due to limited sensing and communication capabilities. The robots must execute a Perception-Action-Communication (PAC) loop -- they perceive their local environment, communicate with other robots, and take actions in real time. A fundamental challenge in decentralized PAC systems is to decide what information to communicate with the neighboring robots and how to take actions while utilizing the information shared by the neighbors. Recently, this has been addressed using Graph Neural Networks (GNNs) for applications such as flocking and coverage control. Although conceptually, GNN policies are fully decentralized, the evaluation and deployment of such policies have primarily remained centralized or restrictively decentralized. Furthermore, existing frameworks assume sequential execution of perception and action inference, which is very restrictive in real-world applications. This paper proposes a framework for asynchronous PAC in robot swarms, where decentralized GNNs are used to compute navigation actions and generate messages for communication. In particular, we use aggregated GNNs, which enable the exchange of hidden layer information between robots for computational efficiency and decentralized inference of actions. Furthermore, the modules in the framework are asynchronous, allowing robots to perform sensing, extracting information, communication, action inference, and control execution at different frequencies. We demonstrate the effectiveness of GNNs executed in the proposed framework in navigating large robot swarms for collaborative coverage of large environments.
    摘要 协作在大型机器人群体中实现共同全球目标是一个具有挑战性的问题,尤其在受限感知和通信能力的大环境中。机器人必须在实时中执行感知-行动-通信(PAC)循环——它们感知当地环境,与其他机器人交流,并在实时中采取行动。在分布式PAC系统中,决定如何与邻居机器人交流信息以及如何在基于这些信息的情况下采取行动是一个基本挑战。在最近的研究中,使用图神经网络(GNN)已经有效地解决了这个问题,并在鸟群控制和覆盖控制等应用中获得了成功。然而,GNN策略在概念上是完全分布式的,但是评估和部署这些策略却主要保留在中央化或部分分布式的方式下。此外,现有的框架假设Sequential感知和行动推理,这是实际应用中非常限制的。本文提出了一个 asynchronous PAC 框架,其中分布式GNNs 用于计算导航行动并生成与其他机器人交流的消息。具体来说,我们使用聚合GNNs,可以在机器人之间交换隐藏层信息,以实现计算效率和分布式推理行动。此外,框架中的模块是异步的,allowing robots to perform sensing, extracting information, communication, action inference, and control execution at different frequencies。我们通过使用GNNs在提出的框架中实现了大型机器人群体的协同探索大环境,以实现共同覆盖大环境。

RadOnc-GPT: A Large Language Model for Radiation Oncology

  • paper_url: http://arxiv.org/abs/2309.10160
  • repo_url: None
  • paper_authors: Zhengliang Liu, Peilong Wang, Yiwei Li, Jason Holmes, Peng Shu, Lian Zhang, Chenbin Liu, Ninghao Liu, Dajiang Zhu, Xiang Li, Quanzheng Li, Samir H. Patel, Terence T. Sio, Tianming Liu, Wei Liu
  • for: 这个论文是为了研究透明度高的语言模型在放射科医疗领域中的应用。
  • methods: 这个论文使用了高级调教方法来特化语言模型,并在Mayo医院的大量放射科病人病历和诊断记录中进行了finetuning。模型包括三个关键任务:生成放射疗程治疗方案、确定最佳放射方式、以及基于病人诊断细节提供诊断描述和ICD代码。
  • results: 对比普通大语言模型的输出,RadOnc-GPT生成的输出显示了显著提高的明确性、特点性和клиниче重要性。研究表明,通过特化语言模型使用域专知如RadOnc-GPT进行练习,可以实现高度专业的医疗领域中的转变能力。
    Abstract This paper presents RadOnc-GPT, a large language model specialized for radiation oncology through advanced tuning methods. RadOnc-GPT was finetuned on a large dataset of radiation oncology patient records and clinical notes from the Mayo Clinic. The model employs instruction tuning on three key tasks - generating radiotherapy treatment regimens, determining optimal radiation modalities, and providing diagnostic descriptions/ICD codes based on patient diagnostic details. Evaluations conducted by having radiation oncologists compare RadOnc-GPT impressions to general large language model impressions showed that RadOnc-GPT generated outputs with significantly improved clarity, specificity, and clinical relevance. The study demonstrated the potential of using large language models fine-tuned using domain-specific knowledge like RadOnc-GPT to achieve transformational capabilities in highly specialized healthcare fields such as radiation oncology.
    摘要

Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions

  • paper_url: http://arxiv.org/abs/2309.10150
  • repo_url: https://github.com/lucidrains/q-transformer
  • paper_authors: Yevgen Chebotar, Quan Vuong, Alex Irpan, Karol Hausman, Fei Xia, Yao Lu, Aviral Kumar, Tianhe Yu, Alexander Herzog, Karl Pertsch, Keerthana Gopalakrishnan, Julian Ibarz, Ofir Nachum, Sumedh Sontakke, Grecia Salazar, Huong T Tran, Jodilyn Peralta, Clayton Tan, Deeksha Manjunath, Jaspiar Singht, Brianna Zitkovich, Tomas Jackson, Kanishka Rao, Chelsea Finn, Sergey Levine
  • for: 这个论文是为了提出一种可扩展的生成学习方法,用于从大量的离线数据集中训练多任务策略。
  • methods: 该方法使用Transformer提供一种可扩展的Q函数表示,通过离线时间差反射来训练Q函数。具体来说,每个动作维度被精确化,并将每个动作维度的Q值表示为独立的token。这使得可以应用高容量的序列模型技术来进行Q学习。
  • results: 作者们表明,Q-Transformer比前一代离线RL算法和依 PDOurt学习技术在一个大型、多样化的真实世界机器人操作任务集中表现更好。详细的实验结果和项目网站可以在https://q-transformer.github.io上找到。
    Abstract In this work, we present a scalable reinforcement learning method for training multi-task policies from large offline datasets that can leverage both human demonstrations and autonomously collected data. Our method uses a Transformer to provide a scalable representation for Q-functions trained via offline temporal difference backups. We therefore refer to the method as Q-Transformer. By discretizing each action dimension and representing the Q-value of each action dimension as separate tokens, we can apply effective high-capacity sequence modeling techniques for Q-learning. We present several design decisions that enable good performance with offline RL training, and show that Q-Transformer outperforms prior offline RL algorithms and imitation learning techniques on a large diverse real-world robotic manipulation task suite. The project's website and videos can be found at https://q-transformer.github.io
    摘要 在这项工作中,我们提出了一种可扩展的强化学习方法,用于从大量离线数据集中训练多任务策略,并可以利用人类示范和自动收集的数据。我们的方法使用Transformer提供一种可扩展的表示方法,用于通过离线时间差备份学习Q值函数。因此,我们称这种方法为Q-Transformer。通过离线动作维度的精细化和每个动作维度的Q值函数的表示为分开的token,我们可以应用高容量序列模型技术进行Q学习。我们提出了一些设计决策,以实现在离线RL训练中良好的性能,并证明Q-Transformer在一个大型多样化的实际世界机器人操作任务集中超过了先前的离线RL算法和模仿学习技术。项目的网站和视频可以在https://q-transformer.github.io找到。

Analysis of the Memorization and Generalization Capabilities of AI Agents: Are Continual Learners Robust?

  • paper_url: http://arxiv.org/abs/2309.10149
  • repo_url: None
  • paper_authors: Minsu Kim, Walid Saad
  • for: The paper is written for the practical deployment of continual learning (CL) applications, such as autonomous vehicles or robotics, in dynamic environments.
  • methods: The proposed CL framework uses a capacity-limited memory to save previously observed environmental information and mitigate forgetting issues. The algorithm samples data points from the memory to estimate the distribution of risks over environmental change and obtain robust predictors.
  • results: The proposed algorithm outperforms memory-based CL baselines across all environments while significantly improving the generalization performance on unseen target environments.
    Abstract In continual learning (CL), an AI agent (e.g., autonomous vehicles or robotics) learns from non-stationary data streams under dynamic environments. For the practical deployment of such applications, it is important to guarantee robustness to unseen environments while maintaining past experiences. In this paper, a novel CL framework is proposed to achieve robust generalization to dynamic environments while retaining past knowledge. The considered CL agent uses a capacity-limited memory to save previously observed environmental information to mitigate forgetting issues. Then, data points are sampled from the memory to estimate the distribution of risks over environmental change so as to obtain predictors that are robust with unseen changes. The generalization and memorization performance of the proposed framework are theoretically analyzed. This analysis showcases the tradeoff between memorization and generalization with the memory size. Experiments show that the proposed algorithm outperforms memory-based CL baselines across all environments while significantly improving the generalization performance on unseen target environments.
    摘要 在连续学习(CL)中,一个AI机器人(例如自动驾驶车或机器人)在动态环境中学习非站ARY数据流。为实际应用这些应用程序,保证对未seen环境的Robustness是非常重要。本文提出了一种新的CL框架,以实现对动态环境的Robust generalization,同时保持过去经验。考虑的CL机器人使用有限容量的记忆来减轻忘记问题。然后,从记忆中采样数据点,以估计环境变化中的风险分布,以获得对未seen变化的Robust预测器。本框架的总体和记忆性性能被理论分析。这种分析显示了记忆大小与Generalization和Memorization之间的贸易。实验表明,提出的算法在所有环境下都超过了记忆基eline,并显著提高了对未seen目标环境的总体性能。

Human Gait Recognition using Deep Learning: A Comprehensive Review

  • paper_url: http://arxiv.org/abs/2309.10144
  • repo_url: None
  • paper_authors: Muhammad Imran Sharif, Mehwish Mehmood, Muhammad Irfan Sharif, Md Palash Uddin
  • For: This paper provides an overview of gait recognition (GR) technology and analyzes the environmental elements and complications that could affect it, with a focus on deep learning (DL) techniques employed for human GR.* Methods: The paper examines existing DL techniques used in GR, including those that address challenges such as shifting lighting conditions, fluctuations in gait patterns, and ensuring uniform performance evaluation across different protocols.* Results: The paper aims to generate new research opportunities in GR by analyzing the existing DL techniques and identifying potential areas for improvement.Here’s the same information in Simplified Chinese text, as requested:* For: 这篇论文提供了人体走姿识别(GR)技术的概述,以及识别环境因素和问题的分析,特别是使用深度学习(DL)技术进行人体GR。* Methods: 论文检查了现有的DL技术,包括对不同协议的性能评估、不同照明条件下的识别、走姿模式的变化和不确定性等问题的解决方案。* Results: 论文旨在通过分析现有的DL技术,探讨新的研究机遇,以提高GR技术的精度和可靠性。
    Abstract Gait recognition (GR) is a growing biometric modality used for person identification from a distance through visual cameras. GR provides a secure and reliable alternative to fingerprint and face recognition, as it is harder to distinguish between false and authentic signals. Furthermore, its resistance to spoofing makes GR suitable for all types of environments. With the rise of deep learning, steadily improving strides have been made in GR technology with promising results in various contexts. As video surveillance becomes more prevalent, new obstacles arise, such as ensuring uniform performance evaluation across different protocols, reliable recognition despite shifting lighting conditions, fluctuations in gait patterns, and protecting privacy.This survey aims to give an overview of GR and analyze the environmental elements and complications that could affect it in comparison to other biometric recognition systems. The primary goal is to examine the existing deep learning (DL) techniques employed for human GR that may generate new research opportunities.
    摘要 “人体步态识别”(GR)是一种在距离通过视频摄像头进行人员识别的快 разви得生物特征模式。GR提供了一种安全可靠的人识别方式,因为它更难于分辨假冒信号。此外,它对冒泡攻击具有抵抗力,因此适用于各种环境。随着深度学习技术的不断发展,GR技术在不同上下文中呈现出了扎实的成果。随着视频监测的普及,新的障碍物出现,例如确保各个协议的统一性评估、在不同照明条件下可靠的识别、步态变化的影响和隐私保护。本调查的目标是对GR和其他生物识别系统进行比较,分析环境因素和障碍,并检视现有的深度学习技术,以便在新的研究机遇上发掘新的可能性。

Efficient Low-Rank GNN Defense Against Structural Attacks

  • paper_url: http://arxiv.org/abs/2309.10136
  • repo_url: None
  • paper_authors: Abdullah Alchihabi, Qing En, Yuhong Guo
    for:这个研究旨在提出一种能够有效地防御对 graf neural network (GNN) 的攻击的方法,以提高 GNN 的安全性。methods:这个方法包括两个模组:粗略低维度估计模组和细节化估计模组。粗略低维度估计模组使用 truncated SVD 来初始化低维度边 Matrix 估计,并且作为 GNN 模型估计的起点。在细节化估计模组中,则与 GNN 模型共同学习低维度短镜� structure,并将其转换为低维度短镜� Matrix。results:实验结果显示,ELR-GNN 比Literature中的 GNN 防御方法更高效,同时也非常简单易于训练。
    Abstract Graph Neural Networks (GNNs) have been shown to possess strong representation abilities over graph data. However, GNNs are vulnerable to adversarial attacks, and even minor perturbations to the graph structure can significantly degrade their performance. Existing methods either are ineffective against sophisticated attacks or require the optimization of dense adjacency matrices, which is time-consuming and prone to local minima. To remedy this problem, we propose an Efficient Low-Rank Graph Neural Network (ELR-GNN) defense method, which aims to learn low-rank and sparse graph structures for defending against adversarial attacks, ensuring effective defense with greater efficiency. Specifically, ELR-GNN consists of two modules: a Coarse Low-Rank Estimation Module and a Fine-Grained Estimation Module. The first module adopts the truncated Singular Value Decomposition (SVD) to initialize the low-rank adjacency matrix estimation, which serves as a starting point for optimizing the low-rank matrix. In the second module, the initial estimate is refined by jointly learning a low-rank sparse graph structure with the GNN model. Sparsity is incorporated into the learned low-rank adjacency matrix by pruning weak connections, which can reduce redundant data while maintaining valuable information. As a result, instead of using the dense adjacency matrix directly, ELR-GNN can learn a low-rank and sparse estimate of it in a simple, efficient and easy to optimize manner. The experimental results demonstrate that ELR-GNN outperforms the state-of-the-art GNN defense methods in the literature, in addition to being very efficient and easy to train.
    摘要 图ael neural networks (GNNs) 有显著的表示能力 sobre graph data, 但是 GNNs 是易受 adversarial 攻击的。现有的方法可能无法对 Sophisticated 攻击有效,或者需要优化 dense adjacency matrices,这是时间consuming 和受 Local minima 影响。为了解决这个问题,我们提出了一种高效的 Low-Rank Graph Neural Network (ELR-GNN) 防御方法,该方法 aimsto learn low-rank 和 sparse graph structures,以防止 adversarial 攻击。具体来说,ELR-GNN 由两个模块组成:Coarse Low-Rank Estimation Module 和 Fine-Grained Estimation Module。第一个模块使用 truncated Singular Value Decomposition (SVD) 初始化 low-rank adjacency matrix 估计,这个估计作为 GNN 模型优化 low-rank matrix 的起点。第二个模块使用 jointly learn low-rank sparse graph structure 和 GNN 模型,在这个过程中,约束 sparse 的 low-rank adjacency matrix 的学习。通过避免直接使用 dense adjacency matrix,ELR-GNN 可以学习一个简单、高效、易于优化的 low-rank 和 sparse estimate。实验结果表明,ELR-GNN 在文献中已经出现的 GNN 防御方法中表现出色,同时具有高效和易于训练的优点。

GDM: Dual Mixup for Graph Classification with Limited Supervision

  • paper_url: http://arxiv.org/abs/2309.10134
  • repo_url: None
  • paper_authors: Abdullah Alchihabi, Yuhong Guo
  • for: 提高graph neural network(GNN)在graph classification任务上的性能,降低需要大量标注的图样例数量。
  • methods: 提议一种基于mixup的图像整合方法,Graph Dual Mixup(GDM),可以利用图例中的函数和结构信息来生成新的标注图样例。GDM首先使用图结构自动编码器学习图例的结构嵌入,然后在学习后的结构嵌入空间中应用mixup,生成新的图结构。同时,GDM直接将输入节点特征混合到图例中,以生成新的功能节点特征信息。
  • results: 实验结果表明,当标注图样例数量受限时,我们提议的方法可以大幅提高GNN的性能,并且可以增加图样例的多样性和难度。
    Abstract Graph Neural Networks (GNNs) require a large number of labeled graph samples to obtain good performance on the graph classification task. The performance of GNNs degrades significantly as the number of labeled graph samples decreases. To reduce the annotation cost, it is therefore important to develop graph augmentation methods that can generate new graph instances to increase the size and diversity of the limited set of available labeled graph samples. In this work, we propose a novel mixup-based graph augmentation method, Graph Dual Mixup (GDM), that leverages both functional and structural information of the graph instances to generate new labeled graph samples. GDM employs a graph structural auto-encoder to learn structural embeddings of the graph samples, and then applies mixup to the structural information of the graphs in the learned structural embedding space and generates new graph structures from the mixup structural embeddings. As for the functional information, GDM applies mixup directly to the input node features of the graph samples to generate functional node feature information for new mixup graph instances. Jointly, the generated input node features and graph structures yield new graph samples which can supplement the set of original labeled graphs. Furthermore, we propose two novel Balanced Graph Sampling methods to enhance the balanced difficulty and diversity for the generated graph samples. Experimental results on the benchmark datasets demonstrate that our proposed method substantially outperforms the state-of-the-art graph augmentation methods when the labeled graphs are scarce.
    摘要 GRAPH Neural Networks (GNNs) 需要大量标注的图样本以达到图分类任务的良好性能。 GNNs 的性能随着标注图样本的减少而明显下降。为了降低标注成本,因此需要开发图生成方法,以增加可用标注图样本的数量和多样性。在这种情况下,我们提出了一种基于 mixup 的图生成方法,即图双混合(GDM)。GDM 利用图样本的函数和结构信息来生成新的标注图样本。GDM 首先使用图结构自动编码器来学习图样本的结构嵌入,然后在学习的结构嵌入空间中应用 mixup 到图样本的结构信息,并生成新的图结构。在 terms of 功能信息,GDM 直接应用 mixup 到图样本的输入节点特征,以生成新的混合节点特征信息。在总的来说,生成的输入节点特征和图结构均可以产生新的图样本,可以补充原始的标注图样本。此外,我们还提出了两种新的平衡图样本采样方法,以提高生成的图样本的平衡难度和多样性。实验结果表明,我们提出的方法在标注图样本稀缺的情况下明显超越了当前的图生成方法。

Adaptive Liquidity Provision in Uniswap V3 with Deep Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2309.10129
  • repo_url: None
  • paper_authors: Haochen Zhang, Xi Chen, Lin F. Yang
  • for: 这个论文是为了解决Decentralized finance(DeFi)中的一些未解决的问题,如资金效益和市场风险,而设计的。
  • methods: 该论文使用了深度强化学习(DRL)方法,通过自适应调整资产价格范围,以最大化收益并减少市场风险。同时,通过对资产liquidity进行重新平衡,以中和价格变化风险。
  • results: 使用 simulations 的盈亏指标(PnL)基准,该方法在 ETH/USDC 和 ETH/USDT 池中表现出色,与现有基线相比,具有更高的收益。
    Abstract Decentralized exchanges (DEXs) are a cornerstone of decentralized finance (DeFi), allowing users to trade cryptocurrencies without the need for third-party authorization. Investors are incentivized to deposit assets into liquidity pools, against which users can trade directly, while paying fees to liquidity providers (LPs). However, a number of unresolved issues related to capital efficiency and market risk hinder DeFi's further development. Uniswap V3, a leading and groundbreaking DEX project, addresses capital efficiency by enabling LPs to concentrate their liquidity within specific price ranges for deposited assets. Nevertheless, this approach exacerbates market risk, as LPs earn trading fees only when asset prices are within these predetermined brackets. To mitigate this issue, this paper introduces a deep reinforcement learning (DRL) solution designed to adaptively adjust these price ranges, maximizing profits and mitigating market risks. Our approach also neutralizes price-change risks by hedging the liquidity position through a rebalancing portfolio in a centralized futures exchange. The DRL policy aims to optimize trading fees earned by LPs against associated costs, such as gas fees and hedging expenses, which is referred to as loss-versus-rebalancing (LVR). Using simulations with a profit-and-loss (PnL) benchmark, our method demonstrates superior performance in ETH/USDC and ETH/USDT pools compared to existing baselines. We believe that this strategy not only offers investors a valuable asset management tool but also introduces a new incentive mechanism for DEX designers.
    摘要 Decentralized exchange (DEX) 是 DeFi 中的一个重要基础设施,让用户可以不需要第三方授权进行 криптовалюencies 的交易。投资者可以将资产存入流动性池,并获得交易所得,而付出的费用则是给流动性提供者 (LP)。然而, DeFi 的进一步发展受到一些未解决的问题所限,包括资金效率和市场风险。Uniswap V3 是一个创新的 DEX 项目,它解决了资金效率的问题,让 LP 可以将流动性集中在特定价格范围内。然而,这种方法会增加市场风险,因为 LP 只在资产价格在预先定义的价格范围内时获得交易所得。为了解决这个问题,这篇论文提出了一个深度学习 (DRL) 解决方案,可以适应地调整这些价格范围,以最大化 LP 的交易所得和减少市场风险。我们的方法还可以消除价格变化的风险,通过在中央期货交易所进行调整资产位置的投资策略。DRL 政策的目标是将 LP 的交易所得与相关成本(如 gas 费用和投资策略成本)进行对抗,这被称为loss-versus-rebalancing (LVR)。使用 simulations 中的盈亏检测 (PnL) benchmark,我们的方法在 ETH/USDC 和 ETH/USDT 流动性池中表现出色,较 existing 基准线超越。我们认为,这种策略不仅为投资者提供了一个有价的资产管理工具,而且也为 DEX 设计师引入了一个新的奖励机制。

AR-TTA: A Simple Method for Real-World Continual Test-Time Adaptation

  • paper_url: http://arxiv.org/abs/2309.10109
  • repo_url: None
  • paper_authors: Damian Sójka, Sebastian Cygert, Bartłomiej Twardowski, Tomasz Trzciński
  • for: 这个研究旨在验证时间适应方法的有效性,以应对实际世界中的变化。
  • methods: 本研究使用了自我训练框架,并将小型快照缓存 buffer integrate 以增加模型稳定性和动态适应。
  • results: 提出的 AR-TTA 方法在实验和更加实际的标准资料上进行了比较,并在不同的 TTA 情况下显示了更高的效果和稳定性。
    Abstract Test-time adaptation is a promising research direction that allows the source model to adapt itself to changes in data distribution without any supervision. Yet, current methods are usually evaluated on benchmarks that are only a simplification of real-world scenarios. Hence, we propose to validate test-time adaptation methods using the recently introduced datasets for autonomous driving, namely CLAD-C and SHIFT. We observe that current test-time adaptation methods struggle to effectively handle varying degrees of domain shift, often resulting in degraded performance that falls below that of the source model. We noticed that the root of the problem lies in the inability to preserve the knowledge of the source model and adapt to dynamically changing, temporally correlated data streams. Therefore, we enhance well-established self-training framework by incorporating a small memory buffer to increase model stability and at the same time perform dynamic adaptation based on the intensity of domain shift. The proposed method, named AR-TTA, outperforms existing approaches on both synthetic and more real-world benchmarks and shows robustness across a variety of TTA scenarios.
    摘要 <>这是一个推广研究方向,允许源模型在数据分布变化时自动适应。然而,现有方法通常会在对测试数据进行评估时使用对测试数据的简化。因此,我们提出使用自驾车领域的最近引入的数据集,即CLAD-C和SHIFT来验证测试适应方法。我们发现现有的测试适应方法对于不同程度的领域变化通常会导致表现下降,常常比源模型的表现低。我们发现这是因为无法保持源模型的知识并适应对于动态变化的、时间相关的数据流。因此,我们将已有的自养学框架加以小型快取buffer,以提高模型的稳定性,并在数据流中进行动态适应,以适应不同的测试适应enario。我们称之为AR-TTA,并证明它在 sintetic和更加实际的benchmark上表现出色,并且在多种测试适应scenario中展现了 Robustness。

Reasoning about the Unseen for Efficient Outdoor Object Navigation

  • paper_url: http://arxiv.org/abs/2309.10103
  • repo_url: None
  • paper_authors: Quanting Xie, Tianyi Zhang, Kedi Xu, Matthew Johnson-Roberson, Yonatan Bisk
  • for: 本研究旨在探讨机器人在户外环境中的自主导航问题,与现有的室内环境导航研究不同,涵盖更广泛的实际应用场景。
  • methods: 本研究提出了一种新的任务OUTDOOR,一种新的大语言模型(LLM)准确预测未来的机制,以及一种新的计算感知成功指标来推动这一更复杂的领域的研究进步。
  • results: 本研究在 simulate 的飞行器和实际四脚机器人上实现了出色的结果,而无需先Mapping。我们的正式成果超越了直觉LLM-based方法。
    Abstract Robots should exist anywhere humans do: indoors, outdoors, and even unmapped environments. In contrast, the focus of recent advancements in Object Goal Navigation(OGN) has targeted navigating in indoor environments by leveraging spatial and semantic cues that do not generalize outdoors. While these contributions provide valuable insights into indoor scenarios, the broader spectrum of real-world robotic applications often extends to outdoor settings. As we transition to the vast and complex terrains of outdoor environments, new challenges emerge. Unlike the structured layouts found indoors, outdoor environments lack clear spatial delineations and are riddled with inherent semantic ambiguities. Despite this, humans navigate with ease because we can reason about the unseen. We introduce a new task OUTDOOR, a new mechanism for Large Language Models (LLMs) to accurately hallucinate possible futures, and a new computationally aware success metric for pushing research forward in this more complex domain. Additionally, we show impressive results on both a simulated drone and physical quadruped in outdoor environments. Our agent has no premapping and our formalism outperforms naive LLM-based approaches
    摘要 人工智能机器应该存在任何人类存在的地方:室内、室外和无法映射的环境中。然而,最近的对象目标导航(OGN)的进步主要集中在内部环境中,利用空间和semantic信息,这些信息不会在户外generalize。虽然这些贡献提供了有价值的室内场景研究,但广泛的实际机器人应用场景通常是户外设置。作为我们从室内环境传播到更加复杂的户外环境,新的挑战出现。与室内的结构化布局不同,户外环境缺乏明确的空间定义,充斥着内生的semantic抽象。尽管如此,人类能够轻松地导航,因为我们可以理解未经见过的事物。我们介绍了一个新任务OUTDOOR,一种新的大语言模型(LLM)准确预测未来的机制,以及一种新的computational感知成功指标,以推动研究在这个更复杂的领域。此外,我们在模拟的飞机和物理四肢机器人上实现了出色的成果在户外环境中。我们的机器人没有预mapping,我们的正式形式超越了没有经验LLM-based方法。

Data Formulator: AI-powered Concept-driven Visualization Authoring

  • paper_url: http://arxiv.org/abs/2309.10094
  • repo_url: None
  • paper_authors: Chenglong Wang, John Thompson, Bongshin Lee
  • for: 提高数据视觉化的艺术化能力,帮助作者快速创建高质量的数据视觉图表。
  • methods: 使用人工智能代理人实现概念绑定视觉 paradigma,自然语言或示例来定义数据概念,然后将其绑定到视觉通道上。
  • results: 通过人工智能代理人自动将输入数据转换为Visualization,创建高质量的数据视觉图表,并提供反馈来帮助作者查看和理解结果。
    Abstract With most modern visualization tools, authors need to transform their data into tidy formats to create visualizations they want. Because this requires experience with programming or separate data processing tools, data transformation remains a barrier in visualization authoring. To address this challenge, we present a new visualization paradigm, concept binding, that separates high-level visualization intents and low-level data transformation steps, leveraging an AI agent. We realize this paradigm in Data Formulator, an interactive visualization authoring tool. With Data Formulator, authors first define data concepts they plan to visualize using natural languages or examples, and then bind them to visual channels. Data Formulator then dispatches its AI-agent to automatically transform the input data to surface these concepts and generate desired visualizations. When presenting the results (transformed table and output visualizations) from the AI agent, Data Formulator provides feedback to help authors inspect and understand them. A user study with 10 participants shows that participants could learn and use Data Formulator to create visualizations that involve challenging data transformations, and presents interesting future research directions.
    摘要 现代视觉工具中,作者通常需要将数据转换成整齐的格式,以创建他们想要的视觉。由于这需要编程经验或分离的数据处理工具,数据转换仍然是视觉作者的挑战。为解决这个挑战,我们提出了一种新的视觉 парадиг,即概念绑定。我们在数据形成器中实现了这种 парадиг,这是一个交互式的视觉作业工具。作者首先使用自然语言或示例来定义他们计划要visualize的数据概念,然后将它们绑定到视觉通道上。数据形成器然后将其人工智能代理发送到自动将输入数据转换成Surface这些概念和生成所需的视觉。当presenting the results(转换后的表格和输出视觉)时,数据形成器提供反馈,帮助作者检查和理解它们。一个参与者学习研究表明,参与者可以快速学习并使用数据形成器创建包含复杂数据转换的视觉。这个研究还提出了一些未来研究的可能性。

Conformal Temporal Logic Planning using Large Language Models: Knowing When to Do What and When to Ask for Help

  • paper_url: http://arxiv.org/abs/2309.10092
  • repo_url: None
  • paper_authors: Jun Wang, Jiaming Tong, Kaiyuan Tan, Yevgeniy Vorobeychik, Yiannis Kantaros
  • for: 本研究旨在提出一种新的机器人运动规划问题,即使用自然语言(NL)表述多个高级次任务,并通过时间和逻辑顺序来描述这些任务。
  • methods: 本研究使用LTL( Linear Temporal Logic)定义了NL基于的原子操作符来正式定义这些任务,而不是使用相关的规划方法,这些方法定义LTL任务 sobre atomic predicate capturing desired low-level system configurations。我们的目标是设计满足LTL任务的机器人计划。
  • results: 我们提出了HERACLEs,一种嵌入了现有工具的层次匹配自然语言规划器,包括(i) 自动机理论来确定需要完成的NL基于的子任务; (ii) 大语言模型来设计满足这些子任务的机器人计划; 和 (iii) 匹配预测来 probabilistically about correctness of the designed plans and mission satisfaction, and determine if external assistance is required. 我们进行了广泛的比较实验,并提供了项目网站ltl-llm.github.io。
    Abstract This paper addresses a new motion planning problem for mobile robots tasked with accomplishing multiple high-level sub-tasks, expressed using natural language (NL), in a temporal and logical order. To formally define such missions, we leverage LTL defined over NL-based atomic predicates modeling the considered NL-based sub-tasks. This is contrast to related planning approaches that define LTL tasks over atomic predicates capturing desired low-level system configurations. Our goal is to design robot plans that satisfy LTL tasks defined over NL-based atomic propositions. A novel technical challenge arising in this setup lies in reasoning about correctness of a robot plan with respect to such LTL-encoded tasks. To address this problem, we propose HERACLEs, a hierarchical conformal natural language planner, that relies on a novel integration of existing tools that include (i) automata theory to determine the NL-specified sub-task the robot should accomplish next to make mission progress; (ii) Large Language Models to design robot plans satisfying these sub-tasks; and (iii) conformal prediction to reason probabilistically about correctness of the designed plans and mission satisfaction and to determine if external assistance is required. We provide extensive comparative experiments on mobile manipulation tasks. The project website is ltl-llm.github.io.
    摘要 A novel technical challenge arises in this setup, as we must reason about the correctness of a robot plan with respect to such LTL-encoded tasks. To address this, we propose HERACLEs, a hierarchical conformal natural language planner that integrates existing tools:1. Automata theory to determine the next NL-specified sub-task the robot should accomplish for mission progress.2. Large Language Models to design robot plans satisfying these sub-tasks.3. Conformal prediction to reason probabilistically about the correctness of the designed plans and mission satisfaction, and to determine if external assistance is required.We provide extensive comparative experiments on mobile manipulation tasks. For more information, please visit the project website at ltl-llm.github.io.

Unified Coarse-to-Fine Alignment for Video-Text Retrieval

  • paper_url: http://arxiv.org/abs/2309.10091
  • repo_url: https://github.com/ziyang412/ucofia
  • paper_authors: Ziyang Wang, Yi-Lin Sung, Feng Cheng, Gedas Bertasius, Mohit Bansal
  • for: 这个论文的目的是提出一种基于CLIP的视频-文本关联模型,以解决视频-文本关联任务中的找到正确的视频问题。
  • methods: 该模型使用了粗细grained或细化的对齐方法,并应用了一个交互性 Similarity Aggregation模块 (ISA),以考虑不同的视觉特征的重要性而对alignment进行重新权重。
  • results: 该模型在多个视频-文本关联数据集上表现出色,与之前的CLIP-based方法相比,实现了2.4%、1.4%和1.3%的提升。
    Abstract The canonical approach to video-text retrieval leverages a coarse-grained or fine-grained alignment between visual and textual information. However, retrieving the correct video according to the text query is often challenging as it requires the ability to reason about both high-level (scene) and low-level (object) visual clues and how they relate to the text query. To this end, we propose a Unified Coarse-to-fine Alignment model, dubbed UCoFiA. Specifically, our model captures the cross-modal similarity information at different granularity levels. To alleviate the effect of irrelevant visual clues, we also apply an Interactive Similarity Aggregation module (ISA) to consider the importance of different visual features while aggregating the cross-modal similarity to obtain a similarity score for each granularity. Finally, we apply the Sinkhorn-Knopp algorithm to normalize the similarities of each level before summing them, alleviating over- and under-representation issues at different levels. By jointly considering the crossmodal similarity of different granularity, UCoFiA allows the effective unification of multi-grained alignments. Empirically, UCoFiA outperforms previous state-of-the-art CLIP-based methods on multiple video-text retrieval benchmarks, achieving 2.4%, 1.4% and 1.3% improvements in text-to-video retrieval R@1 on MSR-VTT, Activity-Net, and DiDeMo, respectively. Our code is publicly available at https://github.com/Ziyang412/UCoFiA.
    摘要 “ Traditional video-text retrieval methods often rely on coarse-grained or fine-grained alignment between visual and textual information. However, accurately retrieving the correct video based on a text query can be challenging, as it requires the ability to reason about both high-level (scene) and low-level (object) visual clues and how they relate to the text query. To address this challenge, we propose a Unified Coarse-to-fine Alignment model (UCoFiA). Our model captures cross-modal similarity information at different granularity levels and uses an Interactive Similarity Aggregation module (ISA) to consider the importance of different visual features when aggregating cross-modal similarity. Additionally, we use the Sinkhorn-Knopp algorithm to normalize the similarities of each level before summing them, alleviating over- and under-representation issues at different levels. By jointly considering the cross-modal similarity of different granularity, UCoFiA enables effective unification of multi-grained alignments. Empirical results show that UCoFiA outperforms previous state-of-the-art CLIP-based methods on multiple video-text retrieval benchmarks, achieving 2.4%, 1.4%, and 1.3% improvements in text-to-video retrieval R@1 on MSR-VTT, Activity-Net, and DiDeMo, respectively. Our code is publicly available at .”Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you prefer Traditional Chinese, please let me know and I can provide the translation in that format instead.

HTEC: Human Transcription Error Correction

  • paper_url: http://arxiv.org/abs/2309.10089
  • repo_url: None
  • paper_authors: Hanbo Sun, Jian Gao, Xiaomin Wu, Anjie Fang, Cheng Cao, Zheng Du
  • for: 提高自动语音识别(ASR)模型的训练和提升
  • methods: 提出了一种基于人工纠正的人类纠正错误(HTEC)方法,包括两个阶段:检测错误的“Trans-Checker”模型和填充错误位置的“Trans-Filler”模型,以及一个包含多种修正操作的整体修正操作列表
  • results: HTEC比其他方法大幅提高了WER表现,并在人工纠正时超过人类注释员2.2%至4.5%的WER表现,并且在辅助人类注释员时提高了译录质量15.1% без损失翻译速度。
    Abstract High-quality human transcription is essential for training and improving Automatic Speech Recognition (ASR) models. Recent study~\cite{libricrowd} has found that every 1% worse transcription Word Error Rate (WER) increases approximately 2% ASR WER by using the transcriptions to train ASR models. Transcription errors are inevitable for even highly-trained annotators. However, few studies have explored human transcription correction. Error correction methods for other problems, such as ASR error correction and grammatical error correction, do not perform sufficiently for this problem. Therefore, we propose HTEC for Human Transcription Error Correction. HTEC consists of two stages: Trans-Checker, an error detection model that predicts and masks erroneous words, and Trans-Filler, a sequence-to-sequence generative model that fills masked positions. We propose a holistic list of correction operations, including four novel operations handling deletion errors. We further propose a variant of embeddings that incorporates phoneme information into the input of the transformer. HTEC outperforms other methods by a large margin and surpasses human annotators by 2.2% to 4.5% in WER. Finally, we deployed HTEC to assist human annotators and showed HTEC is particularly effective as a co-pilot, which improves transcription quality by 15.1% without sacrificing transcription velocity.
    摘要 高品质人工转录是训练和改进自动语音识别(ASR)模型的重要前提。据 latest study 发现,每1%更差的转录单词错误率(WER)会提高约2% ASR WER。然而,有限的研究探讨了人工转录更正。其他问题的错误修正方法,如ASR错误修正和 grammatical error correction,在这个问题上并不够。因此,我们提出了 HTEC для人工转录错误修正。 HTEC 包括两个阶段:转错检测模型(Trans-Checker),可以预测和覆盖错误单词,以及转换模型(Trans-Filler),可以填充覆盖的位置。我们提出了一份总体的修正操作列表,包括四种新的操作来处理删除错误。此外,我们还提出了一种包含音频信息的变换 embedding,用于输入转换器。 HTEC 在其他方法之上大幅提高,并在人工标注员之上提高了2.2%到4.5%的 WER。最后,我们将 HTEC 部署到人工标注员的助手,并证明 HTEC 作为助手可以提高转录质量15.1%,无需减少转录速度。

GAME: Generalized deep learning model towards multimodal data integration for early screening of adolescent mental disorders

  • paper_url: http://arxiv.org/abs/2309.10077
  • repo_url: None
  • paper_authors: Zhicheng Du, Chenyao Jiang, Xi Yuan, Shiyao Zhai, Zhengyang Lei, Shuyue Ma, Yang Liu, Qihui Ye, Chufan Xiao, Qiming Huang, Ming Xu, Dongmei Yu, Peiwu Qin
  • for: 这个研究旨在提供一个可靠的多modal数据收集系统,以早期识别青少年的情绪障碍。
  • methods: 研究人员使用了一个Android应用程序,该应用程序包含了多个游戏和聊天纪录功能,并将其部署在一个便携式的机器人上,以萤幕3,783名中学生的情绪状态。
  • results: 研究人员发现,这个系统可以实时识别青少年的情绪障碍,并且具有73.34%-92.77%的准确率和71.32%-91.06%的F1分数。此外,研究人员发现每个感知modal都会在不同的情绪障碍中发挥不同的作用,这显示了这个系统的可解释性。
    Abstract The timely identification of mental disorders in adolescents is a global public health challenge.Single factor is difficult to detect the abnormality due to its complex and subtle nature. Additionally, the generalized multimodal Computer-Aided Screening (CAS) systems with interactive robots for adolescent mental disorders are not available. Here, we design an android application with mini-games and chat recording deployed in a portable robot to screen 3,783 middle school students and construct the multimodal screening dataset, including facial images, physiological signs, voice recordings, and textual transcripts.We develop a model called GAME (Generalized Model with Attention and Multimodal EmbraceNet) with novel attention mechanism that integrates cross-modal features into the model. GAME evaluates adolescent mental conditions with high accuracy (73.34%-92.77%) and F1-Score (71.32%-91.06%).We find each modality contributes dynamically to the mental disorders screening and comorbidities among various mental disorders, indicating the feasibility of explainable model. This study provides a system capable of acquiring multimodal information and constructs a generalized multimodal integration algorithm with novel attention mechanisms for the early screening of adolescent mental disorders.
    摘要 全球公共卫生中既早发现青少年精神疾病是一个大型公共卫生挑战。单一因素难以检测异常性,因为精神疾病的特征复杂且柔和。此外,普遍的多模态计算机助手系统 для青少年精神疾病并不可用。我们设计了一个安卓应用程序,其中包含了小游戏和聊天记录,并在可携带的机器人上部署。我们为3783名中学生进行了屏幕测试,并建立了多模态检测数据集,包括脸部图像、生物学指标、声音记录和文本译文。我们开发了一个名为“总体模型”(GAME)的模型,其中包含了新的注意力机制,可以将多种模式的特征集成到模型中。GAME模型可以准确地评估青少年精神状况(73.34%-92.77%)和F1分数(71.32%-91.06%)。我们发现每种模式在精神疾病检测中都有独特的贡献,表明模型的可解释性。本研究提供了一种可以获取多模式信息的系统,并构建了一种普遍的多模式融合算法,其中包含了新的注意力机制,用于早期检测青少年精神疾病。

Sex-based Disparities in Brain Aging: A Focus on Parkinson’s Disease

  • paper_url: http://arxiv.org/abs/2309.10069
  • repo_url: None
  • paper_authors: Iman Beheshti, Samuel Booth, Ji Hyun Ko
  • for: 这研究旨在了解帕金сон病患者的大脑年龄差(brain-predicted age difference,简称brain-PAD)与性别之间的关系,以及这种关系对患者的诊断和治疗的影响。
  • methods: 研究使用T1束重磁共振成像(MRI)驱动的大脑预测年龄差(brain-PAD)计算方法,并使用线性回归模型研究帕金сон病患者的大脑预测年龄差与临床变量之间的关系。
  • results: 研究发现,♂️帕金сон病患者的大脑预测年龄差较♀帕金сон病患者高,而♂️帕金сон病患者的大脑预测年龄差与总脑功能、睡眠行为障碍、视空间acuity和唱杯损害之间存在 statistically significant 关系,♀帕金сон病患者则没有这种关系。
    Abstract PD is linked to faster brain aging. Sex is recognized as an important factor in PD, such that males are twice as likely as females to have the disease and have more severe symptoms and a faster progression rate. Despite previous research, there remains a significant gap in understanding the function of sex in the process of brain aging in PD patients. The T1-weighted MRI-driven brain-predicted age difference was computed in a group of 373 PD patients from the PPMI database using a robust brain-age estimation framework that was trained on 949 healthy subjects. Linear regression models were used to investigate the association between brain-PAD and clinical variables in PD, stratified by sex. All female PD patients were used in the correlational analysis while the same number of males were selected based on propensity score matching method considering age, education level, age of symptom onset, and clinical symptom severity. Despite both patient groups being matched for demographics, motor and non-motor symptoms, it was observed that males with Parkinson's disease exhibited a significantly higher mean brain age-delta than their female counterparts . In the propensity score-matched PD male group, brain-PAD was found to be associated with a decline in general cognition, a worse degree of sleep behavior disorder, reduced visuospatial acuity, and caudate atrophy. Conversely, no significant links were observed between these factors and brain-PAD in the PD female group.
    摘要 PD与更快的脑龄相关。性别被认为对PD有重要影响,男性比女性患有疾病的风险高, symptoms更严重,病程更快。 DESPITE PREVIOUS RESEARCH, THERE REMAINS A SIGNIFICANT GAP IN UNDERSTANDING THE FUNCTION OF SEX IN THE PROCESS OF BRAIN AGING IN PD PATIENTS. 使用PPMI数据库中的373名PD患者,通过一种可靠的脑龄估计框架,计算出脑龄差值(brain-PAD)。在PD患者中,按照性别分组,使用线性回归模型investigate brain-PAD与临床变量的关系。 female PD患者中,与brain-PAD相关的变量包括:脑部功能障碍、睡眠行为障碍、视空间acuity和caudate萎缩。 Conversely,在男性PD患者中,这些变量与brain-PAD无 statistically significant links。

Automatic Personalized Impression Generation for PET Reports Using Large Language Models

  • paper_url: http://arxiv.org/abs/2309.10066
  • repo_url: https://github.com/xtie97/pet-report-summarization
  • paper_authors: Xin Tie, Muheon Shin, Ali Pirasteh, Nevein Ibrahim, Zachary Huemann, Sharon M. Castellino, Kara M. Kelly, John Garrett, Junjie Hu, Steve Y. Cho, Tyler J. Bradshaw
  • For: The paper aims to determine if fine-tuned large language models (LLMs) can generate accurate, personalized impressions for whole-body PET reports.* Methods: The paper uses a corpus of PET reports to train 12 language models using the teacher-forcing algorithm, with the report findings as input and the clinical impressions as reference. The models are trained to learn physician-specific reporting styles by using an extra input token that encodes the reading physician’s identity.* Results: The paper finds that the fine-tuned PEGASUS model generates clinically acceptable impressions that are comparable in overall utility to those dictated by other physicians. Specifically, 89% of the personalized impressions generated by PEGASUS were considered clinically acceptable, with a mean utility score of 4.08/5.Here is the simplified Chinese text for the three key information points:* For: 本研究目的是判断是否可以使用调整后的大语言模型(LLMs)生成准确、个性化的核医报告。* Methods: 本研究使用核医报告集来训练12种语言模型,使用教师压制算法,报告结果作为输入,临床印象作为参考。模型通过使用阅读医生标识符来学习医生特有的报告风格。* Results: 研究发现,调整后的PEGASUS模型可生成具有医学可接受性的个性化印象,与其他医生的印象相当。具体来说,PEGASUS模型生成的个性化印象中,89%被评估为医学可接受,平均用户满意度为4.08/5。
    Abstract Purpose: To determine if fine-tuned large language models (LLMs) can generate accurate, personalized impressions for whole-body PET reports. Materials and Methods: Twelve language models were trained on a corpus of PET reports using the teacher-forcing algorithm, with the report findings as input and the clinical impressions as reference. An extra input token encodes the reading physician's identity, allowing models to learn physician-specific reporting styles. Our corpus comprised 37,370 retrospective PET reports collected from our institution between 2010 and 2022. To identify the best LLM, 30 evaluation metrics were benchmarked against quality scores from two nuclear medicine (NM) physicians, with the most aligned metrics selecting the model for expert evaluation. In a subset of data, model-generated impressions and original clinical impressions were assessed by three NM physicians according to 6 quality dimensions and an overall utility score (5-point scale). Each physician reviewed 12 of their own reports and 12 reports from other physicians. Bootstrap resampling was used for statistical analysis. Results: Of all evaluation metrics, domain-adapted BARTScore and PEGASUSScore showed the highest Spearman's rho correlations (0.568 and 0.563) with physician preferences. Based on these metrics, the fine-tuned PEGASUS model was selected as the top LLM. When physicians reviewed PEGASUS-generated impressions in their own style, 89% were considered clinically acceptable, with a mean utility score of 4.08/5. Physicians rated these personalized impressions as comparable in overall utility to the impressions dictated by other physicians (4.03, P=0.41). Conclusion: Personalized impressions generated by PEGASUS were clinically useful, highlighting its potential to expedite PET reporting.
    摘要 目的:检验大型语言模型(LLM)是否可以生成精准、个性化的全身PET报告印象。材料和方法:十二个语言模型通过教师强制算法进行训练,使用PET报告作为输入,并使用报告结果作为参考。模型中还包含一个特殊的输入 токен,用于学习各个physician的报告风格。我们的 corpus 包括2010年至2022年我们机构收集的37,370个Retrospective PET报告。为了选择最佳LLM,我们对30个评估指标进行了对比,并与两名核医学家(NM)的评分相对比较。在一部分数据中,模型生成的印象和原始临床印象被三名NMphysician根据6个质量维度和一个总体用户分数(5分级)进行评估。每名physician reviewed 12份自己的报告和12份其他physician的报告。我们使用 bootstrap 采样来进行统计分析。结果:所有评估指标中,适应域BARTScore和PEGASUSScore显示最高的斯佩曼氏相关度(0.568和0.563)。基于这些指标,我们选择了精心适应PEGASUS模型。当physician们评估PEGASUS生成的印象时,89%被评为临床可接受,平均用户分数为4.08/5。physicians 评估这些个性化印象的总体用户分数相当于他们自己dictated的印象的总体用户分数(4.03,P=0.41)。结论:PEGASUS生成的个性化印象是临床有用, highlighting its potential to expedite PET reporting。

Toward collision-free trajectory for autonomous and pilot-controlled unmanned aerial vehicles

  • paper_url: http://arxiv.org/abs/2309.10064
  • repo_url: None
  • paper_authors: Kaya Kuru, John Michael Pinder, Benjamin Jon Watkinson, Darren Ansell, Keith Vinning, Lee Moore, Chris Gilbert, Aadithya Sujit, David Jones
  • for: The paper is written for the purpose of developing an advanced collision management methodology for unmanned aerial vehicles (UAVs) to avoid mid-air collisions (MACs) with manned aeroplanes.
  • methods: The paper uses electronic conspicuity (EC) information made available by PilotAware Ltd and a reactive geometric conflict detection and resolution (CDR) technique to determine and execute time-optimal evasive collision avoidance (CA) manoeuvres.
  • results: The proposed methodology is demonstrated to be successful in avoiding collisions while limiting the deviation from the original trajectory in highly dynamic aerospace environments without requiring sophisticated sensors and prior training.Here are the three points in Simplified Chinese text:
  • for: 这篇论文是为了开发一种高级冲突管理方法,以避免无人飞行器(UAV)与人员飞机的中空冲突(MAC)。
  • methods: 这篇论文使用PilotAware Ltd提供的电子闪耀信息和一种反应几何冲突检测和解决技术,以确定和执行时间优化的避免冲突措施。
  • results: 提议的方法在高动态航空环境中成功地避免冲突,并限制偏离原轨迹的偏差。
    Abstract For drones, as safety-critical systems, there is an increasing need for onboard detect & avoid (DAA) technology i) to see, sense or detect conflicting traffic or imminent non-cooperative threats due to their high mobility with multiple degrees of freedom and the complexity of deployed unstructured environments, and subsequently ii) to take the appropriate actions to avoid collisions depending upon the level of autonomy. The safe and efficient integration of UAV traffic management (UTM) systems with air traffic management (ATM) systems, using intelligent autonomous approaches, is an emerging requirement where the number of diverse UAV applications is increasing on a large scale in dense air traffic environments for completing swarms of multiple complex missions flexibly and simultaneously. Significant progress over the past few years has been made in detecting UAVs present in aerospace, identifying them, and determining their existing flight path. This study makes greater use of electronic conspicuity (EC) information made available by PilotAware Ltd in developing an advanced collision management methodology -- Drone Aware Collision Management (DACM) -- capable of determining and executing a variety of time-optimal evasive collision avoidance (CA) manoeuvres using a reactive geometric conflict detection and resolution (CDR) technique. The merits of the DACM methodology have been demonstrated through extensive simulations and real-world field tests in avoiding mid-air collisions (MAC) between UAVs and manned aeroplanes. The results show that the proposed methodology can be employed successfully in avoiding collisions while limiting the deviation from the original trajectory in highly dynamic aerospace without requiring sophisticated sensors and prior training.
    摘要 For drones, as safety-critical systems, there is an increasing need for onboard detect & avoid (DAA) technology to see, sense or detect conflicting traffic or imminent non-cooperative threats due to their high mobility with multiple degrees of freedom and the complexity of deployed unstructured environments, and subsequently to take the appropriate actions to avoid collisions depending upon the level of autonomy. The safe and efficient integration of UAV traffic management (UTM) systems with air traffic management (ATM) systems, using intelligent autonomous approaches, is an emerging requirement where the number of diverse UAV applications is increasing on a large scale in dense air traffic environments for completing swarms of multiple complex missions flexibly and simultaneously. Significant progress has been made in recent years in detecting UAVs present in aerospace, identifying them, and determining their existing flight path. This study makes greater use of electronic conspicuity (EC) information provided by PilotAware Ltd in developing an advanced collision management methodology -- Drone Aware Collision Management (DACM) -- capable of determining and executing a variety of time-optimal evasive collision avoidance (CA) manoeuvres using a reactive geometric conflict detection and resolution (CDR) technique. The merits of the DACM methodology have been demonstrated through extensive simulations and real-world field tests in avoiding mid-air collisions (MAC) between UAVs and manned aeroplanes. The results show that the proposed methodology can be successfully employed in avoiding collisions while limiting the deviation from the original trajectory in highly dynamic aerospace without requiring sophisticated sensors and prior training.

Survey of Consciousness Theory from Computational Perspective

  • paper_url: http://arxiv.org/abs/2309.10063
  • repo_url: https://github.com/Aryia-Behroziuan/References
  • paper_authors: Zihan Ding, Xiaoxi Wei, Yidan Xu
  • for: 本研究目的是 bridge 不同学科的意识理论,以计算机科学的视角来解释人类意识现象。
  • methods: 本文使用了多种方法,包括信息理论、量子物理学、认知心理学、生物物理学和计算机科学等,以寻求解释意识现象的方法。
  • results: 本研究通过对多种意识理论的汇总和分析,提出了一种计算机科学的意识模型,并讨论了现有的意识评价指标和计算机模型是否具有意识性。
    Abstract Human consciousness has been a long-lasting mystery for centuries, while machine intelligence and consciousness is an arduous pursuit. Researchers have developed diverse theories for interpreting the consciousness phenomenon in human brains from different perspectives and levels. This paper surveys several main branches of consciousness theories originating from different subjects including information theory, quantum physics, cognitive psychology, physiology and computer science, with the aim of bridging these theories from a computational perspective. It also discusses the existing evaluation metrics of consciousness and possibility for current computational models to be conscious. Breaking the mystery of consciousness can be an essential step in building general artificial intelligence with computing machines.
    摘要 人类意识已经是历史上一个长期的谜团,而机器智能和意识的研究则是一项艰难的探索。研究人员已经提出了多种解释人类大脑意识现象的理论,从不同的角度和水平来看。本文将从计算机科学的视角 survey 这些主要分支的意识理论,并讨论现有的意识评价指标以及现有的计算机模型是否具备意识性。破解意识之谜可能是建立通用人工智能机器的重要一步。

General In-Hand Object Rotation with Vision and Touch

  • paper_url: http://arxiv.org/abs/2309.09979
  • repo_url: None
  • paper_authors: Haozhi Qi, Brent Yi, Sudharshan Suresh, Mike Lambeta, Yi Ma, Roberto Calandra, Jitendra Malik
  • for: 这个论文是为了解决触摸式物体旋转的问题,通过多种感知输入来实现。
  • methods: 这个系统使用了仿真训练,并在真实的涂抹和 proprioceptive 感知输入下进行了线性激活。它还使用了一种可视感知和抓取Transformer来融合多modal 感知输入。
  • results: 相比之前的方法,这个系统在实际应用中表现出了显著的性能提升,并且Visual和感觉感知的重要性得到了证明。
    Abstract We introduce RotateIt, a system that enables fingertip-based object rotation along multiple axes by leveraging multimodal sensory inputs. Our system is trained in simulation, where it has access to ground-truth object shapes and physical properties. Then we distill it to operate on realistic yet noisy simulated visuotactile and proprioceptive sensory inputs. These multimodal inputs are fused via a visuotactile transformer, enabling online inference of object shapes and physical properties during deployment. We show significant performance improvements over prior methods and the importance of visual and tactile sensing.
    摘要 我们介绍RotateIt系统,可以通过多轴滚动来实现手指基于多轴滚动的物体旋转,利用多种感知输入。我们的系统在模拟环境中训练,有访问真实的物体形状和物理属性。然后我们将其逼真实的视觉感知和 proprioceptive 感知混合,通过视觉感知变换,在部署时进行在线的物体形状和物理属性的推断。我们显示了与先前方法相比有显著性能提升,以及视觉和感觉感知的重要性。

MindAgent: Emergent Gaming Interaction

  • paper_url: http://arxiv.org/abs/2309.09971
  • repo_url: None
  • paper_authors: Ran Gong, Qiuyuan Huang, Xiaojian Ma, Hoi Vo, Zane Durante, Yusuke Noda, Zilong Zheng, Song-Chun Zhu, Demetri Terzopoulos, Li Fei-Fei, Jianfeng Gao
  • for: 这paper是为了研究大语言模型在多智能体系统中的复杂计划和协调能力,以及如何在游戏中与人类NPC合作。
  • methods: 该paper使用了现有的游戏框架,并引入了一些新的技术,如需要理解协调者、与人类玩家合作via不完美化的指令,以及在几个shot提示下进行在Context learning。
  • results: 该paper introduce了一个新的游戏场景和benchmark,并通过了新的自动度量器CoS来评估多智能体协调效率。该paper的结果表明,大语言模型可以在游戏中协调多智能体,并且可以通过学习大语言资源来获得这些技能。
    Abstract Large Language Models (LLMs) have the capacity of performing complex scheduling in a multi-agent system and can coordinate these agents into completing sophisticated tasks that require extensive collaboration. However, despite the introduction of numerous gaming frameworks, the community has insufficient benchmarks towards building general multi-agents collaboration infrastructure that encompass both LLM and human-NPCs collaborations. In this work, we propose a novel infrastructure - MindAgent - to evaluate planning and coordination emergent capabilities for gaming interaction. In particular, our infrastructure leverages existing gaming framework, to i) require understanding of the coordinator for a multi-agent system, ii) collaborate with human players via un-finetuned proper instructions, and iii) establish an in-context learning on few-shot prompt with feedback. Furthermore, we introduce CUISINEWORLD, a new gaming scenario and related benchmark that dispatch a multi-agent collaboration efficiency and supervise multiple agents playing the game simultaneously. We conduct comprehensive evaluations with new auto-metric CoS for calculating the collaboration efficiency. Finally, our infrastructure can be deployed into real-world gaming scenarios in a customized VR version of CUISINEWORLD and adapted in existing broader Minecraft gaming domain. We hope our findings on LLMs and the new infrastructure for general-purpose scheduling and coordination can help shed light on how such skills can be obtained by learning from large language corpora.
    摘要 In this work, we propose a novel infrastructure called MindAgent to evaluate the planning and coordination capabilities of LLMs in gaming interactions. Our infrastructure leverages existing gaming frameworks to:1. Require understanding of the coordinator for a multi-agent system.2. Collaborate with human players via un-finetuned proper instructions.3. Establish in-context learning on few-shot prompts with feedback.Furthermore, we introduce CUISINEWORLD, a new gaming scenario and related benchmark that dispatches a multi-agent collaboration efficiency and supervises multiple agents playing the game simultaneously. We conduct comprehensive evaluations with a new auto-metric CoS for calculating collaboration efficiency.Our infrastructure can be deployed into real-world gaming scenarios in a customized VR version of CUISINEWORLD and adapted in existing broader Minecraft gaming domains. We hope our findings on LLMs and the new infrastructure for general-purpose scheduling and coordination can help shed light on how such skills can be obtained by learning from large language corpora.

How to Generate Popular Post Headlines on Social Media?

  • paper_url: http://arxiv.org/abs/2309.09949
  • repo_url: None
  • paper_authors: Zhouxiang Fang, Min Yu, Zhendong Fu, Boning Zhang, Xuanwen Huang, Xiaoqi Tang, Yang Yang
  • for: automatic generation of popular headlines on social media
  • methods: Multiple preference-Extractors with Bidirectional and Auto-Regressive Transformers (BART)
  • results: state-of-the-art performance compared with several advanced baselines, and ability to capture trends and personal styles.
    Abstract Posts, as important containers of user-generated-content pieces on social media, are of tremendous social influence and commercial value. As an integral components of a post, the headline has a decisive contribution to the post's popularity. However, current mainstream method for headline generation is still manually writing, which is unstable and requires extensive human effort. This drives us to explore a novel research question: Can we automate the generation of popular headlines on social media? We collect more than 1 million posts of 42,447 celebrities from public data of Xiaohongshu, which is a well-known social media platform in China. We then conduct careful observations on the headlines of these posts. Observation results demonstrate that trends and personal styles are widespread in headlines on social medias and have significant contribution to posts's popularity. Motivated by these insights, we present MEBART, which combines Multiple preference-Extractors with Bidirectional and Auto-Regressive Transformers (BART), capturing trends and personal styles to generate popular headlines on social medias. We perform extensive experiments on real-world datasets and achieve state-of-the-art performance compared with several advanced baselines. In addition, ablation and case studies demonstrate that MEBART advances in capturing trends and personal styles.
    摘要 posts是社交媒体上重要的用户生成内容容器,具有巨大的社会影响力和商业价值。在这些posts中,标题具有决定性的贡献,促进post的受欢迎程度。然而,现有主流的标题生成方法仍然是人工写作,它是不稳定的,需要大量的人工劳动。这使我们感到需要解决这个问题:可以自动生成社交媒体上的受欢迎标题吗?我们收集了来自Xiaohongshu社交媒体平台的公共数据,包含42,447名明星的post,并进行了仔细的观察。观察结果表明,社交媒体上的标题中广泛存在趋势和个人风格,这些趋势和风格对post的受欢迎程度有重要的贡献。我们发现这些趋势和风格的存在,驱动我们开发MEBART模型,这是基于多个偏好抽取器和双向自适应变换器(BART)的模型,可以捕捉到社交媒体上的趋势和个人风格,生成受欢迎的标题。我们在实际数据上进行了广泛的实验,与多个先进基elines进行比较,并实现了当前领域的最佳性能。此外,我们还进行了ablation和案例研究,以证明MEBART模型在捕捉趋势和个人风格方面的进步。

What is a Fair Diffusion Model? Designing Generative Text-To-Image Models to Incorporate Various Worldviews

  • paper_url: http://arxiv.org/abs/2309.09944
  • repo_url: https://github.com/zoedesimone/diffusionworldviewer
  • paper_authors: Zoe De Simone, Angie Boggust, Arvind Satyanarayan, Ashia Wilson
  • for: The paper aims to enhance bias mitigation in generative text-to-image (GTI) models by introducing a tool called DiffusionWorldViewer to analyze and manipulate the models’ attitudes, values, stories, and expectations of the world.
  • methods: The tool uses an interactive interface deployed as a web-based GUI and Jupyter Notebook plugin to categorize existing demographics of GTI-generated images and provide interactive methods to align image demographics with user worldviews.
  • results: In a study with 13 GTI users, the tool was found to allow users to represent their varied viewpoints about what GTI outputs are fair, challenging current notions of fairness that assume a universal worldview.Here’s the same information in Simplified Chinese text:
  • for: 论文旨在提高生成文本到图像(GTI)模型中的偏见减轻,通过引入DiffusionWorldViewer工具分析和 manipulate GTI模型对世界的态度、价值观、故事和期望的影响。
  • methods: DiffusionWorldViewer使用了一个交互式的网页UI和Jupyter Notebook插件,将GTI生成图像中的存在人类划分为不同类别,并提供了互动方法来与用户的世界观点对齐。
  • results: 在13名GTI用户的研究中,DiffusionWorldViewer被发现可以让用户表达他们对GTI输出是否公正的多种视点,挑战当前假设一个统一的世界观点的假设。
    Abstract Generative text-to-image (GTI) models produce high-quality images from short textual descriptions and are widely used in academic and creative domains. However, GTI models frequently amplify biases from their training data, often producing prejudiced or stereotypical images. Yet, current bias mitigation strategies are limited and primarily focus on enforcing gender parity across occupations. To enhance GTI bias mitigation, we introduce DiffusionWorldViewer, a tool to analyze and manipulate GTI models' attitudes, values, stories, and expectations of the world that impact its generated images. Through an interactive interface deployed as a web-based GUI and Jupyter Notebook plugin, DiffusionWorldViewer categorizes existing demographics of GTI-generated images and provides interactive methods to align image demographics with user worldviews. In a study with 13 GTI users, we find that DiffusionWorldViewer allows users to represent their varied viewpoints about what GTI outputs are fair and, in doing so, challenges current notions of fairness that assume a universal worldview.
    摘要 生成文本到图像(GTI)模型可以生成高质量的图像,但它们经常增强训练数据中的偏见,导致生成的图像具有偏见或�tereotype。然而,现有的偏见缓解策略主要集中在强制性域 occupations 的均衡上。为了提高 GTI 偏见缓解,我们介绍了DiffusionWorldViewer,一种分析和 manipulate GTI 模型对世界的看法、价值观、故事和期望的工具。通过在Web上部署的界面和 Jupyter Notebook 插件,DiffusionWorldViewer 可以分类 GTI 生成的图像中的现有民族、提供交互方式来与用户的世界观点相对应。在13名 GTI 用户的研究中,我们发现DiffusionWorldViewer 允许用户表达他们对 GTI 输出的多种看法,并在这个过程中挑战当前的公平性假设,即所有人都应该有同一个世界观。

A Heterogeneous Graph-Based Multi-Task Learning for Fault Event Diagnosis in Smart Grid

  • paper_url: http://arxiv.org/abs/2309.09921
  • repo_url: None
  • paper_authors: Dibaloke Chanda, Nasim Yahya Soltani
  • for: 本文提出了一种基于多任务学习图神经网络(MTL-GNN)的精准故障诊断方法,用于检测、定位和分类故障,同时还提供了缺陷和电流估计。
  • methods: 本文使用了图神经网络(GNN)来学习分布系统的topological结构和特征学习,并通过消息传递机制实现feature learning。
  • results: 数值测试 validate了模型在各任务中的表现,并且提出了一种基于GNN的解释方法来 Identify key nodes in the distribution system,以便进行 informed sparse measurements。
    Abstract Precise and timely fault diagnosis is a prerequisite for a distribution system to ensure minimum downtime and maintain reliable operation. This necessitates access to a comprehensive procedure that can provide the grid operators with insightful information in the case of a fault event. In this paper, we propose a heterogeneous multi-task learning graph neural network (MTL-GNN) capable of detecting, locating and classifying faults in addition to providing an estimate of the fault resistance and current. Using a graph neural network (GNN) allows for learning the topological representation of the distribution system as well as feature learning through a message-passing scheme. We investigate the robustness of our proposed model using the IEEE-123 test feeder system. This work also proposes a novel GNN-based explainability method to identify key nodes in the distribution system which then facilitates informed sparse measurements. Numerical tests validate the performance of the model across all tasks.
    摘要 <> translate "Precise and timely fault diagnosis is a prerequisite for a distribution system to ensure minimum downtime and maintain reliable operation. This necessitates access to a comprehensive procedure that can provide the grid operators with insightful information in the case of a fault event. In this paper, we propose a heterogeneous multi-task learning graph neural network (MTL-GNN) capable of detecting, locating and classifying faults in addition to providing an estimate of the fault resistance and current. Using a graph neural network (GNN) allows for learning the topological representation of the distribution system as well as feature learning through a message-passing scheme. We investigate the robustness of our proposed model using the IEEE-123 test feeder system. This work also proposes a novel GNN-based explainability method to identify key nodes in the distribution system which then facilitates informed sparse measurements. Numerical tests validate the performance of the model across all tasks." into 简化中文 >>Here's the translation:精准并快速的故障诊断是分布系统的必要前提,以确保最小化下时间和维护可靠的运行。这种情况下,需要访问一个全面的程序,以提供Grid操作人员有用的信息。在这篇论文中,我们提议一种多任务学习图神经网络(MTL-GNN),可以检测、定位和分类故障,同时提供故障抗力和电流的估计。使用图神经网络(GNN),可以学习分布系统的 topological表示,以及通过消息传递机制来学习特征。我们通过IEEE-123测试系统进行了对我们提议模型的Robustness测试。此外,我们还提出了一种基于GNN的解释方法,可以 identific key nodes在分布系统中,从而实现了 Informed sparse measurements。数值测试证明了模型在所有任务上的性能。

Plug in the Safety Chip: Enforcing Constraints for LLM-driven Robot Agents

  • paper_url: http://arxiv.org/abs/2309.09919
  • repo_url: None
  • paper_authors: Ziyi Yang, Shreyas S. Raman, Ankit Shah, Stefanie Tellex
  • for: 这 paper 的目的是提出一种基于线性时间逻辑 (LTL) 的问题可读性安全约束模块,以便在协同环境中部署大语言模型 (LLM) 代理,并实现安全的操作。
  • methods: 本 paper 使用了 LLM 进行协同操作,并采用了 NL 时间约束编码、安全违反逻辑和解释、以及危险动作擦除等方法来保证安全性。
  • results: 实验结果表明,本系统可以坚持安全约束,并在复杂的安全约束下进行扩展,这 highlights 了它在实际应用中的潜在实用性。
    Abstract Recent advancements in large language models (LLMs) have enabled a new research domain, LLM agents, for solving robotics and planning tasks by leveraging the world knowledge and general reasoning abilities of LLMs obtained during pretraining. However, while considerable effort has been made to teach the robot the "dos," the "don'ts" received relatively less attention. We argue that, for any practical usage, it is as crucial to teach the robot the "don'ts": conveying explicit instructions about prohibited actions, assessing the robot's comprehension of these restrictions, and, most importantly, ensuring compliance. Moreover, verifiable safe operation is essential for deployments that satisfy worldwide standards such as ISO 61508, which defines standards for safely deploying robots in industrial factory environments worldwide. Aiming at deploying the LLM agents in a collaborative environment, we propose a queryable safety constraint module based on linear temporal logic (LTL) that simultaneously enables natural language (NL) to temporal constraints encoding, safety violation reasoning and explaining, and unsafe action pruning. To demonstrate the effectiveness of our system, we conducted experiments in VirtualHome environment and on a real robot. The experimental results show that our system strictly adheres to the safety constraints and scales well with complex safety constraints, highlighting its potential for practical utility.
    摘要 最近的大语言模型(LLM)技术发展,开启了一个新的研究领域——LLM代理人,通过利用 pré-training 中获得的世界知识和通用逻辑能力,解决 робо太和规划任务。然而,相比之下,“不”received relatively less attention。我们认为,在实际应用中,也是非常重要教 robot “不”:表达明确的禁止行为,评估机器人的理解这些限制,并最重要的是,确保遵守。此外,确认安全操作是在国际标准ISO 61508中定义的安全部署 robots 在工业Factory环境中的标准。为了在协同环境中部署 LLM 代理人,我们提议使用可询问安全约束模块,该模块同时允许自然语言(NL)和时间约束编码、安全违反理解和解释,以及危险行为剔除。为了证明我们的系统的有效性,我们在虚拟家庭环境和真实机器人上进行了实验。实验结果表明,我们的系统坚持着安全约束,并且随着复杂的安全约束的增加,我们的系统表现良好,这说明它在实际应用中具有潜在的实用性。

Evaluation of Human-Understandability of Global Model Explanations using Decision Tree

  • paper_url: http://arxiv.org/abs/2309.09917
  • repo_url: None
  • paper_authors: Adarsa Sivaprasad, Ehud Reiter, Nava Tintarev, Nir Oren
  • for: The paper aims to improve the understandability and trustworthiness of AI models in healthcare applications, specifically for non-expert patients who may not have a strong background in AI or domain expertise.
  • methods: The authors use a decision tree model to generate both local and global explanations for patients identified as having a high risk of coronary heart disease. They test the effectiveness of these explanations with non-expert users and gather feedback to enhance the narrative global explanations.
  • results: The majority of participants prefer global explanations, while a smaller group prefers local explanations. The authors also find that task-based evaluations of mental models of these participants provide valuable feedback to enhance narrative global explanations, which can guide the design of health informatics systems that are both trustworthy and actionable.Here are the three points in Simplified Chinese text:
  • for: 这篇论文的目的是提高医疗应用中AI模型的可解释性和可信worthiness,特别是为非专业的患者,他们可能没有AI或领域专业知识。
  • methods: 作者使用决策树模型生成本地和全局解释,并对非专业用户进行测试,收集反馈以提高 narative全局解释。
  • results: 大多数参与者喜欢全局解释,而一个较小的组 prefer 本地解释。作者还发现,基于任务评估的受试者的心理模型提供了价值的反馈,以提高全局解释。
    Abstract In explainable artificial intelligence (XAI) research, the predominant focus has been on interpreting models for experts and practitioners. Model agnostic and local explanation approaches are deemed interpretable and sufficient in many applications. However, in domains like healthcare, where end users are patients without AI or domain expertise, there is an urgent need for model explanations that are more comprehensible and instil trust in the model's operations. We hypothesise that generating model explanations that are narrative, patient-specific and global(holistic of the model) would enable better understandability and enable decision-making. We test this using a decision tree model to generate both local and global explanations for patients identified as having a high risk of coronary heart disease. These explanations are presented to non-expert users. We find a strong individual preference for a specific type of explanation. The majority of participants prefer global explanations, while a smaller group prefers local explanations. A task based evaluation of mental models of these participants provide valuable feedback to enhance narrative global explanations. This, in turn, guides the design of health informatics systems that are both trustworthy and actionable.
    摘要 在可解释人工智能(XAI)研究中,主要的关注是为专家和实践者解释模型。但在医疗领域,因为患者没有人工智能或领域知识,有一个急需的需求是为模型提供更容易理解和带来信任的解释。我们提出的假设是,通过生成模型解释,使其更加易于理解,并且全面涵盖模型的运作。我们使用决策树模型生成本地和全局解释,并对非专业用户进行测试。我们发现,大多数参与者偏好全局解释,而一个较小的组 prefer local解释。这些参与者的任务基础评估表示,可以通过改进 narrative global解释来提高健康信息系统的可靠性和可行性。

Wait, That Feels Familiar: Learning to Extrapolate Human Preferences for Preference Aligned Path Planning

  • paper_url: http://arxiv.org/abs/2309.09912
  • repo_url: None
  • paper_authors: Haresh Karnan, Elvin Yang, Garrett Warnell, Joydeep Biswas, Peter Stone
  • for: 这paper的目的是解决自主移动任务中需要理解操作员指定的权重,以确保机器人的安全和任务完成。
  • methods: 这paper使用了 preference extrApolation for Terrain awarE Robot Navigation(PATERN)框架,该框架可以从机器人的感知数据中推断操作员对新地形的偏好。
  • results: 对比基eline方法,这paper的实验结果表明,PATERN可以强健地泛化到多种不同的地形和照明条件下,并能够导航在操作员的偏好驱动下。
    Abstract Autonomous mobility tasks such as lastmile delivery require reasoning about operator indicated preferences over terrains on which the robot should navigate to ensure both robot safety and mission success. However, coping with out of distribution data from novel terrains or appearance changes due to lighting variations remains a fundamental problem in visual terrain adaptive navigation. Existing solutions either require labor intensive manual data recollection and labeling or use handcoded reward functions that may not align with operator preferences. In this work, we posit that operator preferences for visually novel terrains, which the robot should adhere to, can often be extrapolated from established terrain references within the inertial, proprioceptive, and tactile domain. Leveraging this insight, we introduce Preference extrApolation for Terrain awarE Robot Navigation, PATERN, a novel framework for extrapolating operator terrain preferences for visual navigation. PATERN learns to map inertial, proprioceptive, tactile measurements from the robots observations to a representation space and performs nearest neighbor search in this space to estimate operator preferences over novel terrains. Through physical robot experiments in outdoor environments, we assess PATERNs capability to extrapolate preferences and generalize to novel terrains and challenging lighting conditions. Compared to baseline approaches, our findings indicate that PATERN robustly generalizes to diverse terrains and varied lighting conditions, while navigating in a preference aligned manner.
    摘要 自主移动任务,如最后一英里交付,需要考虑操作员表达的偏好在不同地形上 navigator 以确保机器人的安全和任务成功。然而,对于新地形或光学变化所带来的数据 OUT OF DISTRIBUTION 问题在视觉地形适应导航中仍然存在。现有的解决方案可能需要劳动 INTENSIVE 的手动数据收集和标注,或者使用手动编码的奖励函数,这些奖励函数可能并不与操作员的偏好相一致。在这种情况下,我们认为操作员对新地形的偏好可以从已有的地形参考中推断出来。基于这一点,我们提出了 Preference extrApolation for Terrain awarE Robot Navigation,简称 PATERN,一种新的框架,用于从地形参考中推断操作员对新地形的偏好。PATERN 使用地形参考中的各种力学、 proprioceptive 和感觉测量来映射到一个表示空间,并在这个空间中进行 nearest neighbor 搜索来估算操作员对新地形的偏好。通过实际的机器人实验,我们评估了 PATERN 对新地形和不同的照明条件的扩展和适应能力。与基准方法相比,我们的发现表明,PATERN 可以坚定地 generalize 到多种地形和照明条件,并在这些条件下导航以偏好适应的方式。

Evaluation of GPT-3 for Anti-Cancer Drug Sensitivity Prediction

  • paper_url: http://arxiv.org/abs/2309.10016
  • repo_url: None
  • paper_authors: Shaika Chowdhury, Sivaraman Rajaganapathy, Lichao Sun, James Cerhan, Nansu Zong
  • for: 这项研究探讨了使用GPT-3进行精准肿瘤治疗药物敏感性预测任务的可能性,使用结构化药物ogenomics数据,并评估了零批训练和精细调整方法的性能。
  • methods: 这项研究使用了GPT-3模型,并对结构化药物ogenomics数据进行分析和预测。
  • results: 研究发现,药物的笑容表达和细胞线的遗传变化特征是药物响应的预测因素。这些结果有可能为精准肿瘤治疗做出贡献,帮助设计更有效的治疗协议。
    Abstract In this study, we investigated the potential of GPT-3 for the anti-cancer drug sensitivity prediction task using structured pharmacogenomics data across five tissue types and evaluated its performance with zero-shot prompting and fine-tuning paradigms. The drug's smile representation and cell line's genomic mutation features were predictive of the drug response. The results from this study have the potential to pave the way for designing more efficient treatment protocols in precision oncology.
    摘要 在这项研究中,我们研究了GPT-3的抗癌药效预测能力,使用结构化药理学数据,对五种组织型进行评估,并评估了零模式提示和精度调整 paradigms。药物的笑容表达和细胞线的基因突变特征是药效预测的预测因素。这项研究的结果有可能为精准肿瘤学设计更有效的治疗协议开创道路。

The role of causality in explainable artificial intelligence

  • paper_url: http://arxiv.org/abs/2309.09901
  • repo_url: None
  • paper_authors: Gianluca Carloni, Andrea Berti, Sara Colantonio
  • for: 本文旨在探讨 causality 和 explainable artificial intelligence (XAI) 两个领域之间的关系,尤其是它们之间的联系是如何、以及如何利用这两个领域来建立信任于人工智能系统。
  • methods: 本文通过 investigate 文献来了解 causality 和 XAI 之间的关系,并 Identify 三个主要视角:首先,缺乏 causality 是当前 AI 和 XAI 方法的一个重要限制,并且“最佳”的解释的形式被 investigate ;第二是一种实用视角,认为 XAI 可以用于推动科学探索,通过识别可以实现的实验操作来推动科学探索;第三个视角支持 idea , causality 是 XAI 的Propedeutic 三种方式:利用 causality 概念支持或改进 XAI,利用 counterfactuals 来提供解释,并考虑 accessing causal model 作为解释。
  • results: 本文提供了一种统一的视角,把 causality 和 XAI 两个领域联系起来,并 highlight 了这两个领域之间的可能的链接和可能的限制。此外,本文还提供了 relevante 软件解决方案,用于自动化 causal 任务。
    Abstract Causality and eXplainable Artificial Intelligence (XAI) have developed as separate fields in computer science, even though the underlying concepts of causation and explanation share common ancient roots. This is further enforced by the lack of review works jointly covering these two fields. In this paper, we investigate the literature to try to understand how and to what extent causality and XAI are intertwined. More precisely, we seek to uncover what kinds of relationships exist between the two concepts and how one can benefit from them, for instance, in building trust in AI systems. As a result, three main perspectives are identified. In the first one, the lack of causality is seen as one of the major limitations of current AI and XAI approaches, and the "optimal" form of explanations is investigated. The second is a pragmatic perspective and considers XAI as a tool to foster scientific exploration for causal inquiry, via the identification of pursue-worthy experimental manipulations. Finally, the third perspective supports the idea that causality is propaedeutic to XAI in three possible manners: exploiting concepts borrowed from causality to support or improve XAI, utilizing counterfactuals for explainability, and considering accessing a causal model as explaining itself. To complement our analysis, we also provide relevant software solutions used to automate causal tasks. We believe our work provides a unified view of the two fields of causality and XAI by highlighting potential domain bridges and uncovering possible limitations.
    摘要 causality 和 explainable Artificial Intelligence (XAI) 在计算机科学中发展成为两个独立的领域,尽管它们的基本概念之间有共同的古老根基。这种情况进一步加剧了由lack of review works jointly covering these two fields所带来的分化。在这篇论文中,我们对文献进行调查,以便更好地理解causality 和 XAI 之间的关系。更具体地说,我们想要找出causality 和 XAI 之间的关系是什么样的,以及如何在建立人工智能系统的信任方面利用它们。结果,我们分析出了三个主要的视角:第一个视角认为现有的 AI 和 XAI 方法缺乏 causality 是一个重要的限制,并investigate the "optimal" form of explanations。第二个视角是一种实用的视角,认为 XAI 可以用于推动科学探索,通过识别可以实施的实验操作。最后,第三个视角支持idea that causality is propaedeutic to XAI in three possible manners:borrowing concepts from causality to support or improve XAI,utilizing counterfactuals for explainability,和considering accessing a causal model as explaining itself。为了补充我们的分析,我们还提供了用于自动化 causal 任务的相关软件解决方案。我们认为我们的工作提供了两个领域之间的统一视角,揭示了可能的领域桥梁,并揭示了可能的限制。

Towards Ontology Construction with Language Models

  • paper_url: http://arxiv.org/abs/2309.09898
  • repo_url: None
  • paper_authors: Maurice Funk, Simon Hosemann, Jean Christoph Jung, Carsten Lutz
  • for: 自动构建域层次结构
  • methods: 通过问题大语言模型
  • results: LLMs 可以帮助建立域层次结构
    Abstract We present a method for automatically constructing a concept hierarchy for a given domain by querying a large language model. We apply this method to various domains using OpenAI's GPT 3.5. Our experiments indicate that LLMs can be of considerable help for constructing concept hierarchies.
    摘要 我们提出了一种方法,用于自动构建域名空间中的概念层次结构,通过问题大型自然语言模型(LLM)。我们对不同领域进行应用,使用OpenAI的GPT 3.5。我们的实验表明,LLM可以帮助建立概念层次结构。Here's a breakdown of the translation:* "We present a method" becomes "我们提出了一种方法" (wǒmen tīshì yī zhī fāngzhì)* "for automatically constructing a concept hierarchy" becomes "用于自动构建域名空间中的概念层次结构" (yǐng yú zìdān jīngjì xìngshì)* "by querying a large language model" becomes "通过问题大型自然语言模型(LLM)" (tōngguò wèn tí dàxíng zhīyǐng yǔyán módel)* "We apply this method to various domains" becomes "我们对不同领域进行应用" (wǒmen duìfāng bùdìng fāngyì)* "using OpenAI's GPT 3.5" becomes "使用OpenAI的GPT 3.5" (fùyòu OpenAI de GPT 3.5)* "Our experiments indicate that LLMs can be of considerable help for constructing concept hierarchies" becomes "我们的实验表明,LLM可以帮助建立概念层次结构" (wǒmen de shíyè bǎozhèng, LLM cóuzhù bāngzhì jīnjī)

Context is Environment

  • paper_url: http://arxiv.org/abs/2309.09888
  • repo_url: https://github.com/apostrophecms/apostrophe
  • paper_authors: Sharut Gupta, Stefanie Jegelka, David Lopez-Paz, Kartik Ahuja
  • for: 提高领域泛化性能
  • methods: 使用增强上下文学习来减少杂 correlate和提高领域泛化性能
  • results: 经验和理论表明,ICRM算法可以在不知道标签的情况下,通过关注上下文来提高OD性能。
    Abstract Two lines of work are taking the central stage in AI research. On the one hand, the community is making increasing efforts to build models that discard spurious correlations and generalize better in novel test environments. Unfortunately, the bitter lesson so far is that no proposal convincingly outperforms a simple empirical risk minimization baseline. On the other hand, large language models (LLMs) have erupted as algorithms able to learn in-context, generalizing on-the-fly to eclectic contextual circumstances that users enforce by means of prompting. In this paper, we argue that context is environment, and posit that in-context learning holds the key to better domain generalization. Via extensive theory and experiments, we show that paying attention to context$\unicode{x2013}\unicode{x2013}$unlabeled examples as they arrive$\unicode{x2013}\unicode{x2013}$allows our proposed In-Context Risk Minimization (ICRM) algorithm to zoom-in on the test environment risk minimizer, leading to significant out-of-distribution performance improvements. From all of this, two messages are worth taking home. Researchers in domain generalization should consider environment as context, and harness the adaptive power of in-context learning. Researchers in LLMs should consider context as environment, to better structure data towards generalization.
    摘要

EGFE: End-to-end Grouping of Fragmented Elements in UI Designs with Multimodal Learning

  • paper_url: http://arxiv.org/abs/2309.09867
  • repo_url: https://github.com/test2975/egfe
  • paper_authors: Liuqing Chen, Yunnong Chen, Shuhong Xiao, Yaxuan Song, Lingyun Sun, Yankun Zhen, Tingting Zhou, Yanfang Chang
  • for: 快速化 Front-end 应用程序和 GUI 迭代开发
  • methods: 使用 Transformer Encoder 模型 UI 元素之间的关系,并使用多Modal 表示学习
  • results: 与基eline 比较,EGFE 方法在精度(+29.75%)、回归(+31.07%)和 F1 分数(+30.39%)上具有显著优势,并在实际软件工程应用中进行了效果评估。
    Abstract When translating UI design prototypes to code in industry, automatically generating code from design prototypes can expedite the development of applications and GUI iterations. However, in design prototypes without strict design specifications, UI components may be composed of fragmented elements. Grouping these fragmented elements can greatly improve the readability and maintainability of the generated code. Current methods employ a two-stage strategy that introduces hand-crafted rules to group fragmented elements. Unfortunately, the performance of these methods is not satisfying due to visually overlapped and tiny UI elements. In this study, we propose EGFE, a novel method for automatically End-to-end Grouping Fragmented Elements via UI sequence prediction. To facilitate the UI understanding, we innovatively construct a Transformer encoder to model the relationship between the UI elements with multi-modal representation learning. The evaluation on a dataset of 4606 UI prototypes collected from professional UI designers shows that our method outperforms the state-of-the-art baselines in the precision (by 29.75\%), recall (by 31.07\%), and F1-score (by 30.39\%) at edit distance threshold of 4. In addition, we conduct an empirical study to assess the improvement of the generated front-end code. The results demonstrate the effectiveness of our method on a real software engineering application. Our end-to-end fragmented elements grouping method creates opportunities for improving UI-related software engineering tasks.
    摘要 当在产业中将用户界面设计原型转换成代码时,自动生成代码从设计原型可以大大提高应用程序和Graphical User Interface(GUI)的开发效率。然而,在没有严格的设计规范的情况下,用户界面组件可能会被分割成多个碎片。将这些碎片组合起来可以大大提高代码的可读性和维护性。现有的方法采用两阶段策略,通过手动制定规则来组合碎片元素。然而,这些方法的性能不 satisfactory,因为视觉上的重叠和小型用户界面元素。在本研究中,我们提出了一种 novel方法,名为 End-to-end Grouping Fragmented Elements(EGFE),通过用户界面序列预测来自动组合碎片元素。为了促进用户界面理解,我们创新地构建了一个 TransformerEncoder,用于模型用户界面元素之间的关系,并通过多modal表示学习来学习这些关系。我们对收集了4606个用户界面原型的数据集进行评估,结果显示,我们的方法在编辑距离阈值为4时,与state-of-the-art基线方法相比,提高了精度(29.75%),回归率(31.07%)和F1分数(30.39%)。此外,我们进行了一个实际的研究,以评估生成的前端代码的改进。结果表明,我们的方法在实际软件工程应用中具有效果。我们的末端碎片元素组合方法创造了对用户界面相关的软件工程任务的机会。

Learning Spatial and Temporal Hierarchies: Hierarchical Active Inference for navigation in Multi-Room Maze Environments

  • paper_url: http://arxiv.org/abs/2309.09864
  • repo_url: None
  • paper_authors: Daria de Tinguy, Toon Van de Maele, Tim Verbelen, Bart Dhoedt
  • for: 该论文旨在解决从像素级别观察到环境结构的推理挑战,提出了一种层次式活动推理模型。
  • methods: 该模型包括一个认知地图、一个allocentric和一个egocentric世界模型,结合了好奇的探索和目标带导的行为,从上下文、地点到动作的不同层次进行了理解和推理。
  • results: 该模型在房间结构小网格环境中实现了高效的探索和目标带导搜索。
    Abstract Cognitive maps play a crucial role in facilitating flexible behaviour by representing spatial and conceptual relationships within an environment. The ability to learn and infer the underlying structure of the environment is crucial for effective exploration and navigation. This paper introduces a hierarchical active inference model addressing the challenge of inferring structure in the world from pixel-based observations. We propose a three-layer hierarchical model consisting of a cognitive map, an allocentric, and an egocentric world model, combining curiosity-driven exploration with goal-oriented behaviour at the different levels of reasoning from context to place to motion. This allows for efficient exploration and goal-directed search in room-structured mini-grid environments.
    摘要 cognitive maps 扮演了灵活行为的关键角色,协助表示环境中的空间和概念关系。学习和推理环境的结构是有效探索和导航的关键。这篇论文提出了一种层次活动推理模型,解决从像素级别观察到世界结构的推理挑战。我们提议了一种三层层次模型,包括认知地图、allocentric 和 egocentric 世界模型,将curiosity-driven 探索与目标尝试结合在不同的理解层次上,从上下文到场景到运动。这种方法可以有效地在室内小型网格环境中进行探索和目标寻找。

SYNDICOM: Improving Conversational Commonsense with Error-Injection and Natural Language Feedback

  • paper_url: http://arxiv.org/abs/2309.10015
  • repo_url: https://github.com/hieu9955/ggggg
  • paper_authors: Christopher Richardson, Anirudh Sundar, Larry Heck
  • for: 提高对话响应的常识理解
  • methods: 使用知识图和自然语言Feedback(NLF)创建对话对话数据集,并使用两步验证过程:首先预测NLF,然后使用预测结果和无效响应生成对话。
  • results: 比ChatGPT提高53%的ROUGE1分数,人类评价者57%时喜欢SYNDICOM。
    Abstract Commonsense reasoning is a critical aspect of human communication. Despite recent advances in conversational AI driven by large language models, commonsense reasoning remains a challenging task. In this work, we introduce SYNDICOM - a method for improving commonsense in dialogue response generation. SYNDICOM consists of two components. The first component is a dataset composed of commonsense dialogues created from a knowledge graph and synthesized into natural language. This dataset includes both valid and invalid responses to dialogue contexts, along with natural language feedback (NLF) for the invalid responses. The second contribution is a two-step procedure: training a model to predict natural language feedback (NLF) for invalid responses, and then training a response generation model conditioned on the predicted NLF, the invalid response, and the dialogue. SYNDICOM is scalable and does not require reinforcement learning. Empirical results on three tasks are evaluated using a broad range of metrics. SYNDICOM achieves a relative improvement of 53% over ChatGPT on ROUGE1, and human evaluators prefer SYNDICOM over ChatGPT 57% of the time. We will publicly release the code and the full dataset.
    摘要 常识理解是人类communication的重要方面。 despite recent advances in conversational AI driven by large language models, commonsense reasoning remains a challenging task. In this work, we introduce SYNDICOM - a method for improving commonsense in dialogue response generation. SYNDICOM consists of two components. The first component is a dataset composed of commonsense dialogues created from a knowledge graph and synthesized into natural language. This dataset includes both valid and invalid responses to dialogue contexts, along with natural language feedback (NLF) for the invalid responses. The second contribution is a two-step procedure: training a model to predict natural language feedback (NLF) for invalid responses, and then training a response generation model conditioned on the predicted NLF, the invalid response, and the dialogue. SYNDICOM is scalable and does not require reinforcement learning. Empirical results on three tasks are evaluated using a broad range of metrics. SYNDICOM achieves a relative improvement of 53% over ChatGPT on ROUGE1, and human evaluators prefer SYNDICOM over ChatGPT 57% of the time. We will publicly release the code and the full dataset.Here's the translation in Traditional Chinese as well:常识理解是人类communication的重要方面。 despite recent advances in conversational AI driven by large language models, commonsense reasoning remains a challenging task. In this work, we introduce SYNDICOM - a method for improving commonsense in dialogue response generation. SYNDICOM consists of two components. The first component is a dataset composed of commonsense dialogues created from a knowledge graph and synthesized into natural language. This dataset includes both valid and invalid responses to dialogue contexts, along with natural language feedback (NLF) for the invalid responses. The second contribution is a two-step procedure: training a model to predict natural language feedback (NLF) for invalid responses, and then training a response generation model conditioned on the predicted NLF, the invalid response, and the dialogue. SYNDICOM is scalable and does not require reinforcement learning. Empirical results on three tasks are evaluated using a broad range of metrics. SYNDICOM achieves a relative improvement of 53% over ChatGPT on ROUGE1, and human evaluators prefer SYNDICOM over ChatGPT 57% of the time. We will publicly release the code and the full dataset.

CC-SGG: Corner Case Scenario Generation using Learned Scene Graphs

  • paper_url: http://arxiv.org/abs/2309.09844
  • repo_url: None
  • paper_authors: George Drayson, Efimia Panagiotaki, Daniel Omeiza, Lars Kunze
  • for: 增强自动驾驶车辆的安全性测试和验证
  • methods: 使用异构图 neural network 将常见驾驶场景转换为异常场景
  • results: 成功将常见驾驶场景转换为异常场景,实现89.9%的预测精度,并证明模型能够创造基eline方法所不能处理的特殊情况。
    Abstract Corner case scenarios are an essential tool for testing and validating the safety of autonomous vehicles (AVs). As these scenarios are often insufficiently present in naturalistic driving datasets, augmenting the data with synthetic corner cases greatly enhances the safe operation of AVs in unique situations. However, the generation of synthetic, yet realistic, corner cases poses a significant challenge. In this work, we introduce a novel approach based on Heterogeneous Graph Neural Networks (HGNNs) to transform regular driving scenarios into corner cases. To achieve this, we first generate concise representations of regular driving scenes as scene graphs, minimally manipulating their structure and properties. Our model then learns to perturb those graphs to generate corner cases using attention and triple embeddings. The input and perturbed graphs are then imported back into the simulation to generate corner case scenarios. Our model successfully learned to produce corner cases from input scene graphs, achieving 89.9% prediction accuracy on our testing dataset. We further validate the generated scenarios on baseline autonomous driving methods, demonstrating our model's ability to effectively create critical situations for the baselines.
    摘要 弯道情况场景是自动驾驶车辆(AV)的测试和验证安全工具。然而,这些情况场景通常不足于自然驾驶数据集中,因此通过增强数据集中的人工弯道情况场景可以大大提高AV的安全运行。然而,生成具有真实感的人工弯道情况场景是一项挑战。在这种情况下,我们提出了一种基于异种图 neural network(HGNN)的新方法,可以将常见驾驶场景转换成弯道情况场景。我们首先生成了常见驾驶场景的简洁表示,并将其结构和属性进行最小的修改。然后,我们的模型通过注意力和 triple embeddings 来对这些图进行扰动,以生成弯道情况场景。最后,我们将生成的弯道情况场景重新导入到模拟器中,以生成弯道场景。我们的模型成功地将输入场景图转换成弯道场景,测试集上的预测精度达89.9%。此外,我们还 validate了生成的场景,证明我们的模型可以生成对基eline autonomous driving方法 Critical Situation。

RECAP: Retrieval-Augmented Audio Captioning

  • paper_url: http://arxiv.org/abs/2309.09836
  • repo_url: None
  • paper_authors: Sreyan Ghosh, Sonal Kumar, Chandra Kiran Reddy Evuru, Ramani Duraiswami, Dinesh Manocha
  • for: 这个论文旨在提出一种新的音频captioning系统,即RECAP(REtrieval-Augmented Audio CAPtioning),它可以基于输入音频和其他相似音频从数据存储中检索类似的caption。
  • methods: 该方法使用CLAP音频-文本模型来检索相似的caption,然后使用GPT-2解码器和CLAP编码器之间的交叉关注层来conditioning audio дляcaption生成。
  • results: 实验表明,RECAP在本地设置中达到了竞争性性能,而在另外一个设置中具有显著的改进。此外,由于它可以在training-free的方式下使用大量的文本-caption-只datastore,因此RECAP可以caption novel audio事件和多个事件的组合音频。
    Abstract We present RECAP (REtrieval-Augmented Audio CAPtioning), a novel and effective audio captioning system that generates captions conditioned on an input audio and other captions similar to the audio retrieved from a datastore. Additionally, our proposed method can transfer to any domain without the need for any additional fine-tuning. To generate a caption for an audio sample, we leverage an audio-text model CLAP to retrieve captions similar to it from a replaceable datastore, which are then used to construct a prompt. Next, we feed this prompt to a GPT-2 decoder and introduce cross-attention layers between the CLAP encoder and GPT-2 to condition the audio for caption generation. Experiments on two benchmark datasets, Clotho and AudioCaps, show that RECAP achieves competitive performance in in-domain settings and significant improvements in out-of-domain settings. Additionally, due to its capability to exploit a large text-captions-only datastore in a \textit{training-free} fashion, RECAP shows unique capabilities of captioning novel audio events never seen during training and compositional audios with multiple events. To promote research in this space, we also release 150,000+ new weakly labeled captions for AudioSet, AudioCaps, and Clotho.
    摘要 我们提出了RECAP(REtrieval-Augmented Audio CAPtioning),一种新的有效的音频描述系统,它根据输入音频和相似的音频从数据库中检索到类似的描述,并且不需要任何额外的调整。为生成音频描述,我们利用CLAP音频文本模型从可更换数据库中检索类似的描述,然后将它们用于构建提示。接着,我们将这个提示传递给GPT-2解码器,并在CLAP编码器和GPT-2之间添加交叉注意力层,以condition the audio for caption generation。实验表明,RECAP在预训练集和非预训练集上具有竞争力和显著提高的表现。此外,由于它可以在训练集中使用大量的文本描述只数据库,RECAP表现出了新领域的描述和多事件音频的特有能力。为促进这个领域的研究,我们还发布了150,000多个弱Label的描述数据,用于AudioSet、AudioCaps和Clotho等数据集。

Task Selection and Assignment for Multi-modal Multi-task Dialogue Act Classification with Non-stationary Multi-armed Bandits

  • paper_url: http://arxiv.org/abs/2309.09832
  • repo_url: None
  • paper_authors: Xiangheng He, Junjie Chen, Björn W. Schuller
  • for: 提高主任务性能,通过共同学习相关辅助任务。
  • methods: 非站立多臂扔投(MAB)与减少 Thompson 抽样(TS)使用 Gaussian 分布。
  • results: 在不同训练阶段,不同任务有不同的用处。我们的提议方法可以有效地确定任务用处,避免无用或害处任务,并在训练中进行任务分配。我们的方法与单任务和多任务基elines相比,在 UAR 和 F1 方面具有显著优势(p-value < 0.05)。进一步分析实验结果表明,对于数据不均衡问题的数据集,我们的方法具有更高的稳定性,可以获得稳定且良好的性能 для少数类。我们的方法超越当前状态的艺术模型。
    Abstract Multi-task learning (MTL) aims to improve the performance of a primary task by jointly learning with related auxiliary tasks. Traditional MTL methods select tasks randomly during training. However, both previous studies and our results suggest that such the random selection of tasks may not be helpful, and can even be harmful to performance. Therefore, new strategies for task selection and assignment in MTL need to be explored. This paper studies the multi-modal, multi-task dialogue act classification task, and proposes a method for selecting and assigning tasks based on non-stationary multi-armed bandits (MAB) with discounted Thompson Sampling (TS) using Gaussian priors. Our experimental results show that in different training stages, different tasks have different utility. Our proposed method can effectively identify the task utility, actively avoid useless or harmful tasks, and realise the task assignment during training. Our proposed method is significantly superior in terms of UAR and F1 to the single-task and multi-task baselines with p-values < 0.05. Further analysis of experiments indicates that for the dataset with the data imbalance problem, our proposed method has significantly higher stability and can obtain consistent and decent performance for minority classes. Our proposed method is superior to the current state-of-the-art model.
    摘要 多任务学习(MTL)目标是通过同时学习相关的 aux 任务来提高主任务的性能。传统的 MTL 方法在训练过程中随机选择任务。然而,前一 studies 和我们的结果表明,随机选择任务可能并不是有利的,甚至会对性能产生负面影响。因此,新的任务选择和分配策略在 MTL 中需要被探索。本文研究了多模态、多任务对话权分类任务,并提出了基于非站ARY多臂枪(MAB)和折扣 Thompson Sampling(TS)using Gaussian priors 的任务选择和分配策略。我们的实验结果表明,在不同的训练阶段,不同的任务具有不同的用户性。我们的提议的方法可以有效地确定任务的用户性,避免无用或害的任务,并在训练过程中实现任务分配。我们的提议的方法与单任务和多任务基eline 相比,在 UAR 和 F1 上有 statistically 显著的优势(p-value < 0.05)。进一步的分析结果表明,对数据不均衡问题的 dataset 上,我们的方法具有更高的稳定性,并可以在训练过程中获得稳定和 descent 的性能 для少数类。我们的方法超过了当前领域的状态对模型。

Clustering of Urban Traffic Patterns by K-Means and Dynamic Time Warping: Case Study

  • paper_url: http://arxiv.org/abs/2309.09830
  • repo_url: None
  • paper_authors: Sadegh Etemad, Raziyeh Mosayebi, Tadeh Alexani Khodavirdian, Elahe Dastan, Amir Salari Telmadarreh, Mohammadreza Jafari, Sepehr Rafiei
  • for: 这个论文的目的是提出一种基于K-Means和动态时间扭曲算法的时间序列划分方法,用于描述城市交通流量的pattern。
  • methods: 本论文使用的方法包括速度时序列EXTRACTOR和K-Means算法,以及基于时间扭曲的方法。
  • results: 实验结果表明,提议的方法可以准确地描述城市交通流量的pattern,并且可以提供有用的信息 для城市交通管理和规划。
    Abstract Clustering of urban traffic patterns is an essential task in many different areas of traffic management and planning. In this paper, two significant applications in the clustering of urban traffic patterns are described. The first application estimates the missing speed values using the speed of road segments with similar traffic patterns to colorify map tiles. The second one is the estimation of essential road segments for generating addresses for a local point on the map, using the similarity patterns of different road segments. The speed time series extracts the traffic pattern in different road segments. In this paper, we proposed the time series clustering algorithm based on K-Means and Dynamic Time Warping. The case study of our proposed algorithm is based on the Snapp application's driver speed time series data. The results of the two applications illustrate that the proposed method can extract similar urban traffic patterns.
    摘要 clustering 城市交通模式是许多交通管理和规划领域的关键任务。本文描述了两种城市交通模式的 clustering 应用。第一个应用是使用同样交通模式的道路段速度来彩色地图块。第二个应用是根据不同道路段的相似性模式来生成地图上的地址。速度时间序列提取了不同道路段的交通模式。本文提出了基于 K-Means 和动态时间扩展的时间序列 clustering 算法。案例研究基于 Snapp 应用的 води手速度时间序列数据。结果表明,提posed 方法可以提取类似的城市交通模式。Here's the translation of the text into Traditional Chinese: clustering 城市交通模式是许多交通管理和规划领域的关键任务。本文描述了两种城市交通模式的 clustering 应用。第一个应用是使用同样交通模式的道路段速度来彩色地图块。第二个应用是根据不同道路段的相似性模式来生成地图上的地址。速度时间序列提取了不同道路段的交通模式。本文提出了基于 K-Means 和动态时间扩展的时间序列 clustering 算法。案例研究基于 Snapp 应用的 води手速度时间序列数据。结果表明,提案的方法可以提取类似的城市交通模式。

Efficient Avoidance of Vulnerabilities in Auto-completed Smart Contract Code Using Vulnerability-constrained Decoding

  • paper_url: http://arxiv.org/abs/2309.09826
  • repo_url: None
  • paper_authors: André Storhaug, Jingyue Li, Tianyuan Hu
  • for: 本研究旨在提高大型自然语言模型(LLM)在代码生成中的安全性,避免由这些模型生成的代码具有漏洞。
  • methods: 我们提出了一种新的漏洞约束生成方法,通过在代码生成过程中避免生成漏洞的代码来减少漏洞的出现。我们使用了一小组标注了漏洞代码的数据,使得模型在生成代码时包含漏洞标签。然后,在生成代码时,我们禁止模型生成这些标签,以避免生成漏洞代码。
  • results: 我们使用ETH的智能合约(SC)作为实验 case study,因为SC的安全性要求非常严格。我们先使用6亿参数的GPT-J模型进行了186397个SC的精度 fine-tuning,并将2217692个SC中的重复项去除。 fine-tuning需要一周以上的时间,使用了10个GPU。我们发现,我们的精度 fine-tuning可以生成SCs的平均BLEU分数为0.557。然而,大量的代码在自动完成后仍然存在漏洞。我们使用不同类型的漏洞代码的176个SC中的代码 перед漏洞行进行自动完成,发现超过70%的代码是不安全的。因此,我们进一步 fine-tuning了模型,使其能够避免这些漏洞。我们在另外941个漏洞SC中进行了精度 fine-tuning,并在代码生成过程中应用漏洞约束。 fine-tuning只需一个小时,使用了4个GPU。我们再次自动完成了176个SC,发现我们的方法可以将62%的代码标记为不安全,并避免生成67%的代码。
    Abstract Auto-completing code enables developers to speed up coding significantly. Recent advances in transformer-based large language model (LLM) technologies have been applied to code synthesis. However, studies show that many of such synthesized codes contain vulnerabilities. We propose a novel vulnerability-constrained decoding approach to reduce the amount of vulnerable code generated by such models. Using a small dataset of labeled vulnerable lines of code, we fine-tune an LLM to include vulnerability labels when generating code, acting as an embedded classifier. Then, during decoding, we deny the model to generate these labels to avoid generating vulnerable code. To evaluate the method, we chose to automatically complete Ethereum Blockchain smart contracts (SCs) as the case study due to the strict requirements of SC security. We first fine-tuned the 6-billion-parameter GPT-J model using 186,397 Ethereum SCs after removing the duplication from 2,217,692 SCs. The fine-tuning took more than one week using ten GPUs. The results showed that our fine-tuned model could synthesize SCs with an average BLEU (BiLingual Evaluation Understudy) score of 0.557. However, many codes in the auto-completed SCs were vulnerable. Using the code before the vulnerable line of 176 SCs containing different types of vulnerabilities to auto-complete the code, we found that more than 70% of the auto-completed codes were insecure. Thus, we further fine-tuned the model on other 941 vulnerable SCs containing the same types of vulnerabilities and applied vulnerability-constrained decoding. The fine-tuning took only one hour with four GPUs. We then auto-completed the 176 SCs again and found that our approach could identify 62% of the code to be generated as vulnerable and avoid generating 67% of them, indicating the approach could efficiently and effectively avoid vulnerabilities in the auto-completed code.
    摘要 自动完成代码可以大大提高开发者的编程速度。现代 transformer 基于大型语言模型(LLM)技术的应用已经被应用于代码生成。然而,研究表明,许多生成的代码含有漏洞。我们提出了一种权限 constrained decoding 方法,以避免生成的代码中漏洞。我们使用一小个标注有漏洞的代码行的数据集,并在 fine-tune 一个 LLM 以包含漏洞标签,作为内置分类器。然后,在解码时,我们不允许模型生成这些标签,以避免生成漏洞的代码。为评估方法,我们选择了完成 Ethereum Blockchain 智能合约(SC)作为研究 caso。 SC 的安全要求非常严格,因此可以很好地测试我们的方法。我们首先 fine-tune 60亿参数的 GPT-J 模型,使其能够生成 SC。我们使用了 186,397 个 Ethereum SC,并从 2,217,692 个 SC 中去掉重复。 fine-tuning 需要一周以上,使用了十个 GPU。结果表明,我们的 fine-tuned 模型可以在 SC 中获得平均 BLEU 分数为 0.557。然而,许多生成的代码中存在漏洞。我们使用不同类型的漏洞的 176 个 SC 的代码前几行来自动完成代码。我们发现,More than 70% 的自动生成代码存在漏洞。因此,我们进一步 fine-tune 模型,使其可以生成不含漏洞的代码。我们使用了 941 个漏洞 SC,并在四个 GPU 上进行 fine-tuning。 fine-tuning 只需要一个小时。然后,我们再次自动完成了 176 个 SC,发现我们的方法可以识别出 62% 的代码需要被生成,并避免生成 67% 的代码。这表明,我们的方法可以有效地避免生成的代码中漏洞。

Bias of AI-Generated Content: An Examination of News Produced by Large Language Models

  • paper_url: http://arxiv.org/abs/2309.09825
  • repo_url: None
  • paper_authors: Xiao Fang, Shangkun Che, Minjia Mao, Hongzhe Zhang, Ming Zhao, Xiaohang Zhao
  • for: 本研究旨在了解大语言模型(LLM)生成的偏见。
  • methods: 我们使用七个代表性的LLM生成新闻文章的标题作为提示,并评估这些LLM生成的媒体内容中的性别和种族偏见。
  • results: 我们发现所有考测LLM都表现出了显著的性别和种族偏见,而且女性和黑人受到了明显的歧视。 chatGPT的生成内容中具有最低的偏见水平,而且它是唯一一个能够拒绝基于偏见提示生成内容的模型。
    Abstract Large language models (LLMs) have the potential to transform our lives and work through the content they generate, known as AI-Generated Content (AIGC). To harness this transformation, we need to understand the limitations of LLMs. Here, we investigate the bias of AIGC produced by seven representative LLMs, including ChatGPT and LLaMA. We collect news articles from The New York Times and Reuters, both known for their dedication to provide unbiased news. We then apply each examined LLM to generate news content with headlines of these news articles as prompts, and evaluate the gender and racial biases of the AIGC produced by the LLM by comparing the AIGC and the original news articles. We further analyze the gender bias of each LLM under biased prompts by adding gender-biased messages to prompts constructed from these news headlines. Our study reveals that the AIGC produced by each examined LLM demonstrates substantial gender and racial biases. Moreover, the AIGC generated by each LLM exhibits notable discrimination against females and individuals of the Black race. Among the LLMs, the AIGC generated by ChatGPT demonstrates the lowest level of bias, and ChatGPT is the sole model capable of declining content generation when provided with biased prompts.
    摘要 大型语言模型(LLM)有可能改变我们的生活和工作通过它们生成的内容,称为人工智能生成内容(AIGC)。为了利用这一变革,我们需要了解LLM的局限性。我们在这里调查了7个代表性的LLM中的偏见,包括ChatGPT和LLaMA。我们从纽约时报和路透社获取了不偏见的新闻文章,然后对每个检查LLM中的新闻标题作为提示,生成新闻内容,并评估生成的AI内容中的性别和种族偏见。我们进一步分析每个LLM的性别偏见情况,并通过对提示语言添加性别偏见的消息来评估每个LLM的性别偏见。我们的研究发现,每个检查LLM都表现出了显著的性别和种族偏见,而且AIGC生成的 females和黑人受到了明显的歧视。与其他LLM不同,ChatGPT的AIGC表现出最低的偏见水平,并且ChatGPT是唯一一个能够拒绝生成内容的模型。

VisualProg Distiller: Learning to Fine-tune Non-differentiable Visual Programming Frameworks

  • paper_url: http://arxiv.org/abs/2309.09809
  • repo_url: None
  • paper_authors: Wentao Wan, Zeqing Wang, Nan Kang, Keze Wang, Zhiyu Shen, Liang Lin
  • for: 提高VisualProg的实用性和任务性能。
  • methods: 提出了一种基于教师模型的VisualProg Distiller方法,通过逐步填充和筛选知识来优化VisualProg子模块的表现,从而提高整体任务性能。
  • results: 经过广泛和全面的实验评估,提出的方法可以大幅提高VisualProg的性能,并超越所有比较方法。
    Abstract As an interpretable and universal neuro-symbolic paradigm based on Large Language Models, visual programming (VisualProg) can execute compositional visual tasks without training, but its performance is markedly inferior compared to task-specific supervised learning models. To increase its practicality, the performance of VisualProg on specific tasks needs to be improved. However, the non-differentiability of VisualProg limits the possibility of employing the fine-tuning strategy on specific tasks to achieve further improvements. In our analysis, we discovered that significant performance issues in VisualProg's execution originated from errors made by the sub-modules at corresponding visual sub-task steps. To address this, we propose ``VisualProg Distiller", a method of supplementing and distilling process knowledge to optimize the performance of each VisualProg sub-module on decoupled visual sub-tasks, thus enhancing the overall task performance. Specifically, we choose an end-to-end model that is well-performed on the given task as the teacher and further distill the knowledge of the teacher into the invoked visual sub-modules step-by-step based on the execution flow of the VisualProg-generated programs. In this way, our method is capable of facilitating the fine-tuning of the non-differentiable VisualProg frameworks effectively. Extensive and comprehensive experimental evaluations demonstrate that our method can achieve a substantial performance improvement of VisualProg, and outperforms all the compared state-of-the-art methods by large margins. Furthermore, to provide valuable process supervision for the GQA task, we construct a large-scale dataset by utilizing the distillation process of our method.
    摘要 为了提高VisualProg在特定任务上的实际性,我们需要提高它在特定任务上的性能。然而,VisualProg的非导数性限制了我们使用精度调教策略来实现进一步改进的可能性。在我们的分析中,我们发现了VisualProg执行过程中的显著性能问题,这些问题主要来自于VisualProg中的子模块在相应的视觉子任务步骤上所作出的错误。为了解决这个问题,我们提出了“VisualProg Distiller”方法,该方法通过补充和涂抹过程知识来优化VisualProg子模块在分离的视觉子任务上的性能,从而提高整体任务性能。具体来说,我们选择了在给定任务上表现良好的端到端模型作为教师,然后通过执行VisualProg生成的程序的执行流程步骤weise distill the teacher's knowledge into the invoked visual sub-modules。这种方法可以有效地调整不导数的VisualProg框架。我们的方法在广泛和全面的实验评估中表现出色,可以大幅提高VisualProg的性能,并与所有相关的状态前方法相比而言,表现出大幅度的优势。此外,为了为GQA任务提供有价值的过程监督,我们利用我们的方法construct了一个大规模的 dataset。

Efficient Concept Drift Handling for Batch Android Malware Detection Models

  • paper_url: http://arxiv.org/abs/2309.09807
  • repo_url: https://gitlab.com/serralba/concept_drift
  • paper_authors: Molina-Coronado B., Mori U., Mendiburu A., Miguel-Alonso J
    for: This paper aims to address the challenge of maintaining the performance of static machine learning-based malware detectors in rapidly evolving Android app environments.methods: The paper employs retraining techniques to maintain detector capabilities over time, and analyzes the effect of two aspects (frequency of retraining and data used for retraining) on efficiency and performance.results: The experiments show that concept drift detection and sample selection mechanisms can be used to efficiently maintain the performance of static Android malware state-of-the-art detectors in changing environments.
    Abstract The rapidly evolving nature of Android apps poses a significant challenge to static batch machine learning algorithms employed in malware detection systems, as they quickly become obsolete. Despite this challenge, the existing literature pays limited attention to addressing this issue, with many advanced Android malware detection approaches, such as Drebin, DroidDet and MaMaDroid, relying on static models. In this work, we show how retraining techniques are able to maintain detector capabilities over time. Particularly, we analyze the effect of two aspects in the efficiency and performance of the detectors: 1) the frequency with which the models are retrained, and 2) the data used for retraining. In the first experiment, we compare periodic retraining with a more advanced concept drift detection method that triggers retraining only when necessary. In the second experiment, we analyze sampling methods to reduce the amount of data used to retrain models. Specifically, we compare fixed sized windows of recent data and state-of-the-art active learning methods that select those apps that help keep the training dataset small but diverse. Our experiments show that concept drift detection and sample selection mechanisms result in very efficient retraining strategies which can be successfully used to maintain the performance of the static Android malware state-of-the-art detectors in changing environments.
    摘要 “Android应用的快速演化带来了静态批处理机器学习算法在恶意软件检测系统中的挑战,这些算法很快就会过时。然而,现有的文献对此问题的解决方案很少,许多高级Android恶意软件检测方法,如Drebin、DroidDet和MaMaDroid,仍然采用静态模型。在这项工作中,我们展示了如何使用重新训练技术维护检测器的能力。特别是,我们分析了两个方面对检测器的效率和性能的影响:1)模型重新训练的频率,和2)重新训练使用的数据。在第一个实验中,我们比较了定期重新训练和更先进的概念漂移检测方法,该方法只在必要时触发重新训练。在第二个实验中,我们分析了采用不同大小的固定窗口和最新的活动学习方法来减少模型重新训练所需的数据量。我们的实验结果表明,概念漂移检测和样本选择机制可以实现非常高效的重新训练策略,可以成功地维护静态Android恶意软件检测器在变化环境中的性能。”

Harnessing Collective Intelligence Under a Lack of Cultural Consensus

  • paper_url: http://arxiv.org/abs/2309.09787
  • repo_url: None
  • paper_authors: Necdet Gürkan, Jordan W. Suchow
  • for: This paper aims to provide a computational foundation for harnessing collective intelligence in the absence of cultural consensus.
  • methods: The paper introduces Infinite Deep Latent Construct Cultural Consensus Theory (iDLC-CCT), a nonparametric Bayesian model that extends Cultural Consensus Theory with a latent construct that maps between pretrained deep neural network embeddings of entities and the consensus beliefs regarding those entities.
  • results: The iDLC-CCT model better predicts the degree of consensus, generalizes well to out-of-sample entities, and is effective even with sparse data. An efficient hard-clustering variant of the iDLC-CCT is also introduced to improve scalability.
    Abstract Harnessing collective intelligence to drive effective decision-making and collaboration benefits from the ability to detect and characterize heterogeneity in consensus beliefs. This is particularly true in domains such as technology acceptance or leadership perception, where a consensus defines an intersubjective truth, leading to the possibility of multiple "ground truths" when subsets of respondents sustain mutually incompatible consensuses. Cultural Consensus Theory (CCT) provides a statistical framework for detecting and characterizing these divergent consensus beliefs. However, it is unworkable in modern applications because it lacks the ability to generalize across even highly similar beliefs, is ineffective with sparse data, and can leverage neither external knowledge bases nor learned machine representations. Here, we overcome these limitations through Infinite Deep Latent Construct Cultural Consensus Theory (iDLC-CCT), a nonparametric Bayesian model that extends CCT with a latent construct that maps between pretrained deep neural network embeddings of entities and the consensus beliefs regarding those entities among one or more subsets of respondents. We validate the method across domains including perceptions of risk sources, food healthiness, leadership, first impressions, and humor. We find that iDLC-CCT better predicts the degree of consensus, generalizes well to out-of-sample entities, and is effective even with sparse data. To improve scalability, we introduce an efficient hard-clustering variant of the iDLC-CCT using an algorithm derived from a small-variance asymptotic analysis of the model. The iDLC-CCT, therefore, provides a workable computational foundation for harnessing collective intelligence under a lack of cultural consensus and may potentially form the basis of consensus-aware information technologies.
    摘要 使用集体智能来驱动有效的决策和协作,可以利用检测和特征化多样性的共识信仰来获得优势。特别是在技术接受或领导 восприятие等领域,共识定义了间subjective truth,可能导致多个"真实"当 subsets of respondents sustain mutually incompatible consensuses。文化共识理论(CCT)提供了一个统计学方法来检测和特征化这些分歧的共识信仰。然而,它在现代应用中无法普及,因为它缺乏对类似信仰的普适化能力,不可靠于稀缺数据,并且无法利用外部知识库或学习机器表示。在这里,我们超越这些限制通过无穷深层 latent construct cultural consensus theory(iDLC-CCT),一种非 Parametric Bayesian 模型,它将 CCT 扩展到一个 latent construct,该 construct 将预训练的深度神经网络嵌入与共识信仰相关的一个或多个 respondents 之间的Entity的 consensus beliefs 进行映射。我们在风险来源、食品健康、领导、第一印象和幽默等领域进行验证,发现 iDLC-CCT 更好地预测度量共识,普适化能力强,稀缺数据效果良好。为了提高可扩展性,我们引入一种高效硬分 clustering 变体,使用基于小差强 asymptotic analysis 的模型。因此,iDLC-CCT 提供了在缺乏文化共识的情况下可行的计算基础,可能成为共识意识技术的基础。

How to Data in Datathons

  • paper_url: http://arxiv.org/abs/2309.09770
  • repo_url: https://github.com/YLee-ArtsCommission/Arts-Datathon
  • paper_authors: Carlos Mougan, Richard Plant, Clare Teng, Marya Bazzi, Alvaro Cabregas Ejea, Ryan Sze-Yin Chan, David Salvador Jasin, Martin Stoffel, Kirstie Jane Whitaker, Jules Manser
  • for: 该论文旨在为 datathon 组织者提供指南和最佳实践,帮助他们更好地处理数据相关的复杂问题。
  • methods: 该论文根据作者自己的经验和 >60 个合作机构的见解,提出了一套指南和建议,以帮助组织者在 datathon 中更好地管理数据。
  • results: 该论文通过应用自己的框架,对 10 个案例进行分析,以帮助组织者更好地理解和解决数据相关的问题。
    Abstract The rise of datathons, also known as data or data science hackathons, has provided a platform to collaborate, learn, and innovate in a short timeframe. Despite their significant potential benefits, organizations often struggle to effectively work with data due to a lack of clear guidelines and best practices for potential issues that might arise. Drawing on our own experiences and insights from organizing >80 datathon challenges with >60 partnership organizations since 2016, we provide guidelines and recommendations that serve as a resource for organizers to navigate the data-related complexities of datathons. We apply our proposed framework to 10 case studies.
    摘要 “数据马拉松”的出现,也称为“数据”或“数据科学”冲击活动,为团队合作、学习和创新提供了一个短时间内的平台。尽管它们具有重要的可能收益,但组织 frequently struggle to effectively work with data due to a lack of clear guidelines and best practices for potential issues that might arise. 基于我们自己的经验和2016年以来与60家合作组织合作的80多个数据马拉松挑战,我们提供了指南和建议,用于帮助组织 navigate数据相关的复杂性。我们将我们的提议框架应用于10个案例研究。

Looking through the past: better knowledge retention for generative replay in continual learning

  • paper_url: http://arxiv.org/abs/2309.10012
  • repo_url: https://github.com/valeriya-khan/looking-through-the-past
  • paper_authors: Valeriya Khan, Sebastian Cygert, Kamil Deja, Tomasz Trzciński, Bartłomiej Twardowski
  • For: The paper aims to improve the generative replay in a continual learning setting to perform well on challenging scenarios, where current generative rehearsal methods are not powerful enough to generate complex data with a greater number of classes.* Methods: The proposed method incorporates distillation in the latent space between the current and previous models to reduce feature drift, and uses latent matching for reconstruction and original data to improve generated features alignment. Additionally, the method cycles through generations using the previously trained model to make them closer to the original data.* Results: The proposed method outperforms other generative replay methods in various scenarios.Here’s the same information in Simplified Chinese:* For: 该论文目的是在不断学习Setting中提高生成重温来实现复杂enario中的好势所在,当前的生成重温方法通常只能在小型、简单的数据集上进行测试。* Methods: 提议的方法包括在 latent space中进行知识传递,并通过 reconstruction和原始数据之间的匹配来改善生成的特征对齐。此外,该方法还在生成过程中循环使用之前训练的模型,以使生成的数据更加接近原始数据。* Results: 提议的方法在多种情况下表现出色,超过了其他生成重温方法。
    Abstract In this work, we improve the generative replay in a continual learning setting to perform well on challenging scenarios. Current generative rehearsal methods are usually benchmarked on small and simple datasets as they are not powerful enough to generate more complex data with a greater number of classes. We notice that in VAE-based generative replay, this could be attributed to the fact that the generated features are far from the original ones when mapped to the latent space. Therefore, we propose three modifications that allow the model to learn and generate complex data. More specifically, we incorporate the distillation in latent space between the current and previous models to reduce feature drift. Additionally, a latent matching for the reconstruction and original data is proposed to improve generated features alignment. Further, based on the observation that the reconstructions are better for preserving knowledge, we add the cycling of generations through the previously trained model to make them closer to the original data. Our method outperforms other generative replay methods in various scenarios. Code available at https://github.com/valeriya-khan/looking-through-the-past.
    摘要 在这项工作中,我们改进了 continual learning 中的生成重温方法,以便在复杂的场景下表现良好。现有的生成重温方法通常在小型和简单的数据集上进行评估,因为它们无法生成更复杂的数据集 with 更多的类别。我们发现,在 VAE 基于的生成重温方法中,这可能是因为生成的特征与原始特征在幂等空间中的映射相比较远。因此,我们提议三种修改,让模型学习和生成复杂数据。更 Specifically,我们在幂等空间中添加了采样的练习,以降低特征漂移。其次,我们提出了准确重建和原始数据的匹配,以提高生成特征的对齐。此外,根据我们发现,重建是保持知识的好方法,我们在先前训练的模型中循环生成,使其更接近原始数据。我们的方法在多种场景下表现出色,代码可以在 GitHub 上找到:https://github.com/valeriya-khan/looking-through-the-past。

Moving Object Detection and Tracking with 4D Radar Point Cloud

  • paper_url: http://arxiv.org/abs/2309.09737
  • repo_url: None
  • paper_authors: Zhijun Pan, Fangqiang Ding, Hantao Zhong, Chris Xiaoxuan Lu
  • for: 本文针对radar图像跟踪问题提出了一个新的解决方案,即RaTrack。
  • methods: RaTrack方法强调运动分割和归一化,并且具有运动估计模块,以增强跟踪精度。
  • results: 在View-of-Delft数据集上,RaTrack方法与现有方法相比,跟踪精度明显提高,表现出优于现有方法。
    Abstract Mobile autonomy relies on the precise perception of dynamic environments. Robustly tracking moving objects in 3D world thus plays a pivotal role for applications like trajectory prediction, obstacle avoidance, and path planning. While most current methods utilize LiDARs or cameras for Multiple Object Tracking (MOT), the capabilities of 4D imaging radars remain largely unexplored. Recognizing the challenges posed by radar noise and point sparsity in 4D radar data, we introduce RaTrack, an innovative solution tailored for radar-based tracking. Bypassing the typical reliance on specific object types and 3D bounding boxes, our method focuses on motion segmentation and clustering, enriched by a motion estimation module. Evaluated on the View-of-Delft dataset, RaTrack showcases superior tracking precision of moving objects, largely surpassing the performance of the state of the art.
    摘要 mobile自主需要精准感知动态环境。在3D世界中准确跟踪移动 объек的 tracking thus plays a crucial role for applications like trajectory prediction, obstacle avoidance, and path planning. While most current methods使用 LiDARs or cameras for Multiple Object Tracking (MOT), the capabilities of 4D imaging radars remain largely unexplored. Recognizing the challenges posed by radar noise and point sparsity in 4D radar data, we introduce RaTrack, an innovative solution tailored for radar-based tracking. Bypassing the typical reliance on specific object types and 3D bounding boxes, our method focuses on motion segmentation and clustering, enriched by a motion estimation module. Evaluated on the View-of-Delft dataset, RaTrack showcases superior tracking precision of moving objects, largely surpassing the performance of the state of the art.Here's the breakdown of the translation:* mobile自主 (mobile autonomy) - 移动自主* 需要 (need) - 需要* 精准感知 (precise perception) - 精准感知* 动态环境 (dynamic environments) - 动态环境* track (tracking) - 跟踪* 移动 objet (moving objects) - 移动 объек* MOT (Multiple Object Tracking) - MOT* 4D imaging radars (4D imaging radars) - 4D成像雷达* 挑战 (challenges) - 挑战* 点缺乏 (point sparsity) - 点缺乏* RaTrack (RaTrack) - RaTrack* 解决方案 (solution) - 解决方案* 强调 (focusing) - 强调* 动态分割 (motion segmentation) - 动态分割* 聚集 (clustering) - 聚集* 动态估计 (motion estimation) - 动态估计* 评估 (evaluated) - 评估* 视野-of-Delft (View-of-Delft) - 视野-of-Delft* 显示 (showcases) - 显示* 超越 (surpassing) - 超越* 状态前的性能 (performance of the state of the art) - 状态前的性能

A Quantum Optimization Case Study for a Transport Robot Scheduling Problem

  • paper_url: http://arxiv.org/abs/2309.09736
  • repo_url: None
  • paper_authors: Dominik Leib, Tobias Seidel, Sven Jäger, Raoul Heese, Caitlin Isobel Jones, Abhishek Awasthi, Astrid Niederle, Michael Bortz
  • for: 这个研究是为了比较D-Wave的量子-类别混合框架、Fujitsu的量子静态逻辑器和Gurobi的状态当前的类别解决器在解决交通机器人分配问题的性能。这个问题来自实际的工业问题。
  • methods: 我们提供了三个不同的模型来解决这个问题,按照不同的设计哲学。我们在比较不同的模型和解决器组合的终端运行时间和解决质量方面进行了对比。
  • results: 我们发现了静态逻辑器和混合量子逻辑器在直接与Gurobi进行比较时有推荐的结果,并提供了一些机会。我们的研究提供了应用导向优化问题的工作流程和不同策略的评估,可以用于评估不同方法的优缺点。
    Abstract We present a comprehensive case study comparing the performance of D-Waves' quantum-classical hybrid framework, Fujitsu's quantum-inspired digital annealer, and Gurobi's state-of-the-art classical solver in solving a transport robot scheduling problem. This problem originates from an industrially relevant real-world scenario. We provide three different models for our problem following different design philosophies. In our benchmark, we focus on the solution quality and end-to-end runtime of different model and solver combinations. We find promising results for the digital annealer and some opportunities for the hybrid quantum annealer in direct comparison with Gurobi. Our study provides insights into the workflow for solving an application-oriented optimization problem with different strategies, and can be useful for evaluating the strengths and weaknesses of different approaches.
    摘要 我们对三种不同的方法和解决方案进行了全面的比较研究:D-Wave的量子-经典混合框架、Fujitsu的量子启发数字拓扑器和Gurobi的当今最佳的类别解决器。这个问题来自实际工况中的 industrially relevantereal-world scenario。我们提供三个不同的模型来解决这个问题,每个模型都有不同的设计哲学。在我们的比较中,我们关注的是不同模型和解决器组合的解决质量和综合时间。我们发现了数字拓扑器的批处理和混合量子拓扑器在直接与Gurobi进行比较中的抢夺。我们的研究为解决应用导向优化问题的不同策略提供了深入的视野,并且可以用于评估不同方法的优缺点。

LLM4Jobs: Unsupervised occupation extraction and standardization leveraging Large Language Models

  • paper_url: http://arxiv.org/abs/2309.09708
  • repo_url: https://github.com/aida-ugent/skillgpt
  • paper_authors: Nan Li, Bo Kang, Tijl De Bie
  • for: 本研究旨在探讨使用大语言模型(LLM)实现自动化职业抽取和标准化,以便于职业推荐和劳动市场政策制定。
  • methods: 本研究提出了一种新的无监督方法——LLM4Jobs,利用大语言模型的自然语言理解和生成能力来实现职业编码。
  • results: 经过严格的实验评估,LLM4Jobs方法在不同的数据集和粒度上 consistently 超越了现有的无监督状况标准做法,展示其在多样化数据集和粒度上的可变性。
    Abstract Automated occupation extraction and standardization from free-text job postings and resumes are crucial for applications like job recommendation and labor market policy formation. This paper introduces LLM4Jobs, a novel unsupervised methodology that taps into the capabilities of large language models (LLMs) for occupation coding. LLM4Jobs uniquely harnesses both the natural language understanding and generation capacities of LLMs. Evaluated on rigorous experimentation on synthetic and real-world datasets, we demonstrate that LLM4Jobs consistently surpasses unsupervised state-of-the-art benchmarks, demonstrating its versatility across diverse datasets and granularities. As a side result of our work, we present both synthetic and real-world datasets, which may be instrumental for subsequent research in this domain. Overall, this investigation highlights the promise of contemporary LLMs for the intricate task of occupation extraction and standardization, laying the foundation for a robust and adaptable framework relevant to both research and industrial contexts.
    摘要 自动化职业抽取和标准化从自由文本职业广告和简历中提取职业信息是关键 для应用程序如职业推荐和劳动市场政策形成。这篇论文介绍了LLM4Jobs,一种新的无监督方法,利用大型自然语言模型(LLM)的自然语言理解和生成能力来实现职业编码。我们在尝试中rigorous experimentation on synthetic and real-world datasets,发现LLM4Jobs可以在多种数据集和粒度上具有优秀的表现,并且在不同的职业类别和粒度上具有很好的灵活性。此外,我们还提供了一些synthetic和实际 datasets,这些数据可能会对后续在这个领域的研究提供很好的参考。总之,这项研究展示了当今LLM的潜在能力在职业抽取和标准化中,为研究和工业上的应用提供了一个强大和灵活的基础。

Information based explanation methods for deep learning agents – with applications on large open-source chess models

  • paper_url: http://arxiv.org/abs/2309.09702
  • repo_url: https://github.com/patrik-ha/ii-map
  • paper_authors: Patrik Hammersborg, Inga Strümke
  • for: 这个研究旨在使用大型开源棋牌模型来实现透明AI(XAI)方法,以解释具有相似性表现的 alphaZero 模型。
  • methods: 这种 XAI 方法使用可视化解释,以便在棋牌游戏中解释模型的决策。它可以控制输入向模型传递的信息,从而提供精确的解释。
  • results: 研究人员使用这种 XAI 方法对标准 8x8 棋牌进行了应用,并取得了类似于 alphaZero 的性能。
    Abstract With large chess-playing neural network models like AlphaZero contesting the state of the art within the world of computerised chess, two challenges present themselves: The question of how to explain the domain knowledge internalised by such models, and the problem that such models are not made openly available. This work presents the re-implementation of the concept detection methodology applied to AlphaZero in McGrath et al. (2022), by using large, open-source chess models with comparable performance. We obtain results similar to those achieved on AlphaZero, while relying solely on open-source resources. We also present a novel explainable AI (XAI) method, which is guaranteed to highlight exhaustively and exclusively the information used by the explained model. This method generates visual explanations tailored to domains characterised by discrete input spaces, as is the case for chess. Our presented method has the desirable property of controlling the information flow between any input vector and the given model, which in turn provides strict guarantees regarding what information is used by the trained model during inference. We demonstrate the viability of our method by applying it to standard 8x8 chess, using large open-source chess models.
    摘要 With large chess-playing neural network models like AlphaZero contesting the state of the art within the world of computerised chess, two challenges present themselves: The question of how to explain the domain knowledge internalised by such models, and the problem that such models are not made openly available. This work presents the re-implementation of the concept detection methodology applied to AlphaZero in McGrath et al. (2022), by using large, open-source chess models with comparable performance. We obtain results similar to those achieved on AlphaZero, while relying solely on open-source resources. We also present a novel explainable AI (XAI) method, which is guaranteed to highlight exhaustively and exclusively the information used by the explained model. This method generates visual explanations tailored to domains characterised by discrete input spaces, as is the case for chess. Our presented method has the desirable property of controlling the information flow between any input vector and the given model, which in turn provides strict guarantees regarding what information is used by the trained model during inference. We demonstrate the viability of our method by applying it to standard 8x8 chess, using large open-source chess models.Note that the translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you prefer Traditional Chinese, please let me know and I can provide the translation in that format as well.

Securing Fixed Neural Network Steganography

  • paper_url: http://arxiv.org/abs/2309.09700
  • repo_url: None
  • paper_authors: Zicong Luo, Sheng Li, Guobiao Li, Zhenxing Qian, Xinpeng Zhang
  • for: 这个研究是为了提高图像隐藏技术的安全性和可见性。
  • methods: 研究使用固定神经网络(FNN)进行秘密嵌入和抽出,并通过生成钥匙控制的扰动来提高安全性。
  • results: 实验结果显示,提案的方案能够防止未授权者从隐藏图像中提取秘密,并且能够生成高品质的隐藏图像,特别是当FNN是一个用于普通学习任务的神经网络时。
    Abstract Image steganography is the art of concealing secret information in images in a way that is imperceptible to unauthorized parties. Recent advances show that is possible to use a fixed neural network (FNN) for secret embedding and extraction. Such fixed neural network steganography (FNNS) achieves high steganographic performance without training the networks, which could be more useful in real-world applications. However, the existing FNNS schemes are vulnerable in the sense that anyone can extract the secret from the stego-image. To deal with this issue, we propose a key-based FNNS scheme to improve the security of the FNNS, where we generate key-controlled perturbations from the FNN for data embedding. As such, only the receiver who possesses the key is able to correctly extract the secret from the stego-image using the FNN. In order to improve the visual quality and undetectability of the stego-image, we further propose an adaptive perturbation optimization strategy by taking the perturbation cost into account. Experimental results show that our proposed scheme is capable of preventing unauthorized secret extraction from the stego-images. Furthermore, our scheme is able to generate stego-images with higher visual quality than the state-of-the-art FNNS scheme, especially when the FNN is a neural network for ordinary learning tasks.
    摘要 Image 隐藏技术是隐藏秘密信息在图像中,以便只有授权方可以访问。现有研究表明,可以使用固定神经网络(FNN)进行秘密嵌入和抽取。称为固定神经网络隐藏技术(FNNS),这种技术可以在无需训练神经网络的情况下实现高度的隐藏性。然而,现有的FNNS方案具有一定的漏洞,即任何人都可以从隐藏图像中提取秘密。为了解决这个问题,我们提议使用密钥控制的FNNS方案,其中我们从FNN中生成密钥控制的扰动来用于数据嵌入。因此,只有持有密钥的接收方可以使用FNN correctly提取秘密从隐藏图像中。为了提高隐藏图像的视觉质量和不可察觉性,我们进一步提议一种适应性优化策略,其中考虑扰动成本。实验结果表明,我们的提议方案可以防止未经授权的秘密提取。此外,我们的方案还可以生成高质量的隐藏图像,特别是当FNN是一个用于常规学习任务的神经网络时。

Noise-Augmented Boruta: The Neural Network Perturbation Infusion with Boruta Feature Selection

  • paper_url: http://arxiv.org/abs/2309.09694
  • repo_url: None
  • paper_authors: Hassan Gharoun, Navid Yazdanjoe, Mohammad Sadegh Khorshidi, Amir H. Gandomi
    for: 这篇论文的目的是提出一种对Boruta标本选择算法进行改进,以提高其选择功能和精度。methods: 本论文使用了对Boruta算法中的shadow variable进行噪声添加,并将其与人工神经网络的损害分析框架相似。results: 根据四个公开的 benchmark dataset的实验结果显示,提出的方法比传统的Boruta算法表现更好,证明了这种改进的可能性和价值。
    Abstract With the surge in data generation, both vertically (i.e., volume of data) and horizontally (i.e., dimensionality), the burden of the curse of dimensionality has become increasingly palpable. Feature selection, a key facet of dimensionality reduction techniques, has advanced considerably to address this challenge. One such advancement is the Boruta feature selection algorithm, which successfully discerns meaningful features by contrasting them to their permutated counterparts known as shadow features. However, the significance of a feature is shaped more by the data's overall traits than by its intrinsic value, a sentiment echoed in the conventional Boruta algorithm where shadow features closely mimic the characteristics of the original ones. Building on this premise, this paper introduces an innovative approach to the Boruta feature selection algorithm by incorporating noise into the shadow variables. Drawing parallels from the perturbation analysis framework of artificial neural networks, this evolved version of the Boruta method is presented. Rigorous testing on four publicly available benchmark datasets revealed that this proposed technique outperforms the classic Boruta algorithm, underscoring its potential for enhanced, accurate feature selection.
    摘要 Simplified Chinese translation:随着数据生成的快速增长,both vertically (i.e., 数据量)和horizontally (i.e., 维度),“维度束缚”的压力变得越来越明显。特别是Feature selection,这是维度减少技术的关键方面,在这个挑战中得到了显著的进步。一种如此进步的算法是Boruta feature selection algorithm,它成功地从 permutated counterparts 中分别出 meaningful features。然而,一个特征的重要性更多地受到数据的总特征影响,而不是其内在价值,这种想法也在经典的Boruta算法中体现出来,shadow features 与原始特征几乎一致。在这个基础上,这篇论文提出了一种新的Boruta feature selection algorithm的方法,通过在阴影变量中添加噪音。从人工神经网络的扰动分析框架中启发,这种演进版的Boruta方法被提出。经过严格的测试,这种提议的方法在四个公共可用的 benchmark 数据集上表现出色,超过经典Boruta算法,这说明了这种方法的潜在准确性。

Ugly Ducklings or Swans: A Tiered Quadruplet Network with Patient-Specific Mining for Improved Skin Lesion Classification

  • paper_url: http://arxiv.org/abs/2309.09689
  • repo_url: None
  • paper_authors: Nathasha Naranpanawa, H. Peter Soyer, Adam Mothershaw, Gayan K. Kulatilleke, Zongyuan Ge, Brigid Betz-Stablein, Shekhar S. Chandra
  • For: 帮助诊断皮肤癌(cutaneous melanoma),通过 diferenciating 高度可疑的皮肤损伤和非高度可疑的损伤。* Methods: 使用深度度量学习网络(Deep Metric Learning Network),包括两级lesion feature学习:个体水平和损伤水平。还使用了个体特定的四元组挖掘方法和层次四元组网络,以便学习更多的上下文信息。* Results: 相比传统分类器,提出的方法实现了54%高于基线ResNet18 CNN的敏感性和37%高于Naive triplet网络的敏感性,在识别ugly duckling损伤中表现出色。数据 manifold 的视觉化也表明了 DMT-Quadruplet 可以成功地在个体特定和个体不特定的情况下分类ugly duckling损伤。
    Abstract An ugly duckling is an obviously different skin lesion from surrounding lesions of an individual, and the ugly duckling sign is a criterion used to aid in the diagnosis of cutaneous melanoma by differentiating between highly suspicious and benign lesions. However, the appearance of pigmented lesions, can change drastically from one patient to another, resulting in difficulties in visual separation of ugly ducklings. Hence, we propose DMT-Quadruplet - a deep metric learning network to learn lesion features at two tiers - patient-level and lesion-level. We introduce a patient-specific quadruplet mining approach together with a tiered quadruplet network, to drive the network to learn more contextual information both globally and locally between the two tiers. We further incorporate a dynamic margin within the patient-specific mining to allow more useful quadruplets to be mined within individuals. Comprehensive experiments show that our proposed method outperforms traditional classifiers, achieving 54% higher sensitivity than a baseline ResNet18 CNN and 37% higher than a naive triplet network in classifying ugly duckling lesions. Visualisation of the data manifold in the metric space further illustrates that DMT-Quadruplet is capable of classifying ugly duckling lesions in both patient-specific and patient-agnostic manner successfully.
    摘要 “一只鸡蛋”是指一个个体细胞肿瘤与周围细胞不同,并且“一只鸡蛋”标准是用于诊断皮肤恶性癌的重要依据。然而,细胞肿瘤的外观可以在不同的患者之间有很大的变化,从而使得视觉分离变得困难。因此,我们提议使用深度度量学习网络(DMT-Quadruplet),以学习皮肤病变 lesion 的特征。我们首先引入了patient-specific quadruplet mining方法,并将其与 tiered quadruplet 网络结合,以使网络学习更多的上下文信息。此外,我们还在patient-specific mining中引入了动态边缘,以让更多有用的quadruplets被挖掘出来。实验表明,我们的提议方法可以超过传统分类器,达到54%的敏感性和37%的特征选择率。图像 manifold 在度量空间的Visual化也证明了DMT-Quadruplet 可以成功地分类“一只鸡蛋” lesion 在patient-specific和patient-agnostic 方面。

Distributed course allocation with asymmetric friendships

  • paper_url: http://arxiv.org/abs/2309.09684
  • repo_url: None
  • paper_authors: Ilya Khakhiashvili, Lihi Dery, Tal Grinshpoun
  • for: 本研究旨在考虑学生之间的友谊关系,为学生分配课程 seats 提供一种分布式解决方案。
  • methods: 本文使用非对称分布式约束优化问题来模型问题,并开发了一种专门的算法。
  • results: Results show that our algorithm can obtain high utility for students while ensuring fairness and observing course seat capacity limitations.
    Abstract Students' decisions on whether to take a class are strongly affected by whether their friends plan to take the class with them. A student may prefer to be assigned to a course they likes less, just to be with their friends, rather than taking a more preferred class alone. It has been shown that taking classes with friends positively affects academic performance. Thus, academic institutes should prioritize friendship relations when assigning course seats. The introduction of friendship relations results in several non-trivial changes to current course allocation methods. This paper explores how course allocation mechanisms can account for friendships between students and provide a unique, distributed solution. In particular, we model the problem as an asymmetric distributed constraint optimization problem and develop a new dedicated algorithm. Our extensive evaluation includes both simulated data and data derived from a user study on 177 students' preferences over courses and friends. The results show that our algorithm obtains high utility for the students while keeping the solution fair and observing courses' seat capacity limitations.
    摘要 学生们决定选课的决定受到同学的决定影响很强。学生可能会偏好选择一门课程,即使它不是他们最喜欢的,只是为了与朋友一起学习。这已经证明了与朋友一起学习会提高学业表现。因此,学府应该在分配课程时考虑学生之间的友谊关系。在现有课程分配方法的基础上引入友谊关系会导致一些非常重要的变化。这篇论文探讨了如何考虑学生之间的友谊关系来分配课程,并提供了一种专门的算法。我们的评估包括仿真数据和177名学生对课程和朋友的偏好的用户研究数据。结果表明,我们的算法可以为学生提供高的用户价值,同时保证分配的解决方案公平、遵循课程坐席限制。

Single and Few-step Diffusion for Generative Speech Enhancement

  • paper_url: http://arxiv.org/abs/2309.09677
  • repo_url: None
  • paper_authors: Bunlong Lay, Jean-Marie Lemercier, Julius Richter, Timo Gerkmann
  • for: 提高Diffusion模型的推理速度和精度
  • methods: 采用两个阶段训练方法,首先使用普通的生成推理方法进行训练,然后使用预测损失来对推理结果进行修正
  • results: 使用这种两个阶段训练方法可以在60次函数评估中达到同等性能,并且在减少函数评估数量(NFEs)下仍然保持稳定性和超越基eline模型的性能。
    Abstract Diffusion models have shown promising results in speech enhancement, using a task-adapted diffusion process for the conditional generation of clean speech given a noisy mixture. However, at test time, the neural network used for score estimation is called multiple times to solve the iterative reverse process. This results in a slow inference process and causes discretization errors that accumulate over the sampling trajectory. In this paper, we address these limitations through a two-stage training approach. In the first stage, we train the diffusion model the usual way using the generative denoising score matching loss. In the second stage, we compute the enhanced signal by solving the reverse process and compare the resulting estimate to the clean speech target using a predictive loss. We show that using this second training stage enables achieving the same performance as the baseline model using only 5 function evaluations instead of 60 function evaluations. While the performance of usual generative diffusion algorithms drops dramatically when lowering the number of function evaluations (NFEs) to obtain single-step diffusion, we show that our proposed method keeps a steady performance and therefore largely outperforms the diffusion baseline in this setting and also generalizes better than its predictive counterpart.
    摘要 Diffusion 模型在听音提升中表现出了promising的结果,使用任务适应的扩散过程来 conditional generation 清晰的听音,给定噪音混合。然而,在测试时,用于分数估计的神经网络会被多次调用,以解决反向过程的迭代问题。这会导致慢速的推理过程和积累的精度错误。在这篇论文中,我们解决这些限制,通过两个阶段的训练方法。在第一阶段,我们使用传统的扩散模型训练方法,使用生成扩散分数匹配损失函数。在第二阶段,我们解决反向过程,并将结果与清晰听音目标进行比较,使用预测损失函数。我们显示,使用这个第二阶段训练方法可以达到与基准模型相同的性能,只需要5个功能评估次数,而不是60个。而通常的生成扩散算法在降低功能评估次数(NFEs)时,性能会降低很多,但我们的提议方法可以保持稳定的性能,因此大大超越了扩散基准模型。此外,我们还发现,我们的方法在这种设定下更好地适应和泛化。

Conditioning Latent-Space Clusters for Real-World Anomaly Classification

  • paper_url: http://arxiv.org/abs/2309.09676
  • repo_url: None
  • paper_authors: Daniel Bogdoll, Svetlana Pavlitska, Simon Klaus, J. Marius Zöllner
  • for: 本研究旨在提高自动驾驶领域中异常检测的精度和效果。
  • methods: 本研究使用Variational Autoencoder(VAE)来分类样本为正常数据或异常数据,并通过提供异常映射来进一步提高异常检测性能。
  • results: 研究结果表明,通过使用异常映射,VAE可以更好地分类异常数据,并且可以分离正常数据和异常数据到隔离的集群中,从而获得有意义的幂等表示。
    Abstract Anomalies in the domain of autonomous driving are a major hindrance to the large-scale deployment of autonomous vehicles. In this work, we focus on high-resolution camera data from urban scenes that include anomalies of various types and sizes. Based on a Variational Autoencoder, we condition its latent space to classify samples as either normal data or anomalies. In order to emphasize especially small anomalies, we perform experiments where we provide the VAE with a discrepancy map as an additional input, evaluating its impact on the detection performance. Our method separates normal data and anomalies into isolated clusters while still reconstructing high-quality images, leading to meaningful latent representations.
    摘要 半自动驾驶领域中的异常现象是大规模部署自动驾驶车辆的主要障碍。在这项工作中,我们关注城市场景中高分辨率摄像头数据中的异常现象,其异常类型和大小各不相同。基于变量自适应编码器,我们将其缺失空间 condition 为分类样本为正常数据或异常样本。为了强调特别小的异常,我们在实验中提供了差异地图作为额外输入,评估其影响检测性能。我们的方法可以分解正常数据和异常数据,并且仍然重建高质量图像,从而获得有意义的隐藏表示。

Neural Network-Based Rule Models With Truth Tables

  • paper_url: http://arxiv.org/abs/2309.09638
  • repo_url: https://github.com/molyswu/hand_detection
  • paper_authors: Adrien Benamira, Tristan Guérand, Thomas Peyrin, Hans Soegeng
    for:这个论文主要针对的是理解机器学习模型做出决策的过程,特别是在安全敏感应用中。methods:这个研究使用了神经网络框架,并将神经网络转换成规则基型模型。这个框架被称为Truth Table rules(TT-rules),它基于Truth Table nets(TTnets),一种由形式验证而来的神经网络家族。results:研究表明,TT-rules可以在七个tabular数据集上达到等效或更高的性能,而且保持和性能之间的平衡。此外,TT-rules还可以适用于大型tabular数据集,包括两个实际的DNA数据集,它们具有超过20K的特征。最后,研究者还进行了一个详细的rule-based模型的调查,使用Adult数据集。
    Abstract Understanding the decision-making process of a machine/deep learning model is crucial, particularly in security-sensitive applications. In this study, we introduce a neural network framework that combines the global and exact interpretability properties of rule-based models with the high performance of deep neural networks. Our proposed framework, called $\textit{Truth Table rules}$ (TT-rules), is built upon $\textit{Truth Table nets}$ (TTnets), a family of deep neural networks initially developed for formal verification. By extracting the set of necessary and sufficient rules $\mathcal{R}$ from the trained TTnet model (global interpretability), yielding the same output as the TTnet (exact interpretability), TT-rules effectively transforms the neural network into a rule-based model. This rule-based model supports binary classification, multi-label classification, and regression tasks for tabular datasets. Furthermore, our TT-rules framework optimizes the rule set $\mathcal{R}$ into $\mathcal{R}_{opt}$ by reducing the number and size of the rules. To enhance model interpretation, we leverage Reduced Ordered Binary Decision Diagrams (ROBDDs) to visualize these rules effectively. After outlining the framework, we evaluate the performance of TT-rules on seven tabular datasets from finance, healthcare, and justice domains. We also compare the TT-rules framework to state-of-the-art rule-based methods. Our results demonstrate that TT-rules achieves equal or higher performance compared to other interpretable methods while maintaining a balance between performance and complexity. Notably, TT-rules presents the first accurate rule-based model capable of fitting large tabular datasets, including two real-life DNA datasets with over 20K features. Finally, we extensively investigate a rule-based model derived from TT-rules using the Adult dataset.
    摘要 理解机器学习模型的决策过程是非常重要,尤其在安全敏感应用中。在这种研究中,我们提出了一种神经网络框架,可以结合神经网络的高性能和规则型模型的全面和准确解释性。我们称之为“真实表达规则”(TT-rules)。基于规则型网络(TTnets),TT-rules 可以从训练完成的 TTnet 模型中提取必要和 suficient 规则集( $\mathcal{R}$),并且可以使这些规则集产生同样的输出,从而将神经网络转化成规则型模型。这种规则型模型支持二分类、多标签分类和回归任务。此外,我们还优化了规则集 $\mathcal{R}$ 为 $\mathcal{R}_{opt}$,以降低规则的数量和大小。为了增强模型解释,我们利用减少的binary decision diagram(ROBDDs)来可见地表示这些规则。在文章中,我们首先介绍了 TT-rules 框架,然后对七个标准化表格数据集进行评估。这些数据集来自于金融、医疗和正义领域。我们还与当前的解释性方法进行比较。我们的结果表明,TT-rules 可以与其他解释性方法相比,具有同等或更高的性能,同时保持性能和复杂度之间的平衡。尤其是,TT-rules 可以适用于大型表格数据集,包括两个实际的 DNA 数据集,它们具有超过 20K 的特征。最后,我们进行了一项详细的规则型模型研究,使用 Adult 数据集。

Designing a Hybrid Neural System to Learn Real-world Crack Segmentation from Fractal-based Simulation

  • paper_url: http://arxiv.org/abs/2309.09637
  • repo_url: None
  • paper_authors: Achref Jaziri, Martin Mundt, Andres Fernandez Rodriguez, Visvanathan Ramesh
  • for: 这篇论文的目的是提高计算机视觉系统对混凝土结构完整性的评估,特别是Robust crack segmentation。
  • methods: 这篇论文使用了高准确的裂纹图形生成器和相应的完全注释的裂纹数据集,并利用了点wise Mutual Information estimate和适应实例normalization来学习通用的表示。
  • results: 论文通过实验显示,该系统可以有效地处理实际世界中的裂纹分割任务,并且不同的设计选择是相互协力的。
    Abstract Identification of cracks is essential to assess the structural integrity of concrete infrastructure. However, robust crack segmentation remains a challenging task for computer vision systems due to the diverse appearance of concrete surfaces, variable lighting and weather conditions, and the overlapping of different defects. In particular recent data-driven methods struggle with the limited availability of data, the fine-grained and time-consuming nature of crack annotation, and face subsequent difficulty in generalizing to out-of-distribution samples. In this work, we move past these challenges in a two-fold way. We introduce a high-fidelity crack graphics simulator based on fractals and a corresponding fully-annotated crack dataset. We then complement the latter with a system that learns generalizable representations from simulation, by leveraging both a pointwise mutual information estimate along with adaptive instance normalization as inductive biases. Finally, we empirically highlight how different design choices are symbiotic in bridging the simulation to real gap, and ultimately demonstrate that our introduced system can effectively handle real-world crack segmentation.
    摘要 检测裂隙是评估混凝土基础设施结构完整性的关键。然而,由于混凝土表面的多样性、变化的照明和天气条件以及不同损害的重叠,计算机视觉系统中的裂隙分割仍然是一项挑战。特别是,最新的数据驱动方法在数据的有限性、细化和时间消耗的 crack 注释以及扩展到不同样本上的difficulty in generalizing。在这项工作中,我们通过两种方式突破这些挑战。首先,我们介绍了一个基于 fractional 的高精度裂隙图形生成器,并且提供了相应的完全注释的裂隙集合。其次,我们利用了 both pointwise mutual information estimate和 adaptive instance normalization作为 inductive biases,以学习普适的表示。最后,我们证明了不同的设计选择是协同的,并最终表明了我们引入的系统可以有效地处理实际世界中的裂隙分割。

Gradpaint: Gradient-Guided Inpainting with Diffusion Models

  • paper_url: http://arxiv.org/abs/2309.09614
  • repo_url: None
  • paper_authors: Asya Grechka, Guillaume Couairon, Matthieu Cord
  • for: 图像填充任务中的图像生成(image inpainting)
  • methods: 使用 diffusion probabilistic models (DDPMs) 和自定义损失函数(custom loss)来导航生成过程,以实现图像生成的准确性和一致性。
  • results: 比起当前的方法, GradPaint 能够更好地考虑图像的一致性和自然性,提高了图像生成的质量。
    Abstract Denoising Diffusion Probabilistic Models (DDPMs) have recently achieved remarkable results in conditional and unconditional image generation. The pre-trained models can be adapted without further training to different downstream tasks, by guiding their iterative denoising process at inference time to satisfy additional constraints. For the specific task of image inpainting, the current guiding mechanism relies on copying-and-pasting the known regions from the input image at each denoising step. However, diffusion models are strongly conditioned by the initial random noise, and therefore struggle to harmonize predictions inside the inpainting mask with the real parts of the input image, often producing results with unnatural artifacts. Our method, dubbed GradPaint, steers the generation towards a globally coherent image. At each step in the denoising process, we leverage the model's "denoised image estimation" by calculating a custom loss measuring its coherence with the masked input image. Our guiding mechanism uses the gradient obtained from backpropagating this loss through the diffusion model itself. GradPaint generalizes well to diffusion models trained on various datasets, improving upon current state-of-the-art supervised and unsupervised methods.
    摘要 德brázkyDiffusion噪声模型(DDPM)在最近已经实现了条件和无条件图像生成的出色result。预训练模型可以在推理时通过引导其迭代噪声过程来适应不同的下游任务,无需进一步训练。对于图像填充任务,当前的引导机制是通过在每个迭代步骤中复制输入图像中知道的区域。然而,噪声模型受初始噪声的强烈条件,因此在生成 Predictions 时难以融合输入图像的真实部分和填充mask中的预测,常导致生成结果具有不自然的 artifacts。我们的方法,称为GradPaint,使得生成向度向globally coherent image迁移。在噪声过程中每个步骤中,我们利用 diffusion 模型的 "denoised image estimation",计算一个定制的损失函数,用于衡量其与masked输入图像的几何匹配度。我们的引导机制使用diffusion模型自身的倒推操作来获得的梯度。GradPaint在不同的训练数据集上训练的 diffusion 模型上generalizes well,超过当前领先的指导和无指导方法。

Proposition from the Perspective of Chinese Language: A Chinese Proposition Classification Evaluation Benchmark

  • paper_url: http://arxiv.org/abs/2309.09602
  • repo_url: None
  • paper_authors: Conghui Niu, Mengyang Hu, Lin Bo, Xiaoli He, Dong Yu, Pengyuan Liu
  • for: 这篇论文主要研究了中文提案的分类和识别,以及提案的语言表达特征和逻辑意义。
  • methods: 该论文提出了明确和暗示提案的概念,并提出了一种多级分类系统,使用语言学和逻辑学方法进行提案分类。
  • results: 经过多种方法的评估,包括Rule-based方法、SVM、BERT、RoBERTA和ChatGPT等,研究发现现有模型对中文提案分类能力不足,尤其是跨领域传递性不佳。BERT表现较好,但缺乏跨领域传递性。ChatGPT表现不佳,但可以通过提供更多提案信息进行改进。
    Abstract Existing propositions often rely on logical constants for classification. Compared with Western languages that lean towards hypotaxis such as English, Chinese often relies on semantic or logical understanding rather than logical connectives in daily expressions, exhibiting the characteristics of parataxis. However, existing research has rarely paid attention to this issue. And accurately classifying these propositions is crucial for natural language understanding and reasoning. In this paper, we put forward the concepts of explicit and implicit propositions and propose a comprehensive multi-level proposition classification system based on linguistics and logic. Correspondingly, we create a large-scale Chinese proposition dataset PEACE from multiple domains, covering all categories related to propositions. To evaluate the Chinese proposition classification ability of existing models and explore their limitations, We conduct evaluations on PEACE using several different methods including the Rule-based method, SVM, BERT, RoBERTA, and ChatGPT. Results show the importance of properly modeling the semantic features of propositions. BERT has relatively good proposition classification capability, but lacks cross-domain transferability. ChatGPT performs poorly, but its classification ability can be improved by providing more proposition information. Many issues are still far from being resolved and require further study.
    摘要 现有的提案经常利用逻辑常量进行分类。相比西方语言,如英语,中文更加倾向于 semantic或逻辑理解而非逻辑连接在日常表达中,展现出复杂的parataxis特点。然而,现有的研究几乎没有关注这一点。正确地分类这些提案是自然语言理解和逻辑的关键。在这篇论文中,我们提出了explicit和implicit提案的概念,并提出了基于语言和逻辑的多级提案分类系统。与此同时,我们创建了来自多个领域的大规模中文提案数据集PEACE,覆盖所有与提案相关的类别。为了评估现有模型对中文提案分类的能力和其局限性,我们使用了多种方法,包括规则基本方法、SVM、BERT、RoBERTA和ChatGPT。结果表明,正确地表示提案的 semantic特征非常重要。BERT在提案分类能力方面表现较好,但缺乏跨领域传递性。ChatGPT表现不佳,但可以通过提供更多提案信息来提高其分类能力。许多问题仍然待解决,需要进一步研究。

Fabricator: An Open Source Toolkit for Generating Labeled Training Data with Teacher LLMs

  • paper_url: http://arxiv.org/abs/2309.09582
  • repo_url: None
  • paper_authors: Jonas Golde, Patrick Haller, Felix Hamborg, Julian Risch, Alan Akbik
  • for: 这个论文的目的是提出一种基于大语言模型(LLM)的数据生成方法,以解决NLプロセス中的数据预处理瓶颈。
  • methods: 这个论文使用的方法是通过向LLM提供任务描述,然后使用生成的数据来训练下游NLP模型。
  • results: 这个论文的结果表明,通过使用LLM进行数据生成,可以生成大量高质量的标注数据,从而降低NL模型的训练成本。同时,这个方法还可以支持多种下游NLP任务,如文本分类、问答和实体识别等。
    Abstract Most NLP tasks are modeled as supervised learning and thus require labeled training data to train effective models. However, manually producing such data at sufficient quality and quantity is known to be costly and time-intensive. Current research addresses this bottleneck by exploring a novel paradigm called zero-shot learning via dataset generation. Here, a powerful LLM is prompted with a task description to generate labeled data that can be used to train a downstream NLP model. For instance, an LLM might be prompted to "generate 500 movie reviews with positive overall sentiment, and another 500 with negative sentiment." The generated data could then be used to train a binary sentiment classifier, effectively leveraging an LLM as a teacher to a smaller student model. With this demo, we introduce Fabricator, an open-source Python toolkit for dataset generation. Fabricator implements common dataset generation workflows, supports a wide range of downstream NLP tasks (such as text classification, question answering, and entity recognition), and is integrated with well-known libraries to facilitate quick experimentation. With Fabricator, we aim to support researchers in conducting reproducible dataset generation experiments using LLMs and help practitioners apply this approach to train models for downstream tasks.
    摘要 我们引入了一个名为 Fabricator 的开源 Python 工具库,用于生成数据集。Fabricator 支持许多下游 NLP 任务(如文本分类、问题答案和实体识别),并与各种知名库集成,以便快速实验。我们希望通过 Fabricator 支持研究人员在使用 LLM 进行可重现的数据集生成实验,并帮助实践者使用这种方法训练下游任务中的模型。

Heterogeneous Generative Knowledge Distillation with Masked Image Modeling

  • paper_url: http://arxiv.org/abs/2309.09571
  • repo_url: None
  • paper_authors: Ziming Wang, Shumin Han, Xiaodi Wang, Jing Hao, Xianbin Cao, Baochang Zhang
  • for: 这篇论文的目的是提出一种基于Masked Image Modeling(MIM)的异化深度学习知识传递(H-GKD)方法,实现将大型Transformer模型的知识转移到小型CNN模型中,以提高这些小型模型在 computationally resource-limited edge devices 上的表现。
  • methods: 这篇论文使用了一种基于UNet的学生网络,通过将 sparse convolution 加入学生网络中,使学生网络能够对教师模型的Visual Representation进行效果伪装。此外,这篇论文还使用了Masked Image Modeling(MIM)方法,将教师模型的Visual Representation转移到学生网络中,以实现知识传递。
  • results: 这篇论文的实验结果显示,H-GKD 方法可以对不同的模型和大小进行适应,在 ImageNet 1K dataset 上,H-GKD 方法可以从 Resnet50 (sparse) 的76.98% 提高到80.01%。
    Abstract Small CNN-based models usually require transferring knowledge from a large model before they are deployed in computationally resource-limited edge devices. Masked image modeling (MIM) methods achieve great success in various visual tasks but remain largely unexplored in knowledge distillation for heterogeneous deep models. The reason is mainly due to the significant discrepancy between the Transformer-based large model and the CNN-based small network. In this paper, we develop the first Heterogeneous Generative Knowledge Distillation (H-GKD) based on MIM, which can efficiently transfer knowledge from large Transformer models to small CNN-based models in a generative self-supervised fashion. Our method builds a bridge between Transformer-based models and CNNs by training a UNet-style student with sparse convolution, which can effectively mimic the visual representation inferred by a teacher over masked modeling. Our method is a simple yet effective learning paradigm to learn the visual representation and distribution of data from heterogeneous teacher models, which can be pre-trained using advanced generative methods. Extensive experiments show that it adapts well to various models and sizes, consistently achieving state-of-the-art performance in image classification, object detection, and semantic segmentation tasks. For example, in the Imagenet 1K dataset, H-GKD improves the accuracy of Resnet50 (sparse) from 76.98% to 80.01%.
    摘要 通常,小型CNN模型需要从大型模型中传输知识才能在计算资源有限的边缘设备中部署。Masked image modeling(MIM)方法在各种视觉任务中取得了很大成功,但在不同深度模型之间的知识传递方面尚未得到充分开发。这主要是因为大型Transformer模型和小型CNN模型之间存在很大的不同。在这篇论文中,我们开发了首个基于MIM的Heterogeneous Generative Knowledge Distillation(H-GKD),可以有效地将大型Transformer模型中的知识传递给小型CNN模型。我们的方法建立了Transformer模型和CNN之间的桥梁,通过训练一个带有散集 convolution的学生模型,可以有效地模仿教师模型在遮盲模型中的视觉表示。我们的方法是一种简单 yet有效的学习方法,可以从不同的教师模型中学习数据的视觉表示和分布。我们的实验表明,我们的方法可以适应不同的模型和大小,并在图像分类、物体检测和semantic segmentation任务中准确地达到领先性性表现。例如,在Imagenet 1K dataset中,H-GKD提高了Resnet50(散集)的准确率从76.98%提升到80.01%。

Causal-Story: Local Causal Attention Utilizing Parameter-Efficient Tuning For Visual Story Synthesis

  • paper_url: http://arxiv.org/abs/2309.09553
  • repo_url: None
  • paper_authors: Tianyi Song, Jiuxin Cao, Kun Wang, Bo Liu, Xiaofeng Zhang
  • for: 提高视觉故事生成的全局一致性
  • methods: 使用本地 causal 注意机制,考虑过去的caption、frame和当前caption之间的 causal 关系,对当前帧生成
  • results: 在 PororoSV 和 FlintstonesSV 数据集上获得了state-of-the-art FID 分数,生成的帧也更加出色地表现了视觉故事的整体一致性。
    Abstract The excellent text-to-image synthesis capability of diffusion models has driven progress in synthesizing coherent visual stories. The current state-of-the-art method combines the features of historical captions, historical frames, and the current captions as conditions for generating the current frame. However, this method treats each historical frame and caption as the same contribution. It connects them in order with equal weights, ignoring that not all historical conditions are associated with the generation of the current frame. To address this issue, we propose Causal-Story. This model incorporates a local causal attention mechanism that considers the causal relationship between previous captions, frames, and current captions. By assigning weights based on this relationship, Causal-Story generates the current frame, thereby improving the global consistency of story generation. We evaluated our model on the PororoSV and FlintstonesSV datasets and obtained state-of-the-art FID scores, and the generated frames also demonstrate better storytelling in visuals. The source code of Causal-Story can be obtained from https://github.com/styufo/Causal-Story.
    摘要 “ diffusion 模型的出色文本到图像合成能力,已经驱动了视觉故事的生成进步。当前state-of-the-art方法是将历史caption、历史框和当前caption作为生成当前帧的condition。但是,这种方法对每个历史框和caption都使用相同的权重,忽略了不同历史条件对当前帧生成的不同影响。为解决这个问题,我们提出了Causal-Story模型。这个模型包含了本地 causal 注意力机制,考虑了以前caption、frame和当前caption之间的 causal 关系。通过基于这种关系的权重分配,Causal-Story模型生成当前帧,从而提高了全局的故事生成一致性。我们在 PororoSV 和 FlintstonesSV 数据集上评估了我们的模型,并取得了state-of-the-art FID 分数,生成的图像也更好地表达了故事的视觉。Causal-Story 模型的源代码可以从 GitHub 上获取:https://github.com/styufo/Causal-Story。”

CB-Whisper: Contextual Biasing Whisper using TTS-based Keyword Spotting

  • paper_url: http://arxiv.org/abs/2309.09552
  • repo_url: None
  • paper_authors: Yuang Li, Yinglu Li, Min Zhang, Chang Su, Mengyao Piao, Xiaosong Qiao, Jiawei Yu, Miaomiao Ma, Yanqing Zhao, Hao Yang
  • for: 提高自动语音识别(ASR)系统对罕见名词的识别率,如人名、组织名称和技术术语等。
  • methods: 使用OpenAI的Whisper模型,首先通过关键词检测(KWS)模块匹配实体和语音示例的特征。
  • results: 在三个内部数据集和两个开源数据集上,包括英语、中文和code-switching场景,通过采用经过设计的口头形式提示,使Whisper模型的混合错误率(MER)和实体恢复率得到显著改进。
    Abstract End-to-end automatic speech recognition (ASR) systems often struggle to recognize rare name entities, such as personal names, organizations, or technical terms that are not frequently encountered in the training data. This paper presents Contextual Biasing Whisper (CB-Whisper), a novel ASR system based on OpenAI's Whisper model that performs keyword-spotting (KWS) before the decoder. The KWS module leverages text-to-speech (TTS) techniques and a convolutional neural network (CNN) classifier to match the features between the entities and the utterances. Experiments demonstrate that by incorporating predicted entities into a carefully designed spoken form prompt, the mixed-error-rate (MER) and entity recall of the Whisper model is significantly improved on three internal datasets and two open-sourced datasets that cover English-only, Chinese-only, and code-switching scenarios.
    摘要 通常的自动语音识别(ASR)系统经常遇到不寻常的名 Entity 识别问题,如人名、组织机构或技术术语,这些名 Entity 在训练数据中不充分出现。这篇论文提出了 Contextual Biasing Whisper(CB-Whisper),一种基于 OpenAI 的 Whisper 模型的 ASR 系统,其在 decoder 前使用 Keyword-spotting(KWS)模块。KWS 模块利用文本识别(TTS)技术和卷积神经网络(CNN)分类器将 Entity 与语音之间的特征进行匹配。实验表明,通过在精心设计的 spoken form 提示中包含预测的 Entity,可以显著提高 Whisper 模型的混合错误率(MER)和 Entity 回归率,在三个内部数据集和两个开源数据集中,这些数据集覆盖了英语、中文和代码混合enario。

Adaptive Reorganization of Neural Pathways for Continual Learning with Hybrid Spiking Neural Networks

  • paper_url: http://arxiv.org/abs/2309.09550
  • repo_url: None
  • paper_authors: Bing Han, Feifei Zhao, Wenxuan Pan, Zhaoya Zhao, Xianqi Li, Qingqun Kong, Yi Zeng
  • for: 本研究旨在开发一种基于大脑自组织的连续学习算法,以便让人工神经网络能够有效地适应增加的任务,同时保持性能和能耗水平。
  • methods: 该算法使用自组织调节网络(SOR-SNN),通过重组单个和有限的脉冲神经网络(SNN),生成丰富的稀热神经路径,以高效地处理增量任务。
  • results: 实验结果表明,该算法在多种连续学习任务上表现了稳定的优异性、能耗和存储能力,并能够有效地汇集过去学习的知识和当前任务的信息,实现了回传能力。此外,该模型还具有自修复能力,可以自动分配新的路径来恢复忘记的知识。
    Abstract The human brain can self-organize rich and diverse sparse neural pathways to incrementally master hundreds of cognitive tasks. However, most existing continual learning algorithms for deep artificial and spiking neural networks are unable to adequately auto-regulate the limited resources in the network, which leads to performance drop along with energy consumption rise as the increase of tasks. In this paper, we propose a brain-inspired continual learning algorithm with adaptive reorganization of neural pathways, which employs Self-Organizing Regulation networks to reorganize the single and limited Spiking Neural Network (SOR-SNN) into rich sparse neural pathways to efficiently cope with incremental tasks. The proposed model demonstrates consistent superiority in performance, energy consumption, and memory capacity on diverse continual learning tasks ranging from child-like simple to complex tasks, as well as on generalized CIFAR100 and ImageNet datasets. In particular, the SOR-SNN model excels at learning more complex tasks as well as more tasks, and is able to integrate the past learned knowledge with the information from the current task, showing the backward transfer ability to facilitate the old tasks. Meanwhile, the proposed model exhibits self-repairing ability to irreversible damage and for pruned networks, could automatically allocate new pathway from the retained network to recover memory for forgotten knowledge.
    摘要 人类大脑可以自组织富有多样性和稀疏的神经路径,以逐渐掌握百种认知任务。然而,现有的深度人工神经网络和脉冲神经网络的持续学习算法无法充分自适应网络的限制资源,导致任务增加后性能下降和能耗增加。在这篇论文中,我们提出了基于大脑自适应学习算法的快速适应神经网络,使用自组织调节网络(SOR-SNN)重组单个和有限的脉冲神经网络,以高效地处理增量任务。我们的模型在多种不同的持续学习任务上表现了一致的优异性、能耗和存储能力,包括从婴儿式简单到复杂任务,以及通用CIFAR100和ImageNet数据集。尤其是SOR-SNN模型在学习更复杂的任务和更多任务方面表现出色,并能够将过去学习的知识与当前任务的信息相结合,表现出返回传递能力以便恢复过去学习的知识。同时,我们的模型也展现了自动修复损害和剪除网络的自适应能力。

A performance characteristic curve for model evaluation: the application in information diffusion prediction

  • paper_url: http://arxiv.org/abs/2309.09537
  • repo_url: None
  • paper_authors: Wenjin Xie, Xiaomeng Wang, Radosław Michalski, Tao Jia
    for: 这种研究旨在预测社交媒体上的信息传播受众,有实际应用在市场营销和社交媒体领域。methods: 该研究使用了一种基于信息熵的指标来衡量推测模型的准确性,并发现了一种与随机性和预测准确性之间存在的整体关系。results: 研究发现,对不同的序列长度、系统大小和随机性,推测模型的性能特征曲线呈现出一种普适的趋势,这种曲线可以用来评估推测模型的性能。这种方法可以系统地评估不同推测模型的性能,并为未来的研究提供新的评估方法。
    Abstract The information diffusion prediction on social networks aims to predict future recipients of a message, with practical applications in marketing and social media. While different prediction models all claim to perform well, general frameworks for performance evaluation remain limited. Here, we aim to identify a performance characteristic curve for a model, which captures its performance on tasks of different complexity. We propose a metric based on information entropy to quantify the randomness in diffusion data, then identify a scaling pattern between the randomness and the prediction accuracy of the model. Data points in the patterns by different sequence lengths, system sizes, and randomness all collapse into a single curve, capturing a model's inherent capability of making correct predictions against increased uncertainty. Given that this curve has such important properties that it can be used to evaluate the model, we define it as the performance characteristic curve of the model. The validity of the curve is tested by three prediction models in the same family, reaching conclusions in line with existing studies. Also, the curve is successfully applied to evaluate two distinct models from the literature. Our work reveals a pattern underlying the data randomness and prediction accuracy. The performance characteristic curve provides a new way to systematically evaluate models' performance, and sheds light on future studies on other frameworks for model evaluation.
    摘要 社交媒体上的信息扩散预测目标是预测未来的消息接收者,有实际应用于市场营销和社交媒体。虽然不同的预测模型都宣称表现良好,但总体框架 для性能评估仍然有限。我们想要找到一个表现曲线,可以捕捉模型在不同复杂度任务上的表现。我们提出一种基于信息熵的度量,用于量化扩散数据中的随机性,然后确定模型预测正确率和随机性之间的扩散规律。数据点在不同的序列长度、系统大小和随机性下都可以归一化到单一的曲线上,捕捉模型在面临不确定性增加时的内在能力。由于这个曲线具有这些重要的性能特点,我们定义它为模型性能特征曲线。我们的实验证明了这个曲线的有效性,并应用到了Literature中的两种不同模型中。我们的研究发现了随机性和预测精度之间的关系,并提供了一种新的评估模型性能的方法。这种方法可以系统地评估模型的性能,并为未来的研究提供了新的思路。

DFIL: Deepfake Incremental Learning by Exploiting Domain-invariant Forgery Clues

  • paper_url: http://arxiv.org/abs/2309.09526
  • repo_url: https://github.com/deepfakeil/dfil
  • paper_authors: Kun Pan, Yin Yifang, Yao Wei, Feng Lin, Zhongjie Ba, Zhenguang Liu, ZhiBo Wang, Lorenzo Cavallaro, Kui Ren
  • for: 防止深伪造影像的普遍传播和攻击,提高伪造影像检测模型的准确性和适应能力。
  • methods: 提出一个增量学习框架,通过不断学习少量新数据来改善伪造影像检测模型的普遍性和适应能力。另外,提出了一个领域不断学习的方法,以获得不同数据分布下的领域共同表示。以及一种多角度知识传授法,以避免严重遗忘现象。
  • results: 在四个benchmark数据集上进行了广泛的实验,取得了新的state-of-the-art的平均遗忘率7.01和平均准确率85.49。
    Abstract The malicious use and widespread dissemination of deepfake pose a significant crisis of trust. Current deepfake detection models can generally recognize forgery images by training on a large dataset. However, the accuracy of detection models degrades significantly on images generated by new deepfake methods due to the difference in data distribution. To tackle this issue, we present a novel incremental learning framework that improves the generalization of deepfake detection models by continual learning from a small number of new samples. To cope with different data distributions, we propose to learn a domain-invariant representation based on supervised contrastive learning, preventing overfit to the insufficient new data. To mitigate catastrophic forgetting, we regularize our model in both feature-level and label-level based on a multi-perspective knowledge distillation approach. Finally, we propose to select both central and hard representative samples to update the replay set, which is beneficial for both domain-invariant representation learning and rehearsal-based knowledge preserving. We conduct extensive experiments on four benchmark datasets, obtaining the new state-of-the-art average forgetting rate of 7.01 and average accuracy of 85.49 on FF++, DFDC-P, DFD, and CDF2. Our code is released at https://github.com/DeepFakeIL/DFIL.
    摘要 黑客使用和广泛传播深伪造成了信任危机。现有的深伪检测模型通常可以通过大量数据训练Recognize forgery images,但检测模型对新的深伪生成方法生成的图像的准确率会降低significantly due to differences in data distribution. To address this issue, we propose a novel incremental learning framework that improves the generalization of deepfake detection models by continual learning from a small number of new samples. To cope with different data distributions, we propose to learn a domain-invariant representation based on supervised contrastive learning, preventing overfit to the insufficient new data. To mitigate catastrophic forgetting, we regularize our model in both feature-level and label-level based on a multi-perspective knowledge distillation approach. Finally, we propose to select both central and hard representative samples to update the replay set, which is beneficial for both domain-invariant representation learning and rehearsal-based knowledge preserving. We conduct extensive experiments on four benchmark datasets, achieving a new state-of-the-art average forgetting rate of 7.01 and average accuracy of 85.49 on FF++, DFDC-P, DFD, and CDF2. Our code is released at .

FedGKD: Unleashing the Power of Collaboration in Federated Graph Neural Networks

  • paper_url: http://arxiv.org/abs/2309.09517
  • repo_url: None
  • paper_authors: Qiying Pan, Ruofan Wu, Tengfei Liu, Tianyi Zhang, Yifei Zhu, Weiqiang Wang
  • for: 这篇论文旨在提出一个基于 Federated Training 的 Graph Neural Network (GNN) 框架,并解决在 Federated GNN 系统中的图素异常问题。
  • methods: 本文使用了一个新的客边端图数据精炼方法,将本地任务描述用更好地描述任务相关性,并引入了一个新的服务器端集成机制,让它更好地利用全球协作结构。
  • results: 本文的实验结果显示, FedGKD 框架在六个真实世界数据集上表现出优于其他方法,特别是在大规模数据集上。
    Abstract Federated training of Graph Neural Networks (GNN) has become popular in recent years due to its ability to perform graph-related tasks under data isolation scenarios while preserving data privacy. However, graph heterogeneity issues in federated GNN systems continue to pose challenges. Existing frameworks address the problem by representing local tasks using different statistics and relating them through a simple aggregation mechanism. However, these approaches suffer from limited efficiency from two aspects: low quality of task-relatedness quantification and inefficacy of exploiting the collaboration structure. To address these issues, we propose FedGKD, a novel federated GNN framework that utilizes a novel client-side graph dataset distillation method to extract task features that better describe task-relatedness, and introduces a novel server-side aggregation mechanism that is aware of the global collaboration structure. We conduct extensive experiments on six real-world datasets of different scales, demonstrating our framework's outperformance.
    摘要 随着 Federated 培训(Federated Training)的兴起,Graph Neural Networks(GNN)在数据隔离场景下进行图像相关任务的能力得到了广泛的关注。然而, Federated GNN 系统中的图像异ogeneity问题仍然具有挑战性。现有的框架通过使用不同的统计来表示本地任务和将其相关联的简单汇总机制来解决这个问题。然而,这些方法受到两个方面的限制:一是任务相关性评估质量低,二是不能充分利用全局协作结构。为了解决这些问题,我们提出了 FedGKD,一种基于客户端 Graph 数据集缩减方法的新的 Federated GNN 框架,并引入了服务器端具有全局协作结构的新汇总机制。我们在六个真实世界数据集上进行了广泛的实验,并证明了我们的框架的表现优于。

Pruning Large Language Models via Accuracy Predictor

  • paper_url: http://arxiv.org/abs/2309.09507
  • repo_url: None
  • paper_authors: Yupeng Ji, Yibo Cao, Jiucai Liu
  • for: 这篇论文旨在提出一种新的模型压缩方法,以便对大型语言模型(LLMs)进行压缩,以提高训练、测试和部署的效率。
  • methods: 这篇论文提出了一种新的模型压缩方法,包括首先建立一个训练集,其中包含一定数量的架构精度对。然后,使用这个精度预测器进行进一步优化搜索空间,以找到最佳的模型。
  • results: 实验结果显示,提案的方法具有高效性和高精度。相比基准,在Wikitext2和PTB上的PPL下降9.48%和5.76%,MMLU的平均精度提高6.28%.
    Abstract Large language models(LLMs) containing tens of billions of parameters (or even more) have demonstrated impressive capabilities in various NLP tasks. However, substantial model size poses challenges to training, inference, and deployment so that it is necessary to compress the model. At present, most model compression for LLMs requires manual design of pruning features, which has problems such as complex optimization pipeline and difficulty in retaining the capabilities of certain parts of the model.Therefore, we propose a novel pruning approach: firstly, a training set of a certain number of architecture-accuracy pairs is established, and then a non-neural model is trained as an accuracy predictor. Using the accuracy predictor to further optimize the search space and search, the optimal model can be automatically selected. Experiments show that our proposed approach is effective and efficient. Compared with the baseline, the perplexity(PPL) on Wikitext2 and PTB dropped by 9.48% and 5,76% respectively, and the average accuracy of MMLU increased by 6.28%.
    摘要 大型语言模型(LLM)包含十以上亿个参数(甚至更多)的表现很出色在不同的自然语言处理任务中。然而,大型模型的训练、推理和部署带来了很多挑战,因此需要压缩模型。目前,大多数模型压缩方法 для LLM 需要手动设计特征剪除,这种方法存在复杂的优化管道和维护特定部分模型的困难。因此,我们提出了一种新的剪除方法:首先,建立一个固定数量的体系精度对的训练集,然后使用一个非神经网络模型来预测精度。使用预测器进一步优化搜索空间,并通过搜索,选择最佳模型。实验表明,我们提出的方法有效和高效。相比基准, Wikitext2 和 PTB 的 PPL 下降了9.48%和5.76%,MMLU 的平均精度提高了6.28%。

PromptST: Prompt-Enhanced Spatio-Temporal Multi-Attribute Prediction

  • paper_url: http://arxiv.org/abs/2309.09500
  • repo_url: None
  • paper_authors: Zijian Zhang, Xiangyu Zhao, Qidong Liu, Chunxu Zhang, Qian Ma, Wanyu Wang, Hongwei Zhao, Yiqi Wang, Zitao Liu
    for:This paper focuses on the problem of spatio-temporal multi-attribute prediction, which is a critical part of urban management. The authors aim to address the challenge of handling diverse spatio-temporal attributes simultaneously and improve the prediction performance.methods:The proposed method, called PromptST, consists of a spatio-temporal transformer and a parameter-sharing training scheme. The authors also introduce a spatio-temporal prompt tuning strategy to fit specific attributes in a lightweight manner.results:Extensive experiments on real-world datasets show that PromptST achieves state-of-the-art performance in spatio-temporal multi-attribute prediction. Additionally, the authors prove that PromptST has good transferability on unseen spatio-temporal attributes, which has promising application potential in urban computing.
    Abstract In the era of information explosion, spatio-temporal data mining serves as a critical part of urban management. Considering the various fields demanding attention, e.g., traffic state, human activity, and social event, predicting multiple spatio-temporal attributes simultaneously can alleviate regulatory pressure and foster smart city construction. However, current research can not handle the spatio-temporal multi-attribute prediction well due to the complex relationships between diverse attributes. The key challenge lies in how to address the common spatio-temporal patterns while tackling their distinctions. In this paper, we propose an effective solution for spatio-temporal multi-attribute prediction, PromptST. We devise a spatio-temporal transformer and a parameter-sharing training scheme to address the common knowledge among different spatio-temporal attributes. Then, we elaborate a spatio-temporal prompt tuning strategy to fit the specific attributes in a lightweight manner. Through the pretrain and prompt tuning phases, our PromptST is able to enhance the specific spatio-temoral characteristic capture by prompting the backbone model to fit the specific target attribute while maintaining the learned common knowledge. Extensive experiments on real-world datasets verify that our PromptST attains state-of-the-art performance. Furthermore, we also prove PromptST owns good transferability on unseen spatio-temporal attributes, which brings promising application potential in urban computing. The implementation code is available to ease reproducibility.
    摘要 在信息爆发时代,空间时间数据挖掘作为城市管理的重要组成部分。考虑到不同领域的需求,如交通状况、人员活动和社会事件,同时预测多个空间时间属性可以减轻管理压力和推动智能城市建设。然而,当前的研究不能好地处理多个空间时间属性的预测,因为这些属性之间存在复杂的关系。主要挑战在于如何处理共同的空间时间模式,同时考虑各属性的特点。在本文中,我们提出一种高效的空间时间多属性预测解决方案——PromptST。我们设计了一种空间时间变换器和共享参数训练方案,以处理不同空间时间属性之间的共同知识。然后,我们提出了一种空间时间Prompt调整策略,以适应特定属性的特点。通过预训练和Prompt调整两个阶段,我们的PromptST能够提高特定空间时间特征的捕捉,同时维护学习的共同知识。广泛的实验表明,我们的PromptST可以达到状态机器的表现。此外,我们还证明PromptST具有良好的传输性,可以在未看到的空间时间属性上进行应用,这提供了智能城市建设的广阔应用前景。代码实现可以方便复制。

CLIP-based Synergistic Knowledge Transfer for Text-based Person Retrieval

  • paper_url: http://arxiv.org/abs/2309.09496
  • repo_url: None
  • paper_authors: Yating liu, Yaowei Li, Zimo Liu, Wenming Yang, Yaowei Wang, Qingmin Liao
  • for: 文本基准人识别(TBPR)
  • methods: 使用 CLIP 基于的 Synergistic Knowledge Transfer(CSKT)方法,包括文本到图像和图像到文本的bidirectional prompts 和 coupling projections,以及图像和语言多头自注意力的双向转移知识机制。
  • results: CSKT 方法在三个标准测试集上比前一个最佳方法表现出色,只需要训练参数占模型总参数的 7.4%,显示其高效、有效和普遍。
    Abstract Text-based Person Retrieval aims to retrieve the target person images given a textual query. The primary challenge lies in bridging the substantial gap between vision and language modalities, especially when dealing with limited large-scale datasets. In this paper, we introduce a CLIP-based Synergistic Knowledge Transfer(CSKT) approach for TBPR. Specifically, to explore the CLIP's knowledge on input side, we first propose a Bidirectional Prompts Transferring (BPT) module constructed by text-to-image and image-to-text bidirectional prompts and coupling projections. Secondly, Dual Adapters Transferring (DAT) is designed to transfer knowledge on output side of Multi-Head Self-Attention (MHSA) in vision and language. This synergistic two-way collaborative mechanism promotes the early-stage feature fusion and efficiently exploits the existing knowledge of CLIP. CSKT outperforms the state-of-the-art approaches across three benchmark datasets when the training parameters merely account for 7.4% of the entire model, demonstrating its remarkable efficiency, effectiveness and generalization.
    摘要 文本基于人物检索(Text-based Person Retrieval,TBPR)目标是根据文本查询 retrieve 目标人像。主要挑战在视觉和语言模式之间的差异 bridge,特别是处理有限的大规模数据集。本文提出了基于 CLIP 的 Synergistic Knowledge Transfer(CSKT)方法 для TBPR。具体来说,我们首先提出了文本到图像和图像到文本的双向提问模块(BPT),并将其与 coupling projections 结合。其次,我们设计了双Adapter Transferring(DAT)模块,用于在视觉和语言中的多头自注意力(MHSA)知识传递。这种同时合作机制促进了早期特征融合,高效地利用 CLIP 已有的知识。CSKT 在三个标准数据集上超过了状态体系的方法,即使只用训练参数占据 7.4% 的整个模型,表明其非常高效、有效和普遍。

HiFTNet: A Fast High-Quality Neural Vocoder with Harmonic-plus-Noise Filter and Inverse Short Time Fourier Transform

  • paper_url: http://arxiv.org/abs/2309.09493
  • repo_url: None
  • paper_authors: Yinghao Aaron Li, Cong Han, Xilin Jiang, Nima Mesgarani
  • for: 高品质语音合成
  • methods: integrate inverse short-time Fourier transform (iSTFT) into the network, incorporates a harmonic-plus-noise source filter in the time-frequency domain
  • results: 比 iSTFTNet 和 HiFi-GAN 更高效, achieve ground-truth-level performance, outperforms BigVGAN 在未看到 Speaker 上, achieve comparable performance to BigVGAN while being four times faster with only $1/6$ of the parameters.
    Abstract Recent advancements in speech synthesis have leveraged GAN-based networks like HiFi-GAN and BigVGAN to produce high-fidelity waveforms from mel-spectrograms. However, these networks are computationally expensive and parameter-heavy. iSTFTNet addresses these limitations by integrating inverse short-time Fourier transform (iSTFT) into the network, achieving both speed and parameter efficiency. In this paper, we introduce an extension to iSTFTNet, termed HiFTNet, which incorporates a harmonic-plus-noise source filter in the time-frequency domain that uses a sinusoidal source from the fundamental frequency (F0) inferred via a pre-trained F0 estimation network for fast inference speed. Subjective evaluations on LJSpeech show that our model significantly outperforms both iSTFTNet and HiFi-GAN, achieving ground-truth-level performance. HiFTNet also outperforms BigVGAN-base on LibriTTS for unseen speakers and achieves comparable performance to BigVGAN while being four times faster with only $1/6$ of the parameters. Our work sets a new benchmark for efficient, high-quality neural vocoding, paving the way for real-time applications that demand high quality speech synthesis.
    摘要 现代语音合成技术已经利用GAN基于网络,如HiFi-GAN和BigVGAN,生成高质量波形从mel-spectrogram中。然而,这些网络 computationally expensive和parameter-heavy。iSTFTNetAddress这些限制,通过将 inverse short-time Fourier transform (iSTFT) integrate into the network,实现了速度和参数效率。在这篇论文中,我们介绍了一种增强版iSTFTNet,称为HiFTNet,它在时域-频域中使用一个harmonic-plus-noise源滤波器,使用来自基准频率(F0) 的恒定频率来进行快速的推理。对LJSpeech进行主观评估,我们发现我们的模型在iSTFTNet和HiFi-GAN之上显著超越,达到了参照水平的性能。HiFTNet还在LibriTTS上超越BigVGAN-base,并在未看到说话者时达到了相同的性能,只用了BigVGAN的一半参数和速度。我们的工作设置了新的高质量、高效的神经 vocoding 标准,为实时应用程序提供了高质量语音合成的路径。

Mechanic Maker 2.0: Reinforcement Learning for Evaluating Generated Rules

  • paper_url: http://arxiv.org/abs/2309.09476
  • repo_url: None
  • paper_authors: Johor Jara Gonzalez, Seth Cooper, Mathew Guzdial
  • for: 本研究用Reinforcement Learning(RL)作为人类游戏行为的近似方法,用于自动生成游戏规则。
  • methods: 本研究使用RL方法来模拟人类游戏行为,并在Unity游戏引擎上创建了一个新的开源规则生成框架。
  • results: 研究结果表明,RL生成的规则与A*机器人基eline有所不同,可能更有用于人类游戏玩家。
    Abstract Automated game design (AGD), the study of automatically generating game rules, has a long history in technical games research. AGD approaches generally rely on approximations of human play, either objective functions or AI agents. Despite this, the majority of these approximators are static, meaning they do not reflect human player's ability to learn and improve in a game. In this paper, we investigate the application of Reinforcement Learning (RL) as an approximator for human play for rule generation. We recreate the classic AGD environment Mechanic Maker in Unity as a new, open-source rule generation framework. Our results demonstrate that RL produces distinct sets of rules from an A* agent baseline, which may be more usable by humans.
    摘要 自动游戏设计(AGD),游戏规则自动生成的研究,有很长的历史在技术游戏研究中。AGD方法通常基于人类游戏行为的估计,可以是目标函数或AI代理。尽管如此,大多数这些估计都是静态的, meaning they do not reflect human players' ability to learn and improve in a game. 在这篇论文中,我们研究了在人类游戏行为中使用奖励学习(RL)作为估计器。我们在Unity中重新创建了 Mechanic Maker 环境,并将其转换为一个新的、开源的规则生成框架。我们的结果表明,RL 可以生成与 A* 代理基线相比较为人类更可用的规则集。

Exploring and Learning in Sparse Linear MDPs without Computationally Intractable Oracles

  • paper_url: http://arxiv.org/abs/2309.09457
  • repo_url: None
  • paper_authors: Noah Golowich, Ankur Moitra, Dhruv Rohatgi
  • for: 这篇论文的目的是解决线性Markov决策过程(MDP)中的特征选择问题,即在缺乏专家领域知识的情况下,通过学习一个近似优化策略,以便在仅有多少交互中学习环境。
  • methods: 这篇论文使用了特征选择的思想,即在一个$k$-稀疏的线性MDP中,找到一个SIZE$k$的子集$S\subset[d]$,其中$d$是特征的维度,这个子集包含所有有用的特征。这篇论文提出了首个可行的算法来解决这个问题。
  • results: 这篇论文的主要结果是在线性MDP中实现了近似优化策略,并且这个算法只需要多少交互来学习。此外,这篇论文还引入了一种观测器,即一种简洁的逻辑表示环境的转移,可以有效地计算一些Bellman backup。这个观测器可以通过几何编程来计算得到。此外,这篇论文还给出了一种可以在块MDP中实现的算法,该算法可以在几何时间内学习一个近似优化策略,并且这个算法可以在多少交互中学习。
    Abstract The key assumption underlying linear Markov Decision Processes (MDPs) is that the learner has access to a known feature map $\phi(x, a)$ that maps state-action pairs to $d$-dimensional vectors, and that the rewards and transitions are linear functions in this representation. But where do these features come from? In the absence of expert domain knowledge, a tempting strategy is to use the ``kitchen sink" approach and hope that the true features are included in a much larger set of potential features. In this paper we revisit linear MDPs from the perspective of feature selection. In a $k$-sparse linear MDP, there is an unknown subset $S \subset [d]$ of size $k$ containing all the relevant features, and the goal is to learn a near-optimal policy in only poly$(k,\log d)$ interactions with the environment. Our main result is the first polynomial-time algorithm for this problem. In contrast, earlier works either made prohibitively strong assumptions that obviated the need for exploration, or required solving computationally intractable optimization problems. Along the way we introduce the notion of an emulator: a succinct approximate representation of the transitions that suffices for computing certain Bellman backups. Since linear MDPs are a non-parametric model, it is not even obvious whether polynomial-sized emulators exist. We show that they do exist and can be computed efficiently via convex programming. As a corollary of our main result, we give an algorithm for learning a near-optimal policy in block MDPs whose decoding function is a low-depth decision tree; the algorithm runs in quasi-polynomial time and takes a polynomial number of samples. This can be seen as a reinforcement learning analogue of classic results in computational learning theory. Furthermore, it gives a natural model where improving the sample complexity via representation learning is computationally feasible.
    摘要 Linear Markov Decision Processes (MDPs) 的关键假设是学习者有访问known feature map $\phi(x, a)$,这个映射将状态-动作对映射到 $d$-维向量上,并且奖励和转移是这个表示下的线性函数。但是这些特征来自哪里?在不具备专家领域知识的情况下,一个自然的策略是使用“厨房抽象”方法,希望真实的特征包含在一个更大的可能性集中。在这篇文章中,我们从 linear MDPs 的特征选择角度重新探讨这个问题。在 $k$-简 sparse linear MDP 中,存在一个未知的子集 $S \subset [d]$ 的大小为 $k$,包含所有相关的特征,并且目标是通过只需 poly$(k, \log d)$ 交互来学习一个近似优化的策略。我们的主要结果是首先的幂时间算法。在对比之下,先前的作品 Either 假设了可观察的环境,或者需要解决 computationally intractable 优化问题。在路径中,我们引入了一个新的概念:模拟器。模拟器是一种简洁的 Approximate represntation of the transitions,可以用于计算某些 Bellman backups。由于 linear MDPs 是非 Parametric 模型,也不知道 whether polynomial-sized emulators exist。我们证明了它们确实存在,并可以通过几何Programming 计算得到。作为一个辅助结果,我们提供了一个算法,可以在块 MDPs 中学习一个近似优化的策略,其中 decoding 函数是一个 low-depth 决策树。该算法在 quasi-polynomial 时间内运行,并且只需要 polynomial 个样本。这可以看作是一种 reinforcement learning 的类比,以及一种可以通过 representation learning 提高样本复杂性的计算可能性。更多细节可以参考文章。

Are You Worthy of My Trust?: A Socioethical Perspective on the Impacts of Trustworthy AI Systems on the Environment and Human Society

  • paper_url: http://arxiv.org/abs/2309.09450
  • repo_url: None
  • paper_authors: Jamell Dacon
  • for: 本文旨在探讨人工智能系统在社会中的影响,以及如何通过多学科合作和系统性的审核来确保人工智能的可靠性。
  • methods: 本文使用了多学科的视角和系统性的审核方法来检视人工智能系统的社会影响。
  • results: 本文指出,人工智能系统的社会影响包括能源消耗和碳脚印,以及对用户的社会发展影响。此外,本文还探讨了人工智能系统的多学科风险和不可避免的社会影响。
    Abstract With ubiquitous exposure of AI systems today, we believe AI development requires crucial considerations to be deemed trustworthy. While the potential of AI systems is bountiful, though, is still unknown-as are their risks. In this work, we offer a brief, high-level overview of societal impacts of AI systems. To do so, we highlight the requirement of multi-disciplinary governance and convergence throughout its lifecycle via critical systemic examinations (e.g., energy consumption), and later discuss induced effects on the environment (i.e., carbon footprint) and its users (i.e., social development). In particular, we consider these impacts from a multi-disciplinary perspective: computer science, sociology, environmental science, and so on to discuss its inter-connected societal risks and inability to simultaneously satisfy aspects of well-being. Therefore, we accentuate the necessity of holistically addressing pressing concerns of AI systems from a socioethical impact assessment perspective to explicate its harmful societal effects to truly enable humanity-centered Trustworthy AI.
    摘要 From a computer science perspective, we must examine the energy consumption of AI systems and their potential carbon footprint. From a sociological perspective, we must consider the impact of AI systems on social development and the potential for induced effects on users. Environmental science must also be taken into account, as AI systems may have a significant impact on the environment.To truly enable humanity-centered Trustworthy AI, we must address these concerns holistically, taking a socioethical impact assessment perspective to explicate the harmful societal effects of AI systems. By considering these impacts from a multi-disciplinary perspective, we can better understand the interconnected risks and challenges posed by AI systems and work towards developing Trustworthy AI that benefits society as a whole.

A Schedule of Duties in the Cloud Space Using a Modified Salp Swarm Algorithm

  • paper_url: http://arxiv.org/abs/2309.09441
  • repo_url: None
  • paper_authors: Hossein Jamali, Ponkoj Chandra Shill, David Feil-Seifer, Frederick C. Harris, Jr., Sergiu M. Dascalu
  • for: 这个论文的目的是提出一种基于集群智能算法的云计算任务调度算法,以提高云计算服务的效率和质量。
  • methods: 该论文使用了改进后的Salp Swarm Algorithm(SSA),并对其进行了比较与基于遗传算法(GA)、Particle Swarm Optimization(PSO)、连续ACO(ACO)等算法的性能比较。
  • results: 研究发现,提出的算法在云计算任务调度问题中的性能较高,比如与基本SSA相比,该算法的吞吐量减少约21%。
    Abstract Cloud computing is a concept introduced in the information technology era, with the main components being the grid, distributed, and valuable computing. The cloud is being developed continuously and, naturally, comes up with many challenges, one of which is scheduling. A schedule or timeline is a mechanism used to optimize the time for performing a duty or set of duties. A scheduling process is accountable for choosing the best resources for performing a duty. The main goal of a scheduling algorithm is to improve the efficiency and quality of the service while at the same time ensuring the acceptability and effectiveness of the targets. The task scheduling problem is one of the most important NP-hard issues in the cloud domain and, so far, many techniques have been proposed as solutions, including using genetic algorithms (GAs), particle swarm optimization, (PSO), and ant colony optimization (ACO). To address this problem, in this paper, one of the collective intelligence algorithms, called the Salp Swarm Algorithm (SSA), has been expanded, improved, and applied. The performance of the proposed algorithm has been compared with that of GAs, PSO, continuous ACO, and the basic SSA. The results show that our algorithm has generally higher performance than the other algorithms. For example, compared to the basic SSA, the proposed method has an average reduction of approximately 21% in makespan.
    摘要 云计算是信息时代中提出的概念,其主要组成部分包括网格、分布式和价值计算。云是不断发展的,并且随着时间的推移而带来许多挑战,其中之一是调度。调度是一种机制,用于优化执行任务的时间。调度过程的目标是选择最佳资源来执行任务。云计算领域中的调度问题是NP困难问题之一,至今为止,已经有许多技术提出了解决方案,包括使用遗传算法(GAs)、粒子群动力学(PSO)和蚁群动力学(ACO)。为解决这个问题,在这篇论文中,一种集成智能算法,即Salp Swarm Algorithm(SSA),被扩展、改进并应用。提出的算法的性能与GAs、PSO、继续ACO和基本SSA进行比较,结果表明,我们的算法在性能方面与其他算法相比,有一定的优势。例如,相比基本SSA,我们的方法在做出span中减少了约21%的差异。

Multi-Agent Deep Reinforcement Learning for Cooperative and Competitive Autonomous Vehicles using AutoDRIVE Ecosystem

  • paper_url: http://arxiv.org/abs/2309.10007
  • repo_url: None
  • paper_authors: Tanmay Vilas Samak, Chinmay Vilas Samak, Venkat Krovi
  • for: 这 paper 旨在开发一种模块化和并行化的多代理人深度学习框架,用于养成自驾车辆的合作和竞争行为。
  • methods: 该 paper 使用了 AutoDRIVE Ecosystem 作为开发 physically accurate 和 graphically realistic 的数字双胞虫的 enables,并在这个 ecosystem 中训练和部署多代理人学习策略。
  • results: experiments 表明,在交叉路口通行问题和thead-to-head自驱车竞赛问题中,使用这种框架可以养成高效的合作和竞争行为,并且在带有偏差和安全约束的情况下进行了可靠的训练和测试。
    Abstract This work presents a modular and parallelizable multi-agent deep reinforcement learning framework for imbibing cooperative as well as competitive behaviors within autonomous vehicles. We introduce AutoDRIVE Ecosystem as an enabler to develop physically accurate and graphically realistic digital twins of Nigel and F1TENTH, two scaled autonomous vehicle platforms with unique qualities and capabilities, and leverage this ecosystem to train and deploy multi-agent reinforcement learning policies. We first investigate an intersection traversal problem using a set of cooperative vehicles (Nigel) that share limited state information with each other in single as well as multi-agent learning settings using a common policy approach. We then investigate an adversarial head-to-head autonomous racing problem using a different set of vehicles (F1TENTH) in a multi-agent learning setting using an individual policy approach. In either set of experiments, a decentralized learning architecture was adopted, which allowed robust training and testing of the approaches in stochastic environments, since the agents were mutually independent and exhibited asynchronous motion behavior. The problems were further aggravated by providing the agents with sparse observation spaces and requiring them to sample control commands that implicitly satisfied the imposed kinodynamic as well as safety constraints. The experimental results for both problem statements are reported in terms of quantitative metrics and qualitative remarks for training as well as deployment phases.
    摘要 In the first set of experiments, we investigate an intersection traversal problem using a set of cooperative vehicles (Nigel) that share limited state information with each other. We use a common policy approach in both single and multi-agent learning settings. In the second set of experiments, we investigate an adversarial head-to-head autonomous racing problem using a different set of vehicles (F1TENTH) in a multi-agent learning setting using an individual policy approach.In both sets of experiments, we adopt a decentralized learning architecture that allows for robust training and testing of the approaches in stochastic environments. The agents are mutually independent and exhibit asynchronous motion behavior, which adds complexity to the problems. To make the problems more challenging, we provide the agents with sparse observation spaces and require them to sample control commands that implicitly satisfy the imposed kinodynamic and safety constraints.The experimental results for both problem statements are reported in terms of quantitative metrics and qualitative remarks for training and deployment phases. The results demonstrate the effectiveness of the proposed framework in imbibing cooperative and competitive behaviors in autonomous vehicles.

The Optimized path for the public transportation of Incheon in South Korea

  • paper_url: http://arxiv.org/abs/2309.10006
  • repo_url: None
  • paper_authors: Soroor Malekmohammadi faradunbeh, Hongle Li, Mangkyu Kang, Choongjae Iim
  • for: 本研究旨在为沟通系统路径选择优化,以满足乘客需求。
  • methods: 我们提出了一种基于修改A*算法的路径找路方法,可以在实时中找到大量数据点的最短路径。
  • results: 我们的方法比基本路径找路算法如基因算法和迪克斯特拉算法更好,可以快速找到最短路径。
    Abstract Path-finding is one of the most popular subjects in the field of computer science. Pathfinding strategies determine a path from a given coordinate to another. The focus of this paper is on finding the optimal path for the bus transportation system based on passenger demand. This study is based on bus stations in Incheon, South Korea, and we show that our modified A* algorithm performs better than other basic pathfinding algorithms such as the Genetic and Dijkstra. Our proposed approach can find the shortest path in real-time even for large amounts of data(points).
    摘要 “路径找路是计算机科学领域中最受欢迎的主题之一。路径找路策略的目的是将 FROM 点到 DESTINATION 点找到最佳路径。本研究是基于韩国仁川市的公共汽车站点,我们显示了我们的修改了A\*搜寻算法可以在实时运算大量数据点上取得更短路径。”Here's the breakdown of the translation:“路径找路” (lù xiào lù) - path-finding“是计算机科学领域中最受欢迎的主题之一” (shì computacional kēxué yù zhì yǐn yī yī) - one of the most popular subjects in the field of computer science“路径找路策略” (lù xiào lù zhì lü) - path-finding strategies“的目的是将 FROM 点到 DESTINATION 点找到最佳路径” (de mù yì yī jīn yǐn zhèng yī jīn) - the goal is to find the shortest path from FROM point to DESTINATION point“本研究是基于韩国仁川市的公共汽车站点” (ben yán jí yì yī jīn yǐn zhèng yī jīn) - this study is based on bus stations in Incheon, South Korea“我们显示了我们的修改了A\*搜寻算法可以在实时运算大量数据点上取得更短路径” (wǒ men jiàn shì le yǒu men de xiū gòu le A\* sōu xún suān yì yī yī yī) - we show that our modified A* search algorithm can obtain shorter paths in real-time for large amounts of data points.

FactoFormer: Factorized Hyperspectral Transformers with Self-Supervised Pre-Training

  • paper_url: http://arxiv.org/abs/2309.09431
  • repo_url: https://github.com/csiro-robotics/factoformer
  • paper_authors: Shaheer Mohamed, Maryam Haghighat, Tharindu Fernando, Sridha Sridharan, Clinton Fookes, Peyman Moghadam
  • for: 本研究旨在使用变换器来提高卷积图像分类 task 的性能。
  • methods: 本研究使用了一种新的分解 spectral-spatial transformer,并提出了一种基于自我supervised pre-training的方法,以及一种基于masking的预训练策略。
  • results: 实验结果表明,该模型在三个公开的 dataset 上的分类任务中均达到了状态册的性能。
    Abstract Hyperspectral images (HSIs) contain rich spectral and spatial information. Motivated by the success of transformers in the field of natural language processing and computer vision where they have shown the ability to learn long range dependencies within input data, recent research has focused on using transformers for HSIs. However, current state-of-the-art hyperspectral transformers only tokenize the input HSI sample along the spectral dimension, resulting in the under-utilization of spatial information. Moreover, transformers are known to be data-hungry and their performance relies heavily on large-scale pre-training, which is challenging due to limited annotated hyperspectral data. Therefore, the full potential of HSI transformers has not been fully realized. To overcome these limitations, we propose a novel factorized spectral-spatial transformer that incorporates factorized self-supervised pre-training procedures, leading to significant improvements in performance. The factorization of the inputs allows the spectral and spatial transformers to better capture the interactions within the hyperspectral data cubes. Inspired by masked image modeling pre-training, we also devise efficient masking strategies for pre-training each of the spectral and spatial transformers. We conduct experiments on three publicly available datasets for HSI classification task and demonstrate that our model achieves state-of-the-art performance in all three datasets. The code for our model will be made available at https://github.com/csiro-robotics/factoformer.
    摘要 干扰图像(HSIs)含有丰富的spectral和空间信息。驱动于自然语言处理和计算机视觉领域中transformers的成功,现有研究强调使用transformers来处理HSIs。然而,当前状态的艺术HSITransformer只是在spectral维度上进行HSIs的token化,从而导致了空间信息的不足利用。此外,transformers是知道数据备受的,其性能强度取决于大规模的预训练,这是因为有限的注解干扰图像数据。因此,HSITransformer的全部潜力尚未得到充分利用。为了解决这些限制,我们提出了一种新的分解 spectral-spatial transformer,该模型包括分解自我监督预训练过程,从而导致显著的性能提升。该模型的分解输入允许spectral和空间变换更好地捕捉干扰数据立方体中的交互。我们还根据干扰图像模型预训练的想法,设计了高效的屏蔽策略,用于预训练每个spectral和空间变换。我们在三个公共可用的干扰图像数据集上进行了实验,并证明了我们的模型在所有三个数据集中均达到了状态之最的性能。我们的代码将在https://github.com/csiro-robotics/factoformer上公开。

Joint Demosaicing and Denoising with Double Deep Image Priors

  • paper_url: http://arxiv.org/abs/2309.09426
  • repo_url: None
  • paper_authors: Taihui Li, Anish Lahiri, Yutong Dai, Owen Mayer
  • for: joint demosaicing and denoising of RAW images
  • methods: direct injection of prior ( DoubleDIP) without requiring training data
  • results: consistently outperforms other compared methods in terms of PSNR, SSIM, and qualitative visual perception
    Abstract Demosaicing and denoising of RAW images are crucial steps in the processing pipeline of modern digital cameras. As only a third of the color information required to produce a digital image is captured by the camera sensor, the process of demosaicing is inherently ill-posed. The presence of noise further exacerbates this problem. Performing these two steps sequentially may distort the content of the captured RAW images and accumulate errors from one step to another. Recent deep neural-network-based approaches have shown the effectiveness of joint demosaicing and denoising to mitigate such challenges. However, these methods typically require a large number of training samples and do not generalize well to different types and intensities of noise. In this paper, we propose a novel joint demosaicing and denoising method, dubbed JDD-DoubleDIP, which operates directly on a single RAW image without requiring any training data. We validate the effectiveness of our method on two popular datasets -- Kodak and McMaster -- with various noises and noise intensities. The experimental results show that our method consistently outperforms other compared methods in terms of PSNR, SSIM, and qualitative visual perception.
    摘要 这里的数码相机处理管线中,采样和噪音处理是重要的步骤。因为摄像机感应器只Capture一 third of the color information needed to produce a digital image, therefore, the process of demosaicing is inherently ill-posed. The presence of noise further exacerbates this problem. Performing these two steps sequentially may distort the content of the captured RAW images and accumulate errors from one step to another. Recent deep neural network-based approaches have shown the effectiveness of joint demosaicing and denoising to mitigate such challenges. However, these methods typically require a large number of training samples and do not generalize well to different types and intensities of noise.在本文中,我们提出了一个新的共同采样和噪音处理方法,名为JDD-DoubleDIP,这个方法直接运算在单一的 RAW 图像上,不需要任何训练数据。我们验证了我们的方法在两个流行的数据集(Kodak 和 McMaster)上,具有不同的噪音和噪音强度时,可以实现更高的 PSNR、SSIM 和质感视觉。

Causal Discovery and Prediction: Methods and Algorithms

  • paper_url: http://arxiv.org/abs/2309.09416
  • repo_url: None
  • paper_authors: Gilles Blondel
  • for: 这个博士论文是为了提出一种能够在很多不同情况下快速和有效地发现 causal model的算法,并且可以避免在实际世界中进行不必要的实验。
  • methods: 这个算法使用了一种通过设计简单的 experiment 来评估每个可能的 intervention 的 cost,并且使用了一种最低成本序列的 intervention 来找出 causal relations。
  • results: 这个算法可以在大多数情况下使用相对便宜的 intervention 来排除大量 causal model candidate,并且可以确保在找出 causal effects 时不会产生很多不必要的实验。此外,这个算法还可以在有限的试验次数下完成 causal discovery。
    Abstract We are not only observers but also actors of reality. Our capability to intervene and alter the course of some events in the space and time surrounding us is an essential component of how we build our model of the world. In this doctoral thesis we introduce a generic a-priori assessment of each possible intervention, in order to select the most cost-effective interventions only, and avoid unnecessary systematic experimentation on the real world. Based on this a-priori assessment, we propose an active learning algorithm that identifies the causal relations in any given causal model, using a least cost sequence of interventions. There are several novel aspects introduced by our algorithm. It is, in most case scenarios, able to discard many causal model candidates using relatively inexpensive interventions that only test one value of the intervened variables. Also, the number of interventions performed by the algorithm can be bounded by the number of causal model candidates. Hence, fewer initial candidates (or equivalently, more prior knowledge) lead to fewer interventions for causal discovery. Causality is intimately related to time, as causes appear to precede their effects. Cyclical causal processes are a very interesting case of causality in relation to time. In this doctoral thesis we introduce a formal analysis of time cyclical causal settings by defining a causal analog to the purely observational Dynamic Bayesian Networks, and provide a sound and complete algorithm for the identification of causal effects in the cyclic setting. We introduce the existence of two types of hidden confounder variables in this framework, which affect in substantially different ways the identification procedures, a distinction with no analog in either Dynamic Bayesian Networks or standard causal graphs.
    摘要 我们不只是观察者,也是现实中的actor。我们可以在空间和时间周围 intervene 和改变一些事件的走向,这是我们构建世界模型的重要组成部分。在这个博士论文中,我们介绍了一种通用的先行评估,以选择最cost-effective的 intervención,并避免在实际世界进行无必要的系统性实验。基于这种先行评估,我们提议一种活动学习算法,用于在任何给定 causal model 中找到 causal 关系,使用最小成本序列的 intervención。我们的算法有几个新的特点:在大多数情况下,可以通过对 intervened 变量的一个值进行相对较少的实验来抛弃许多 causal model 候选者。此外,我们的算法可以 guarantee 在给定 causal model 候选者的数量上限下进行 intervención。因此, fewer initial candidates (或等效的更多先知) 会导致更少的 intervención для causal discovery。 causality 与时间 closely related, causes 通常会先于其效果出现。循环 causal process 是 causality 在时间方面的一个非常有趣的情况。在这个博士论文中,我们提出了一种形式的时间循环 causal setting 的分析,并提供了一个完整的算法,用于在这种设定下找到 causal effect。我们还引入了两种隐藏干扰变量,这些变量在不同的方式影响了我们的 identification 过程,与 neither Dynamic Bayesian Networks nor standard causal graphs 中的干扰变量没有相应的分类。

Does Video Summarization Require Videos? Quantifying the Effectiveness of Language in Video Summarization

  • paper_url: http://arxiv.org/abs/2309.09405
  • repo_url: None
  • paper_authors: Yoonsoo Nam, Adam Lehavi, Daniel Yang, Digbalay Bose, Swabha Swayamdipta, Shrikanth Narayanan
  • for: 这个论文是为了开发一个高效、语言基于的视频摘要器,以提高计算机视觉中的视频摘要效果。
  • methods: 该论文使用了语言变换器模型,只使用文本描述来进行训练,而不需要图像表示。文本描述通过零批量学习获取,并通过筛选表示文本vector进行压缩。
  • results: 该论文可以具有更高的数据效率和解释性,并且可以保持与传统方法相比的结果水平。在模式和数据压缩方面,研究发现,只使用语言模式可以有效地减少输入数据处理量,而不会导致结果下降。
    Abstract Video summarization remains a huge challenge in computer vision due to the size of the input videos to be summarized. We propose an efficient, language-only video summarizer that achieves competitive accuracy with high data efficiency. Using only textual captions obtained via a zero-shot approach, we train a language transformer model and forego image representations. This method allows us to perform filtration amongst the representative text vectors and condense the sequence. With our approach, we gain explainability with natural language that comes easily for human interpretation and textual summaries of the videos. An ablation study that focuses on modality and data compression shows that leveraging text modality only effectively reduces input data processing while retaining comparable results.
    摘要 视频摘要仍然是计算机视觉领域的大型挑战,主要是因为要摘要的视频尺寸过于大。我们提出了一种高效的语言只视频摘要器,可以实现与高数据效率相当的竞争性准确率。通过采用零shot方法获取的文本描述,我们训练了一个语言转换器模型,并忽略图像表示。这种方法allow us来对表示序列进行筛选,并将其压缩。通过我们的方法,我们可以获得自然语言的解释,这是人类可读的文本摘要。我们的ablation研究表明,只使用文本模式可以有效地减少输入数据处理,同时保持相对比较的结果。

(Deployed Application) Promoting Research Collaboration with Open Data Driven Team Recommendation in Response to Call for Proposals

  • paper_url: http://arxiv.org/abs/2309.09404
  • repo_url: None
  • paper_authors: Siva Likitha Valluru, Biplav Srivastava, Sai Teja Paladi, Siwen Yan, Sriraam Natarajan
  • for: The paper aims to recommend teams for collaborative research opportunities in response to funding agencies’ calls for proposals.
  • methods: The system uses various AI methods, including skill extraction from open data and taxonomies, to match demand and supply and create balanced teams that maximize goodness.
  • results: The system was validated through quantitative and qualitative evaluations, showing that it recommends smaller but higher-quality teams, and was found useful and relevant by users in a large-scale user study.
    Abstract Building teams and promoting collaboration are two very common business activities. An example of these are seen in the TeamingForFunding problem, where research institutions and researchers are interested to identify collaborative opportunities when applying to funding agencies in response to latter's calls for proposals. We describe a novel system to recommend teams using a variety of AI methods, such that (1) each team achieves the highest possible skill coverage that is demanded by the opportunity, and (2) the workload of distributing the opportunities is balanced amongst the candidate members. We address these questions by extracting skills latent in open data of proposal calls (demand) and researcher profiles (supply), normalizing them using taxonomies, and creating efficient algorithms that match demand to supply. We create teams to maximize goodness along a novel metric balancing short- and long-term objectives. We validate the success of our algorithms (1) quantitatively, by evaluating the recommended teams using a goodness score and find that more informed methods lead to recommendations of smaller number of teams but higher goodness, and (2) qualitatively, by conducting a large-scale user study at a college-wide level, and demonstrate that users overall found the tool very useful and relevant. Lastly, we evaluate our system in two diverse settings in US and India (of researchers and proposal calls) to establish generality of our approach, and deploy it at a major US university for routine use.
    摘要 建立团队和促进协作是现代企业活动的常见任务。一个例子是团队为融资机构申请资金的问题,研究机构和研究人员希望通过协作机会来应对后者的征求提案。我们描述了一种新的系统,使用人工智能技术来推荐团队,以确保每个团队达到最高可能的技能覆盖度,同时将工作负担平均分配给候选成员。我们解决这些问题 by提取提案征求(需求)和研究人员资料(供应)中的技能,使用分类法归一化它们,并创建高效的匹配算法来匹配需求和供应。我们创建了一个以最大化好处为基础的新指标,以匹配短期和长期目标。我们证明了我们的算法的成功(1)量化方面,通过评估推荐团队的好度分数,发现更加知ledge的方法会导致 fewer teams but higher goodness,和(2)质量方面,通过在大学level进行大规模的用户研究,发现用户们总体认为工具非常有用和相关。最后,我们在美国和印度两地进行了两次不同的设置(研究人员和提案征求)来证明我们的方法的一致性,并在一所主要的美国大学中进行了 Routine 使用。

  • paper_url: http://arxiv.org/abs/2309.09403
  • repo_url: None
  • paper_authors: Ekaterina Khramtsova, Shengyao Zhuang, Mahsa Baktashmotlagh, Xi Wang, Guido Zuccon
  • for: 本研究 targets the problem of choosing the most suitable dense retrieval model for searching on a new collection without any labeled data, i.e. in a zero-shot setting.
  • methods: 作者们使用了 Computer Vision 和机器学习领域的最新方法来评估无监督性能,但这些方法不适用于选择高性能的精度检索器在 zero-shot 设置下。
  • results: 经验表明,现有的方法不能有效地选择精度检索器在 zero-shot 设置下,这是一个重要的新问题,解决这个问题可以促进精度检索的普遍应用。Note: The paper is written in English, so the Simplified Chinese translation may not be exact.
    Abstract We propose the new problem of choosing which dense retrieval model to use when searching on a new collection for which no labels are available, i.e. in a zero-shot setting. Many dense retrieval models are readily available. Each model however is characterized by very differing search effectiveness -- not just on the test portion of the datasets in which the dense representations have been learned but, importantly, also across different datasets for which data was not used to learn the dense representations. This is because dense retrievers typically require training on a large amount of labeled data to achieve satisfactory search effectiveness in a specific dataset or domain. Moreover, effectiveness gains obtained by dense retrievers on datasets for which they are able to observe labels during training, do not necessarily generalise to datasets that have not been observed during training. This is however a hard problem: through empirical experimentation we show that methods inspired by recent work in unsupervised performance evaluation with the presence of domain shift in the area of computer vision and machine learning are not effective for choosing highly performing dense retrievers in our setup. The availability of reliable methods for the selection of dense retrieval models in zero-shot settings that do not require the collection of labels for evaluation would allow to streamline the widespread adoption of dense retrieval. This is therefore an important new problem we believe the information retrieval community should consider. Implementation of methods, along with raw result files and analysis scripts are made publicly available at https://www.github.com/anonymized.
    摘要 我们提出了一个新的问题,即在搜寻新集合时选择哪个紧密搜寻模型,即零批评设定下的问题。许多紧密搜寻模型都是可用的。然而,每个模型都具有不同的搜寻效能,不仅在测试集上,而且也在不同的集合上。这是因为紧密搜寻模型通常需要训练大量的标签数据来在特定集合或领域中 достиungeatisfactory的搜寻效能。此外,在测试集上获得的效能提升不一定能够应用到没有用于训练的集合上。这是一个困难的问题,我们通过实验证明,运用最近在computer vision和机器学习领域的无supervised performance evaluation方法不能够选择高效的紧密搜寻模型。如果有可靠的方法可以在零批评设定下选择紧密搜寻模型,而不需要训练集的标签评估,这将帮助快速推广紧密搜寻。因此,我们认为信息搜寻社群应该考虑这个问题。我们在https://www.github.com/anonymized上公开了实现和原始档案,以及分析脚本。