cs.AI - 2023-11-02

Implicit Chain of Thought Reasoning via Knowledge Distillation

  • paper_url: http://arxiv.org/abs/2311.01460
  • repo_url: https://github.com/da03/implicit_chain_of_thought
  • paper_authors: Yuntian Deng, Kiran Prasad, Roland Fernandez, Paul Smolensky, Vishrav Chaudhary, Stuart Shieber
  • for: 本研究旨在增强语言模型的逻辑能力,通过让模型生成链式思维步骤来解决问题。
  • methods: 本研究使用语言模型的内部隐藏状态进行做implicit reasoning,通过将教师模型在explicit链式思维上受训练的步骤进行压缩,使 reasoning 从”水平”(一旦一旦)变为”垂直”(在不同层次)进行。
  • results: 在多位数乘法任务和小学数学问题集上进行实验,发现这种方法可以解决无需explicit链式思维的任务,速度与无链式思维相当。
    Abstract To augment language models with the ability to reason, researchers usually prompt or finetune them to produce chain of thought reasoning steps before producing the final answer. However, although people use natural language to reason effectively, it may be that LMs could reason more effectively with some intermediate computation that is not in natural language. In this work, we explore an alternative reasoning approach: instead of explicitly producing the chain of thought reasoning steps, we use the language model's internal hidden states to perform implicit reasoning. The implicit reasoning steps are distilled from a teacher model trained on explicit chain-of-thought reasoning, and instead of doing reasoning "horizontally" by producing intermediate words one-by-one, we distill it such that the reasoning happens "vertically" among the hidden states in different layers. We conduct experiments on a multi-digit multiplication task and a grade school math problem dataset and find that this approach enables solving tasks previously not solvable without explicit chain-of-thought, at a speed comparable to no chain-of-thought.
    摘要 通常,为了让语言模型具备理智能力,研究人员通常会提示或调整它们生成链式思维步骤,然后生成答案。然而,人们在自然语言中理智很有效,可能是语言模型可以更加有效地进行一些不是自然语言的中间计算。在这项工作中,我们尝试了一种不同的理智方法:而不是显式地生成链式思维步骤,我们使用语言模型的内部隐藏状态来进行隐藏式理智。我们从一个用于显式链式思维的教师模型中提取了隐藏式理智步骤,并不是在水平方向(一个个)进行理智,而是在不同层次中的隐藏状态之间进行垂直的理智。我们在多位数乘法任务和小学数学题目集合上进行了实验,发现这种方法可以解决无法用显式链式思维解决的任务,并且速度与无链式思维相当。

Conformal Policy Learning for Sensorimotor Control Under Distribution Shifts

  • paper_url: http://arxiv.org/abs/2311.01457
  • repo_url: None
  • paper_authors: Huang Huang, Satvik Sharma, Antonio Loquercio, Anastasios Angelopoulos, Ken Goldberg, Jitendra Malik
  • for: 检测和应对感知器的观测分布变化
  • methods: 使用具有正式统计保证的均值折衔策略,包括使用均值折衔策略进行安全性和速度的选择或直接将策略观测添加到量化和强化学习中
  • results: 虽然在 simulations 和物理 quadruped 上进行了丰富的评估,但是与五个基准相比,OUR 方法表现出了优异的成果,同时也是最简单的基准策略之一。
    Abstract This paper focuses on the problem of detecting and reacting to changes in the distribution of a sensorimotor controller's observables. The key idea is the design of switching policies that can take conformal quantiles as input, which we define as conformal policy learning, that allows robots to detect distribution shifts with formal statistical guarantees. We show how to design such policies by using conformal quantiles to switch between base policies with different characteristics, e.g. safety or speed, or directly augmenting a policy observation with a quantile and training it with reinforcement learning. Theoretically, we show that such policies achieve the formal convergence guarantees in finite time. In addition, we thoroughly evaluate their advantages and limitations on two compelling use cases: simulated autonomous driving and active perception with a physical quadruped. Empirical results demonstrate that our approach outperforms five baselines. It is also the simplest of the baseline strategies besides one ablation. Being easy to use, flexible, and with formal guarantees, our work demonstrates how conformal prediction can be an effective tool for sensorimotor learning under uncertainty.
    摘要

RoboGen: Towards Unleashing Infinite Data for Automated Robot Learning via Generative Simulation

  • paper_url: http://arxiv.org/abs/2311.01455
  • repo_url: None
  • paper_authors: Yufei Wang, Zhou Xian, Feng Chen, Tsun-Hsuan Wang, Yian Wang, Katerina Fragkiadaki, Zackory Erickson, David Held, Chuang Gan
  • for: 本研究旨在将大规模模型中嵌入的广泛和多元知识转移到机器人领域,并实现机器人自动学习多种机器人技能。
  • methods: 本研究使用生成模型来自动生成多样化的任务、景象和训练监督,以扩大机器人技能学习的规模。
  • results: 本研究可以实现自动生成多样化的机器人技能,并且可以在无人指导下实现机器人自动学习。
    Abstract We present RoboGen, a generative robotic agent that automatically learns diverse robotic skills at scale via generative simulation. RoboGen leverages the latest advancements in foundation and generative models. Instead of directly using or adapting these models to produce policies or low-level actions, we advocate for a generative scheme, which uses these models to automatically generate diversified tasks, scenes, and training supervisions, thereby scaling up robotic skill learning with minimal human supervision. Our approach equips a robotic agent with a self-guided propose-generate-learn cycle: the agent first proposes interesting tasks and skills to develop, and then generates corresponding simulation environments by populating pertinent objects and assets with proper spatial configurations. Afterwards, the agent decomposes the proposed high-level task into sub-tasks, selects the optimal learning approach (reinforcement learning, motion planning, or trajectory optimization), generates required training supervision, and then learns policies to acquire the proposed skill. Our work attempts to extract the extensive and versatile knowledge embedded in large-scale models and transfer them to the field of robotics. Our fully generative pipeline can be queried repeatedly, producing an endless stream of skill demonstrations associated with diverse tasks and environments.
    摘要 我们介绍RoboGen,一种生成式机器人代理人,可以自动学习多样化机器人技能的扩展。RoboGen利用了最新的基础和生成模型的进步。而不是直接使用或修改这些模型来生成策略或低级动作,我们提议使用生成方案,使用这些模型自动生成多样化的任务、场景和训练监督。我们的方法使机器人代理人具有自顾探索-生成-学习循环:代理人首先提出有趣的任务和技能要发展,然后生成相应的 simulations环境,通过填充相关的物体和资产,并对其进行适当的空间配置。然后,代理人将高级任务分解成子任务,选择最佳学习方法(强化学习、运动规划或轨迹优化),生成所需的训练监督,然后学习策略以获得提案的技能。我们的工作尝试抽取大规模模型中嵌入的广泛和多样化知识,将其传递到机器人领域。我们的完全生成管道可以重复查询,生成无数个关联有多样化任务和环境的技能示例。

NOIR: Neural Signal Operated Intelligent Robots for Everyday Activities

  • paper_url: http://arxiv.org/abs/2311.01454
  • repo_url: None
  • paper_authors: Ruohan Zhang, Sharon Lee, Minjune Hwang, Ayano Hiranaka, Chen Wang, Wensi Ai, Jin Jie Ryan Tan, Shreya Gupta, Yilun Hao, Gabrael Levine, Ruohan Gao, Anthony Norcia, Li Fei-Fei, Jiajun Wu
  • for: 本研究开发了一个通用的智能脑机器人接口系统(NOIR),让人类通过脑征号控制机器人进行日常活动。
  • methods: 本研究使用电生物学测定(EEG)捕捉人类脑征号,并结合机器人学习算法,让NOIR适应个人用户并预测他们的意图。
  • results: 本研究成功完成了20个日常家居活动,包括cooking、cleaning、personal care和娱乐等,并提高了系统的效能。
    Abstract We present Neural Signal Operated Intelligent Robots (NOIR), a general-purpose, intelligent brain-robot interface system that enables humans to command robots to perform everyday activities through brain signals. Through this interface, humans communicate their intended objects of interest and actions to the robots using electroencephalography (EEG). Our novel system demonstrates success in an expansive array of 20 challenging, everyday household activities, including cooking, cleaning, personal care, and entertainment. The effectiveness of the system is improved by its synergistic integration of robot learning algorithms, allowing for NOIR to adapt to individual users and predict their intentions. Our work enhances the way humans interact with robots, replacing traditional channels of interaction with direct, neural communication. Project website: https://noir-corl.github.io/.
    摘要 我们现在推介Neural Signal Operated Intelligent Robots(NOIR),一个通用的智能大脑机器人接口系统,允许人类通过脑信号控制机器人完成日常活动。通过这个接口,人类通过电enzephalography(EEG)传达自己的意图对象和动作到机器人。我们的新系统在20种日常家务中展示了成功,包括厨艺、干净、个人护理和娱乐等。系统的效果通过机器人学习算法的同化,使得NOIR能够适应个人用户和预测他们的意图。我们的工作改善了人类与机器人之间的交互方式,将传统的通信途径替换为直接的 neural 通信。项目网站:https://noir-corl.github.io/.Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China. If you prefer Traditional Chinese, I can provide that as well.

Time Series Anomaly Detection using Diffusion-based Models

  • paper_url: http://arxiv.org/abs/2311.01452
  • repo_url: https://github.com/fbrad/diffusionae
  • paper_authors: Ioana Pintilie, Andrei Manolache, Florin Brad
  • for: 这 paper 探讨了使用 diffusion models 进行多变量时间序列中的异常检测 (AD)。
  • methods: 这 paper 测试了两种基于 diffusion 的模型,并与多个强大的神经网络基准进行比较。它们还扩展了 PA%K 协议,通过计算一个不依赖检测阈值和 K 的正确检测点的 ROCK-AUC 指标。
  • results: 这 paper 的模型在synthetic datasets 上表现出色,并在实际 datasets 上与基准集成比较,illustrating diffusion-based methods 的潜在用于 AD 中。
    Abstract Diffusion models have been recently used for anomaly detection (AD) in images. In this paper we investigate whether they can also be leveraged for AD on multivariate time series (MTS). We test two diffusion-based models and compare them to several strong neural baselines. We also extend the PA%K protocol, by computing a ROCK-AUC metric, which is agnostic to both the detection threshold and the ratio K of correctly detected points. Our models outperform the baselines on synthetic datasets and are competitive on real-world datasets, illustrating the potential of diffusion-based methods for AD in multivariate time series.
    摘要 Diffusion models 最近在图像异常检测(AD)中使用,本文我们调查是否可以将其应用于多变量时间序列(MTS)上的异常检测。我们测试了两种扩散模型,并与一些强大的神经网络基线进行比较。我们还扩展了PA%K协议,计算一个不受检测阈值和K正确检测点的比率的ROCK-AUC指标。我们的模型在 sintetic 数据集上表现出色,并在实际数据集上与基线集成比较,这表明扩散基本方法在 MTS 上的异常检测具有潜在的潜力。

DreamSmooth: Improving Model-based Reinforcement Learning via Reward Smoothing

  • paper_url: http://arxiv.org/abs/2311.01450
  • repo_url: None
  • paper_authors: Vint Lee, Pieter Abbeel, Youngwoon Lee
  • for: 这篇论文是关于Model-based reinforcement learning(MBRL)的研究,旨在通过生成假象轨迹来计划行为,学习复杂的行为。
  • methods: 这篇论文提出了一种简单 yet effective的奖金平滑方法,叫做DreamSmooth,它通过预测短时间内的奖金来训练MBRL算法,而不是固定时间内的奖金。
  • results: 经验表明,DreamSmooth可以在长期间遇到罕见奖金的任务上达到最佳性能,包括样本效率和最终性能,而不失去常见的benchmark测试。
    Abstract Model-based reinforcement learning (MBRL) has gained much attention for its ability to learn complex behaviors in a sample-efficient way: planning actions by generating imaginary trajectories with predicted rewards. Despite its success, we found that surprisingly, reward prediction is often a bottleneck of MBRL, especially for sparse rewards that are challenging (or even ambiguous) to predict. Motivated by the intuition that humans can learn from rough reward estimates, we propose a simple yet effective reward smoothing approach, DreamSmooth, which learns to predict a temporally-smoothed reward, instead of the exact reward at the given timestep. We empirically show that DreamSmooth achieves state-of-the-art performance on long-horizon sparse-reward tasks both in sample efficiency and final performance without losing performance on common benchmarks, such as Deepmind Control Suite and Atari benchmarks.
    摘要

Distilling Out-of-Distribution Robustness from Vision-Language Foundation Models

  • paper_url: http://arxiv.org/abs/2311.01441
  • repo_url: https://github.com/andyz245/discreteadversarialdistillation
  • paper_authors: Andy Zhou, Jindong Wang, Yu-Xiong Wang, Haohan Wang
  • for: 提高视觉模型的鲁棒性(out-of-distribution robustness)
  • methods: 组合知识塑化和数据增强,使用robust teacher生成对抗样本,并使用VQGAN积累对抗样本
  • results: 在不同的学生架构上显示了强大的对抗样本生成和清洁精度提升,而且计算 overhead 相对较少,可以轻松地与其他数据增强技术结合使用
    Abstract We propose a conceptually simple and lightweight framework for improving the robustness of vision models through the combination of knowledge distillation and data augmentation. We address the conjecture that larger models do not make for better teachers by showing strong gains in out-of-distribution robustness when distilling from pretrained foundation models. Following this finding, we propose Discrete Adversarial Distillation (DAD), which leverages a robust teacher to generate adversarial examples and a VQGAN to discretize them, creating more informative samples than standard data augmentation techniques. We provide a theoretical framework for the use of a robust teacher in the knowledge distillation with data augmentation setting and demonstrate strong gains in out-of-distribution robustness and clean accuracy across different student architectures. Notably, our method adds minor computational overhead compared to similar techniques and can be easily combined with other data augmentations for further improvements.
    摘要 我们提出了一种概念简单且轻量级的框架,用于提高视觉模型的鲁棒性通过知识塑化和数据扩展。我们证明了大型模型不一定是优秀的教师,我们通过显示含义更强的外部 robustness 提升。基于这一发现,我们提出了分割对抗塑化(DAD),它利用一个鲁棒的教师生成对抗例子,并使用 VQGAN 精炼它们,创造更有信息的样本。我们提供了在知识塑化和数据扩展设置下使用鲁棒教师的理论框架,并在不同的学生架构上显示了强大的 OUT-OF-distribution 鲁棒性和清晰率。值得注意的是,我们的方法相对于类似技术增加了微量的计算成本,可以轻松地与其他数据扩展技术结合使用,以实现更好的性能。

Tailoring Mixup to Data using Kernel Warping functions

  • paper_url: http://arxiv.org/abs/2311.01434
  • repo_url: https://github.com/ensta-u2is/torch-uncertainty
  • paper_authors: Quentin Bouniot, Pavlo Mozharovskyi, Florence d’Alché-Buc
  • for: 本研究旨在提高深度学习模型的效率和准确性,通过调整数据的 interpolate 方法来实现。
  • methods: 本文提出了一种基于插值的数据采样方法,通过调整插值系数的分布来实现更好的数据混合。
  • results: 经过广泛的 classification 和 regression 任务实验, authors 发现,使用该方法可以提高模型的性能和准确性,同时保持模型的多样性。
    Abstract Data augmentation is an essential building block for learning efficient deep learning models. Among all augmentation techniques proposed so far, linear interpolation of training data points, also called mixup, has found to be effective for a large panel of applications. While the majority of works have focused on selecting the right points to mix, or applying complex non-linear interpolation, we are interested in mixing similar points more frequently and strongly than less similar ones. To this end, we propose to dynamically change the underlying distribution of interpolation coefficients through warping functions, depending on the similarity between data points to combine. We define an efficient and flexible framework to do so without losing in diversity. We provide extensive experiments for classification and regression tasks, showing that our proposed method improves both performance and calibration of models. Code available in https://github.com/ENSTA-U2IS/torch-uncertainty
    摘要 “数据扩充是深度学习模型学习的重要基础之一。迄今为止所提出的所有扩充技术中,线性 interpolate 训练数据点,也称为 mixup,已经在许多应用场景中证明有效。然而,大多数工作都是关注选择要混合的点,或者应用复杂非线性 interpolate,我们则关注更频繁地混合类似点,并强制混合类似点更强大一些。为实现这一目标,我们提议动态更改混合过程中的基础分布,通过扭曲函数,根据数据点的相似性来确定混合。我们定义了高效可靠的框架,不会失去多样性。我们在分类和回归任务中进行了广泛的实验,显示我们的提议方法可以提高模型的性能和准确性。代码可以在 查看。”Note that the translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need Traditional Chinese, please let me know and I can provide that as well.

Castor: Causal Temporal Regime Structure Learning

  • paper_url: http://arxiv.org/abs/2311.01412
  • repo_url: None
  • paper_authors: Abdellah Rahmani, Pascal Frossard
  • for: 本研究旨在探讨多变量时间序列数据中的 causal 关系,以解决各种领域中的关键问题。
  • methods: CASTOR 方法基于 EM 算法,可以学习不同模式下的 causal 关系,并且可以准确地找到每个模式下的唯一 режи。
  • results: 实验表明,CASTOR 方法在 causal discovery 中具有稳定性和可解释性,并且在 synthetic 数据和实际数据上都表现出色。
    Abstract The task of uncovering causal relationships among multivariate time series data stands as an essential and challenging objective that cuts across a broad array of disciplines ranging from climate science to healthcare. Such data entails linear or non-linear relationships, and usually follow multiple a priori unknown regimes. Existing causal discovery methods can infer summary causal graphs from heterogeneous data with known regimes, but they fall short in comprehensively learning both regimes and the corresponding causal graph. In this paper, we introduce CASTOR, a novel framework designed to learn causal relationships in heterogeneous time series data composed of various regimes, each governed by a distinct causal graph. Through the maximization of a score function via the EM algorithm, CASTOR infers the number of regimes and learns linear or non-linear causal relationships in each regime. We demonstrate the robust convergence properties of CASTOR, specifically highlighting its proficiency in accurately identifying unique regimes. Empirical evidence, garnered from exhaustive synthetic experiments and two real-world benchmarks, confirm CASTOR's superior performance in causal discovery compared to baseline methods. By learning a full temporal causal graph for each regime, CASTOR establishes itself as a distinctly interpretable method for causal discovery in heterogeneous time series.
    摘要 “探索多变量时间序列数据中的 causal 关系是一项非常重要且挑战性强的任务,覆盖了各种领域,从气候科学到医疗。这种数据通常具有线性或非线性关系,并且可能遵循多个未知的模式。现有的 causal 发现方法可以从不同类型的数据中推导摘要的 causal 图,但它们缺乏完整地学习多个模式和相应的 causal 图。在这篇论文中,我们提出了 CASTOR,一种新的框架,用于在不同模式下学习时间序列数据中的 causal 关系。通过 Maximize 一个分数函数的 EM 算法,CASTOR 可以推导模式的数量和每个模式中的线性或非线性 causal 关系。我们证明了 CASTOR 的稳定性和可靠性,并且在多种 synthetic 实验和实际应用中证明了它的超越性。通过学习每个模式的全 temporal causal 图,CASTOR 成为一种可解释的 causal 发现方法。”

Analysis of Information Propagation in Ethereum Network Using Combined Graph Attention Network and Reinforcement Learning to Optimize Network Efficiency and Scalability

  • paper_url: http://arxiv.org/abs/2311.01406
  • repo_url: None
  • paper_authors: Stefan Kambiz Behfar, Jon Crowcroft
  • for: 这个研究的目的是分析以太网络中信息传递的动态模式,以提高网络的效率、安全性和扩展性。
  • methods: 这个研究使用图 convolutional neural networks (GCNs) 分析以太网络中信息传递的图结构,并使用 combined graph attention network (GAT) 和 reinforcement learning (RL) 模型优化网络的效率和扩展性。
  • results: 实验评估表明,我们提出的 GAT-RL 模型在大规模以太网络 dataset 上表现出色,可以有效地传递信息 across the network,优化 gas limits for block processing,并提高网络的效率。
    Abstract Blockchain technology has revolutionized the way information is propagated in decentralized networks. Ethereum plays a pivotal role in facilitating smart contracts and decentralized applications. Understanding information propagation dynamics in Ethereum is crucial for ensuring network efficiency, security, and scalability. In this study, we propose an innovative approach that utilizes Graph Convolutional Networks (GCNs) to analyze the information propagation patterns in the Ethereum network. The first phase of our research involves data collection from the Ethereum blockchain, consisting of blocks, transactions, and node degrees. We construct a transaction graph representation using adjacency matrices to capture the node embeddings; while our major contribution is to develop a combined Graph Attention Network (GAT) and Reinforcement Learning (RL) model to optimize the network efficiency and scalability. It learns the best actions to take in various network states, ultimately leading to improved network efficiency, throughput, and optimize gas limits for block processing. In the experimental evaluation, we analyze the performance of our model on a large-scale Ethereum dataset. We investigate effectively aggregating information from neighboring nodes capturing graph structure and updating node embeddings using GCN with the objective of transaction pattern prediction, accounting for varying network loads and number of blocks. Not only we design a gas limit optimization model and provide the algorithm, but also to address scalability, we demonstrate the use and implementation of sparse matrices in GraphConv, GraphSAGE, and GAT. The results indicate that our designed GAT-RL model achieves superior results compared to other GCN models in terms of performance. It effectively propagates information across the network, optimizing gas limits for block processing and improving network efficiency.
    摘要 Blockchain技术已经改变了分布式网络中信息的传播方式。以太币扮演着重要的角色,它使得智能合约和分布式应用得以实现。为了确保网络的效率、安全性和可扩展性,理解以太币网络中信息传播的动态非常重要。在这项研究中,我们提出了一种创新的方法,使用图 convolutional neural networks(GCNs)来分析以太币网络中信息传播的模式。我们的首个阶段是收集以太币链上的数据,包括块、交易和节点度。我们使用邻居矩阵来构造交易图表示,并通过我们的主要贡献—— combining Graph Attention Network(GAT)和强化学习(RL)模型来优化网络效率和可扩展性。这个模型学习在不同的网络状态下,选择最佳的行为,最终导致网络效率的提高,通过缓存限制和块处理的优化。在实验评估中,我们对大规模的以太币数据进行分析,研究如何有效地从邻居节点中收集信息,更新节点嵌入,使用 GCN 进行交易模式预测,考虑不同的网络负载和块数。此外,我们还设计了一个 gas 限制优化模型,并提供算法。为了解决扩展性问题,我们在 GraphConv、GraphSAGE 和 GAT 中使用稀疏矩阵。结果显示,我们设计的 GAT-RL 模型在性能方面取得了更好的结果,能够有效地在网络中传播信息,优化缓存限制和块处理,提高网络效率。

Vision-Language Foundation Models as Effective Robot Imitators

  • paper_url: http://arxiv.org/abs/2311.01378
  • repo_url: None
  • paper_authors: Xinghang Li, Minghuan Liu, Hanbo Zhang, Cunjun Yu, Jie Xu, Hongtao Wu, Chilam Cheang, Ya Jing, Weinan Zhang, Huaping Liu, Hang Li, Tao Kong
  • for: 本研究旨在使用现有的视力语言模型(VLM)进行简单的微调,以解决机器人操作任务。
  • methods: 我们提出了一种简单的视力语言操作框架,名为RoboFlamingo,基于开源的VLM。 RoboFlamingo使用预训练的VLM进行单步视力语言理解,并使用显式策略头来记录Sequential history information。
  • results: 我们的实验结果显示,RoboFlamingo可以在语言控制 datasets 上达到最佳性能,并且在低性能平台上进行开Loop控制。我们的研究还发现了不同预训练VLM的不同表现在操作任务中。我们认为RoboFlamingo可以成为一种cost-effective和易于使用的机器人操作解决方案。
    Abstract Recent progress in vision language foundation models has shown their ability to understand multimodal data and resolve complicated vision language tasks, including robotics manipulation. We seek a straightforward way of making use of existing vision-language models (VLMs) with simple fine-tuning on robotics data. To this end, we derive a simple and novel vision-language manipulation framework, dubbed RoboFlamingo, built upon the open-source VLMs, OpenFlamingo. Unlike prior works, RoboFlamingo utilizes pre-trained VLMs for single-step vision-language comprehension, models sequential history information with an explicit policy head, and is slightly fine-tuned by imitation learning only on language-conditioned manipulation datasets. Such a decomposition provides RoboFlamingo the flexibility for open-loop control and deployment on low-performance platforms. By exceeding the state-of-the-art performance with a large margin on the tested benchmark, we show RoboFlamingo can be an effective and competitive alternative to adapt VLMs to robot control. Our extensive experimental results also reveal several interesting conclusions regarding the behavior of different pre-trained VLMs on manipulation tasks. We believe RoboFlamingo has the potential to be a cost-effective and easy-to-use solution for robotics manipulation, empowering everyone with the ability to fine-tune their own robotics policy.
    摘要 Translated into Simplified Chinese:近来的视觉语言基础模型进步,表明它们可以理解多模态数据,解决复杂的视觉语言任务,包括机器人控制。我们寻找一种简单、直观地使用现有的视觉语言模型(VLM),并在机器人数据上进行简单的微调。为此,我们提出了一个简单而新的视觉语言控制框架,名为RoboFlamingo,基于开源的VLM,OpenFlamingo。与前作不同,RoboFlamingo使用预训练的VLM进行单步视觉语言理解,模型序列历史信息使用显式策略头,并通过仅在语言条件 manipulate 数据上进行微调学习。这种分解提供了 RoboFlamingo 对于开loop控制和低性能平台部署的灵活性。我们通过在测试 benchmark 上以大幅度超越状态艺术表现,显示 RoboFlamingo 可以作为适用 VLM 到机器人控制的有效和竞争力强的解决方案。我们的广泛的实验结果还揭示了不同预训练 VLM 在 manipulate 任务上的行为有趣的结论。我们认为 RoboFlamingo 具有成本效果和易用的特点,可以让每个人通过微调自己的机器人策略来掌控机器人。

Recognize Any Regions

  • paper_url: http://arxiv.org/abs/2311.01373
  • repo_url: https://github.com/Surrey-UPLab/Recognize-Any-Regions
  • paper_authors: Haosen Yang, Chuofan Ma, Bin Wen, Yi Jiang, Zehuan Yuan, Xiatian Zhu
  • for: 本研究旨在解决计算机视觉中开放世界对象检测中个体区域或块的 semantics 问题,即在不受限制的图像中理解每个区域或块的 semantics。
  • methods: 该研究基于现有的图像视语(ViL)基础模型,如 CLIP,并将其用于开放世界对象检测。研究者们使用了各种方法,包括对region-label对的预训练和对检测模型的输出的图像级别表示的对接。
  • results: 研究者们提出了一种新的、通用的和高效的区域认可架构,名为RegionSpot,可以将位置意识的本地化知识与图像级别的semantics相结合。该架构可以在开放世界对象检测中提高性能,同时减少计算成本。例如,在300万个数据集上训练,只需一天内使用8个V100 GPU,并且与GLIP相比,提高了6.5%的mean average precision(mAP),对更难和罕见的类型的提高更大的14.8%。
    Abstract Understanding the semantics of individual regions or patches within unconstrained images, such as in open-world object detection, represents a critical yet challenging task in computer vision. Building on the success of powerful image-level vision-language (ViL) foundation models like CLIP, recent efforts have sought to harness their capabilities by either training a contrastive model from scratch with an extensive collection of region-label pairs or aligning the outputs of a detection model with image-level representations of region proposals. Despite notable progress, these approaches are plagued by computationally intensive training requirements, susceptibility to data noise, and deficiency in contextual information. To address these limitations, we explore the synergistic potential of off-the-shelf foundation models, leveraging their respective strengths in localization and semantics. We introduce a novel, generic, and efficient region recognition architecture, named RegionSpot, designed to integrate position-aware localization knowledge from a localization foundation model (e.g., SAM) with semantic information extracted from a ViL model (e.g., CLIP). To fully exploit pretrained knowledge while minimizing training overhead, we keep both foundation models frozen, focusing optimization efforts solely on a lightweight attention-based knowledge integration module. Through extensive experiments in the context of open-world object recognition, our RegionSpot demonstrates significant performance improvements over prior alternatives, while also providing substantial computational savings. For instance, training our model with 3 million data in a single day using 8 V100 GPUs. Our model outperforms GLIP by 6.5 % in mean average precision (mAP), with an even larger margin by 14.8 % for more challenging and rare categories.
    摘要 本文描述了一种新的、通用、高效的区域识别架构,即RegionSpot,用于在开放世界 объек特点检测中理解图像中的各个区域或区域提案。我们利用了一个本地化基本模型(如SAM)和一个视力语言基本模型(如CLIP)的各自优势,通过一种简单的注意力机制来结合这两者的知识。我们不会更新基本模型,只是对注意力机制进行优化。我们通过对300万个数据进行训练,使用8个V100 GPU,并证明了我们的模型在开放世界 объек特点检测中表现出了显著的改进,同时也减少了计算成本。例如,我们的模型在GLIP模型的6.5%的mean average precision(mAP)上表现出了6.5%的提升,而在更为困难和罕见的类别上则是14.8%的提升。

Simplicial Models for the Epistemic Logic of Faulty Agents

  • paper_url: http://arxiv.org/abs/2311.01351
  • repo_url: None
  • paper_authors: Eric Goubault, Roman Kniazev, Jeremy Ledent, Sergio Rajsbaum
  • for: 这篇论文研究了基于高维结构 simplicial complexes 的 simplicial models,并探讨了这些模型在不同设计选择下的性质。
  • methods: 作者使用了不同的设计选择来定义不纯的 simplicial models,并axiomatized了这些模型的逻辑。
  • results: 作者通过应用于分布式计算中进程可能在执行系统时崩溃的例子,ILLUSTRATE了这些逻辑的应用。
    Abstract In recent years, several authors have been investigating simplicial models, a model of epistemic logic based on higher-dimensional structures called simplicial complexes. In the original formulation, simplicial models were always assumed to be pure, meaning that all worlds have the same dimension. This is equivalent to the standard S5n semantics of epistemic logic, based on Kripke models. By removing the assumption that models must be pure, we can go beyond the usual Kripke semantics and study epistemic logics where the number of agents participating in a world can vary. This approach has been developed in a number of papers, with applications in fault-tolerant distributed computing where processes may crash during the execution of a system. A difficulty that arises is that subtle design choices in the definition of impure simplicial models can result in different axioms of the resulting logic. In this paper, we classify those design choices systematically, and axiomatize the corresponding logics. We illustrate them via distributed computing examples of synchronous systems where processes may crash.
    摘要 近年来,一些作者已经在调查 simplicial 模型,一种基于高维结构called simplicial complexes的epistemic logic模型。在原始表述中,simplicial模型总是被认为是纯净的,意味着所有世界都有相同的维度。这与标准的 S5n semantics of epistemic logic相等,基于 Kripke 模型。由 removing the assumption that models must be pure,我们可以超越常见的 Kripke semantics 和研究 epistemic logics 中参与世界数量的变化。这种方法在一些文章中被发展,并应用于容易受到进程崩溃的分布式计算系统中。然而,由于 subtle design choices 的定义而导致不同的论据。在这篇文章中,我们系统地分类这些设计选择,并对它们的论据进行 axiomatization。我们通过分布式计算的同步系统示例来说明它们。

Like an Open Book? Read Neural Network Architecture with Simple Power Analysis on 32-bit Microcontrollers

  • paper_url: http://arxiv.org/abs/2311.01344
  • repo_url: None
  • paper_authors: Raphael Joud, Pierre-Alain Moellic, Simon Pontie, Jean-Baptiste Rigaud
  • for: 本研究旨在探讨如何通过EM侧通道诊断深度学习模型的架构信息,以便对相关的AI系统进行安全保护。
  • methods: 本研究使用了 тео리тиче知识和ARM CMSIS-NN库的分析,提出了一种基于简单模式识别分析的EXTRACTION方法,用于从EM侧通道诊断传感器上提取深度学习模型的架构信息。
  • results: 研究发现,即使面临一些困难的特例,EXTRACTION方法仍可以成功地提取深度学习模型的架构信息,而且攻击复杂度较低。研究也指出了相关的安全保护措施需要适应强大的内存和延迟要求。
    Abstract Model extraction is a growing concern for the security of AI systems. For deep neural network models, the architecture is the most important information an adversary aims to recover. Being a sequence of repeated computation blocks, neural network models deployed on edge-devices will generate distinctive side-channel leakages. The latter can be exploited to extract critical information when targeted platforms are physically accessible. By combining theoretical knowledge about deep learning practices and analysis of a widespread implementation library (ARM CMSIS-NN), our purpose is to answer this critical question: how far can we extract architecture information by simply examining an EM side-channel trace? For the first time, we propose an extraction methodology for traditional MLP and CNN models running on a high-end 32-bit microcontroller (Cortex-M7) that relies only on simple pattern recognition analysis. Despite few challenging cases, we claim that, contrary to parameters extraction, the complexity of the attack is relatively low and we highlight the urgent need for practicable protections that could fit the strong memory and latency requirements of such platforms.
    摘要 <>模型提取是人工智能系统安全的一个快速增长的问题。深度神经网络模型的架构是恶意者最重要的目标信息。作为一个序列的重复计算块,深度神经网络模型在边缘设备上部署时会生成特征的侧annel泄露。这些侧annel泄露可以被利用来提取关键信息,当目标平台可以访问时。通过结合深度学习实践知识和ARM CMSIS-NN实现库的分析,我们的目的是回答这个关键问题:可以通过仅仅分析EM侧annel跟踪来提取架构信息多远?我们首次提出了一种EXTRACTION方法,可以在高端32位微控制器(Cortex-M7)上运行传统的MLP和CNN模型,不需要复杂的算法或特殊的硬件。虽有一些困难的案例,但我们宣称,相比于参数提取,这种攻击的复杂性相对较低。我们高亮了对这些平台的实用防护措施的急需,以满足它们的强大内存和延迟需求。

Offline Imitation from Observation via Primal Wasserstein State Occupancy Matching

  • paper_url: http://arxiv.org/abs/2311.01331
  • repo_url: https://github.com/kaiyan289/pw-dice
  • paper_authors: Kai Yan, Alexander G. Schwing, Yu-xiong Wang
  • For: 本研究旨在降低在实际场景中的环境侵入成本,以及专家示范行为不一定可用。为此,Offline Learning from Observations (LfO) 得到了广泛的研究,旨在使用仅专家状态和任务无关的非专家状态-动作对组成一个问题解决方案。* Methods: 现有的 DIstribution Correction Estimation (DICE) 方法尝试将学习者和专家政策之间的状态占用差异降到最小。然而,这些方法受到 $f$- divergence(KL 和 $\chi^2$)或 Wasserstein 距离的限制,后者限制了在 Wasserstein 基于解决方案中使用的下面距离的metric。为了解决这个问题,我们提出了 Primal Wasserstein DICE(PW-DICE),它将学习者和专家状态占用之间的 primal Wasserstein 距离降到最小,并使用一个对比学习的距离作为下面距离的metric。* Results: 我们理论上证明了 PW-DICE 框架是 SMODICE 的一种总结,并将 $f$- divergence 和 Wasserstein 最小化联系起来。实验结果表明,PW-DICE 在多个测试床上超越了多种状态之前的方法。
    Abstract In real-world scenarios, arbitrary interactions with the environment can often be costly, and actions of expert demonstrations are not always available. To reduce the need for both, Offline Learning from Observations (LfO) is extensively studied, where the agent learns to solve a task with only expert states and \textit{task-agnostic} non-expert state-action pairs. The state-of-the-art DIstribution Correction Estimation (DICE) methods minimize the state occupancy divergence between the learner and expert policies. However, they are limited to either $f$-divergences (KL and $\chi^2$) or Wasserstein distance with Rubinstein duality, the latter of which constrains the underlying distance metric crucial to the performance of Wasserstein-based solutions. To address this problem, we propose Primal Wasserstein DICE (PW-DICE), which minimizes the primal Wasserstein distance between the expert and learner state occupancies with a pessimistic regularizer and leverages a contrastively learned distance as the underlying metric for the Wasserstein distance. Theoretically, we prove that our framework is a generalization of the state-of-the-art, SMODICE, and unifies $f$-divergence and Wasserstein minimization. Empirically, we find that PW-DICE improves upon several state-of-the-art methods on multiple testbeds.
    摘要 在实际场景中,不可预知的环境交互可能会很昂贵,而专家示范的动作不总是可获得。为了减少这两种成本,半线性学习从观察(LfO)得到了广泛的研究,其中agent learns to solve a task with only expert states and task-agnostic non-expert state-action pairs。当前的DIstribution Correction Estimation(DICE)方法 minimum the state occupancy divergence between the learner and expert policies,但它们受到 $f$-divergence(KL和$\chi^2)或 Wasserstein distance with Rubinstein duality的限制,后者对于 Wasserstein-based solutions的性能具有关键的下面距离度量。为解决这个问题,我们提出了 Primal Wasserstein DICE(PW-DICE),它 minimum the primal Wasserstein distance between the expert and learner state occupancies with a pessimistic regularizer,并使用一个 contrastively learned distance as the underlying metric for the Wasserstein distance。理论上,我们证明了我们的框架是 state-of-the-art SMODICE 的一般化,并将 $f$-divergence和Wasserstein minimization unify。实际上,我们发现 PW-DICE 在多个测试床上比多种 state-of-the-art 方法表现更好。

A Simple Solution for Offline Imitation from Observations and Examples with Possibly Incomplete Trajectories

  • paper_url: http://arxiv.org/abs/2311.01329
  • repo_url: https://github.com/kaiyan289/tailo
  • paper_authors: Kai Yan, Alexander G. Schwing, Yu-Xiong Wang
  • for: 解决在缺乏专家动作的情况下,从观察中学习模式动作的问题。
  • methods: 使用权重行为做假的方法,并使用一个识别器来识别专家状态。
  • results: 在多个测试平台上,TAILO表现更加稳定和有效,特别是在 incomplete trajectories 的情况下。
    Abstract Offline imitation from observations aims to solve MDPs where only task-specific expert states and task-agnostic non-expert state-action pairs are available. Offline imitation is useful in real-world scenarios where arbitrary interactions are costly and expert actions are unavailable. The state-of-the-art "DIstribution Correction Estimation" (DICE) methods minimize divergence of state occupancy between expert and learner policies and retrieve a policy with weighted behavior cloning; however, their results are unstable when learning from incomplete trajectories, due to a non-robust optimization in the dual domain. To address the issue, in this paper, we propose Trajectory-Aware Imitation Learning from Observations (TAILO). TAILO uses a discounted sum along the future trajectory as the weight for weighted behavior cloning. The terms for the sum are scaled by the output of a discriminator, which aims to identify expert states. Despite simplicity, TAILO works well if there exist trajectories or segments of expert behavior in the task-agnostic data, a common assumption in prior work. In experiments across multiple testbeds, we find TAILO to be more robust and effective, particularly with incomplete trajectories.
    摘要 <>translate text into Simplified ChineseOffline imitation from observations aims to solve MDPs where only task-specific expert states and task-agnostic non-expert state-action pairs are available. Offline imitation is useful in real-world scenarios where arbitrary interactions are costly and expert actions are unavailable. The state-of-the-art "DIstribution Correction Estimation" (DICE) methods minimize divergence of state occupancy between expert and learner policies and retrieve a policy with weighted behavior cloning; however, their results are unstable when learning from incomplete trajectories, due to a non-robust optimization in the dual domain. To address the issue, in this paper, we propose Trajectory-Aware Imitation Learning from Observations (TAILO). TAILO uses a discounted sum along the future trajectory as the weight for weighted behavior cloning. The terms for the sum are scaled by the output of a discriminator, which aims to identify expert states. Despite simplicity, TAILO works well if there exist trajectories or segments of expert behavior in the task-agnostic data, a common assumption in prior work. In experiments across multiple testbeds, we find TAILO to be more robust and effective, particularly with incomplete trajectories.中文简体版:<>将文本翻译成中文简体版从观察中进行假扮,目标是解决MDPs,只有任务专家状态和任务非专家动作对组合可用。假扮在实际场景中很有用,因为专家动作是不可预测的。现状的“分布式修正估计”(DICE)方法减少专家和学习政策之间状态占据的差异,并提取一个政策,但是它们在学习部分轨迹时不稳定,这是因为附加的双域优化不稳定。为解决这个问题,在这篇论文中,我们提议使用轨迹意识的假扮学习(TAILO)。TAILO使用未来轨迹的折扣和学习器输出来权重假扮行为。在实验中,我们发现TAILO比DICE更加稳定和有效,特别是在部分轨迹时。

Better Together: Enhancing Generative Knowledge Graph Completion with Language Models and Neighborhood Information

  • paper_url: http://arxiv.org/abs/2311.01326
  • repo_url: https://github.com/screemix/kgc-t5-with-neighbors
  • paper_authors: Alla Chepurova, Aydar Bulatov, Yuri Kuratov, Mikhail Burtsev
  • for: 本研究旨在解决现实世界知识图(KG)中的不完teness问题,提高知识图完teness。
  • methods: 本研究使用语音模型(如T5和KGT5)来预测尾节点,并包含节点邻居信息以改进知识图完teness方法。
  • results: 研究表明,包含节点邻居信息可以提高知识图完teness方法的性能,在 inductive 和 transductive Wikidata 子集上都超过 KGT5 和传统知识图完teness方法。 Additionally, the study shows the importance of neighborhood information in model prediction and points out a way to significantly improve KGC through more effective neighborhood selection.
    Abstract Real-world Knowledge Graphs (KGs) often suffer from incompleteness, which limits their potential performance. Knowledge Graph Completion (KGC) techniques aim to address this issue. However, traditional KGC methods are computationally intensive and impractical for large-scale KGs, necessitating the learning of dense node embeddings and computing pairwise distances. Generative transformer-based language models (e.g., T5 and recent KGT5) offer a promising solution as they can predict the tail nodes directly. In this study, we propose to include node neighborhoods as additional information to improve KGC methods based on language models. We examine the effects of this imputation and show that, on both inductive and transductive Wikidata subsets, our method outperforms KGT5 and conventional KGC approaches. We also provide an extensive analysis of the impact of neighborhood on model prediction and show its importance. Furthermore, we point the way to significantly improve KGC through more effective neighborhood selection.
    摘要

Scattering Vision Transformer: Spectral Mixing Matters

  • paper_url: http://arxiv.org/abs/2311.01310
  • repo_url: None
  • paper_authors: Badri N. Patro, Vijay Srinivas Agneeswaran
  • for: 这个论文主要针对 Computer Vision 领域中的图像分类、实例分割和对象检测任务,尝试解决注意力复杂性和图像细节捕捉问题。
  • methods: 该论文提出了一种新的方法 called Scattering Vision Transformer (SVT),它包括一个spectral scattering网络,可以帮助捕捉图像细节。SVT还引入了一种特殊的 spectral gating 网络,使得计算复杂度得到了降低。
  • results: 根据论文的实验结果,SVT在 ImageNet 数据集上达到了 state-of-the-art 性能,与 LiTv2 和 iFormer 相比,SVT-H-S 达到了 84.2% 的 top-1 准确率,SVT-H-B 达到了 85.2%(基本版本中的 state-of-the-art),SVT-H-L 达到了 85.7%(大版本中的 state-of-the-art)。SVT 还在其他视觉任务中表现出色,比如实例分割任务。此外,SVT 在标准的 CIFAR10、CIFAR100、Oxford Flower 和 Stanford Car 数据集上也表现出优异的转移学习能力。
    Abstract Vision transformers have gained significant attention and achieved state-of-the-art performance in various computer vision tasks, including image classification, instance segmentation, and object detection. However, challenges remain in addressing attention complexity and effectively capturing fine-grained information within images. Existing solutions often resort to down-sampling operations, such as pooling, to reduce computational cost. Unfortunately, such operations are non-invertible and can result in information loss. In this paper, we present a novel approach called Scattering Vision Transformer (SVT) to tackle these challenges. SVT incorporates a spectrally scattering network that enables the capture of intricate image details. SVT overcomes the invertibility issue associated with down-sampling operations by separating low-frequency and high-frequency components. Furthermore, SVT introduces a unique spectral gating network utilizing Einstein multiplication for token and channel mixing, effectively reducing complexity. We show that SVT achieves state-of-the-art performance on the ImageNet dataset with a significant reduction in a number of parameters and FLOPS. SVT shows 2\% improvement over LiTv2 and iFormer. SVT-H-S reaches 84.2\% top-1 accuracy, while SVT-H-B reaches 85.2\% (state-of-art for base versions) and SVT-H-L reaches 85.7\% (again state-of-art for large versions). SVT also shows comparable results in other vision tasks such as instance segmentation. SVT also outperforms other transformers in transfer learning on standard datasets such as CIFAR10, CIFAR100, Oxford Flower, and Stanford Car datasets. The project page is available on this webpage.\url{https://badripatro.github.io/svt/}.
    摘要 “vision transformer”已经受到了广泛关注,并在不同的计算机视觉任务中取得了前一等的性能,包括图像分类、实例分类和物体检测。然而,在处理注意力复杂性和细节资讯方面仍然存在挑战。现有的解决方案通常是透过下推运算,例如滤波器,以减少计算成本。然而,这些运算是不可逆的,可能会导致资讯损失。在本文中,我们提出了一个新的方法 called Scattering Vision Transformer (SVT),以解决这些挑战。SVT包括一个spectrally scattering网络,可以对图像中的细节进行捕捉。SVT绕过下推运算所带来的倒数易变性问题,并且将低频和高频 ком成分分离。此外,SVT引入了单一的 спектраль闸道网络,使用爱因斯坦 multiplication 进行对token和通道的混合,实现了缩减复杂性。我们展示了SVT在ImageNet dataset上实现了前一等的性能,并且显著减少了总参数和FLOPS数。SVT与LiTv2和iFormer相比,提高了2%的性能。SVT-H-S实现了84.2%的顶部一致率,SVT-H-B实现了85.2%的顶部一致率(大版本的最佳性能),SVT-H-L实现了85.7%的顶部一致率(大版本的最佳性能)。SVT还在其他视觉任务中展示了相似的结果,例如实例分类。此外,SVT在标准的dataset上,如CIFAR10、CIFAR100、牛津花园和斯坦福汽车dataset上也展示了比较好的结果。SVT的专案页面可以在以下网址中找到:\url{https://badripatro.github.io/svt/}.

AWEQ: Post-Training Quantization with Activation-Weight Equalization for Large Language Models

  • paper_url: http://arxiv.org/abs/2311.01305
  • repo_url: None
  • paper_authors: Baisong Li, Xingwang Wang, Haixiao Xu
  • for: 提高大型语言模型(LLMs)的计算和存储成本,以提高模型的可扩展性和可靠性。
  • methods: 提出了一种无需额外训练的post-training方法,通过通道平衡来弥合权重和活动量的量化难度差异,从而实现模型的最佳性能。
  • results: 对各种流行的模型(如LLaMA和OPT)进行了广泛的实验,证明了AWEQ方法在post-training量化中的优越性,并且在8位权重和活动(W8A8)量化中达到了最高性能。
    Abstract Large language models(LLMs) exhibit excellent performance across a variety of tasks, but they come with significant computational and storage costs. Quantizing these models is an effective way to alleviate this issue. However, existing methods struggle to strike a balance between model accuracy and hardware efficiency. This is where we introduce AWEQ, a post-training method that requires no additional training overhead. AWEQ excels in both ultra-low-bit quantization and 8-bit weight and activation (W8A8) quantization. There is an observation that weight quantization is less challenging than activation quantization. AWEQ transfers the difficulty of activation quantization to weights using channel equalization, achieving a balance between the quantization difficulties of both, and thereby maximizing performance. We have further refined the equalization method to mitigate quantization bias error, ensuring the robustness of the model. Extensive experiments on popular models such as LLaMA and OPT demonstrate that AWEQ outperforms all existing post-training quantization methods for large models.
    摘要 大型语言模型(LLM)具有多种任务的出色表现,但它们带来了重要的计算和储存成本。量化这些模型是一种有效的方法来解决这个问题。然而,现有的方法难以寻求模型精度和硬件效率之间的平衡。这是我们引入AWEQ,一种不需要额外训练成本的后训练方法。AWEQ在超低位数量化和8位构成元素(W8A8)量化中表现出色。对于模型的量化难度,权重量化比 activation 量化更容易。AWEQ将活动量化问题转移到权重中,实现了两者之间的平衡,因此提高了性能。我们进一步改进了均衡方法,以减少量化偏误错误,保证模型的稳定性。实验结果显示,AWEQ在各种流行的模型,如LLaMA和OPT上都大大超越了现有的后训练量化方法。

TRIALSCOPE A Unifying Causal Framework for Scaling Real-World Evidence Generation with Biomedical Language Models

  • paper_url: http://arxiv.org/abs/2311.01301
  • repo_url: None
  • paper_authors: Javier González, Cliff Wong, Zelalem Gero, Jass Bagga, Risa Ueno, Isabel Chien, Eduard Orakvin, Emre Kiciman, Aditya Nori, Roshanthi Weerasinghe, Rom S. Leidner, Brian Piening, Tristan Naumann, Carlo Bifulco, Hoifung Poon
  • for: 该论文旨在 оптимизиATION OF HEALTHCARE DELIVERY AND ACCELERATING BIOMEDICAL DISCOVERY 通过利用实际数据,以提高医疗服务质量和生物医学发现。
  • methods: 该论文使用了生物医学语言模型来结构化临床文本,并使用高级概率模型进行噪声除除和替换,同时应用了现代 causal inference 技术来解决常见的干扰因素。
  • results: 该论文通过使用临床试验规范来生成和理解临床假设,并在一个大规模的真实世界数据集上进行了广泛的实验和分析,并得到了高质量的结构化数据和与知名肿瘤试验的相似结果。
    Abstract The rapid digitization of real-world data offers an unprecedented opportunity for optimizing healthcare delivery and accelerating biomedical discovery. In practice, however, such data is most abundantly available in unstructured forms, such as clinical notes in electronic medical records (EMRs), and it is generally plagued by confounders. In this paper, we present TRIALSCOPE, a unifying framework for distilling real-world evidence from population-level observational data. TRIALSCOPE leverages biomedical language models to structure clinical text at scale, employs advanced probabilistic modeling for denoising and imputation, and incorporates state-of-the-art causal inference techniques to combat common confounders. Using clinical trial specification as generic representation, TRIALSCOPE provides a turn-key solution to generate and reason with clinical hypotheses using observational data. In extensive experiments and analyses on a large-scale real-world dataset with over one million cancer patients from a large US healthcare network, we show that TRIALSCOPE can produce high-quality structuring of real-world data and generates comparable results to marquee cancer trials. In addition to facilitating in-silicon clinical trial design and optimization, TRIALSCOPE may be used to empower synthetic controls, pragmatic trials, post-market surveillance, as well as support fine-grained patient-like-me reasoning in precision diagnosis and treatment.
    摘要 随着数字化的迅速进程,现实世界中的数据提供了不可思议的机会,以便优化医疗服务和加速生物医学发现。然而,实际上,这些数据通常存在干扰和干扰因素。在这篇论文中,我们介绍了一种名为TRIALSCOPE的框架,用于从人口水平的观察数据中提取现实世界的证据。TRIALSCOPE利用生物医学语言模型来结构临床文本,在大规模上进行混杂和替换,并应用了当前的 causal inference 技术来战胜常见的干扰因素。使用临床试验规范作为普通表示,TRIALSCOPE提供了一个启用和理解临床假设的全自动解决方案。在对一个大型现实世界数据集(包含 более一百万美国医疗网络中的肿瘤患者)进行了广泛的实验和分析后,我们发现TRIALSCOPE可以生成高质量的现实世界数据结构,并且与知名肿瘤试验的结果相比较。除了促进固态临床试验设计和优化之外,TRIALSCOPE还可以用于强化 synthetic control, Pragmatic trials, post-market surveillance,以及支持精细化的患者如我 reasoning 在精准诊断和治疗方面。

UniFolding: Towards Sample-efficient, Scalable, and Generalizable Robotic Garment Folding

  • paper_url: http://arxiv.org/abs/2311.01267
  • repo_url: https://github.com/xiaoxiaoxh/UniFolding
  • paper_authors: Han Xue, Yutong Li, Wenqiang Xu, Huanyu Li, Dongzhe Zheng, Cewu Lu
  • for: 这 paper 探讨了一种Sample-Efficient, Scalable, and Generalizable Robotic System for Unfolding and Folding Various Garments。
  • methods: 这 paper 使用了提议的 UFONet 神经网络,将 unfolding 和 folding 决策集成到一个单一的策略模型中,可以适应不同的衣物类型和状态。
  • results: 这 paper 测试了两种衣物类型:长袖和短袖衬衣,并对 20 件衣物进行了性能评估,结果表明 UniFolding 系统可以在不同的 texture、shape 和材料下提供高效的 unfolding 和 folding 功能。I hope that helps! Let me know if you have any other questions.
    Abstract This paper explores the development of UniFolding, a sample-efficient, scalable, and generalizable robotic system for unfolding and folding various garments. UniFolding employs the proposed UFONet neural network to integrate unfolding and folding decisions into a single policy model that is adaptable to different garment types and states. The design of UniFolding is based on a garment's partial point cloud, which aids in generalization and reduces sensitivity to variations in texture and shape. The training pipeline prioritizes low-cost, sample-efficient data collection. Training data is collected via a human-centric process with offline and online stages. The offline stage involves human unfolding and folding actions via Virtual Reality, while the online stage utilizes human-in-the-loop learning to fine-tune the model in a real-world setting. The system is tested on two garment types: long-sleeve and short-sleeve shirts. Performance is evaluated on 20 shirts with significant variations in textures, shapes, and materials. More experiments and videos can be found in the supplementary materials and on the website: https://unifolding.robotflow.ai
    摘要 Translated into Simplified Chinese:这篇论文探讨了一种可靠、扩展性强、通用的机器人系统,用于不同类型的衣服的打包和卷起。该系统使用提议的UFONet神经网络,将打包和卷起的决策集成到一个单一的政策模型中,以适应不同的衣服类型和状态。设计基于衣服的部分点云,帮助总体化和降低不同 texture和形状的敏感性。训练管道强调低成本、样本效率的数据采集。训练数据通过人类中心的过程收集,包括在虚拟现实环境中完成人类 unfolding和folding 动作。在线阶段通过人类 loops 学习来细化模型,并在实际环境中进行测试。系统在长袖和短袖上测试了20件衣服,其中具有显著的文本ure、形状和材料的变化。更多实验和视频可以在补充材料和网站:https://unifolding.robotflow.ai 中找到。

Expressive TTS Driven by Natural Language Prompts Using Few Human Annotations

  • paper_url: http://arxiv.org/abs/2311.01260
  • repo_url: None
  • paper_authors: Hanglei Zhang, Yiwei Guo, Sen Liu, Xie Chen, Kai Yu
  • for: 这研究旨在提供一种可控制的 expresive TTS 模型,无需大量的风格标注数据。
  • methods: 该方法使用大型自然语言模型(LLM)将 expresive TTS 转化为一种风格检索任务,通过选择最佳匹配的风格参考语音来控制 TTS 管道 Synthesize 语音。
  • results: 实验结果表明,FS-TTS 可以充分利用 LLM 的 semantics 推理能力,从输入文本或用户定义的风格描述中检索所需的风格。这 führt 到通过 TTS 管道 Synthesize 出的语音与指定的风格高度吻合。
    Abstract Expressive text-to-speech (TTS) aims to synthesize speeches with human-like tones, moods, or even artistic attributes. Recent advancements in expressive TTS empower users with the ability to directly control synthesis style through natural language prompts. However, these methods often require excessive training with a significant amount of style-annotated data, which can be challenging to acquire. Moreover, they may have limited adaptability due to fixed style annotations. In this work, we present FreeStyleTTS (FS-TTS), a controllable expressive TTS model with minimal human annotations. Our approach utilizes a large language model (LLM) to transform expressive TTS into a style retrieval task. The LLM selects the best-matching style references from annotated utterances based on external style prompts, which can be raw input text or natural language style descriptions. The selected reference guides the TTS pipeline to synthesize speeches with the intended style. This innovative approach provides flexible, versatile, and precise style control with minimal human workload. Experiments on a Mandarin storytelling corpus demonstrate FS-TTS's proficiency in leveraging LLM's semantic inference ability to retrieve desired styles from either input text or user-defined descriptions. This results in synthetic speeches that are closely aligned with the specified styles.
    摘要 文本译文:文本调读技术(TTS)目的是实时生成人工语音,具有人类语音的调读风格、情感和艺术性。现代的表达式TTS技术允许用户通过自然语言提示来直接控制合成类型。然而,这些方法通常需要大量的类型标注数据,实现可能困难。此外,它们可能具有固定类型标注的局限性。在这个工作中,我们提出了FreeStyleTTS(FS-TTS),一个可控的表达式TTS模型,仅需少量人工标注。我们的方法利用大型自然语言模型(LLM)将表达式TTS转换为一个类型搜寻任务。LLM选择基于标注utterance的最佳匹配式 referent,并将其用于合成语音的指导。这个创新的方法提供了高度可调、多元化和精确的类型控制,仅需 minimal human workload。实验结果显示,FS-TTS可以充分利用LLM的semantic inference能力,从input text或用户定义的描述中找到所需的类型。这 resulted in 合成语音具有所需的类型。

Formal Methods for Autonomous Systems

  • paper_url: http://arxiv.org/abs/2311.01258
  • repo_url: https://github.com/Aryia-Behroziuan/References
  • paper_authors: Tichakorn Wongpiromsarn, Mahsa Ghasemi, Murat Cubuktepe, Georgios Bakirtzis, Steven Carr, Mustafa O. Karabag, Cyrus Neary, Parham Gohari, Ufuk Topcu
  • for: 本文提供了形式方法在自动化系统领域的应用现状的报告。
  • methods: 本文使用了多种形式方法,包括关闭系统、反应式和概率设定,以验证和生成系统行为的正式保证。
  • results: 本文描述了一些应用形式方法的成果,包括对不确定性的处理、学习使用形式方法的限制、监控系统的设计等。
    Abstract Formal methods refer to rigorous, mathematical approaches to system development and have played a key role in establishing the correctness of safety-critical systems. The main building blocks of formal methods are models and specifications, which are analogous to behaviors and requirements in system design and give us the means to verify and synthesize system behaviors with formal guarantees. This monograph provides a survey of the current state of the art on applications of formal methods in the autonomous systems domain. We consider correct-by-construction synthesis under various formulations, including closed systems, reactive, and probabilistic settings. Beyond synthesizing systems in known environments, we address the concept of uncertainty and bound the behavior of systems that employ learning using formal methods. Further, we examine the synthesis of systems with monitoring, a mitigation technique for ensuring that once a system deviates from expected behavior, it knows a way of returning to normalcy. We also show how to overcome some limitations of formal methods themselves with learning. We conclude with future directions for formal methods in reinforcement learning, uncertainty, privacy, explainability of formal methods, and regulation and certification.
    摘要 Formal methods refer to rigorous, mathematical approaches to system development and have played a key role in establishing the correctness of safety-critical systems. The main building blocks of formal methods are models and specifications, which are analogous to behaviors and requirements in system design and give us the means to verify and synthesize system behaviors with formal guarantees. This monograph provides a survey of the current state of the art on applications of formal methods in the autonomous systems domain. We consider correct-by-construction synthesis under various formulations, including closed systems, reactive, and probabilistic settings. Beyond synthesizing systems in known environments, we address the concept of uncertainty and bound the behavior of systems that employ learning using formal methods. Further, we examine the synthesis of systems with monitoring, a mitigation technique for ensuring that once a system deviates from expected behavior, it knows a way of returning to normalcy. We also show how to overcome some limitations of formal methods themselves with learning. We conclude with future directions for formal methods in reinforcement learning, uncertainty, privacy, explainability of formal methods, and regulation and certification.Note: Please note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.

  • paper_url: http://arxiv.org/abs/2311.01256
  • repo_url: None
  • paper_authors: Sinan Gultekin, Achille Globo, Andrea Zugarini, Marco Ernandes, Leonardo Rigutini
  • for: 这篇论文的目的是评估大自然语言处理器(LLM)和传统方法(如支持向量机)在LexGLUE测试benchmark上的表现,并考虑到了性能(标准指标)以外的因素,如时间、能耗和成本。
  • methods: 这篇论文使用了详细的量化比较,包括训练-验证-测试循环的评估,以及生产阶段和实际应用阶段的评估。
  • results: 结果表明, simplest algorithms 经常可以达到大LLMs的性能,但具有较低的能耗和资源需求。这些结果可能会导致公司在选择机器学习(ML)解决方案时包括额外评估。
    Abstract Most Machine Learning research evaluates the best solutions in terms of performance. However, in the race for the best performing model, many important aspects are often overlooked when, on the contrary, they should be carefully considered. In fact, sometimes the gaps in performance between different approaches are neglectable, whereas factors such as production costs, energy consumption, and carbon footprint must take into consideration. Large Language Models (LLMs) are extensively adopted to address NLP problems in academia and industry. In this work, we present a detailed quantitative comparison of LLM and traditional approaches (e.g. SVM) on the LexGLUE benchmark, which takes into account both performance (standard indices) and alternative metrics such as timing, power consumption and cost, in a word: the carbon-footprint. In our analysis, we considered the prototyping phase (model selection by training-validation-test iterations) and in-production phases separately, since they follow different implementation procedures and also require different resources. The results indicate that very often, the simplest algorithms achieve performance very close to that of large LLMs but with very low power consumption and lower resource demands. The results obtained could suggest companies to include additional evaluations in the choice of Machine Learning (ML) solutions.
    摘要 大多数机器学习研究强调最佳解决方案的性能。然而,在尝试创造最高性能的模型时,有许多重要因素经常被忽略,而这些因素在实际应用中应该仔细考虑。事实上,有时性能之间的差异非常小,而生产成本、能源消耗和碳脚印则应该被考虑。大型自然语言模型(LLM)在学术和产业中广泛应用,以解决自然语言处理(NLP)问题。在这项工作中,我们提供了 LexGLUE 竞赛奖励的详细量化比较,包括性能(标准指标)和代表性指标(如时间、能源消耗和成本)。在我们的分析中,我们分 separately 评估预测阶段(模型选择)和生产阶段,因为它们采用不同的实现方式和需要不同的资源。结果显示,经常情况下,最简单的算法可以与大型 LLM 的性能几乎相当,但具有非常低的电力消耗和资源需求。这些结果可能会让公司包括机器学习(ML)解决方案的评估在内。

Push it to the Demonstrated Limit: Multimodal Visuotactile Imitation Learning with Force Matching

  • paper_url: http://arxiv.org/abs/2311.01248
  • repo_url: None
  • paper_authors: Trevor Ablett, Oliver Limoyo, Adam Sigal, Affan Jilani, Jonathan Kelly, Kaleem Siddiqi, Francois Hogan, Gregory Dudek
  • for: 这个论文主要针对的是使用光学皮肤感知器进行机器人 manipulate 任务中的粘质感知。
  • methods: 该论文使用了灵活的光学皮肤感知器,可以同时获取视觉和皮肤信息。在训练过程中,使用了感觉学习的方法,通过对人类示范者的力学特征进行学习,生成一个更加符合人类的力学特征的力学Profile。
  • results: 研究结果表明,通过结合视觉和皮肤感知,可以提高机器人的 manipulate 任务性能。在多种观察配置下,对比视觉数据和视觉/皮肤感知数据,研究发现,皮肤感知对于力学学习和任务反馈具有重要的作用。
    Abstract Optical tactile sensors have emerged as an effective means to acquire dense contact information during robotic manipulation. A recently-introduced `see-through-your-skin' (STS) variant of this type of sensor has both visual and tactile modes, enabled by leveraging a semi-transparent surface and controllable lighting. In this work, we investigate the benefits of pairing visuotactile sensing with imitation learning for contact-rich manipulation tasks. First, we use tactile force measurements and a novel algorithm during kinesthetic teaching to yield a force profile that better matches that of the human demonstrator. Second, we add visual/tactile STS mode switching as a control policy output, simplifying the application of the sensor. Finally, we study multiple observation configurations to compare and contrast the value of visual/tactile data (both with and without mode switching) with visual data from a wrist-mounted eye-in-hand camera. We perform an extensive series of experiments on a real robotic manipulator with door-opening and closing tasks, including over 3,000 real test episodes. Our results highlight the importance of tactile sensing for imitation learning, both for data collection to allow force matching, and for policy execution to allow accurate task feedback.
    摘要 optical tactile sensors 已经成为了在机器人操作中获取密集的触感信息的有效手段。一种最近引入的 `see-through-your-skin'(STS)变体的这种感应器具有视觉和感觉两种模式,通过利用半透明表面和可控的照明来实现。在这项工作中,我们研究了将视觉感觉与模仿学习结合使用以提高接触充满的抓取任务。首先,我们使用了拟合力测量和一种新的算法来从抗阻教学中获得更好地匹配人类示范者的力脉冲。其次,我们添加了视觉/感觉 STS 模式切换作为控制策略输出,使感应器的应用更加简单。最后,我们研究了多种观察配置,比较和对比视觉数据和感觉数据(均有和无模式切换)的价值,以及视觉数据来自机器人手臂上的眼在手中摄像头。我们在一个真实的机器人抓取机器上进行了大量实验,包括超过 3,000 个真实测试集。我们的结果表明,感觉感知对于模仿学习是非常重要的,不仅用于数据采集以允许力脉冲匹配,还用于策略执行以提供精准任务反馈。

FacadeNet: Conditional Facade Synthesis via Selective Editing

  • paper_url: http://arxiv.org/abs/2311.01240
  • repo_url: None
  • paper_authors: Yiangos Georgiou, Marios Loizou, Tom Kelly, Melinos Averkiou
  • for: 这个论文是为了Synthesizing building facade images from diverse viewpoints,即生成不同视角的建筑facade图像。
  • methods: 这个方法使用了一种conditional GAN,接受一个建筑facade的单一视图以及所需的视点信息,并生成图像。为了精确地修改视点依赖的元素(如窗户和门)而保留视角无关的元素(如墙壁),我们引入了选择性编辑模块。这个模块利用了一种预训练的视Transformer来提取图像嵌入。
  • results: 我们的实验表明,这种方法可以达到建筑facade生成领域的州际性表现,超过了其他方法。
    Abstract We introduce FacadeNet, a deep learning approach for synthesizing building facade images from diverse viewpoints. Our method employs a conditional GAN, taking a single view of a facade along with the desired viewpoint information and generates an image of the facade from the distinct viewpoint. To precisely modify view-dependent elements like windows and doors while preserving the structure of view-independent components such as walls, we introduce a selective editing module. This module leverages image embeddings extracted from a pre-trained vision transformer. Our experiments demonstrated state-of-the-art performance on building facade generation, surpassing alternative methods.
    摘要 我们介绍了 FacadeNet,一种深度学习方法,用于从多个视角生成建筑外墙图像。我们的方法使用一个条件GAN,接受一个建筑外墙的单个视图以及所需视角信息,并生成该视角下的建筑外墙图像。为精准地修改视角依赖的元素,如窗户和门,而保留视角独立的元素,如墙壁,我们引入了选择性编辑模块。这个模块利用一个预训练的视Transformer来提取图像嵌入。我们的实验表明,FacadeNet可以在建筑外墙生成中实现状态机器人表现,超过其他方法。

  • paper_url: http://arxiv.org/abs/2311.01235
  • repo_url: None
  • paper_authors: Ryen W. White
  • for: 这篇论文旨在探讨人工智能技术在搜索方面的应用和发展,以帮助搜索引擎更好地支持复杂的搜索任务。
  • methods: 本论文使用了生成式人工智能技术和助手(AI copilots),以帮助搜索者更好地完成复杂的搜索任务。
  • results: 本论文预示了AI copilots在搜索方面的应用将有普遍的改善和发展,并可能导致搜索引擎的重新设计和未来发展。
    Abstract As many of us in the information retrieval (IR) research community know and appreciate, search is far from being a solved problem. Millions of people struggle with tasks on search engines every day. Often, their struggles relate to the intrinsic complexity of their task and the failure of search systems to fully understand the task and serve relevant results. The task motivates the search, creating the gap/problematic situation that searchers attempt to bridge/resolve and drives search behavior as they work through different task facets. Complex search tasks require more than support for rudimentary fact finding or re-finding. Research on methods to support complex tasks includes work on generating query and website suggestions, personalizing and contextualizing search, and developing new search experiences, including those that span time and space. The recent emergence of generative artificial intelligence (AI) and the arrival of assistive agents, or copilots, based on this technology, has the potential to offer further assistance to searchers, especially those engaged in complex tasks. There are profound implications from these advances for the design of intelligent systems and for the future of search itself. This article, based on a keynote by the author at the 2023 ACM SIGIR Conference, explores these issues and charts a course toward new horizons in information access guided by AI copilots.
    摘要 很多我们在信息检索(IR)研究社区知道和钦佩的事实是,搜寻并不是已经解决的问题。每天,百万人都在搜索引擎上进行各种任务。常常,这些任务的问题在搜寻系统不够理解任务的情况下,无法提供相应的结果。这些任务驱使搜寻,创造出问题的差距和问题,并且驱动搜寻行为。复杂的搜寻任务需要更进一步的支持,不仅是基本的事实查找或重新找。研究人员在发展新的搜寻技术方面做出了很多努力,例如生成查询和网站建议、个性化和 contextualizing搜寻、开发新的搜寻经验,包括在时空中进行的搜寻。受到生成人工智能(AI)的启发,助手或副驾驶器的出现,将对搜寻者,特别是进行复杂任务的人,提供更多的帮助。这些进步将对智能系统的设计和未来的搜寻产生深远的影响。本文,基于作者在2023年ACM SIGIR会议上的关键演讲,探讨了这些问题,并寻找新的搜寻方向,受到AI助手带领。

Long Story Short: a Summarize-then-Search Method for Long Video Question Answering

  • paper_url: http://arxiv.org/abs/2311.01233
  • repo_url: None
  • paper_authors: Jiwan Chung, Youngjae Yu
  • for: This paper explores the ability of large language models like GPT-3 to adapt to new tasks without task-specific training data, specifically in the context of long multimodal narratives in multimedia content like drama, movies, and animation.
  • methods: The proposed framework, called Long Story Short, first summarizes the narrative of the video into a short plot and then searches for relevant parts of the video using CLIPCheck.
  • results: The model outperforms state-of-the-art supervised models by a large margin, demonstrating the potential of zero-shot QA for long videos.Here’s the text in Simplified Chinese:
  • for: 这篇论文探讨了大语言模型如GPT-3在不需要任务特定训练数据的情况下是否能够扩展到新任务,特别是在叙事视频内容如电影、动画等中的长Multimodal narratives。
  • methods: 提议的框架是Long Story Short,它首先摘要了视频的叙事情节,然后使用CLIPCheck搜索问题相关的视频部分。
  • results: 模型超过了现有的超vised模型,强调了零shot QA的潜在性。
    Abstract Large language models such as GPT-3 have demonstrated an impressive capability to adapt to new tasks without requiring task-specific training data. This capability has been particularly effective in settings such as narrative question answering, where the diversity of tasks is immense, but the available supervision data is small. In this work, we investigate if such language models can extend their zero-shot reasoning abilities to long multimodal narratives in multimedia content such as drama, movies, and animation, where the story plays an essential role. We propose Long Story Short, a framework for narrative video QA that first summarizes the narrative of the video to a short plot and then searches parts of the video relevant to the question. We also propose to enhance visual matching with CLIPCheck. Our model outperforms state-of-the-art supervised models by a large margin, highlighting the potential of zero-shot QA for long videos.
    摘要 大型语言模型如GPT-3已经表现出适应新任务的能力,不需要专门的任务特有的训练数据。这种能力在叙事问答中 especial 有效,因为任务的多样性很大,但可用的监督数据很少。在这项工作中,我们 investigates 如果这些语言模型可以扩展其零shot 理解能力到长 multimedia 媒体内容,如电影、电视剧和动画,其中故事扮演着关键性的角色。我们提出了 Long Story Short,一个用于叙事视频问答的框架,首先摘要视频的叙事,然后在问题相关的部分搜索视频。我们还提出了CLIPCheck的Visual Matching Enhancement,我们的模型在比较之上大幅超越了现有的超vised模型,这 highlights 零shot QA 的潜在能力在长视频上。

Multi-Operational Mathematical Derivations in Latent Space

  • paper_url: http://arxiv.org/abs/2311.01230
  • repo_url: https://github.com/neuro-symbolic-ai/latent_mathematical_reasoning
  • paper_authors: Marco Valentino, Jordan Meadows, Lan Zhang, André Freitas
  • for: 这个论文研究了在潜在空间中对多个数学运算的合并,以实现表达推导的可能性。
  • methods: 作者引入了不同的多操作表示模式,将数学运算视为显式的几何变换,并利用符号计算机件构建了一个大规模的 derivation step 集合,包括 61K premises 和 6 种运算,分析每种模式在使用现有的神经编码器时的属性。
  • results: 研究发现,多操作模式是分离不同运算的关键,而单一运算的推导结论可以在原始表达编码器中分配。此外,作者还发现了不同的建筑选择对训练动力、结构组织和泛化造成了重大的影响,导致不同模式和编码器类型之间存在显著的差异。
    Abstract This paper investigates the possibility of approximating multiple mathematical operations in latent space for expression derivation. To this end, we introduce different multi-operational representation paradigms, modelling mathematical operations as explicit geometric transformations. By leveraging a symbolic engine, we construct a large-scale dataset comprising 1.7M derivation steps stemming from 61K premises and 6 operators, analysing the properties of each paradigm when instantiated with state-of-the-art neural encoders. Specifically, we investigate how different encoding mechanisms can approximate equational reasoning in latent space, exploring the trade-off between learning different operators and specialising within single operations, as well as the ability to support multi-step derivations and out-of-distribution generalisation. Our empirical analysis reveals that the multi-operational paradigm is crucial for disentangling different operators, while discriminating the conclusions for a single operation is achievable in the original expression encoder. Moreover, we show that architectural choices can heavily affect the training dynamics, structural organisation, and generalisation of the latent space, resulting in significant variations across paradigms and classes of encoders.
    摘要

Diffusion Models for Reinforcement Learning: A Survey

  • paper_url: http://arxiv.org/abs/2311.01223
  • repo_url: https://github.com/apexrl/diff4rlsurvey
  • paper_authors: Zhengbang Zhu, Hanye Zhao, Haoran He, Yichao Zhong, Shenyu Zhang, Yong Yu, Weinan Zhang
  • for: 本文提供了Diffusion模型在强化学习(Reinforcement Learning,RL)领域的进展概述,并希望通过这篇评论来鼓励新的研究方向。
  • methods: 本文分析了当前RL算法遇到的一些挑战,然后提出了基于Diffusion模型的RL方法的分类,并详细介绍了它们如何解决这些挑战。
  • results: 本文介绍了Diffusion模型在各种RL相关任务中的成功应用,同时讨论了现有方法的局限性,并提出了未来研究方向的想法,包括提高模型性能和应用Diffusion模型到更广泛的任务。Here’s the full text in Simplified Chinese:本文提供了Diffusion模型在强化学习(Reinforcement Learning,RL)领域的进展概述,并希望通过这篇评论来鼓励新的研究方向。Diffusion模型在RL领域的应用已经超过了之前的方法,包括轨迹规划、表达政策类、数据生成器等。本文分析了当前RL算法遇到的一些挑战,然后提出了基于Diffusion模型的RL方法的分类,并详细介绍了它们如何解决这些挑战。此外,本文介绍了Diffusion模型在各种RL相关任务中的成功应用,同时讨论了现有方法的局限性,并提出了未来研究方向的想法,包括提高模型性能和应用Diffusion模型到更广泛的任务。您可以在https://github.com/apexrl/Diff4RLSurvey上找到更多相关资源和文献。
    Abstract Diffusion models have emerged as a prominent class of generative models, surpassing previous methods regarding sample quality and training stability. Recent works have shown the advantages of diffusion models in improving reinforcement learning (RL) solutions, including as trajectory planners, expressive policy classes, data synthesizers, etc. This survey aims to provide an overview of the advancements in this emerging field and hopes to inspire new avenues of research. First, we examine several challenges encountered by current RL algorithms. Then, we present a taxonomy of existing methods based on the roles played by diffusion models in RL and explore how the existing challenges are addressed. We further outline successful applications of diffusion models in various RL-related tasks while discussing the limitations of current approaches. Finally, we conclude the survey and offer insights into future research directions, focusing on enhancing model performance and applying diffusion models to broader tasks. We are actively maintaining a GitHub repository for papers and other related resources in applying diffusion models in RL: https://github.com/apexrl/Diff4RLSurvey .
    摘要 Diffusion models 已经成为一种显著的生成模型,超过了之前的方法在样本质量和训练稳定性方面。最近的研究表明 diffusion models 在改进强化学习(RL)解决方案方面具有优势,包括轨迹规划器、表达政策类、数据合成器等。这篇评论旨在为这个emerging field提供一个概述,并希望能启发新的研究方向。首先,我们考虑了现在RL算法遇到的一些挑战。然后,我们提出了基于 diffusion models 在 RL 中扮演的不同角色的分类,并详细介绍了现有的挑战如何被解决。然后,我们详细介绍了 diffusion models 在各种 RL 相关任务中的成功应用,同时讨论了现有方法的局限性。最后,我们结束这篇评论,并对未来研究方向做出了一些建议,主要是增强模型性能和将 diffusion models 应用于更广泛的任务。我们 aktif maintenanceng a GitHub repository for papers and other related resources in applying diffusion models in RL: https://github.com/apexrl/Diff4RLSurvey。

Multi-view Relation Learning for Cross-domain Few-shot Hyperspectral Image Classification

  • paper_url: http://arxiv.org/abs/2311.01212
  • repo_url: https://github.com/henulwy/stbdip
  • paper_authors: Chun Liu, Longwei Yang, Zheng Li, Wei Yang, Zhigang Han, Jianzhong Guo, Junyong Yu
  • for: 这个论文主要针对跨Domain少数shot颜色成像分类问题,探讨如何将来自源Domain的大量标签样本中的专业知识转移到目标Domain中的任务中,仅具够几个标签样本进行分类。
  • methods: 本文提出了一个基于对比学习的方法,从不同的视角学习样本之间的关系,并将这些关系纳入模型学习过程中,以提高跨Domain少数shot颜色成像分类的性能。这个方法首先从不同的视角EXTRACT样本的特征,然后使用对比学习来学习样本之间的关系,最后将这些关系纳入模型学习过程中。
  • results: 我们的实验结果显示,在跨Domain少数shot颜色成像分类任务中,这个基于对比学习的方法能够提高模型的性能,并且与现有的方法相比,具有更好的一致性和稳定性。
    Abstract Cross-domain few-shot hyperspectral image classification focuses on learning prior knowledge from a large number of labeled samples from source domain and then transferring the knowledge to the tasks which contain only few labeled samples in target domains. Following the metric-based manner, many current methods first extract the features of the query and support samples, and then directly predict the classes of query samples according to their distance to the support samples or prototypes. The relations between samples have not been fully explored and utilized. Different from current works, this paper proposes to learn sample relations from different views and take them into the model learning process, to improve the cross-domain few-shot hyperspectral image classification. Building on current DCFSL method which adopts a domain discriminator to deal with domain-level distribution difference, the proposed method applys contrastive learning to learn the class-level sample relations to obtain more discriminable sample features. In addition, it adopts a transformer based cross-attention learning module to learn the set-level sample relations and acquire the attentions from query samples to support samples. Our experimental results have demonstrated the contribution of the multi-view relation learning mechanism for few-shot hyperspectral image classification when compared with the state of the art methods.
    摘要 Unlike current methods, this paper proposes a new approach that learns sample relations from different views and incorporates them into the model learning process to improve cross-domain few-shot hyperspectral image classification. Building on the current DCFSL method, which uses a domain discriminator to handle domain-level distribution differences, the proposed method uses contrastive learning to learn class-level sample relations and obtain more discriminative sample features. Additionally, it employs a transformer-based cross-attention learning module to learn set-level sample relations and acquire attention from query samples to support samples.Experimental results have shown that the proposed method outperforms state-of-the-art methods in few-shot hyperspectral image classification, thanks to the multi-view relation learning mechanism.

Attacking Graph Neural Networks with Bit Flips: Weisfeiler and Lehman Go Indifferent

  • paper_url: http://arxiv.org/abs/2311.01205
  • repo_url: None
  • paper_authors: Lorenz Kummer, Samir Moustafa, Nils N. Kriege, Wilfried N. Gansterer
  • for: 这篇论文旨在攻击图神经网络(Graph Neural Network,GNN)的Weight和Biases,而不是 tradicional的Graph Poisoning和Evasion攻击。
  • methods: 我们提出了首个特性为图神经网络的Bit Flip攻击, targets learned neighborhood aggregation functions in quantized message passing neural networks,使其难以识别图结构和丢失表达力。
  • results: 我们的研究表明,通过利用图神经网络特有的数学性质,可以大幅提高其对Bit Flip攻击的感受性。我们的攻击可以使最大表达能力的图同构网络(Graph Isomorphism Networks)的输出变为随机值,只需要flipping一小部分网络的比特。
    Abstract Prior attacks on graph neural networks have mostly focused on graph poisoning and evasion, neglecting the network's weights and biases. Traditional weight-based fault injection attacks, such as bit flip attacks used for convolutional neural networks, do not consider the unique properties of graph neural networks. We propose the Injectivity Bit Flip Attack, the first bit flip attack designed specifically for graph neural networks. Our attack targets the learnable neighborhood aggregation functions in quantized message passing neural networks, degrading their ability to distinguish graph structures and losing the expressivity of the Weisfeiler-Lehman test. Our findings suggest that exploiting mathematical properties specific to certain graph neural network architectures can significantly increase their vulnerability to bit flip attacks. Injectivity Bit Flip Attacks can degrade the maximal expressive Graph Isomorphism Networks trained on various graph property prediction datasets to random output by flipping only a small fraction of the network's bits, demonstrating its higher destructive power compared to a bit flip attack transferred from convolutional neural networks. Our attack is transparent and motivated by theoretical insights which are confirmed by extensive empirical results.
    摘要 先前的攻击对图 neural network 主要集中在恶意修改图和逃脱,忽视了网络的权重和偏好。传统的权重基于的攻击,如 convolutional neural network 中的 bit flip 攻击,不考虑图 neural network 的独特特性。我们提出了 Injectivity Bit Flip Attack,首先针对 quantized message passing neural network 中的学习可能的邻接聚合函数,使其失去Distinguish 图结构的能力和 Weisfeiler-Lehman 测试的表达能力。我们的发现表明,特定的图 neural network 架构的数学性质可以使其更容易受到 bit flip 攻击。我们的攻击可以通过只flipping 一小部分网络的比特来使最大表达能力的图Isomorphism Networks 输出Random, demonstarting its higher destructive power compared to transferred bit flip attack from convolutional neural networks。我们的攻击是透明的,基于理论启示,并经过了广泛的实验验证。

Cross-Modal Information-Guided Network using Contrastive Learning for Point Cloud Registration

  • paper_url: http://arxiv.org/abs/2311.01202
  • repo_url: https://github.com/ivanxie416/cmignet
  • paper_authors: Yifan Xie, Jihua Zhu, Shiqi Li, Pengcheng Shi
  • for: 本研究旨在提出一种新的多模态信息引导网络(CMIGNet),用于实现精度和稳定的点云注册。
  • methods: 我们首先将点云图像投影到2D图像上,并将多模态特征进行融合使用注意力机制。然后,我们采用两种对比学习策略,即重叠对比学习和跨模态对比学习,以确定关键点云特征。
  • results: 我们在多个 benchmark 数据集上进行了广泛的实验,结果显示,我们的网络可以准确地进行点云注册。
    Abstract The majority of point cloud registration methods currently rely on extracting features from points. However, these methods are limited by their dependence on information obtained from a single modality of points, which can result in deficiencies such as inadequate perception of global features and a lack of texture information. Actually, humans can employ visual information learned from 2D images to comprehend the 3D world. Based on this fact, we present a novel Cross-Modal Information-Guided Network (CMIGNet), which obtains global shape perception through cross-modal information to achieve precise and robust point cloud registration. Specifically, we first incorporate the projected images from the point clouds and fuse the cross-modal features using the attention mechanism. Furthermore, we employ two contrastive learning strategies, namely overlapping contrastive learning and cross-modal contrastive learning. The former focuses on features in overlapping regions, while the latter emphasizes the correspondences between 2D and 3D features. Finally, we propose a mask prediction module to identify keypoints in the point clouds. Extensive experiments on several benchmark datasets demonstrate that our network achieves superior registration performance.
    摘要 Specifically, we first incorporate the projected images from the point clouds and fuse the cross-modal features using the attention mechanism. Furthermore, we employ two contrastive learning strategies, namely overlapping contrastive learning and cross-modal contrastive learning. The former focuses on features in overlapping regions, while the latter emphasizes the correspondences between 2D and 3D features. Finally, we propose a mask prediction module to identify keypoints in the point clouds.Extensive experiments on several benchmark datasets demonstrate that our network achieves superior registration performance.

Federated Learning on Edge Sensing Devices: A Review

  • paper_url: http://arxiv.org/abs/2311.01201
  • repo_url: None
  • paper_authors: Berrenur Saylam, Özlem Durmaz İncel
  • for: 本研究实际应用于Edge设备上的聚合学习,以解决传统机器学习技术所面临的隐私、硬件和连接限制问题。
  • methods: 本研究使用 Federated Learning(FL)策略,将聚合学习模型训练在Edge设备上,而不需要分享实际数据。
  • results: 本研究提出了一个基于FL的聚合学习方法,可以在Edge设备上进行实时数据分析和决策,并维护隐私和安全性。
    Abstract The ability to monitor ambient characteristics, interact with them, and derive information about the surroundings has been made possible by the rapid proliferation of edge sensing devices like IoT, mobile, and wearable devices and their measuring capabilities with integrated sensors. Even though these devices are small and have less capacity for data storage and processing, they produce vast amounts of data. Some example application areas where sensor data is collected and processed include healthcare, environmental (including air quality and pollution levels), automotive, industrial, aerospace, and agricultural applications. These enormous volumes of sensing data collected from the edge devices are analyzed using a variety of Machine Learning (ML) and Deep Learning (DL) approaches. However, analyzing them on the cloud or a server presents challenges related to privacy, hardware, and connectivity limitations. Federated Learning (FL) is emerging as a solution to these problems while preserving privacy by jointly training a model without sharing raw data. In this paper, we review the FL strategies from the perspective of edge sensing devices to get over the limitations of conventional machine learning techniques. We focus on the key FL principles, software frameworks, and testbeds. We also explore the current sensor technologies, properties of the sensing devices and sensing applications where FL is utilized. We conclude with a discussion on open issues and future research directions on FL for further studies
    摘要 “随着边缘感应设备的普及,例如IoT、手持式和穿戴式设备的数据量和感应功能,实现了监测环境特点、互动和获取环境信息的能力。这些设备小巧,储存空间和处理能力有限,但生成了巨量数据。一些应用领域包括医疗、环境(包括空气质量和污染水平)、汽车、工业、航空和农业应用。这些边缘感应数据通过多种机器学习(ML)和深度学习(DL)方法进行分析。但是,将数据分析到云端或服务器端存在隐私、硬件和连接限制的问题。联邦学习(FL)正在解决这些问题,并保持隐私性,无需共享原始数据。本文从边缘感应设备的角度,检视FL策略,并评估适用于边缘感应应用的软件框架和实验室。我们也探讨目前的感应技术、感应设备的性能和感应应用中FL的应用。我们结束时讨论未解决的问题和未来研究方向。”

AiluRus: A Scalable ViT Framework for Dense Prediction

  • paper_url: http://arxiv.org/abs/2311.01197
  • repo_url: https://github.com/caddyless/ailurus
  • paper_authors: Jin Li, Yaoming Wang, Xiaopeng Zhang, Bowen Shi, Dongsheng Jiang, Chenglin Li, Wenrui Dai, Hongkai Xiong, Qi Tian
  • For: 提高 vision transformer (ViT) 模型在长序列处理方面的性能,特别是在高分辨率输入的 dense prediction 任务中。* Methods: 使用适应分辨率技术,将图像中不同区域的分辨率调整为不同的水平。在 ViT 中间层使用空间感知密度基于的聚合算法,选择代表性的 токен。然后,将其他 tokens 聚合到最近的代表 токен 中。这种策略可以减少 токен数量,使后续层可以处理减少的 токен序列,实现加速。* Results: 在三个不同的 dataset 上进行测试,并观察了promising的性能。例如,可以通过不需要微调的方式,将 “Segmenter ViT-L” 模型加速48% FPS。此外,我们的方法还可以加速 fine-tuning 过程。实验结果表明,可以在训练时间上产生52%的减少,同时加速2.46倍 FPS,只有0.09%的性能下降。代码可以在 https://github.com/caddyless/ailurus/tree/main 上找到。
    Abstract Vision transformers (ViTs) have emerged as a prevalent architecture for vision tasks owing to their impressive performance. However, when it comes to handling long token sequences, especially in dense prediction tasks that require high-resolution input, the complexity of ViTs increases significantly. Notably, dense prediction tasks, such as semantic segmentation or object detection, emphasize more on the contours or shapes of objects, while the texture inside objects is less informative. Motivated by this observation, we propose to apply adaptive resolution for different regions in the image according to their importance. Specifically, at the intermediate layer of the ViT, we utilize a spatial-aware density-based clustering algorithm to select representative tokens from the token sequence. Once the representative tokens are determined, we proceed to merge other tokens into their closest representative token. Consequently, semantic similar tokens are merged together to form low-resolution regions, while semantic irrelevant tokens are preserved independently as high-resolution regions. This strategy effectively reduces the number of tokens, allowing subsequent layers to handle a reduced token sequence and achieve acceleration. We evaluate our proposed method on three different datasets and observe promising performance. For example, the "Segmenter ViT-L" model can be accelerated by 48% FPS without fine-tuning, while maintaining the performance. Additionally, our method can be applied to accelerate fine-tuning as well. Experimental results demonstrate that we can save 52% training time while accelerating 2.46 times FPS with only a 0.09% performance drop. The code is available at https://github.com/caddyless/ailurus/tree/main.
    摘要 vision transformers (ViTs) 已经成为视觉任务中广泛使用的主流架构,尤其是在处理长token序列时,它们的性能很出色。然而,在 dense prediction 任务中,特别是 semantic segmentation 或 object detection,需要高分辨率的输入,这时 ViTs 的复杂度会增加显著。我们发现, dense prediction 任务中,对象的 outline 或形状更加重要,而内部的文字则更加不重要。基于这一点,我们提议应用适应性分辨率,对不同的图像区域进行不同的分辨率处理。在 ViT 的中间层次结构中,我们使用空间意识度的density-based clustering算法来选择表示性的token。然后,我们将其他token合并到最近的表示token中。因此,semantic相似的token会合并成低分辨率区域,而semantic不相关的token则独立保留为高分辨率区域。这种策略有效地减少了token数量,使后续层次可以处理减少后的token序列,并实现加速。我们在三个不同的dataset上进行了评估,并观察到了有前景的性能。例如,"Segmenter ViT-L" 模型可以通过48% FPS 的加速而不需要微调。此外,我们的方法还可以用于加速微调。实验结果表明,我们可以在训练时间上Save 52%,并且在加速2.46倍 FPS 时,只减少了0.09%的性能。代码可以在 中找到。

Batch Bayesian Optimization for Replicable Experimental Design

  • paper_url: http://arxiv.org/abs/2311.01195
  • repo_url: None
  • paper_authors: Zhongxiang Dai, Quoc Phong Nguyen, Sebastian Shenghong Tay, Daisuke Urano, Richalynn Leong, Bryan Kian Hsiang Low, Patrick Jaillet
  • for: 本文针对实验设计问题提出了一个框架,即批 Thompson 抽样体系 (BTS-RED),用于处理多元实验条件的评估和重复测量。
  • methods: 本文提出了三种算法,分别为 BTS-RED-Known、BTS-RED-Unknown 和 Mean-Var-BTS-RED,用于处理不同的错误分布和对应的风险偏好。
  • results: 本文透过实验证明了这三种算法的可靠性和无损 regret 性,并在精确农业和 AutoML 实验中显示了它们的实际效果。
    Abstract Many real-world experimental design problems (a) evaluate multiple experimental conditions in parallel and (b) replicate each condition multiple times due to large and heteroscedastic observation noise. Given a fixed total budget, this naturally induces a trade-off between evaluating more unique conditions while replicating each of them fewer times vs. evaluating fewer unique conditions and replicating each more times. Moreover, in these problems, practitioners may be risk-averse and hence prefer an input with both good average performance and small variability. To tackle both challenges, we propose the Batch Thompson Sampling for Replicable Experimental Design (BTS-RED) framework, which encompasses three algorithms. Our BTS-RED-Known and BTS-RED-Unknown algorithms, for, respectively, known and unknown noise variance, choose the number of replications adaptively rather than deterministically such that an input with a larger noise variance is replicated more times. As a result, despite the noise heteroscedasticity, both algorithms enjoy a theoretical guarantee and are asymptotically no-regret. Our Mean-Var-BTS-RED algorithm aims at risk-averse optimization and is also asymptotically no-regret. We also show the effectiveness of our algorithms in two practical real-world applications: precision agriculture and AutoML.
    摘要 多个实验设计问题(a)会同时评估多个实验条件,并且每个条件会被重复多次,这是因为观察噪声很大且不均匀。给定一个固定的总预算,这会导致评估更多的独特条件 vs. 评估更少的独特条件的费用之间的权衡。此外,在这些问题中,实践者可能会偏爱风险观,因此偏好一个具有良好平均性和小变异性的输入。为了解决这两个挑战,我们提出了批 Thompson 采样 для可重现实验设计(BTS-RED)框架,该框架包括三种算法。我们的 BTS-RED-known 和 BTS-RED-unknown 算法,分别针对已知和未知噪声 variance,选择复制的数量适应而不是决定性地,以便在噪声不均匀的情况下,输入具有更大的噪声 variance 会被复制更多次。由于这些算法具有理论保证和朴素观察的折衔,它们在噪声不均匀情况下是 asymptotically no-regret。我们的 Mean-Var-BTS-RED 算法则是针对偏爱风险优化的,并且也是 asymptotically no-regret。我们还在精准农业和 AutoML 两个实际应用中证明了我们的算法的效果。

Contextual Confidence and Generative AI

  • paper_url: http://arxiv.org/abs/2311.01193
  • repo_url: None
  • paper_authors: Shrey Jain, Zoë Hitzig, Pamela Mishkin
  • for: 该论文旨在面对生成式人工智能模型对有效人类communication的威胁,描述一些稳定communication的策略。
  • methods: 该论文使用的方法包括工具、技术和政策,分为两大类:含容策略和推动策略。含容策略目的是在生成式AI模型威胁下重新确定communication的上下文,而推动策略则是利用AI的进步提高mediated communication的隐私和真实性的期望。
  • results: 该论文的结果表明,采用合适的策略可以稳定communication在生成式AI模型的威胁下,并提高mediated communication的隐私和真实性。
    Abstract Generative AI models perturb the foundations of effective human communication. They present new challenges to contextual confidence, disrupting participants' ability to identify the authentic context of communication and their ability to protect communication from reuse and recombination outside its intended context. In this paper, we describe strategies--tools, technologies and policies--that aim to stabilize communication in the face of these challenges. The strategies we discuss fall into two broad categories. Containment strategies aim to reassert context in environments where it is currently threatened--a reaction to the context-free expectations and norms established by the internet. Mobilization strategies, by contrast, view the rise of generative AI as an opportunity to proactively set new and higher expectations around privacy and authenticity in mediated communication.
    摘要 生成AI模型对人类交流的基础产生了巨大的挑战。它们使得参与者无法正确地识别交流的 authentics 上下文和保护交流从其不良上下文中的重用和复制。在这篇论文中,我们描述了一些策略——工具、技术和政策——以稳定交流面临这些挑战。我们所讨论的策略分为两个大类。封装策略 aim to reassert context in environments where it is currently threatened——一种应对互联网所建立的无上下文期望和规范的反应。 mobilization strategies, by contrast, view the rise of generative AI as an opportunity to proactively set new and higher expectations around privacy and authenticity in mediated communication.

VIGraph: Self-supervised Learning for Class-Imbalanced Node Classification

  • paper_url: http://arxiv.org/abs/2311.01191
  • repo_url: None
  • paper_authors: Yulan Hu, Sheng Ouyang, Zhirui Yang, Yong Liu
  • for: 此研究旨在解决图数据中类别不整齐的问题,提高类别不整齐的节点预测性能。
  • methods: 本研究提出了一种基于自助学习(SSL)的新方法,利用自身数据自带的缺失数据进行生成缺失类节点,从而提高类别不整齐的预测性能。
  • results: 实验结果表明,基于VGAE的VIGraph方法可以生成高质量的缺失类节点,提高类别不整齐的节点预测性能。
    Abstract Class imbalance in graph data poses significant challenges for node classification. Existing methods, represented by SMOTE-based approaches, partially alleviate this issue but still exhibit limitations during imbalanced scenario construction. Self-supervised learning (SSL) offers a promising solution by synthesizing minority nodes from the data itself, yet its potential remains unexplored. In this paper, we analyze the limitations of SMOTE-based approaches and introduce VIGraph, a novel SSL model based on the self-supervised Variational Graph Auto-Encoder (VGAE) that leverages Variational Inference (VI) to generate minority nodes. Specifically, VIGraph strictly adheres to the concept of imbalance when constructing imbalanced graphs and utilizes the generative VGAE to generate minority nodes. Moreover, VIGraph introduces a novel Siamese contrastive strategy at the decoding phase to improve the overall quality of generated nodes. VIGraph can generate high-quality nodes without reintegrating them into the original graph, eliminating the "Generating, Reintegrating, and Retraining" process found in SMOTE-based methods. Experiments on multiple real-world datasets demonstrate that VIGraph achieves promising results for class-imbalanced node classification tasks.
    摘要 classe 不均衡在图数据中存在 significativ 挑战,现有的方法,表示 SMOTE 基于方法, partially 缓解了这种情况,但仍然在不均衡enario 构建中存在限制。自我supervised 学习(SSL)提供了一个有前途的解决方案,可以自动生成少数节点,但其潜力仍然没有得到充分利用。本文分析了 SMOTE 基于方法的局限性,并引入 VIGraph,一种新的 SSL 模型,基于自我supervised Variational Graph Auto-Encoder(VGAE),利用 Variational Inference(VI)生成少数节点。具体来说,VIGraph 严格遵循不均衡概念在构建不均衡图时,并利用生成的 VGAE 来生成少数节点。此外,VIGraph 引入了一种新的对比策略在解码阶段,以提高生成节点的质量。VIGraph 可以生成高质量节点,无需将其重新 integrate 到原始图中,从而消除 SMOTE 基于方法中的 "生成、重新集成、重新训练" 过程。实验表明,VIGraph 在多个真实世界数据集上取得了优秀的结果 для类均衡节点分类任务。

Revolutionizing Healthcare Image Analysis in Pandemic-Based Fog-Cloud Computing Architectures

  • paper_url: http://arxiv.org/abs/2311.01185
  • repo_url: None
  • paper_authors: Al Zahraa Elsayed, Khalil Mohamed, Hany Harb
    for: 这篇研究paper的目的是来提出一个创新的医疗架构,以解决医疗数据分析中的效率和准确性问题。methods: 这篇paper使用了fog computing和改进的Convolutional Neural Network(CNN)来进行医疗影像分析。不同的CNN层架构被充分探讨和评估,以最大化整体性能。results: 比较过去的模型如VGG16、VGG19、MobileNet以及相关研究,提出的方法实现了99.88%的正常案例准确率,以及96.5%的验证率、100%的精度和回传率,以及100%的F1分数。这些结果显示fog computing和改进的CNN在医疗影像分析和诊断中具有广泛的应用前景,不仅在疫情期间,而且在未来也具有巨大的潜力。
    Abstract The emergence of pandemics has significantly emphasized the need for effective solutions in healthcare data analysis. One particular challenge in this domain is the manual examination of medical images, such as X-rays and CT scans. This process is time-consuming and involves the logistical complexities of transferring these images to centralized cloud computing servers. Additionally, the speed and accuracy of image analysis are vital for efficient healthcare image management. This research paper introduces an innovative healthcare architecture that tackles the challenges of analysis efficiency and accuracy by harnessing the capabilities of Artificial Intelligence (AI). Specifically, the proposed architecture utilizes fog computing and presents a modified Convolutional Neural Network (CNN) designed specifically for image analysis. Different architectures of CNN layers are thoroughly explored and evaluated to optimize overall performance. To demonstrate the effectiveness of the proposed approach, a dataset of X-ray images is utilized for analysis and evaluation. Comparative assessments are conducted against recent models such as VGG16, VGG19, MobileNet, and related research papers. Notably, the proposed approach achieves an exceptional accuracy rate of 99.88% in classifying normal cases, accompanied by a validation rate of 96.5%, precision and recall rates of 100%, and an F1 score of 100%. These results highlight the immense potential of fog computing and modified CNNs in revolutionizing healthcare image analysis and diagnosis, not only during pandemics but also in the future. By leveraging these technologies, healthcare professionals can enhance the efficiency and accuracy of medical image analysis, leading to improved patient care and outcomes.
    摘要 随着疫情的出现,医疗数据分析领域面临着有效解决方案的强烈需求。一个特定的挑战在这个领域是手动检查医疗图像,如X光和CT扫描图像。这个过程浪费时间,同时也存在将图像传输到中央云计算服务器的logistical复杂性。此外,图像分析的速度和准确率对医疗图像管理是非常重要的。本研究论文提出了一种革命性的医疗架构,通过人工智能(AI)技术解决了分析效率和准确率的挑战。具体来说,该架构利用了fog computing技术,并提出了一种特殊的卷积神经网络(CNN),用于图像分析。不同的CNN层的架构被全面探讨和评估,以便优化总性性能。为证明提出的方法的效果,本文使用了一个X光图像集进行分析和评估。与之比较的是,VGG16、VGG19、MobileNet等现有模型和相关研究论文。结果表明,提出的方法在分类正常情况时 achieved an exceptional accuracy rate of 99.88%,并且 validation rate为96.5%,准确率和回归率均为100%,F1分数也为100%。这些结果表明,fog computing和特殊的CNN可以在医疗图像分析和诊断方面发挥革命性的作用,不仅在疫情期间,而且在未来也会发挥重要作用。通过利用这些技术,医疗专业人员可以提高医疗图像分析的效率和准确率,从而提高患者的病情和结果。

Generative Input: Towards Next-Generation Input Methods Paradigm

  • paper_url: http://arxiv.org/abs/2311.01166
  • repo_url: None
  • paper_authors: Keyu Ding, Yongcan Wang, Zihang Xu, Zhenzhen Jia, Shijin Wang, Cong Liu, Enhong Chen
  • for: 这个论文旨在探讨如何使用生成模型提高中文输入法的性能。
  • methods: 该论文提出了一种新的生成输入模式,名为生成输入模式(GeneInput),它使用提示来处理所有输入场景,并使用用户反馈来优化模型并提供个性化结果。
  • results: 研究结果显示,GeneInput在全模式键序列到字符(FK2C)任务中首次实现了国际级表现,并且提出了一种新的奖励模型训练方法,可以消除额外的手动注释和表现超越GPT-4在智能关联和对话协助等任务中。相比传统模式,GeneInput不仅表现出了更高的性能,还展现出了更好的抗衡性、扩展性和在线学习能力。
    Abstract Since the release of ChatGPT, generative models have achieved tremendous success and become the de facto approach for various NLP tasks. However, its application in the field of input methods remains under-explored. Many neural network approaches have been applied to the construction of Chinese input method engines(IMEs).Previous research often assumed that the input pinyin was correct and focused on Pinyin-to-character(P2C) task, which significantly falls short of meeting users' demands. Moreover, previous research could not leverage user feedback to optimize the model and provide personalized results. In this study, we propose a novel Generative Input paradigm named GeneInput. It uses prompts to handle all input scenarios and other intelligent auxiliary input functions, optimizing the model with user feedback to deliver personalized results. The results demonstrate that we have achieved state-of-the-art performance for the first time in the Full-mode Key-sequence to Characters(FK2C) task. We propose a novel reward model training method that eliminates the need for additional manual annotations and the performance surpasses GPT-4 in tasks involving intelligent association and conversational assistance. Compared to traditional paradigms, GeneInput not only demonstrates superior performance but also exhibits enhanced robustness, scalability, and online learning capabilities.
    摘要 desde el lanzamiento de ChatGPT, los modelos generativos han logrado un gran éxito y se han convertido en el enfoque por defecto para diversas tareas de procesamiento de lenguaje natural. Sin embargo, su aplicación en el campo de los métodos de entrada aún se ha explorado insuficientemente. Muchas aproximaciones basadas en redes neuronales se han aplicado en la construcción de motores de entrada de caracteres chinos(IMEs). La investigación previa suponía que la entrada de pinyin era correcta y se centró en la tarea de Pinyin-to-Character(P2C), lo que significativamente se aparta de las demandas de los usuarios. Además, la investigación anterior no podía aproveitar la retroalimentación del usuario para optimizar el modelo y proporcionar resultados personalizados. En este estudio, propusimos un paradigma de entrada generativa llamado GeneInput. Utiliza prompts para manejar todas las escenas de entrada y otras funciones de entrada inteligente auxiliar, optimizando el modelo con la retroalimentación del usuario para entregar resultados personalizados. Los resultados demuestran que hemos logrado el rendimiento estado-de-arte por primera vez en la tarea de Full-mode Key-sequence to Characters(FK2C). Proponemos un método de entrenamiento de modelo de reward que elimina la necesidad de anotaciones manuales adicionales y el rendimiento supera a GPT-4 en tareas involucradas en asociación y asistencia conversacional. En comparación con los enfoques tradicionales, GeneInput no solo demuestra un rendimiento superior, sino también exhibe mayor robustez, escalabilidad y capacidades de aprendizaje en línea.

Weakly Supervised Semantic Parsing with Execution-based Spurious Program Filtering

  • paper_url: http://arxiv.org/abs/2311.01161
  • repo_url: None
  • paper_authors: Kang-il Lee, Segwang Kim, Kyomin Jung
  • for: 本研究旨在推断SemanticParser训练 FROM weak supervision中的假计划问题。
  • methods: 我们提议一种基于程序执行结果的领域无关筛选机制,具体来说,对每个通过搜索获得的程序,我们首先构建一个捕捉程序 semantics的表示,然后对这些表示进行多数投票,以识别并过滤有显著不同Semantics的程序。
  • results: 我们的方法可以轻松地与现有的weakly supervised SemanticParser Frameworks堆叠,并在Natural Language Visual Reasoning和WikiTableQuestions上进行了实验,发现将我们的方法应用于现有的SemanticParserinduces significantly improved performances。
    Abstract The problem of spurious programs is a longstanding challenge when training a semantic parser from weak supervision. To eliminate such programs that have wrong semantics but correct denotation, existing methods focus on exploiting similarities between examples based on domain-specific knowledge. In this paper, we propose a domain-agnostic filtering mechanism based on program execution results. Specifically, for each program obtained through the search process, we first construct a representation that captures the program's semantics as execution results under various inputs. Then, we run a majority vote on these representations to identify and filter out programs with significantly different semantics from the other programs. In particular, our method is orthogonal to the program search process so that it can easily augment any of the existing weakly supervised semantic parsing frameworks. Empirical evaluations on the Natural Language Visual Reasoning and WikiTableQuestions demonstrate that applying our method to the existing semantic parsers induces significantly improved performances.
    摘要 “伪函数问题”是强度指导下训练 semantic parser 的长standing挑战。以往的方法则是利用领域专业知识来推导类似性,以删除具有错误semantics yet correct denotation 的程式。在这篇论文中,我们提出了一种领域共享 Filtering 机制,基于程式执行结果。具体来说,我们将每个通过搜索过程获得的程式转换为执行结果的表示,然后对这些表示进行多数决,以识别和删除与其他程式semantics 不同的程式。我们的方法与程式搜索过程不相互干扰,因此可以轻松地将其与现有的弱指导 semantic parsing 框架结合使用。实验评估在 Natural Language Visual Reasoning 和 WikiTableQuestions 上显示,将我们的方法应用到现有的 semantic parsers 可以导致明显改善的性能。

A Review of Digital Twins and their Application in Cybersecurity based on Artificial Intelligence

  • paper_url: http://arxiv.org/abs/2311.01154
  • repo_url: None
  • paper_authors: MohammadHossein Homaei, Oscar Mogollon Gutierrez, Jose Carlos Sancho Nunez, Mar Avila Vegas, Andres Caro Lindo
  • for: 本研究旨在探讨虚拟链技术在不同领域的应用和潜在问题,以及如何通过人工智能技术来提供数字各种领域的安全性。
  • methods: 本研究使用了许多不同的方法,包括文献综述、实践报告、采访调查等,以探讨虚拟链技术的应用和潜在问题。
  • results: 本研究发现了虚拟链技术在各种领域的应用和潜在问题,包括虚拟产品、虚拟服务、虚拟产业等。同时,该研究还发现了虚拟链技术的安全性问题,包括数据隐私和安全性问题。
    Abstract The potential of digital twin technology is yet to be fully realized due to its diversity and untapped potential. Digital twins enable systems' analysis, design, optimization, and evolution to be performed digitally or in conjunction with a cyber-physical approach to improve speed, accuracy, and efficiency over traditional engineering methods. Industry 4.0, factories of the future, and digital twins continue to benefit from the technology and provide enhanced efficiency within existing systems. Due to the lack of information and security standards associated with the transition to cyber digitization, cybercriminals have been able to take advantage of the situation. Access to a digital twin of a product or service is equivalent to threatening the entire collection. There is a robust interaction between digital twins and artificial intelligence tools, which leads to strong interaction between these technologies, so it can be used to improve the cybersecurity of these digital platforms based on their integration with these technologies. This study aims to investigate the role of artificial intelligence in providing cybersecurity for digital twin versions of various industries, as well as the risks associated with these versions. In addition, this research serves as a road map for researchers and others interested in cybersecurity and digital security.
    摘要 “数字双工程技术的潜力仍未得到完全实现,这主要归功于其多样性和未发掘的潜力。数字双工程技术可以在数字或融合物理方式下进行系统分析、设计、优化和演化,从而提高速度、准确性和效率,并且可以与工业4.0、未来的制造厂和数字双工程技术相结合,提高现有系统的效率。然而,由于数字化转型的缺乏信息和安全标准,黑客有机会利用这种情况。访问一个产品或服务的数字双版本等于对整个收藏的威胁。数字双工程技术和人工智能工具之间存在强烈的互动,因此可以通过这些技术的结合来提高数字平台的安全性。本研究旨在调查不同领务中数字双版本的人工智能在提供网络安全方面的作用,以及这些版本的风险。此外,这项研究还可 serve as a roadmap for researchers and others interested in cybersecurity and digital security.”Note: Please note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China. If you need Traditional Chinese, please let me know.

Revisiting the Knowledge Injection Frameworks

  • paper_url: http://arxiv.org/abs/2311.01150
  • repo_url: None
  • paper_authors: Peng Fu, Yiming Zhang, Haobo Wang, Weikang Qiu, Junbo Zhao
  • for: 这篇论文旨在解决如何使用外部知识来适应垂直领域特定任务,以提高大语言模型(LLM)的性能。
  • methods: 这篇论文使用了一种Alignment Heuristic的方法,通过将相关的知识元组注入到相应的文本样本中来实现外部知识的注入。然而, authors发现,Random Knowledge Injection(随机注入外部知识)可以达到类似或更好的结果,而不需要对知识元组进行对齐。
  • results: 作者们发现,采用Random Knowledge Injection可以超越现有的Alignment Heuristic,并且可以提高垂直领域特定任务中LLM的性能。此外, authors还提出了一种简单的修复方法,通过约束外部知识库的淘汰和纯化来解决这个问题。
    Abstract In recent years, large language models (LLMs), such as GPTs, have attained great impact worldwide. However, how to adapt these LLMs to better suit the vertical domain-specific tasks by utilizing external knowledge remains not completely solved. Indeed, there have emerged a few works on this line where most of them rely on an alignment heuristic that is built to inject the corresponding knowledge tuple into the associated text sample. However, despite the promise, we identify a pivotal problem in this work ubiquitously. Simply put, we find that injecting unaligned (i.e., random) knowledge tuple into the LLMs achieves comparable (and sometimes better) results than the aligned knowledge being injected. We therefore take a thorough investigation of this frustrating finding on a variety of related prior work and further provide a chain of potential interpretations for the phenomenon. Based on all that, we offer a simple remediated technique. Briefly, the core of this technique is rooted in an ideological emphasis on the pruning and purification of the external knowledge base to be injected into LLMs. At last, we show that by integrating this technique into most (if not all) knowledge injection frameworks and recent LLMs, it manages to overcome the aforementioned sanity problem and further pushes the boundary of the performance of the domain-adaptive LLMs.
    摘要

GREEMA: Proposal and Experimental Verification of Growing Robot by Eating Environmental MAterial for Landslide Disaster

  • paper_url: http://arxiv.org/abs/2311.01107
  • repo_url: None
  • paper_authors: Yusuke Tsunoda, Yuya Sato, Koichi Osuka
  • for: 这个研究是为了开发一种能够在无法人类进入的区域,如月面和滑坡现场,进行多个自动移动机械系统的替代。具体来说,在河道堵塞现场,需要为该地点移除水和泥土的任务。传统上,需要多部建筑机械来进行 цивіLENGINEERING工作,但由于这些机械的大小和重量,将它们运输到现场是具有巨大成本和时间的问题。
  • methods: 这个研究使用了一种名为GREEMA的新型生长机械,这是一种轻量级和压缩的 durante transportation,但可以在到达现场后使用环境材料来运作。GREEMA可以活动地吸收环境材料,如水和泥土,并将它们转换为自己的结构,并从自己运走。
  • results: 这个研究实验了两种GREEMA的类型。首先,我们开发了一种 fins-type swimming robot,这个机器人可以通过吸收水来实现游泳功能。其次,我们建立了一种 arm-type robot,这个机器人可以吃泥土来增加自己的韧性。我们对这两个实验的结果进行了Explicit-Implicit控制的探讨,并描述了GREEMA的设计理论。
    Abstract In areas that are inaccessible to humans, such as the lunar surface and landslide sites, there is a need for multiple autonomous mobile robot systems that can replace human workers. In particular, at landslide sites such as river channel blockages, robots are required to remove water and sediment from the site as soon as possible. Conventionally, several construction machines have been deployed to the site for civil engineering work. However, because of the large size and weight of conventional construction equipment, it is difficult to move multiple units of construction equipment to the site, resulting in significant transportation costs and time. To solve such problems, this study proposes a novel growing robot by eating environmental material called GREEMA, which is lightweight and compact during transportation, but can function by eating on environmental materials once it arrives at the site. GREEMA actively takes in environmental materials such as water and sediment, uses them as its structure, and removes them by moving itself. In this paper, we developed and experimentally verified two types of GREEMAs. First, we developed a fin-type swimming robot that passively takes water into its body using a water-absorbing polymer and forms a body to express its swimming function. Second, we constructed an arm-type robot that eats soil to increase the rigidity of its body. We discuss the results of these two experiments from the viewpoint of Explicit-Implicit control and describe the design theory of GREEMA.
    摘要 在人类无法进入的区域,如月面和滥覆现场,需要多个自主移动 робо辅助人工工作。特别是在河道堵塞现场,机器人需要尽快将水和淤泥从现场除去。 conventionally,数量多的建筑机械被派往现场进行土木工程。然而,由于传统的建筑机械庞大和重量,运输成本和时间均很高。为解决这些问题,本研究提出了一种新型增长机器人,即吃环境材料called GREEMA,它轻量级和压缩的交通时间。GREEMA在到达现场后通过吃环境材料来形成结构,并将其移除。在这篇论文中,我们开发并实验验证了两种GREEMA的类型。首先,我们开发了一种螺旋型游泳机器人,通过吸收水的水吸收聚合物来形成身体表现游泳功能。其次,我们建立了一种吃土的机器人,通过吃土来增加机器人的体硬度。我们从Explicit-Implicit控制的视角来讲述GREEMA的设计理论。

Ultra-Efficient On-Device Object Detection on AI-Integrated Smart Glasses with TinyissimoYOLO

  • paper_url: http://arxiv.org/abs/2311.01057
  • repo_url: None
  • paper_authors: Julian Moosmann, Pietro Bonazzi, Yawei Li, Sizhen Bian, Philipp Mayer, Luca Benini, Michele Magno
  • for: The paper is written for researchers and developers who are interested in integrating AI into smart glasses, specifically those who are looking to achieve prolonged continuous operation with limited battery capacity.
  • methods: The paper describes the design and implementation of tiny machine-learning algorithms that exploit novel low-power processors to enable energy- and latency-efficient object detection on smart glasses. The authors developed a family of novel tiny deep-learning models based on YOLO with sub-million parameters customized for microcontroller-based inference.
  • results: The paper reports that the proposed TinyissimoYOLO models achieve an inference latency of 17ms and energy consumption of 1.59mJ per inference, with acceptable detection accuracy. The end-to-end latency from image capturing to algorithm prediction is 56ms (equivalent to 18 fps), with a total power consumption of 62.9mW, which is equivalent to 9.3 hours of continuous run time on a 154mAh battery. These results outperform MCUNet (TinyNAS+TinyEngine), which achieves a simpler task (image classification) at just 7.3 fps per second.
    Abstract Smart glasses are rapidly gaining advanced functionality thanks to cutting-edge computing technologies, accelerated hardware architectures, and tiny AI algorithms. Integrating AI into smart glasses featuring a small form factor and limited battery capacity is still challenging when targeting full-day usage for a satisfactory user experience. This paper illustrates the design and implementation of tiny machine-learning algorithms exploiting novel low-power processors to enable prolonged continuous operation in smart glasses. We explore the energy- and latency-efficient of smart glasses in the case of real-time object detection. To this goal, we designed a smart glasses prototype as a research platform featuring two microcontrollers, including a novel milliwatt-power RISC-V parallel processor with a hardware accelerator for visual AI, and a Bluetooth low-power module for communication. The smart glasses integrate power cycling mechanisms, including image and audio sensing interfaces. Furthermore, we developed a family of novel tiny deep-learning models based on YOLO with sub-million parameters customized for microcontroller-based inference dubbed TinyissimoYOLO v1.3, v5, and v8, aiming at benchmarking object detection with smart glasses for energy and latency. Evaluations on the prototype of the smart glasses demonstrate TinyissimoYOLO's 17ms inference latency and 1.59mJ energy consumption per inference while ensuring acceptable detection accuracy. Further evaluation reveals an end-to-end latency from image capturing to the algorithm's prediction of 56ms or equivalently 18 fps, with a total power consumption of 62.9mW, equivalent to a 9.3 hours of continuous run time on a 154mAh battery. These results outperform MCUNet (TinyNAS+TinyEngine), which runs a simpler task (image classification) at just 7.3 fps per second.
    摘要 智能眼镜在技术上不断提高,感谢于高级计算技术、加速器硬件体系和小型AI算法。但是在将AI集成到智能眼镜中,具有小型化的形态和有限的电池容量仍然是一大挑战,以实现满意的用户体验。本文描述了在智能眼镜中实现小型机器学习算法的设计和实现,以提高智能眼镜的连续运行时间。我们开发了一款智能眼镜原型,其包括两个微控制器,包括一个新的低功耗RISC-V并行处理器和一个蓝牙低功耗模块。智能眼镜还包括图像和音频感知接口。此外,我们开发了一家小型深度学习模型基于YOLO,称为TinyissimoYOLO v1.3、v5和v8,以实现智能眼镜中对物体检测的能效评估。我们对智能眼镜原型进行评估,发现TinyissimoYOLO的推理延迟时间为17毫秒,电能消耗为1.59毫瓦,并保持了可接受的检测精度。进一步的评估表明,从图像捕获到算法预测的总时间为56毫秒(相当于18帧/秒),总电力消耗为62.9毫瓦,等于9.3小时的连续运行时间。这些结果超过了MCUNet(TinyNAS+TinyEngine),它在更简单的任务(图像分类)中只能达到7.3帧/秒。

Multi-dimensional data refining strategy for effective fine-tuning LLMs

  • paper_url: http://arxiv.org/abs/2311.01049
  • repo_url: None
  • paper_authors: Thanh Nguyen Ngoc, Quang Nhat Tran, Arthur Tang, Bao Nguyen, Thuy Nguyen, Thanh Pham
  • for: 本研究旨在提供大型语言模型精度调整的数据Foundation,但获得适合的数据仍然具有挑战性。
  • methods: 本研究使用了多维度的策略,包括利用英语语料集和开发自定义数据爬虫脚本,并利用生成AI工具来帮助。
  • results: 使用结果的 Vietnamese 语言模型在生成文章的任务中表现了良好的表现。研究提供了实践的解决方案和指导,对未来针对语言如 Vietnamese 的模型精度调整具有重要意义。I hope that helps! Let me know if you have any further questions.
    Abstract Data is a cornerstone for fine-tuning large language models, yet acquiring suitable data remains challenging. Challenges encompassed data scarcity, linguistic diversity, and domain-specific content. This paper presents lessons learned while crawling and refining data tailored for fine-tuning Vietnamese language models. Crafting such a dataset, while accounting for linguistic intricacies and striking a balance between inclusivity and accuracy, demands meticulous planning. Our paper presents a multidimensional strategy including leveraging existing datasets in the English language and developing customized data-crawling scripts with the assistance of generative AI tools. A fine-tuned LLM model for the Vietnamese language, which was produced using resultant datasets, demonstrated good performance while generating Vietnamese news articles from prompts. The study offers practical solutions and guidance for future fine-tuning models in languages like Vietnamese.
    摘要 数据是大语言模型精度调整的基estone,但获得适合的数据仍然是一大挑战。这些挑战包括数据稀缺、语言多样性和领域特定内容。本文介绍了在爬取和修剪适合精度调整越南语言模型的数据时所学到的经验。制作这类数据集需要仔细规划,考虑语言细节和兼顾准确性和包容性。我们的文章提出了多维度策略,包括利用英语语料库和开发自定义爬取脚本,并通过生成AI工具来帮助。通过使用结果数据集,我们生成的越南语言模型进行了良好的表现,从提示生成越南新闻文章。这种研究提供了实用的解决方案和指导,以便未来的语言模型精度调整。

AI-assisted Learning for Electronic Engineering Courses in High Education

  • paper_url: http://arxiv.org/abs/2311.01048
  • repo_url: None
  • paper_authors: Thanh Nguyen Ngoc, Quang Nhat Tran, Arthur Tang, Bao Nguyen, Thuy Nguyen, Thanh Pham
  • for: This paper is written to evaluate the effectiveness of ChatGPT as a teaching and learning support tool in an integrated circuit systems course at a higher education institution in an Asian country.
  • methods: The study uses various question types to assess ChatGPT’s responses and gain valuable insights for further investigation. The study also includes the evaluation and reflection of different stakeholders: students, lecturers, and engineers.
  • results: The findings of this study shed light on the benefits and limitations of ChatGPT as an AI tool, paving the way for innovative learning approaches in technical disciplines. The study contributes to our understanding of how digital transformation is likely to unfold in the education sector.Here are the three key points in Simplified Chinese text:
  • for: 这篇论文是为了评估聊天GPT在大学学习支持中的效果,specifically in an integrated circuit systems course at a higher education institution in an Asian country.
  • methods: 这篇论文使用了多种问题类型来评估聊天GPT的回答,并通过不同参与者的评估和反思(包括学生、讲师和工程师)来获得有价值的发现。
  • results: 这篇论文的发现探讨了聊天GPT作为AI工具的优缺点,为技术领域的学习方法做出了贡献,并为教育领域的数字变革做出了贡献。
    Abstract This study evaluates the efficacy of ChatGPT as an AI teaching and learning support tool in an integrated circuit systems course at a higher education institution in an Asian country. Various question types were completed, and ChatGPT responses were assessed to gain valuable insights for further investigation. The objective is to assess ChatGPT's ability to provide insights, personalized support, and interactive learning experiences in engineering education. The study includes the evaluation and reflection of different stakeholders: students, lecturers, and engineers. The findings of this study shed light on the benefits and limitations of ChatGPT as an AI tool, paving the way for innovative learning approaches in technical disciplines. Furthermore, the study contributes to our understanding of how digital transformation is likely to unfold in the education sector.
    摘要 这项研究评估了 chatGPT 在大学技术课程中作为人工智能教学支持工具的效果。在一个亚洲国家的高等教育机构中,学生、讲师和工程师参与了多种问题的回答,以获得有价值的发现和反思。研究的目的是评估 chatGPT 是否能提供个性化支持、互动式学习体验和工程教育中的洞察。这项研究还包括不同参与者的评估和反思:学生、讲师和工程师。研究结果为我们提供了 chatGPT 作为人工智能工具的优缺点,并为我们更好地理解技术领域教育领域的数字变革。

A Survey of Large Language Models for Autonomous Driving

  • paper_url: http://arxiv.org/abs/2311.01043
  • repo_url: https://github.com/thinklab-sjtu/awesome-llm4ad
  • paper_authors: Zhenjie Yang, Xiaosong Jia, Hongyang Li, Junchi Yan
  • for: 本研究旨在探讨大语言模型(LLM)在自动驾驶技术中的应用,以提高自动驾驶系统的可解释性和可Traceability。
  • methods: 本研究使用了大语言模型(LLM)与基础视觉模型的组合,以实现开放世界理解、逻辑推理和几步学习等功能。
  • results: 本研究系统性地回顾了现有的LLM4AD技术发展,并特别强调了当前技术的挑战和未来研究的方向。同时,我们还提供了实时更新的最新进展和相关开源资源,以便学术和工业研究人员快速入手。
    Abstract Autonomous driving technology, a catalyst for revolutionizing transportation and urban mobility, has the tend to transition from rule-based systems to data-driven strategies. Traditional module-based systems are constrained by cumulative errors among cascaded modules and inflexible pre-set rules. In contrast, end-to-end autonomous driving systems have the potential to avoid error accumulation due to their fully data-driven training process, although they often lack transparency due to their ``black box" nature, complicating the validation and traceability of decisions. Recently, large language models (LLMs) have demonstrated abilities including understanding context, logical reasoning, and generating answers. A natural thought is to utilize these abilities to empower autonomous driving. By combining LLM with foundation vision models, it could open the door to open-world understanding, reasoning, and few-shot learning, which current autonomous driving systems are lacking. In this paper, we systematically review a research line about \textit{Large Language Models for Autonomous Driving (LLM4AD)}. This study evaluates the current state of technological advancements, distinctly outlining the principal challenges and prospective directions for the field. For the convenience of researchers in academia and industry, we provide real-time updates on the latest advances in the field as well as relevant open-source resources via the designated link: https://github.com/Thinklab-SJTU/Awesome-LLM4AD.
    摘要 自主驾驶技术,一种可以革新交通和城市流动的技术,正在从规则基于系统向数据驱动策略过渡。传统的模块化系统受到累加误差的限制,以及硬性预先设置的规则。相比之下,端到端自主驾驶系统具有避免误差累加的潜力,尽管它们常常lack transparency due to their "black box" nature,复杂化决策的验证和跟踪。现在,大型语言模型(LLM)已经展示了理解上下文、逻辑推理和生成答案的能力。一种自然的想法是利用这些能力来 empower autonomous driving。将 LLM 与基础视觉模型结合,可以开启开放世界理解、逻辑推理和几招学习,现在的自主驾驶系统缺乏。在这篇论文中,我们系统地回顾了关于《大型语言模型 для自主驾驶(LLM4AD)》的研究线。这项研究评估了当前技术前进的状况,明确地描述了主要挑战和未来方向。为研究人员在学术和industry中方便,我们提供了实时更新的最新进展和相关开源资源,via the designated link: .

Learn to Refuse: Making Large Language Models More Controllable and Reliable through Knowledge Scope Limitation and Refusal Mechanism

  • paper_url: http://arxiv.org/abs/2311.01041
  • repo_url: https://github.com/windszzlang/Learn-to-Refuse
  • paper_authors: Lang Cao
    for: 这篇论文的目的是如何使用拒绝机制来减少语言模型中的幻视(hallucination),尤其是在问答中。methods: 本文使用了一个简单 yet 有效的解决方案called Learn to Refuse (L2R),它将拒绝机制与语言模型相结合,让语言模型可以识别和拒绝处理难以回答的问题。此外,本文还提出了一种自动和高效地扩展语言模型知识库的方法。results: 根据质量和量度分析,我们展示了我们的方法可以提高语言模型的可控性和可靠性。
    Abstract Large language models (LLMs) have demonstrated impressive language understanding and generation capabilities, enabling them to answer a wide range of questions across various domains. However, these models are not flawless and often produce responses that contain errors or misinformation. These inaccuracies, commonly referred to as hallucinations, render LLMs unreliable and even unusable in many scenarios. In this paper, our focus is on mitigating the issue of hallucination in LLMs, particularly in the context of question-answering. Instead of attempting to answer all questions, we explore a refusal mechanism that instructs LLMs to refuse to answer challenging questions in order to avoid errors. We then propose a simple yet effective solution called Learn to Refuse (L2R), which incorporates the refusal mechanism to enable LLMs to recognize and refuse to answer questions that they find difficult to address. To achieve this, we utilize a structured knowledge base to represent all the LLM's understanding of the world, enabling it to provide traceable gold knowledge. This knowledge base is separate from the LLM and initially empty, and it is progressively expanded with validated knowledge. When an LLM encounters questions outside its domain, the system recognizes its knowledge scope and determines whether it can answer the question independently. Additionally, we introduce a method for automatically and efficiently expanding the knowledge base of LLMs. Through qualitative and quantitative analysis, we demonstrate that our approach enhances the controllability and reliability of LLMs.
    摘要 大型语言模型(LLM)有示出了卓越的语言理解和生成能力,能够回答各种领域的问题。然而,这些模型并不完美,常会产生错误或不实的回答,称为“幻视”。这些幻视使得 LLM 成为不可靠和无法使用的。在这篇文章中,我们专注于对 LLM 中的幻视问题进行缓和,尤其是在问答领域。而不是尝试回答所有问题,我们探索了一种拒绝机制,让 LLM 当面困难的问题时拒绝回答,以避免错误。我们then propose a simple yet effective solution called Learn to Refuse (L2R), which incorporates the refusal mechanism to enable LLMs to recognize and refuse to answer questions that they find difficult to address. To achieve this, we utilize a structured knowledge base to represent all the LLM's understanding of the world, enabling it to provide traceable gold knowledge. This knowledge base is separate from the LLM and initially empty, and it is progressively expanded with validated knowledge. When an LLM encounters questions outside its domain, the system recognizes its knowledge scope and determines whether it can answer the question independently. Additionally, we introduce a method for automatically and efficiently expanding the knowledge base of LLMs. Through qualitative and quantitative analysis, we demonstrate that our approach enhances the controllability and reliability of LLMs.

ATHENA: Mathematical Reasoning with Thought Expansion

  • paper_url: http://arxiv.org/abs/2311.01036
  • repo_url: https://github.com/the-jb/athena-math
  • paper_authors: JB. Kim, Hazel Kim, Joonghyuk Hahn, Yo-Sub Han
  • for: 解决实际 math 问题需要如何表述问题,模型如何理解人类语言表达。
  • methods: 我们介绍了 Attention-based THought Expansion Network Architecture (ATHENA),它模仿人类思维扩展机制,通过神经网络传播来解决实际实践中的挑战。
  • results: 我们的实验显示,ATHENA可以达到新的state-of-the-art Water准,在 variant 问题中表现出色,即使训练示例的信息含量有限。
    Abstract Solving math word problems depends on how to articulate the problems, the lens through which models view human linguistic expressions. Real-world settings count on such a method even more due to the diverse practices of the same mathematical operations. Earlier works constrain available thinking processes by limited prediction strategies without considering their significance in acquiring mathematical knowledge. We introduce Attention-based THought Expansion Network Architecture (ATHENA) to tackle the challenges of real-world practices by mimicking human thought expansion mechanisms in the form of neural network propagation. A thought expansion recurrently generates the candidates carrying the thoughts of possible math expressions driven from the previous step and yields reasonable thoughts by selecting the valid pathways to the goal. Our experiments show that ATHENA achieves a new state-of-the-art stage toward the ideal model that is compelling in variant questions even when the informativeness in training examples is restricted.
    摘要 解决数学word问题取决于如何表达问题,模型通过人类语言表达的镜像来看待人类语言表达。现实世界中的各种实践更加依赖这种方法,因为这些操作的方式异常多样。先前的工作压缩了可用的思维过程,未能考虑这些知识获得的重要性。我们介绍了注意力基于的思维扩展网络架构(ATHENA),以模拟人类思维扩展机制,通过神经网络传播来解决现实世界中的挑战。一个思维扩展循环产生可能会携带思维的数学表达的候选者,通过选择前一步的有效路径来得到合理的思维。我们的实验表明,ATHENA已经达到了新的状态级模型,在变化的问题中具有吸引力,即使在受限的训练示例中也能达到优秀的表现。

Non-Autoregressive Diffusion-based Temporal Point Processes for Continuous-Time Long-Term Event Prediction

  • paper_url: http://arxiv.org/abs/2311.01033
  • repo_url: None
  • paper_authors: Wang-Tao Zhou, Zhao Kang, Ling Tian
  • for: 预测长期事件序列
  • methods: 基于扩散过程的非自回归模型
  • results: 比基于当前状态最佳的方法提高了预测质量
    Abstract Continuous-time long-term event prediction plays an important role in many application scenarios. Most existing works rely on autoregressive frameworks to predict event sequences, which suffer from error accumulation, thus compromising prediction quality. Inspired by the success of denoising diffusion probabilistic models, we propose a diffusion-based non-autoregressive temporal point process model for long-term event prediction in continuous time. Instead of generating events one at a time in an autoregressive way, our model predicts the future event sequence entirely as a whole. In order to perform diffusion processes on event sequences, we develop a bidirectional map between target event sequences and the Euclidean vector space. Furthermore, we design a novel denoising network to capture both sequential and contextual features for better sample quality. Extensive experiments are conducted to prove the superiority of our proposed model over state-of-the-art methods on long-term event prediction in continuous time. To the best of our knowledge, this is the first work to apply diffusion methods to long-term event prediction problems.
    摘要 continuous-time long-term event prediction在许多应用场景中扮演着重要的角色。现有大多数工作都是基于autoregressive框架进行预测,这会导致预测误差积累,从而降低预测质量。我们受到了denoising diffusion probabilistic models的成功 inspiration,提出了一种基于diffusion的非autoregressive时间点进程模型 для长期事件预测。而不是一个一个事件进行autoregressive预测,我们的模型会预测未来事件序列的整体。为了在事件序列上进行diffusion过程,我们开发了一种双向映射 между目标事件序列和几何空间的Euclidean vector。此外,我们还设计了一种novel的denoising网络,以捕捉事件序列中的sequential和contextual特征,以提高样本质量。我们进行了广泛的实验,证明了我们提出的模型在长期事件预测任务中的优越性,并且这是首次应用diffusion方法于长期事件预测问题。

Joint Learning of Local and Global Features for Aspect-based Sentiment Classification

  • paper_url: http://arxiv.org/abs/2311.01030
  • repo_url: None
  • paper_authors: Hao Niu, Yun Xiong, Xiaosu Wang, Philip S. Yu
  • for: 本文主要针对 aspect-based sentiment classification (ASC) 问题,即根据给定的方面词语判断句子中的 sentiment polarity。
  • methods: 本文提出了一种基于 local 和 global 特征的模型,包括 Gaussian 层和 covariance self-attention 层,以及一种 dual-level graph attention 网络。这些方法可以强制地模型 local 和 global 信息,从而更好地解决 ASC 问题。
  • results: 本文在 SemEval 2014 和 Twitter datasets 上 achieved state-of-the-art 性能。
    Abstract Aspect-based sentiment classification (ASC) aims to judge the sentiment polarity conveyed by the given aspect term in a sentence. The sentiment polarity is not only determined by the local context but also related to the words far away from the given aspect term. Most recent efforts related to the attention-based models can not sufficiently distinguish which words they should pay more attention to in some cases. Meanwhile, graph-based models are coming into ASC to encode syntactic dependency tree information. But these models do not fully leverage syntactic dependency trees as they neglect to incorporate dependency relation tag information into representation learning effectively. In this paper, we address these problems by effectively modeling the local and global features. Firstly, we design a local encoder containing: a Gaussian mask layer and a covariance self-attention layer. The Gaussian mask layer tends to adjust the receptive field around aspect terms adaptively to deemphasize the effects of unrelated words and pay more attention to local information. The covariance self-attention layer can distinguish the attention weights of different words more obviously. Furthermore, we propose a dual-level graph attention network as a global encoder by fully employing dependency tag information to capture long-distance information effectively. Our model achieves state-of-the-art performance on both SemEval 2014 and Twitter datasets.
    摘要 In this paper, we address these problems by effectively modeling local and global features. First, we design a local encoder containing:1. Gaussian mask layer: 可以 adaptively adjust the receptive field around aspect terms to deemphasize the effects of unrelated words and pay more attention to local information.2. Covariance self-attention layer: can distinguish the attention weights of different words more obviously.Furthermore, we propose a dual-level graph attention network as a global encoder, fully employing dependency tag information to capture long-distance information effectively. Our model achieves state-of-the-art performance on both SemEval 2014 and Twitter datasets.

Distance-Based Propagation for Efficient Knowledge Graph Reasoning

  • paper_url: http://arxiv.org/abs/2311.01024
  • repo_url: https://github.com/harryshomer/tagnet
  • paper_authors: Harry Shomer, Yao Ma, Juanhui Li, Bo Wu, Charu C. Aggarwal, Jiliang Tang
  • for: 这个论文的目的是解决知识 graphs(KGs)中的新的边预测问题,以便发现新的事实。
  • methods: 这些方法使用路径信息的汇集来解决这个问题,但它们受到效率问题的困扰。虽有一些最近的尝试通过学习路径剪辑来解决这个问题,但它们通常会牺牲性能来换取效率。
  • results: 本文提出了一种新的方法TAGNet,可以高效地传播信息。这是通过只在每个源-目标对的固定窗口内汇集路径来实现的。我们示出了TAGNet的复杂性与层数无关。经验表明,TAGNet可以在多个KG数据集上剪枝90%的消息,同时保持与其他方法的竞争性。代码可以在https://github.com/HarryShomer/TAGNet上获取。
    Abstract Knowledge graph completion (KGC) aims to predict unseen edges in knowledge graphs (KGs), resulting in the discovery of new facts. A new class of methods have been proposed to tackle this problem by aggregating path information. These methods have shown tremendous ability in the task of KGC. However they are plagued by efficiency issues. Though there are a few recent attempts to address this through learnable path pruning, they often sacrifice the performance to gain efficiency. In this work, we identify two intrinsic limitations of these methods that affect the efficiency and representation quality. To address the limitations, we introduce a new method, TAGNet, which is able to efficiently propagate information. This is achieved by only aggregating paths in a fixed window for each source-target pair. We demonstrate that the complexity of TAGNet is independent of the number of layers. Extensive experiments demonstrate that TAGNet can cut down on the number of propagated messages by as much as 90% while achieving competitive performance on multiple KG datasets. The code is available at https://github.com/HarryShomer/TAGNet.
    摘要 知识图完成(KGC)目标是预测知识图(KG)中未被观测到的边,从而发现新的事实。一些新的方法已经被提出来解决这个问题,它们通过聚合路径信息来实现。这些方法在KGC任务中表现出了惊人的能力,但它们受到效率问题的困扰。虽然有一些最近的尝试通过学习路径剪辑来解决这个问题,但它们经常牺牲性能来获得效率。在这种情况下,我们发现了两种知识图完成方法的内在限制,它们影响了效率和表示质量。为了解决这些限制,我们提出了一种新的方法,TAGNet,它可以有效地传播信息。这是通过在每个源-目标对的固定窗口内聚合路径来实现的。我们证明TAGNet的复杂度独立于层数。广泛的实验表明,TAGNet可以将传播的消息数量减少为90%,同时在多个知识图 datasets 上实现竞争性的性能。代码可以在 上获取。

Augmentation is AUtO-Net: Augmentation-Driven Contrastive Multiview Learning for Medical Image Segmentation

  • paper_url: http://arxiv.org/abs/2311.01023
  • repo_url: None
  • paper_authors: Yanming Guo
  • for: 这篇论文的目的是对医疗影像诊断中使用深度学习分类算法来提高视觉能力,特别是针对视网膜血管分类任务。
  • methods: 这篇论文使用了深度学习分类算法,包括多观察者学习框架和混合网络架构,以及注意力机制来捕捉视网膜血管的复杂构造。
  • results: 这篇论文使用CHASE-DB1资料集进行验证,其中提出了83.46%的F1分数和71.62%的交集顶点分数(IOU),两者都高于现有的参考方法。此外,这篇论文还指出了现有方法的两个主要限制,即数据量紧张和高计算资源的依赖。
    Abstract The utilisation of deep learning segmentation algorithms that learn complex organs and tissue patterns and extract essential regions of interest from the noisy background to improve the visual ability for medical image diagnosis has achieved impressive results in Medical Image Computing (MIC). This thesis focuses on retinal blood vessel segmentation tasks, providing an extensive literature review of deep learning-based medical image segmentation approaches while comparing the methodologies and empirical performances. The work also examines the limitations of current state-of-the-art methods by pointing out the two significant existing limitations: data size constraints and the dependency on high computational resources. To address such problems, this work proposes a novel efficient, simple multiview learning framework that contrastively learns invariant vessel feature representation by comparing with multiple augmented views by various transformations to overcome data shortage and improve generalisation ability. Moreover, the hybrid network architecture integrates the attention mechanism into a Convolutional Neural Network to further capture complex continuous curvilinear vessel structures. The result demonstrates the proposed method validated on the CHASE-DB1 dataset, attaining the highest F1 score of 83.46% and the highest Intersection over Union (IOU) score of 71.62% with UNet structure, surpassing existing benchmark UNet-based methods by 1.95% and 2.8%, respectively. The combination of the metrics indicates the model detects the vessel object accurately with a highly coincidental location with the ground truth. Moreover, the proposed approach could be trained within 30 minutes by consuming less than 3 GB GPU RAM, and such characteristics support the efficient implementation for real-world applications and deployments.
    摘要 utilization of deep learning segmentation algorithms that learn complex organs and tissue patterns and extract essential regions of interest from the noisy background to improve the visual ability for medical image diagnosis has achieved impressive results in Medical Image Computing (MIC). This thesis focuses on retinal blood vessel segmentation tasks, providing an extensive literature review of deep learning-based medical image segmentation approaches while comparing the methodologies and empirical performances. The work also examines the limitations of current state-of-the-art methods by pointing out the two significant existing limitations: data size constraints and the dependency on high computational resources. To address such problems, this work proposes a novel efficient, simple multiview learning framework that contrastively learns invariant vessel feature representation by comparing with multiple augmented views by various transformations to overcome data shortage and improve generalisation ability. Moreover, the hybrid network architecture integrates the attention mechanism into a Convolutional Neural Network to further capture complex continuous curvilinear vessel structures. The result demonstrates the proposed method validated on the CHASE-DB1 dataset, attaining the highest F1 score of 83.46% and the highest Intersection over Union (IOU) score of 71.62% with UNet structure, surpassing existing benchmark UNet-based methods by 1.95% and 2.8%, respectively. The combination of the metrics indicates the model detects the vessel object accurately with a highly coincidental location with the ground truth. Moreover, the proposed approach could be trained within 30 minutes by consuming less than 3 GB GPU RAM, and such characteristics support the efficient implementation for real-world applications and deployments.

NeuroWrite: Predictive Handwritten Digit Classification using Deep Neural Networks

  • paper_url: http://arxiv.org/abs/2311.01022
  • repo_url: None
  • paper_authors: Kottakota Asish, P. Sarath Teja, R. Kishan Chander, Dr. D. Deva Hema
  • for: 这篇文章是为了探讨一种基于深度神经网络的手写数字识别方法,即NeuroWrite。
  • methods: 这篇文章使用了对于手写数字识别的构建方法,包括对于手写数字的资料准备、网络设计和训练方法。文章还使用了现代技术,例如卷积神经网络(CNNs)和回传神经网络(RNNs),以提高模型的准确性和适用性。
  • results: 根据文章的结果,NeuroWrite模型在识别和分类手写数字方面表现出色,具有高的准确性和适用性。文章还评估了NeuroWrite模型在实际应用中的性能,包括手写数字文档中的数字识别、签名验证和自动邮政区识别等。
    Abstract The rapid evolution of deep neural networks has revolutionized the field of machine learning, enabling remarkable advancements in various domains. In this article, we introduce NeuroWrite, a unique method for predicting the categorization of handwritten digits using deep neural networks. Our model exhibits outstanding accuracy in identifying and categorising handwritten digits by utilising the strength of convolutional neural networks (CNNs) and recurrent neural networks (RNNs).In this article, we give a thorough examination of the data preparation methods, network design, and training methods used in NeuroWrite. By implementing state-of-the-art techniques, we showcase how NeuroWrite can achieve high classification accuracy and robust generalization on handwritten digit datasets, such as MNIST. Furthermore, we explore the model's potential for real-world applications, including digit recognition in digitized documents, signature verification, and automated postal code recognition. NeuroWrite is a useful tool for computer vision and pattern recognition because of its performance and adaptability.The architecture, training procedure, and evaluation metrics of NeuroWrite are covered in detail in this study, illustrating how it can improve a number of applications that call for handwritten digit classification. The outcomes show that NeuroWrite is a promising method for raising the bar for deep neural network-based handwritten digit recognition.
    摘要 “深度神经网络的快速演化已经革命化了机器学习领域,使得各种领域得到了无前例的进步。在这篇文章中,我们介绍了一种叫做NeuroWrite的手写数字预测方法,使用深度神经网络(CNNs)和循环神经网络(RNNs)来预测手写数字的分类。我们在这篇文章中对NeuroWrite的数据准备方法、网络设计和训练方法进行了详细的介绍,并通过应用现代技术,证明了NeuroWrite在手写数字 dataset(如MNIST)上的高分类精度和robust适应能力。此外,我们还探讨了NeuroWrite在实际应用中的潜在应用,包括手写数字在扫描文档中的识别、电子签名验证和自动化邮政编码识别。NeuroWrite因其性能和适应性而成为计算机视觉和Pattern recognition中的有用工具。本文中还详细介绍了NeuroWrite的架构、训练过程和评价指标, ilustrating its potential for improving a wide range of applications that require handwritten digit classification.结果表明,NeuroWrite是一种有前途的方法,可以提高深度神经网络基于手写数字的识别水平。”

Learning Unsupervised World Models for Autonomous Driving via Discrete Diffusion

  • paper_url: http://arxiv.org/abs/2311.01017
  • repo_url: None
  • paper_authors: Lunjun Zhang, Yuwen Xiong, Ze Yang, Sergio Casas, Rui Hu, Raquel Urtasun
  • for: This paper aims to improve the efficiency and effectiveness of world modeling for robotic applications such as autonomous driving.
  • methods: The proposed approach uses a novel combination of VQVAE and discrete diffusion to tokenize and predict the future of sensor observations.
  • results: The proposed method achieves significant improvements in reducing prior SOTA Chamfer distance for 1s and 3s predictions on three datasets (NuScenes, KITTI Odometry, and Argoverse2). Specifically, it reduces the Chamfer distance by more than 65% for 1s predictions and more than 50% for 3s predictions.
    Abstract Learning world models can teach an agent how the world works in an unsupervised manner. Even though it can be viewed as a special case of sequence modeling, progress for scaling world models on robotic applications such as autonomous driving has been somewhat less rapid than scaling language models with Generative Pre-trained Transformers (GPT). We identify two reasons as major bottlenecks: dealing with complex and unstructured observation space, and having a scalable generative model. Consequently, we propose a novel world modeling approach that first tokenizes sensor observations with VQVAE, then predicts the future via discrete diffusion. To efficiently decode and denoise tokens in parallel, we recast Masked Generative Image Transformer into the discrete diffusion framework with a few simple changes, resulting in notable improvement. When applied to learning world models on point cloud observations, our model reduces prior SOTA Chamfer distance by more than 65% for 1s prediction, and more than 50% for 3s prediction, across NuScenes, KITTI Odometry, and Argoverse2 datasets. Our results demonstrate that discrete diffusion on tokenized agent experience can unlock the power of GPT-like unsupervised learning for robotic agents.
    摘要 学习世界模型可以教导一个机器人如何在无监督的方式下理解世界。尽管它可以视为语言模型的特殊情况,但对于机器人应用程序如自动驾驶而言,进展 slower than language models with Generative Pre-trained Transformers (GPT)。我们认为这有两个主要瓶颈:处理复杂和不结构化的感知空间,以及拥有可扩展的生成模型。因此,我们提出了一种新的世界模型方法,即首先使用VQVAE卷积编码感知数据,然后预测未来via粒子扩散。为了高效地解码和减噪token,我们将Masked Generative Image Transformer重新定义为粒子扩散框架中,并对其进行一些简单的改进,从而实现了明显的改善。当应用于学习世界模型的点云观测数据时,我们的模型可以在NuScenes、KITTI Odometry和Argoverse2 datasets上降低先前的SOTA Chamfer距离,即在1秒预测中降低65%以上,在3秒预测中降低50%以上。我们的结果表明,在Tokenized Agent Experience上应用粒子扩散可以解锁GPT-like无监督学习的能力。

Revamping AI Models in Dermatology: Overcoming Critical Challenges for Enhanced Skin Lesion Diagnosis

  • paper_url: http://arxiv.org/abs/2311.01009
  • repo_url: None
  • paper_authors: Deval Mehta, Brigid Betz-Stablein, Toan D Nguyen, Yaniv Gal, Adrian Bowling, Martin Haskett, Maithili Sashindranath, Paul Bonnington, Victoria Mar, H Peter Soyer, Zongyuan Ge
  • for: 针对皮肤病变的诊断图像分析领域的深度学习模型的开发呈现了明显的增长趋势,然而这些模型在临床实践中受到一些挑战。现有的皮肤科AI模型具有一些局限性,如有限的诊断输出数量、对不常见皮肤病变的测试不充分、无法检测不符合分布图像等。
  • methods: 我们提出了一种全面的Hierarchical-\textbf{O}ut of Distribution-\textbf{C}linical Triage(HOT)模型,用于诊断皮肤病变。该模型对一个临床图像进行三种输出:层次预测、对不符合分布图像发出警告,以及如果临床图像alone不充分进行诊断,则建议使用德维斯科术图像。当建议被追究时,我们的模型将临床和德维斯科术图像集成,以实现最终诊断。
  • results: 我们在一个代表性的皮肤病变数据集上进行了广泛的实验,并证明了我们的框架中每个组件的有效性和互补性。我们的多功能模型为皮肤病变诊断提供了有价值的决策支持,并为医学AI应用领域设置了一个可喜的先例。
    Abstract The surge in developing deep learning models for diagnosing skin lesions through image analysis is notable, yet their clinical black faces challenges. Current dermatology AI models have limitations: limited number of possible diagnostic outputs, lack of real-world testing on uncommon skin lesions, inability to detect out-of-distribution images, and over-reliance on dermoscopic images. To address these, we present an All-In-One \textbf{H}ierarchical-\textbf{O}ut of Distribution-\textbf{C}linical Triage (HOT) model. For a clinical image, our model generates three outputs: a hierarchical prediction, an alert for out-of-distribution images, and a recommendation for dermoscopy if clinical image alone is insufficient for diagnosis. When the recommendation is pursued, it integrates both clinical and dermoscopic images to deliver final diagnosis. Extensive experiments on a representative cutaneous lesion dataset demonstrate the effectiveness and synergy of each component within our framework. Our versatile model provides valuable decision support for lesion diagnosis and sets a promising precedent for medical AI applications.
    摘要 开发深度学习模型用于诊断皮肤病变的图像分析已经很流行,但它们在临床面临挑战。目前的皮肤科AI模型有一些局限性,包括有限的诊断输出数量、lack of real-world testing on rare skin lesions、无法检测非标量图像和过度依赖于皮肤镜像。为了解决这些问题,我们提出了一种All-In-One层次-\out of Distribution-\临床排序(HOT)模型。对于临床图像,我们的模型可以生成三种输出:层次预测、 alert for non-standard images 和皮肤镜像建议。当建议被追究时,它将 integrate both clinical and dermoscopic images to deliver final diagnosis。我们在一个代表性的皮肤病变数据集上进行了广泛的实验,并证明了我们的框架的有效性和各Component的相互作用。我们的多功能模型可以为皮肤病变诊断提供有价值的决策支持,并为医疗AI应用领域设置了一个可能的先例。

Effective Human-AI Teams via Learned Natural Language Rules and Onboarding

  • paper_url: http://arxiv.org/abs/2311.01007
  • repo_url: https://github.com/clinicalml/onboarding_human_ai
  • paper_authors: Hussein Mozannar, Jimin J Lee, Dennis Wei, Prasanna Sattigeri, Subhro Das, David Sontag
  • for: 本研究旨在学习基于数据区域和自然语言描述的人工智能(AI)和人合作规则,以提高人AI团队的准确性。
  • methods: 本研究使用了一种新的区域发现算法,可以在数据空间中找到本地区域,并使用迭代和对比过程将这些区域描述以便人类理解。
  • results: 通过对物体检测和问答任务进行人类学习和评估,研究发现,通过使用本研究提出的方法,人AI团队的准确性可以得到进一步提高。此外,研究还分别评估了区域发现和描述算法的效果。
    Abstract People are relying on AI agents to assist them with various tasks. The human must know when to rely on the agent, collaborate with the agent, or ignore its suggestions. In this work, we propose to learn rules grounded in data regions and described in natural language that illustrate how the human should collaborate with the AI. Our novel region discovery algorithm finds local regions in the data as neighborhoods in an embedding space that corrects the human prior. Each region is then described using an iterative and contrastive procedure where a large language model describes the region. We then teach these rules to the human via an onboarding stage. Through user studies on object detection and question-answering tasks, we show that our method can lead to more accurate human-AI teams. We also evaluate our region discovery and description algorithms separately.
    摘要 人们正在依靠人工智能代理人 assistance 完成各种任务。人类需要知道何时依靠代理人、合作与代理人或忽略其建议。在这项工作中,我们提议通过学习基于数据区域的规则,以便人类与AI合作更加准确。我们的新区域发现算法在嵌入空间中找到地方,并将其描述为人类可以理解的形式。然后,我们通过一种迭代和对比的过程,使用大型自然语言处理模型描述这些地方。最后,我们将这些规则传递给人类,以便他们可以更好地与AI合作。通过对物体检测和问答任务的用户研究,我们显示了我们的方法可以带来更加准确的人类-AI团队。我们还分别评估了我们的区域发现和描述算法。

Sam-Guided Enhanced Fine-Grained Encoding with Mixed Semantic Learning for Medical Image Captioning

  • paper_url: http://arxiv.org/abs/2311.01004
  • repo_url: None
  • paper_authors: Gaoang Wang, Zhenyu Zhang, Benlu Wang, Weijie Liang, Yizhi Li, Xuechen Guo, Guanhong Wang, Shiyan Li
  • for: 这篇论文旨在提出一种基于深度学习的医疗影像描述方法,以提供更好的诊断建议。
  • methods: 本论文使用了Segment Anything Model(SAM)来实现更好的缩寸和细部特征提取,并且运用混合semantic learning的独特预训策略,同时捕捉医疗影像的全面信息和细部细节。
  • results: 本论文证明了这种方法的效iveness,与预训BLIP2模型相比,在不同的评估指标上表现出色,能够更好地描述医疗影像的内容。
    Abstract With the development of multimodality and large language models, the deep learning-based technique for medical image captioning holds the potential to offer valuable diagnostic recommendations. However, current generic text and image pre-trained models do not yield satisfactory results when it comes to describing intricate details within medical images. In this paper, we present a novel medical image captioning method guided by the segment anything model (SAM) to enable enhanced encoding with both general and detailed feature extraction. In addition, our approach employs a distinctive pre-training strategy with mixed semantic learning to simultaneously capture both the overall information and finer details within medical images. We demonstrate the effectiveness of this approach, as it outperforms the pre-trained BLIP2 model on various evaluation metrics for generating descriptions of medical images.
    摘要 随着多模态和大语言模型的发展,深度学习基于医疗图像描述技术具有诊断建议的潜在价值。然而,当前的通用文本和图像预训练模型无法准确描述医疗图像中的细节。在这篇论文中,我们提出了一种基于segment anything模型(SAM)的新型医疗图像描述方法,以便增强通用特征提取和细节特征提取。此外,我们的方法采用混合semantic学习策略,同时捕捉医疗图像的总体信息和细节信息。我们的实验表明,这种方法可以超过预训练的BLIP2模型,在不同的评价指标上为医疗图像生成描述具有更高的效果。

Robust Data Pruning under Label Noise via Maximizing Re-labeling Accuracy

  • paper_url: http://arxiv.org/abs/2311.01002
  • repo_url: None
  • paper_authors: Dongmin Park, Seola Choi, Doyoung Kim, Hwanjun Song, Jae-Gil Lee
  • for: 降低深度学习的计算成本,通过减少大规模训练集来实现数据采样。
  • methods: 提出了一种基于预测信息的数据采样算法,通过计算邻域例子的预测信息来选择最有用的示例集。
  • results: 对四个真实数据集和一个 sintetic 数据集进行了广泛的实验,结果显示,相比基eline,\algname{}可以提高标注模型的泛化性能和预测精度,最高提高9.1%和21.6%。
    Abstract Data pruning, which aims to downsize a large training set into a small informative subset, is crucial for reducing the enormous computational costs of modern deep learning. Though large-scale data collections invariably contain annotation noise and numerous robust learning methods have been developed, data pruning for the noise-robust learning scenario has received little attention. With state-of-the-art Re-labeling methods that self-correct erroneous labels while training, it is challenging to identify which subset induces the most accurate re-labeling of erroneous labels in the entire training set. In this paper, we formalize the problem of data pruning with re-labeling. We first show that the likelihood of a training example being correctly re-labeled is proportional to the prediction confidence of its neighborhood in the subset. Therefore, we propose a novel data pruning algorithm, Prune4Rel, that finds a subset maximizing the total neighborhood confidence of all training examples, thereby maximizing the re-labeling accuracy and generalization performance. Extensive experiments on four real and one synthetic noisy datasets show that \algname{} outperforms the baselines with Re-labeling models by up to 9.1% as well as those with a standard model by up to 21.6%.
    摘要 <>转换文本到简化中文。<>现代深度学习的计算成本很高,因此大规模数据集的减小成本是至关重要的。尽管大规模数据集总是含有注释噪声和许多Robust学习方法已经开发出来,但是对于噪声Robust学习场景,数据减小尚未得到足够的关注。使用现代重新标注方法可以在训练过程中自动更正错误标签,但是寻找整个训练集中最精确地重新标注错误标签的子集是挑战。在这篇论文中,我们正式定义了数据减小与重新标注的问题。我们首先表明,训练示例 Correctly重新标注的可能性与其邻域在子集中的预测信心直接相关。因此,我们提出了一种新的数据减小算法,名为Prune4Rel,它找到一个最大化全局邻域信心的所有训练示例的子集,以最大化重新标注准确性和泛化性能。我们对四个真实的噪声数据集和一个synthetic数据集进行了广泛的实验,结果显示,\algname{}相比基eline模型和标准模型,提高了9.1%和21.6%。

Fully Quantized Always-on Face Detector Considering Mobile Image Sensors

  • paper_url: http://arxiv.org/abs/2311.01001
  • repo_url: None
  • paper_authors: Haechang Lee, Wongi Jeong, Dongil Ryu, Hyunwoo Je, Albert No, Kijeong Kim, Se Young Chun
  • for: 这个研究旨在对 Always-on 面部检测scenario for 移动像感应器应用进行 bridging gap.
  • methods: 我们的提案使用感应器 Raw 输入,模拟 Always-on 面部检测过程 “before” ISP 链接. 我们的方法使用三元 (-1, 0, 1) 的权重,实现了实际上的图像感应器实现.
  • results: 我们的方法在模拟研究中展示了理想的面部检测性和优秀的效率.
    Abstract Despite significant research on lightweight deep neural networks (DNNs) designed for edge devices, the current face detectors do not fully meet the requirements for "intelligent" CMOS image sensors (iCISs) integrated with embedded DNNs. These sensors are essential in various practical applications, such as energy-efficient mobile phones and surveillance systems with always-on capabilities. One noteworthy limitation is the absence of suitable face detectors for the always-on scenario, a crucial aspect of image sensor-level applications. These detectors must operate directly with sensor RAW data before the image signal processor (ISP) takes over. This gap poses a significant challenge in achieving optimal performance in such scenarios. Further research and development are necessary to bridge this gap and fully leverage the potential of iCIS applications. In this study, we aim to bridge the gap by exploring extremely low-bit lightweight face detectors, focusing on the always-on face detection scenario for mobile image sensor applications. To achieve this, our proposed model utilizes sensor-aware synthetic RAW inputs, simulating always-on face detection processed "before" the ISP chain. Our approach employs ternary (-1, 0, 1) weights for potential implementations in image sensors, resulting in a relatively simple network architecture with shallow layers and extremely low-bitwidth. Our method demonstrates reasonable face detection performance and excellent efficiency in simulation studies, offering promising possibilities for practical always-on face detectors in real-world applications.
    摘要 尽管有大量研究关于轻量级深度神经网络(DNN),目前的脸部检测器还没有完全满足智能CMOS图像传感器(iCIS)的需求。这些检测器在实际应用中非常重要,例如能效的手机和 Always-on surveillance系统。一个吸引人的限制是 absent 的适用于 Always-on 场景的脸部检测器,这是图像感知器(ISP)链的一个关键环节。这个差距使得实现最佳性能在这些场景变得非常困难。为了跨越这个差距,我们在这个研究中尝试通过探索极低位数轻量级脸部检测器来bridging 这个差距。我们的提议的模型使用感知器 Raw 输入,模拟 Always-on 脸部检测场景,并使用 (-1, 0, 1) 的ternary 权重。这种方法使得我们的网络架构非常简单,具有极低的位数宽。我们的方法在模拟研究中表现出了合理的脸部检测性能和优秀的效率,提供了实用 Always-on 脸部检测器的可能性。

Replicable Benchmarking of Neural Machine Translation (NMT) on Low-Resource Local Languages in Indonesia

  • paper_url: http://arxiv.org/abs/2311.00998
  • repo_url: https://github.com/exqrch/indonesiannmt
  • paper_authors: Lucky Susanto, Ryandito Diandaru, Adila Krisnadhi, Ayu Purwarianti, Derry Wijaya
  • for: 本研究旨在解决印度尼西亚低资源本地语言中文机器翻译 faces significiant challenges,包括需要代表性的标准和有限的数据可用性。
  • methods: 本研究使用了多种训练方法、概念和数据大小,并进行了一些初步的大语言模型为低资源语言平行数据生成的研究。
  • results: 我们的研究发现,尽管有限的计算资源和文本数据,several of our NMT systems 仍然可以达到竞争性的翻译质量,与零批gpt-3.5-turbo的翻译质量相当。这些发现对低资源语言翻译具有重要的进步,对研究人员在类似情况下具有很大的价值。
    Abstract Neural machine translation (NMT) for low-resource local languages in Indonesia faces significant challenges, including the need for a representative benchmark and limited data availability. This work addresses these challenges by comprehensively analyzing training NMT systems for four low-resource local languages in Indonesia: Javanese, Sundanese, Minangkabau, and Balinese. Our study encompasses various training approaches, paradigms, data sizes, and a preliminary study into using large language models for synthetic low-resource languages parallel data generation. We reveal specific trends and insights into practical strategies for low-resource language translation. Our research demonstrates that despite limited computational resources and textual data, several of our NMT systems achieve competitive performances, rivaling the translation quality of zero-shot gpt-3.5-turbo. These findings significantly advance NMT for low-resource languages, offering valuable guidance for researchers in similar contexts.
    摘要 神经机器翻译(NMT) для低资源本地语言在印度尼西亚面临重大挑战,包括需要代表性的标准和有限的数据可用性。这项工作解决这些挑战,通过对四种低资源本地语言的NMT系统进行全面分析: javanese、Sundanese、Minangkabau 和 Balinese。我们的研究包括不同的训练方法、概念、数据大小和初步研究使用大型语言模型生成低资源语言平行数据。我们发现了特定的趋势和策略,以及在实际翻译中的实用性。我们的研究表明,尽管计算机资源有限和文本数据有限,但是一些我们的NMT系统可以达到竞争的翻译质量,与零shot gpt-3.5-turbo相当。这些发现对NMT的低资源语言翻译做出了重要贡献,为研究人员提供了有价值的指南。

Optimizing Inventory Routing: A Decision-Focused Learning Approach using Neural Networks

  • paper_url: http://arxiv.org/abs/2311.00983
  • repo_url: None
  • paper_authors: MD Shafikul Islam, Azmine Toushik Wasi
  • for: 解决供应链管理中的货物 Routing 问题 (IRP),这是一个关键的挑战,因为它涉及到最优化的路径选择,同时考虑到货物需求预测的不确定性。
  • methods: 我们的实验表明,通常使用两个阶段方法来解决 IRP,首先使用机器学习技术预测需求,然后使用优化算法来最小化 Routing 成本。
  • results: 然而,我们发现机器学习模型无法达到完美准确性,因为货物储备水平受到动态商业环境的影响,这将在下一阶段的优化问题中产生不优化的决策。在这篇论文中,我们提出了一种专注于决策的学习基本方法,以解决真实世界中的 IRP。这种方法直接将货物预测和 Routing 优化 integrate 到一个综合系统中,可能提供一个有力的供应链策略。
    Abstract Inventory Routing Problem (IRP) is a crucial challenge in supply chain management as it involves optimizing efficient route selection while considering the uncertainty of inventory demand planning. To solve IRPs, usually a two-stage approach is employed, where demand is predicted using machine learning techniques first, and then an optimization algorithm is used to minimize routing costs. Our experiment shows machine learning models fall short of achieving perfect accuracy because inventory levels are influenced by the dynamic business environment, which, in turn, affects the optimization problem in the next stage, resulting in sub-optimal decisions. In this paper, we formulate and propose a decision-focused learning-based approach to solving real-world IRPs. This approach directly integrates inventory prediction and routing optimization within an end-to-end system potentially ensuring a robust supply chain strategy.
    摘要 供应链管理中的存储路径问题(IRP)是一个重要的挑战,因为它涉及到有效地选择路径,同时考虑存储需求规划的不确定性。通常,解决IRP需要采用两个阶段的方法,其中首先使用机器学习技术预测需求,然后使用优化算法减少路径成本。我们的实验表明,机器学习模型无法达到完美准确性,因为存储水平受到动态商业环境的影响,这有利于下一阶段的优化问题,导致偏低的决策。在这篇论文中,我们提出了一种专注于决策的学习基于方法,以解决现实世界中的IRP。这种方法直接 интегрируiert存储预测和路径优化在一个综合系统中,有可能确保一个强大的供应链策略。

An Integrated Framework Integrating Monte Carlo Tree Search and Supervised Learning for Train Timetabling Problem

  • paper_url: http://arxiv.org/abs/2311.00971
  • repo_url: None
  • paper_authors: Feiyu Yang
  • for: 解决单轨铁路列车时间安排问题 (TTP),这是一个重要和复杂的问题。
  • methods: 本文提出了一个整合 Monte Carlo Tree Search (MCTS) 计算框架,该框架结合了规则方法、无监督学习方法和监督学习方法来解决 TTP 中的 discrete action space 问题。
  • results: 实验显示,提出的启发式 MCTS 方法对 TTP 有利,并且将 learners 应用于 MCTS 搜索过程可以提高数据效率。这种方法提供了一个新的 TTP 解决方案。
    Abstract The single-track railway train timetabling problem (TTP) is an important and complex problem. This article proposes an integrated Monte Carlo Tree Search (MCTS) computing framework that combines heuristic methods, unsupervised learning methods, and supervised learning methods for solving TTP in discrete action spaces. This article first describes the mathematical model and simulation system dynamics of TTP, analyzes the characteristics of the solution from the perspective of MCTS, and proposes some heuristic methods to improve MCTS. This article considers these methods as planners in the proposed framework. Secondly, this article utilizes deep convolutional neural networks to approximate the value of nodes and further applies them to the MCTS search process, referred to as learners. The experiment shows that the proposed heuristic MCTS method is beneficial for solving TTP; The algorithm framework that integrates planners and learners can improve the data efficiency of solving TTP; The proposed method provides a new paradigm for solving TTP.
    摘要 单轨铁路列车时刻表(TTP)是一个重要和复杂的问题。这篇文章提出了一个集成 Monte Carlo Tree Search(MCTS)计算框架,该框架结合了规则方法、无监督学习方法和监督学习方法来解决 TTP 中的离散行动空间问题。文章首先描述了 TTP 的数学模型和 simulate 系统动力学,分析了 MCTS 方法解决 TTP 的特点,并提出了一些规则方法来改进 MCTS。这些方法被视为计划者在提出的框架中。其次,文章利用深度卷积神经网络来估算节点的值,并将其应用到 MCTS 搜索过程中,被称为学习者。实验表明,提出的规则 MCTS 方法有利于解决 TTP。集成计划者和学习者的算法框架可以提高解决 TTP 的数据效率;该方法提供了一个新的 TTP 解决方法。

Video2Music: Suitable Music Generation from Videos using an Affective Multimodal Transformer model

  • paper_url: http://arxiv.org/abs/2311.00968
  • repo_url: https://github.com/amaai-lab/video2music
  • paper_authors: Jaeyong Kang, Soujanya Poria, Dorien Herremans
  • for: 这个研究旨在开发一个可生成音乐的 AI 框架,以匹配提供的视频。
  • methods: 研究人员首先筹集了一个独特的音乐视频集,然后分析了这些音乐视频,从而获得了semantic、scene offset、motion和emotion等特征。这些特征被用作音乐生成模型的引导输入。
  • results: 研究人员通过一种名为 Affective Multimodal Transformer (AMT) 的新型模型,使得生成的音乐与视频内容的情感相似。此外,研究人员还使用了一种基于 bigGRU 的回归模型来估算视频特征中的音符密度和响度,以确保生成的和声与视频的匹配性。
    Abstract Numerous studies in the field of music generation have demonstrated impressive performance, yet virtually no models are able to directly generate music to match accompanying videos. In this work, we develop a generative music AI framework, Video2Music, that can match a provided video. We first curated a unique collection of music videos. Then, we analysed the music videos to obtain semantic, scene offset, motion, and emotion features. These distinct features are then employed as guiding input to our music generation model. We transcribe the audio files into MIDI and chords, and extract features such as note density and loudness. This results in a rich multimodal dataset, called MuVi-Sync, on which we train a novel Affective Multimodal Transformer (AMT) model to generate music given a video. This model includes a novel mechanism to enforce affective similarity between video and music. Finally, post-processing is performed based on a biGRU-based regression model to estimate note density and loudness based on the video features. This ensures a dynamic rendering of the generated chords with varying rhythm and volume. In a thorough experiment, we show that our proposed framework can generate music that matches the video content in terms of emotion. The musical quality, along with the quality of music-video matching is confirmed in a user study. The proposed AMT model, along with the new MuVi-Sync dataset, presents a promising step for the new task of music generation for videos.
    摘要 许多音乐生成研究已经表现出卓越表现,但是几乎没有模型可以直接生成与视频相匹配的音乐。在这个工作中,我们开发了一个生成音乐AI框架,即Video2Music,可以匹配提供的视频。我们首先筹集了一个独特的音乐视频集。然后,我们分析了音乐视频,以获取Semantic、Scene Offset、Motion和Emotion等特征。这些特征被用作音乐生成模型的导入输入。我们将音频文件转译成MIDI和和声,并提取特征 such as note density和 loudness。这结果了一个丰富的多模态数据集,称为MuVi-Sync,在这个数据集上我们训练了一个新的Affective Multimodal Transformer(AMT)模型,以生成音乐给视频。这个模型包括一个新的机制,以保证视频和音乐之间的情感相似性。最后,基于一个biGRU-based regression模型,我们进行了后处理,以估算视频特征基于的音乐的Note density和Loudness。这确保了生成的和声在不同的节奏和音量上进行了动态渲染。在一项全面的实验中,我们证明了我们的提议框架可以根据视频内容生成匹配的音乐,并且音乐质量和音乐-视频匹配质量得到了用户研究的证实。提出的AMT模型,加之新的MuVi-Sync数据集,对于新的音乐生成 для视频任务提出了一个可能的步骤。

Vision-Language Interpreter for Robot Task Planning

  • paper_url: http://arxiv.org/abs/2311.00967
  • repo_url: https://github.com/omron-sinicx/vilain
  • paper_authors: Keisuke Shirai, Cristian C. Beltran-Hernandez, Masashi Hamaya, Atsushi Hashimoto, Shohei Tanaka, Kento Kawaharazuka, Kazutoshi Tanaka, Yoshitaka Ushiku, Shinsuke Mori
  • for: 本研究的目的是提出一个新的任务,即多模态规划问题说明(Multimodal Planning Problem Specification,简称MPPS),以便使用语言指导的 симвоlic planner 来解决问题。
  • methods: 本研究使用了 state-of-the-art 的语言模型和视觉语言模型来生成问题描述(Problem Description,简称PD),并通过Symbolic planner 的反馈来纠正生成的PD。
  • results: 实验结果表明,ViLaIn 可以生成有正确 syntax 的问题描述,并且可以生成有效的机器人计划,其中有效性高于 58%。
    Abstract Large language models (LLMs) are accelerating the development of language-guided robot planners. Meanwhile, symbolic planners offer the advantage of interpretability. This paper proposes a new task that bridges these two trends, namely, multimodal planning problem specification. The aim is to generate a problem description (PD), a machine-readable file used by the planners to find a plan. By generating PDs from language instruction and scene observation, we can drive symbolic planners in a language-guided framework. We propose a Vision-Language Interpreter (ViLaIn), a new framework that generates PDs using state-of-the-art LLM and vision-language models. ViLaIn can refine generated PDs via error message feedback from the symbolic planner. Our aim is to answer the question: How accurately can ViLaIn and the symbolic planner generate valid robot plans? To evaluate ViLaIn, we introduce a novel dataset called the problem description generation (ProDG) dataset. The framework is evaluated with four new evaluation metrics. Experimental results show that ViLaIn can generate syntactically correct problems with more than 99% accuracy and valid plans with more than 58% accuracy.
    摘要 大型语言模型(LLM)正在推动语言导航 robot 规划器的发展。然而,符号规划器具有可解释性的优势。本文提出了一个新任务,即多modal 规划问题规定。目标是生成一个问题描述(PD),一个机器可读的文件,用于由规划器找到一个计划。通过将语言指令和场景观察转换为PD,我们可以在语言导航框架下驱动符号规划器。我们提出了一个名为视力语言 интерпреTER(ViLaIn)的新框架,它使用当前的 LLM 和视力语言模型来生成PD。ViLaIn 可以通过符号规划器返回错误消息来精细地修正生成的PD。我们的目标是回答这个问题:ViLaIn 和符号规划器能够生成有效的机器人计划吗?为了评估 ViLaIn,我们创建了一个名为问题描述生成(ProDG)数据集。框架在四个新的评价指标下进行了评估。实验结果显示,ViLaIn 可以生成符合语法规则的问题描述,准确率高于 99%,并且可以生成有效的计划,准确率高于 58%。

IndoToD: A Multi-Domain Indonesian Benchmark For End-to-End Task-Oriented Dialogue Systems

  • paper_url: http://arxiv.org/abs/2311.00958
  • repo_url: https://github.com/dehanalkautsar/indotod
  • paper_authors: Muhammad Dehan Al Kautsar, Rahmah Khoirussyifa’ Nurdini, Samuel Cahyawijaya, Genta Indra Winata, Ayu Purwarianti
  • for: 这个论文主要是为了开发高级语言(如英语和中文)以外的地域语言Task-oriented dialogue(ToD)系统,以拓宽对对话上下文的理解能力。
  • methods: 这篇论文使用了两个英语ToD数据集的泛化,通过去 lexicalization 来减少笔记注释的大小,并雇用了本地母语 speaker 手动翻译对话。
  • results: 这篇论文引入了一个综合多个领域的Indonesian ToDbenchmark,可以用于评估英语和INDONESIAN ToD系统,以及探索跨语言和双语权重学习方法的潜在利器。
    Abstract Task-oriented dialogue (ToD) systems have been mostly created for high-resource languages, such as English and Chinese. However, there is a need to develop ToD systems for other regional or local languages to broaden their ability to comprehend the dialogue contexts in various languages. This paper introduces IndoToD, an end-to-end multi domain ToD benchmark in Indonesian. We extend two English ToD datasets to Indonesian, comprising four different domains by delexicalization to efficiently reduce the size of annotations. To ensure a high-quality data collection, we hire native speakers to manually translate the dialogues. Along with the original English datasets, these new Indonesian datasets serve as an effective benchmark for evaluating Indonesian and English ToD systems as well as exploring the potential benefits of cross-lingual and bilingual transfer learning approaches.
    摘要 高度资源语言如英语和中文的任务对话(ToD)系统已经大多创建,但是有必要开发ToD系统 для其他地区或本地语言,以扩大对对话上下文的理解能力。这篇文章介绍了印度ToD,一个综合多领域的ToD benchmark在印度尼西亚语中。我们将英语ToD数据集扩展到印度尼西亚语,包括四个不同的领域,通过去除语言标记来有效地减少注释大小。为保证高质量数据采集,我们雇佣了本地母语speaker来手动翻译对话。与原始英语数据集一起,这些新的印度尼西亚语数据集将成为评估印度尼西亚语和英语ToD系统的有效 benchamark,以及探索跨语言和双语传输学习方法的潜在优势。

Gaussian Mixture Solvers for Diffusion Models

  • paper_url: http://arxiv.org/abs/2311.00941
  • repo_url: https://github.com/guohanzhong/gms
  • paper_authors: Hanzhong Guo, Cheng Lu, Fan Bao, Tianyu Pang, Shuicheng Yan, Chao Du, Chongxuan Li
  • for: 这个论文主要针对的是 diffusion models 的生成任务中的样本生成问题。
  • methods: 该论文提出了一种新的 SDE-based 生成器,称为 Gaussian Mixture Solvers (GMS),它可以在批处理中更好地控制样本质量。
  • results: 实验表明,GMS 可以在各种 diffusion models 中提供更高质量的样本,并且在 stroke-based synthesis 等任务中表现更好。
    Abstract Recently, diffusion models have achieved great success in generative tasks. Sampling from diffusion models is equivalent to solving the reverse diffusion stochastic differential equations (SDEs) or the corresponding probability flow ordinary differential equations (ODEs). In comparison, SDE-based solvers can generate samples of higher quality and are suited for image translation tasks like stroke-based synthesis. During inference, however, existing SDE-based solvers are severely constrained by the efficiency-effectiveness dilemma. Our investigation suggests that this is because the Gaussian assumption in the reverse transition kernel is frequently violated (even in the case of simple mixture data) given a limited number of discretization steps. To overcome this limitation, we introduce a novel class of SDE-based solvers called \emph{Gaussian Mixture Solvers (GMS)} for diffusion models. Our solver estimates the first three-order moments and optimizes the parameters of a Gaussian mixture transition kernel using generalized methods of moments in each step during sampling. Empirically, our solver outperforms numerous SDE-based solvers in terms of sample quality in image generation and stroke-based synthesis in various diffusion models, which validates the motivation and effectiveness of GMS. Our code is available at https://github.com/Guohanzhong/GMS.
    摘要

Bridging the Gap: Addressing Discrepancies in Diffusion Model Training for Classifier-Free Guidance

  • paper_url: http://arxiv.org/abs/2311.00938
  • repo_url: None
  • paper_authors: Niket Patel, Luis Salamanca, Luis Barba
  • for: 本研究旨在探讨Diffusion模型的训练方法和生成结果之间的矛盾,以及如何改进Diffusion模型的生成质量。
  • methods: 本研究使用了一种更新的损失函数,以更好地对准Diffusion模型的训练目标和生成行为。
  • results: 实验结果表明,使用该更新后的损失函数可以生成更高质量的样本,并且可以降低指导缩放参数$w$的选择对生成结果的影响。
    Abstract Diffusion models have emerged as a pivotal advancement in generative models, setting new standards to the quality of the generated instances. In the current paper we aim to underscore a discrepancy between conventional training methods and the desired conditional sampling behavior of these models. While the prevalent classifier-free guidance technique works well, it's not without flaws. At higher values for the guidance scale parameter $w$, we often get out of distribution samples and mode collapse, whereas at lower values for $w$ we may not get the desired specificity. To address these challenges, we introduce an updated loss function that better aligns training objectives with sampling behaviors. Experimental validation with FID scores on CIFAR-10 elucidates our method's ability to produce higher quality samples with fewer sampling timesteps, and be more robust to the choice of guidance scale $w$. We also experiment with fine-tuning Stable Diffusion on the proposed loss, to provide early evidence that large diffusion models may also benefit from this refined loss function.
    摘要 各种扩散模型在生成模型中已经成为了重要的进步,为生成实例提供了新的标准。在当前的论文中,我们想要强调普遍的导航方法和扩散模型的 conditional sampling 行为之间的不一致。虽然无类别导航技术在高于 $w$ 的值下能够工作良好,但是存在误差。在较低的 $w$ 值下,我们可能无法获得所需的特定性,而在更高的 $w$ 值下,我们可能会得到偏差的样本。为了解决这些挑战,我们提出了一个更新的损失函数,该函数更好地对应培训目标和抽样行为。实验证明,我们的方法可以在 CIFAR-10 上获得更高质量的样本,并且更加敏感于导航缩放参数 $w$。我们还对 Stable Diffusion 进行了微调,以提供早期的证据,表明大扩散模型也可以从这种精细的损失函数中受益。

Scalable Counterfactual Distribution Estimation in Multivariate Causal Models

  • paper_url: http://arxiv.org/abs/2311.00927
  • repo_url: None
  • paper_authors: Thong Pham, Shohei Shimizu, Hideitsu Hino, Tam Le
  • for: 估计多个量关注(例如结果)的共轭事件分布 Function: 多ivariate causal model中的共轭事件分布估计问题
  • methods: 利用一个可靠的一维隐藏空间,基于所有维度信息构建,以便更好地捕捉相关结构并生成良好的共轭事件分布估计
  • results: 相比现有方法,提供更好的共轭事件分布估计,并在实际数据上显示出更高的准确性和稳定性
    Abstract We consider the problem of estimating the counterfactual joint distribution of multiple quantities of interests (e.g., outcomes) in a multivariate causal model extended from the classical difference-in-difference design. Existing methods for this task either ignore the correlation structures among dimensions of the multivariate outcome by considering univariate causal models on each dimension separately and hence produce incorrect counterfactual distributions, or poorly scale even for moderate-size datasets when directly dealing with such multivariate causal model. We propose a method that alleviates both issues simultaneously by leveraging a robust latent one-dimensional subspace of the original high-dimension space and exploiting the efficient estimation from the univariate causal model on such space. Since the construction of the one-dimensional subspace uses information from all the dimensions, our method can capture the correlation structures and produce good estimates of the counterfactual distribution. We demonstrate the advantages of our approach over existing methods on both synthetic and real-world data.
    摘要 我团队正在考虑一个多量 interess 的 causal 模型的问题,即在多量 outcome 上的 counterfactual JOINT 分布的估计问题。现有的方法可能忽略多量 outcome 的 correlation 结构,通过对每个维度 separately 处理 causal 模型,从而生成错误的 counterfactual 分布,或者对大型数据集进行 direct 处理时会产生差异。我们提出了一种方法,可以同时解决这两个问题,通过利用一个 robust 的 latent 一维空间,并利用这一空间上的 efficient 估计来避免直接处理高维 causal 模型时的问题。由于构造一维空间使用了所有维度的信息,我们的方法可以捕捉 correlation 结构,并生成良好的 counterfactual 分布估计。我们在 synthetic 数据和实际数据上证明了我们的方法的优势。

M2T2: Multi-Task Masked Transformer for Object-centric Pick and Place

  • paper_url: http://arxiv.org/abs/2311.00926
  • repo_url: None
  • paper_authors: Wentao Yuan, Adithyavairavan Murali, Arsalan Mousavian, Dieter Fox
  • for: 这个论文的目的是提出一个单一的模型,能够在不同的物体上进行多种低层运动 Primitives,并在各种不同的场景中进行稳定的物体搬运。
  • methods: 这个模型使用了 transformer 模型,它可以根据触碰点来决定合适的握持位置,并预测不同的动作模式下的有效握持 pose。
  • results: 在一个大规模的 sintetic 数据集上进行训练,这个模型在真实机器人上进行零 shot sim2real 转移,比基eline系统表现出来19%的总性能和37.5%的挑战场景中的性能提升。此外,这个模型也在RLBench中的一 subset of language conditioned tasks 上得到了状况的最佳结果。
    Abstract With the advent of large language models and large-scale robotic datasets, there has been tremendous progress in high-level decision-making for object manipulation. These generic models are able to interpret complex tasks using language commands, but they often have difficulties generalizing to out-of-distribution objects due to the inability of low-level action primitives. In contrast, existing task-specific models excel in low-level manipulation of unknown objects, but only work for a single type of action. To bridge this gap, we present M2T2, a single model that supplies different types of low-level actions that work robustly on arbitrary objects in cluttered scenes. M2T2 is a transformer model which reasons about contact points and predicts valid gripper poses for different action modes given a raw point cloud of the scene. Trained on a large-scale synthetic dataset with 128K scenes, M2T2 achieves zero-shot sim2real transfer on the real robot, outperforming the baseline system with state-of-the-art task-specific models by about 19% in overall performance and 37.5% in challenging scenes where the object needs to be re-oriented for collision-free placement. M2T2 also achieves state-of-the-art results on a subset of language conditioned tasks in RLBench. Videos of robot experiments on unseen objects in both real world and simulation are available on our project website https://m2-t2.github.io.
    摘要 With the advent of large language models and large-scale robotic datasets, there has been tremendous progress in high-level decision-making for object manipulation. These generic models are able to interpret complex tasks using language commands, but they often have difficulties generalizing to out-of-distribution objects due to the inability of low-level action primitives. In contrast, existing task-specific models excel in low-level manipulation of unknown objects, but only work for a single type of action. To bridge this gap, we present M2T2, a single model that supplies different types of low-level actions that work robustly on arbitrary objects in cluttered scenes. M2T2 is a transformer model which reasons about contact points and predicts valid gripper poses for different action modes given a raw point cloud of the scene. Trained on a large-scale synthetic dataset with 128K scenes, M2T2 achieves zero-shot sim2real transfer on the real robot, outperforming the baseline system with state-of-the-art task-specific models by about 19% in overall performance and 37.5% in challenging scenes where the object needs to be re-oriented for collision-free placement. M2T2 also achieves state-of-the-art results on a subset of language conditioned tasks in RLBench. Videos of robot experiments on unseen objects in both real world and simulation are available on our project website (https://m2-t2.github.io).

The Power of the Senses: Generalizable Manipulation from Vision and Touch through Masked Multimodal Learning

  • paper_url: http://arxiv.org/abs/2311.00924
  • repo_url: None
  • paper_authors: Carmelo Sferrazza, Younggyo Seo, Hao Liu, Youngwoon Lee, Pieter Abbeel
  • for: 本研究旨在开发一种可以结合视觉和感觉信息的多模态学习方法,以提高机器人 manipulate 物体的能力。
  • methods: 本研究提出了Masked Multimodal Learning(M3L)方法,它通过对视觉和感觉信息进行卷积自编码,同时学习策略和多模态表示。
  • results: 研究表明,通过在多模态 setting 学习,可以提高 sample efficiency 和泛化能力,并且vision-only 策略在测试时也受益于多模态学习。研究在三个 simulated 环境中进行了测试:机器人插入、门开合和灵活手部操作,结果表明了多模态策略的优势。
    Abstract Humans rely on the synergy of their senses for most essential tasks. For tasks requiring object manipulation, we seamlessly and effectively exploit the complementarity of our senses of vision and touch. This paper draws inspiration from such capabilities and aims to find a systematic approach to fuse visual and tactile information in a reinforcement learning setting. We propose Masked Multimodal Learning (M3L), which jointly learns a policy and visual-tactile representations based on masked autoencoding. The representations jointly learned from vision and touch improve sample efficiency, and unlock generalization capabilities beyond those achievable through each of the senses separately. Remarkably, representations learned in a multimodal setting also benefit vision-only policies at test time. We evaluate M3L on three simulated environments with both visual and tactile observations: robotic insertion, door opening, and dexterous in-hand manipulation, demonstrating the benefits of learning a multimodal policy. Code and videos of the experiments are available at https://sferrazza.cc/m3l_site.
    摘要 人类在日常任务中借靠感觉的协同才能完成大多数任务。在对物体操作任务中,我们会自然地和高效地利用视觉感和触觉感的相互补充。这篇论文从这些能力中得到灵感,旨在在强化学习设置下系统地融合视觉和触觉信息。我们提议的Masked Multimodal Learning(M3L)方法,同时学习策略和视觉和触觉的表示,基于遮盲自动编码。这些共同学习的表示,从视觉和触觉两种感知中各自提高样本效率,并在单独使用的感知方面也具有扩展的能力。更 remarkably,在多模态设置下学习的表示,也对视觉只的策略在测试时具有改善的效果。我们在三个 simulated 环境中进行了 inserting、开门和灵活的手部操作等三种任务的测试,证明了多模态策略的优势。代码和实验视频可以在 https://sferrazza.cc/m3l_site 上获取。

Artificial Intelligence Ethics Education in Cybersecurity: Challenges and Opportunities: a focus group report

  • paper_url: http://arxiv.org/abs/2311.00903
  • repo_url: None
  • paper_authors: Diane Jackson, Sorin Adam Matei, Elisa Bertino
  • for: 这篇论文的目的是探讨人工智能工具在网络安全领域中的应用和挑战。
  • methods: 论文使用了ocus组研讨方法,即与高水平的硬件学生进行面对面讨论,以了解在网络安全领域中人工智能工具的挑战和机遇。
  • results: 论文发现了在网络安全领域中人工智能工具的使用带来的挑战和机遇,包括访问开源或免费工具、文档、课程多样性和伦理原则的明确表述。 additionally, the study found that addressing the “black box” mentality in AI cybersecurity work and improving systems thinking and effective communication skills are crucial.
    Abstract The emergence of AI tools in cybersecurity creates many opportunities and uncertainties. A focus group with advanced graduate students in cybersecurity revealed the potential depth and breadth of the challenges and opportunities. The salient issues are access to open source or free tools, documentation, curricular diversity, and clear articulation of ethical principles for AI cybersecurity education. Confronting the "black box" mentality in AI cybersecurity work is also of the greatest importance, doubled by deeper and prior education in foundational AI work. Systems thinking and effective communication were considered relevant areas of educational improvement. Future AI educators and practitioners need to address these issues by implementing rigorous technical training curricula, clear documentation, and frameworks for ethically monitoring AI combined with critical and system's thinking and communication skills.
    摘要 人工智能在网络安全领域的出现带来了多种机遇和不确定性。一个关注组与高等研究生共同研究了人工智能在网络安全教育中的挑战和机遇。关键问题包括对开源或免费工具的访问、文档、课程多样性和伦理原则的明确表述。 Additionally, confronting the "black box" mentality in AI cybersecurity work is of great importance, and deeper and prior education in foundational AI work is also crucial. 系统思维和有效沟通技巧被认为是教育改进的重要领域。未来的AI教育和实践人员需要通过实施严格的技术训练课程、明确的文档和伦理监测框架来解决这些问题。

cs.CL - 2023-11-02

TopicGPT: A Prompt-based Topic Modeling Framework

  • paper_url: http://arxiv.org/abs/2311.01449
  • repo_url: https://github.com/chtmp223/topicgpt
  • paper_authors: Chau Minh Pham, Alexander Hoyle, Simeng Sun, Mohit Iyyer
  • for: 用于探索文本集合中的 latent topics,并提供高质量和可读性的主题分类。
  • methods: 使用大型自然语言模型 (LLMs) 来揭示文本集合中的 latent topics,并使用提示来控制主题的 semantics。
  • results: 比基eline方法高的0.74的和� proprio的主题纯度,以及更加可读性的主题描述和自然语言标签。
    Abstract Topic modeling is a well-established technique for exploring text corpora. Conventional topic models (e.g., LDA) represent topics as bags of words that often require "reading the tea leaves" to interpret; additionally, they offer users minimal semantic control over topics. To tackle these issues, we introduce TopicGPT, a prompt-based framework that uses large language models (LLMs) to uncover latent topics within a provided text collection. TopicGPT produces topics that align better with human categorizations compared to competing methods: for example, it achieves a harmonic mean purity of 0.74 against human-annotated Wikipedia topics compared to 0.64 for the strongest baseline. Its topics are also more interpretable, dispensing with ambiguous bags of words in favor of topics with natural language labels and associated free-form descriptions. Moreover, the framework is highly adaptable, allowing users to specify constraints and modify topics without the need for model retraining. TopicGPT can be further extended to hierarchical topical modeling, enabling users to explore topics at various levels of granularity. By streamlining access to high-quality and interpretable topics, TopicGPT represents a compelling, human-centered approach to topic modeling.
    摘要

Server-side Rescoring of Spoken Entity-centric Knowledge Queries for Virtual Assistants

  • paper_url: http://arxiv.org/abs/2311.01398
  • repo_url: None
  • paper_authors: Youyuan Zhang, Sashank Gondala, Thiago Fraga-Silva, Christophe Van Gysel
  • for: 这篇论文主要关注在处理语音识别领域中的处理问题,尤其是针对具有实体信息的查询。
  • methods: 本文使用了不同类型的语言模型(LM),包括N-gram字串LM和子字串神经LM,并考虑了在设备和服务器端的信号整合。
  • results: 本文的实验结果显示,通过在服务器端使用不同类型的LM,可以实现23%-35%的话语识别误差提升,并且模型融合多个服务器端的LM可以最有效地结合各模型的优点和对域特定数据的学习知识。
    Abstract On-device Virtual Assistants (VAs) powered by Automatic Speech Recognition (ASR) require effective knowledge integration for the challenging entity-rich query recognition. In this paper, we conduct an empirical study of modeling strategies for server-side rescoring of spoken information domain queries using various categories of Language Models (LMs) (N-gram word LMs, sub-word neural LMs). We investigate the combination of on-device and server-side signals, and demonstrate significant WER improvements of 23%-35% on various entity-centric query subpopulations by integrating various server-side LMs compared to performing ASR on-device only. We also perform a comparison between LMs trained on domain data and a GPT-3 variant offered by OpenAI as a baseline. Furthermore, we also show that model fusion of multiple server-side LMs trained from scratch most effectively combines complementary strengths of each model and integrates knowledge learned from domain-specific data to a VA ASR system.
    摘要 在设备上的虚拟助手(VAs)通过自动语音识别(ASR)需要有效的知识集成以处理复杂的实体rich查询。在这篇论文中,我们进行了实验室研究,使用不同类型的语言模型(LMs)(N-gram字符LMs、子字符神经LMs)来模型服务器端的重新评分语音信息域查询。我们研究了在设备和服务器端的信号组合,并证明了通过将不同类型的服务器端LMs集成到VA ASR系统中,可以实现23%-35%的话语识别错误率(WER)下降。此外,我们还进行了基于域数据进行LMs的训练和OpenAI提供的GPT-3变体作为基准。此外,我们还发现,通过将多个服务器端LMs从零开始训练并融合这些模型的 complementary strengths 可以最好地将域特定数据中学习的知识集成到VA ASR系统中。

Can Language Models Be Tricked by Language Illusions? Easier with Syntax, Harder with Semantics

  • paper_url: http://arxiv.org/abs/2311.01386
  • repo_url: https://github.com/forrestdavis/languageillusions
  • paper_authors: Yuhan Zhang, Edward Gibson, Forrest Davis
  • for: 这个研究是为了检验语言模型(LM)是否可以模仿人类语言处理的行为。
  • methods: 研究使用了三种语言玄学(illusion)测试语言模型的能力:比较玄学(example:”更多人去过俄罗斯than I have”)、深度炸弹玄学(example:”没有头部伤害是过分的”)和负极性项(NPI)玄学(example:”不信任的猎人不会打熊”)。
  • results: 研究发现,LMs对NPI玄学的评估更容易与人类的判断相符,相比之下,对比玄学和深度炸弹玄学的评估更容易与人类的判断不符。 none of the LMs or metrics yielded results that were entirely consistent with human behavior。这些结果表明,LMs在语言处理方面的能力有限,并且不能够完全模仿人类的语言处理行为。
    Abstract Language models (LMs) have been argued to overlap substantially with human beings in grammaticality judgment tasks. But when humans systematically make errors in language processing, should we expect LMs to behave like cognitive models of language and mimic human behavior? We answer this question by investigating LMs' more subtle judgments associated with "language illusions" -- sentences that are vague in meaning, implausible, or ungrammatical but receive unexpectedly high acceptability judgments by humans. We looked at three illusions: the comparative illusion (e.g. "More people have been to Russia than I have"), the depth-charge illusion (e.g. "No head injury is too trivial to be ignored"), and the negative polarity item (NPI) illusion (e.g. "The hunter who no villager believed to be trustworthy will ever shoot a bear"). We found that probabilities represented by LMs were more likely to align with human judgments of being "tricked" by the NPI illusion which examines a structural dependency, compared to the comparative and the depth-charge illusions which require sophisticated semantic understanding. No single LM or metric yielded results that are entirely consistent with human behavior. Ultimately, we show that LMs are limited both in their construal as cognitive models of human language processing and in their capacity to recognize nuanced but critical information in complicated language materials.
    摘要 语言模型(LM)已经被论证为与人类语言处理能力 overlap substantially 。然而,当人类系统atically makes errors in language processing 时,我们该预期LMs behaving like cognitive models of language and mimic human behavior? We answer this question by investigating LMs' more subtle judgments associated with "language illusions" -- sentences that are vague in meaning, implausible, or ungrammatical but receive unexpectedly high acceptability judgments by humans. We looked at three illusions: the comparative illusion (e.g. "More people have been to Russia than I have"), the depth-charge illusion (e.g. "No head injury is too trivial to be ignored"), and the negative polarity item (NPI) illusion (e.g. "The hunter who no villager believed to be trustworthy will ever shoot a bear"). We found that probabilities represented by LMs were more likely to align with human judgments of being "tricked" by the NPI illusion, which examines a structural dependency, compared to the comparative and the depth-charge illusions, which require sophisticated semantic understanding. No single LM or metric yielded results that are entirely consistent with human behavior. Ultimately, we show that LMs are limited both in their construal as cognitive models of human language processing and in their capacity to recognize nuanced but critical information in complicated language materials.

GPT-4V(ision) as a Generalist Evaluator for Vision-Language Tasks

  • paper_url: http://arxiv.org/abs/2311.01361
  • repo_url: None
  • paper_authors: Xinlu Zhang, Yujie Lu, Weizhi Wang, An Yan, Jun Yan, Lianke Qin, Heng Wang, Xifeng Yan, William Yang Wang, Linda Ruth Petzold
  • for: 这篇论文旨在评估 GPT-4V 是否可以作为多媒体任务的自动评估器。
  • methods: 论文使用 GPT-4V 进行了多种多媒体任务的评估,包括图文生成、文本图生成、图像翻译和多图文对齐等。两种评估方法:单答评估和对比评估。
  • results: GPT-4V 在多种任务上与人类评估结果高度相似,表明 GPT-4V 可以作为多媒体任务的自动评估器。尽管有一些限制,如视觉清晰度评估和实际世界复杂的理解,但 GPT-4V 能够提供人类相似的分数以及详细的解释,表示它在多媒体 LLM 中具有潜力。
    Abstract Automatically evaluating vision-language tasks is challenging, especially when it comes to reflecting human judgments due to limitations in accounting for fine-grained details. Although GPT-4V has shown promising results in various multi-modal tasks, leveraging GPT-4V as a generalist evaluator for these tasks has not yet been systematically explored. We comprehensively validate GPT-4V's capabilities for evaluation purposes, addressing tasks ranging from foundational image-to-text and text-to-image synthesis to high-level image-to-image translations and multi-images to text alignment. We employ two evaluation methods, single-answer grading and pairwise comparison, using GPT-4V. Notably, GPT-4V shows promising agreement with humans across various tasks and evaluation methods, demonstrating immense potential for multi-modal LLMs as evaluators. Despite limitations like restricted visual clarity grading and real-world complex reasoning, its ability to provide human-aligned scores enriched with detailed explanations is promising for universal automatic evaluator.
    摘要 自动评估视觉语言任务是具有挑战性的,尤其是在准确地考虑细节方面有限制。虽然GPT-4V在多模态任务中表现出了可喜的结果,但是利用GPT-4V作为多模态评估器并没有得到系统性的探讨。我们全面验证GPT-4V的评估能力,涵盖图像到文本和文本到图像生成、高级图像到图像翻译以及多个图像到文本对齐等任务。我们采用了两种评估方法:单答题评估和对比评估,使用GPT-4V进行评估。值得注意的是,GPT-4V与人类的评估结果有良好的一致性,表现出了大量的可能性作为多模态LLM的评估器。尽管有限制的视觉清晰度评估和实际世界的复杂逻辑 reasoning 存在限制,但GPT-4V能够提供人类相似的分数,并且具有详细的解释,这是一个有前途的自动评估器。

The Effect of Scaling, Retrieval Augmentation and Form on the Factual Consistency of Language Models

  • paper_url: http://arxiv.org/abs/2311.01307
  • repo_url: None
  • paper_authors: Lovisa Hagström, Denitsa Saynova, Tobias Norlund, Moa Johansson, Richard Johansson
  • for: 本研究旨在解释语言模型(LLM)在实际知识交互中的缺点,即它们在semantic equivalence的问题上具有不一致的答案问题。
  • methods: 本研究采用了两种缓解方法:升级和通过检索库补充语言模型(LM)。我们对LLaMA和Atlas模型进行了测试,并证明了这两种方法都能够减少不一致性,而检索补充方法更加高效。
  • results: 我们发现,不同的组件在Atlas模型中对一致性做出了不同的贡献。此外,我们还发现了不同的语言模型在不同的语言任务上 exhibit 不同的一致性问题。为所有评估过的语言模型而言,我们发现了语法形式和其他评估任务的假设 artifacts 对一致性具有影响。
    Abstract Large Language Models (LLMs) make natural interfaces to factual knowledge, but their usefulness is limited by their tendency to deliver inconsistent answers to semantically equivalent questions. For example, a model might predict both "Anne Redpath passed away in Edinburgh." and "Anne Redpath's life ended in London." In this work, we identify potential causes of inconsistency and evaluate the effectiveness of two mitigation strategies: up-scaling and augmenting the LM with a retrieval corpus. Our results on the LLaMA and Atlas models show that both strategies reduce inconsistency while retrieval augmentation is considerably more efficient. We further consider and disentangle the consistency contributions of different components of Atlas. For all LMs evaluated we find that syntactical form and other evaluation task artifacts impact consistency. Taken together, our results provide a better understanding of the factors affecting the factual consistency of language models.
    摘要 大型语言模型(LLM)可以作为自然界面来访问事实知识,但它们的用途受到它们对 semantically equivalent 问题的答案不一致的限制。例如,一个模型可能会预测 "安妮·雷普薇在Edinburgh去世" 和 "安妮·雷普薇的生命在London conclude"。在这项工作中,我们确定了可能导致不一致的原因,并评估了两种缓解策略:缩放和通过检索库补充语言模型。我们的结果表明,两种策略都可以减少不一致,而检索补充是远远更高效。我们进一步考虑了Atlas模型中的一致性贡献,并发现了不同组件的一致性贡献。对所有评估模型来说,我们发现了语法形式和其他评估任务的artifacts对一致性有影响。总之,我们的结果为语言模型的事实一致性提供了更好的理解。

FlashDecoding++: Faster Large Language Model Inference on GPUs

  • paper_url: http://arxiv.org/abs/2311.01282
  • repo_url: None
  • paper_authors: Ke Hong, Guohao Dai, Jiaming Xu, Qiuli Mao, Xiuhong Li, Jun Liu, Kangdi Chen, Hanyu Dong, Yu Wang
  • for: 提高 LLM 推理引擎的速度
  • methods: 使用异步 softmax 更新、双缓存 flat GEMM 优化、根据硬件资源进行智能数据流优化
  • results: 实现了up to 4.86x和2.18x的速度提升 compared to Hugging Face 实现,以及平均比state-of-the-art LLM 推理引擎提高1.37倍
    Abstract As the Large Language Model (LLM) becomes increasingly important in various domains. However, the following challenges still remain unsolved in accelerating LLM inference: (1) Synchronized partial softmax update. The softmax operation requires a synchronized update operation among each partial softmax result, leading to ~20% overheads for the attention computation in LLMs. (2) Under-utilized computation of flat GEMM. The shape of matrices performing GEMM in LLM inference is flat, leading to under-utilized computation and >50% performance loss after padding zeros in previous designs. (3) Performance loss due to static dataflow. Kernel performance in LLM depends on varied input data features, hardware configurations, etc. A single and static dataflow may lead to a 50.25% performance loss for GEMMs of different shapes in LLM inference. We present FlashDecoding++, a fast LLM inference engine supporting mainstream LLMs and hardware back-ends. To tackle the above challenges, FlashDecoding++ creatively proposes: (1) Asynchronized softmax with unified max value. FlashDecoding++ introduces a unified max value technique for different partial softmax computations to avoid synchronization. (2) Flat GEMM optimization with double buffering. FlashDecoding++ points out that flat GEMMs with different shapes face varied bottlenecks. Then, techniques like double buffering are introduced. (3) Heuristic dataflow with hardware resource adaptation. FlashDecoding++ heuristically optimizes dataflow using different hardware resource considering input dynamics. Due to the versatility of optimizations in FlashDecoding++, FlashDecoding++ can achieve up to 4.86x and 2.18x speedup on both NVIDIA and AMD GPUs compared to Hugging Face implementations. FlashDecoding++ also achieves an average speedup of 1.37x compared to state-of-the-art LLM inference engines on mainstream LLMs.
    摘要 As the Large Language Model (LLM) becomes increasingly important in various domains, there are still several challenges that need to be addressed in order to accelerate LLM inference. These challenges include:1. Synchronized partial softmax update. The softmax operation requires a synchronized update operation among each partial softmax result, leading to approximately 20% overhead for attention computation in LLMs.2. Under-utilized computation of flat GEMM. The shape of matrices performing GEMM in LLM inference is flat, leading to under-utilized computation and a performance loss of over 50% after padding zeros in previous designs.3. Performance loss due to static dataflow. The performance of LLM inference kernels depends on various factors such as input data features, hardware configurations, etc. A single and static dataflow may lead to a 50.25% performance loss for GEMMs of different shapes in LLM inference.To address these challenges, we present FlashDecoding++, a fast LLM inference engine that supports mainstream LLMs and hardware back-ends. Our approach includes:1. Asynchronized softmax with unified max value. We introduce a unified max value technique for different partial softmax computations to avoid synchronization.2. Flat GEMM optimization with double buffering. We point out that flat GEMMs with different shapes face varied bottlenecks, and techniques like double buffering are introduced to optimize the computation.3. Heuristic dataflow with hardware resource adaptation. We heuristically optimize the dataflow using different hardware resources considering input dynamics.Thanks to the versatility of optimizations in FlashDecoding++, FlashDecoding++ can achieve up to 4.86x and 2.18x speedup on both NVIDIA and AMD GPUs compared to Hugging Face implementations. FlashDecoding++ also achieves an average speedup of 1.37x compared to state-of-the-art LLM inference engines on mainstream LLMs.

Finding Common Ground: Annotating and Predicting Common Ground in Spoken Conversations

  • paper_url: http://arxiv.org/abs/2311.01273
  • repo_url: https://github.com/cogstates/2023-emnlp-common-ground
  • paper_authors: Magdalena Markowska, Mohammad Taghizadeh, Adil Soubki, Seyed Abolghasem Mirroshandel, Owen Rambow
  • for: This paper is written for researchers and scientists in the field of cognitive science and natural language processing.
  • methods: The paper introduces a new annotation and corpus to capture common ground, and describes initial experiments extracting propositions from dialog and tracking their status in the common ground from the perspective of each speaker.
  • results: The paper presents initial experiments extracting propositions from dialog and tracking their status in the common ground from the perspective of each speaker, with the goal of capturing common ground in natural language processing.
    Abstract When we communicate with other humans, we do not simply generate a sequence of words. Rather, we use our cognitive state (beliefs, desires, intentions) and our model of the audience's cognitive state to create utterances that affect the audience's cognitive state in the intended manner. An important part of cognitive state is the common ground, which is the content the speaker believes, and the speaker believes the audience believes, and so on. While much attention has been paid to common ground in cognitive science, there has not been much work in natural language processing. In this paper, we introduce a new annotation and corpus to capture common ground. We then describe some initial experiments extracting propositions from dialog and tracking their status in the common ground from the perspective of each speaker.
    摘要 当我们与其他人交流时,我们不仅是生成一个字串的序列。而是使用我们的认知状态(信念、愿望、意图)和我们对听众认知状态的模型来创造影响听众认知状态的语言表达。认知状态中的共同知识是speaker认为自己和听众认为自己相信的内容,以及这些内容在听众和speaker之间的共同认知。虽然认知科学中对共同知识进行了大量研究,但是自然语言处理领域中对其进行了 relativamente little research。在这篇论文中,我们介绍了一个新的注释和 корпуス来捕捉共同知识。然后,我们描述了一些初步的实验,从每个speaker的角度提取对话中的提案,并跟踪它们在共同知识中的状态。

People Make Better Edits: Measuring the Efficacy of LLM-Generated Counterfactually Augmented Data for Harmful Language Detection

  • paper_url: http://arxiv.org/abs/2311.01270
  • repo_url: None
  • paper_authors: Indira Sen, Dennis Assenmacher, Mattia Samory, Isabelle Augenstein, Wil van der Aalst, Claudia Wagne
  • For: The paper aims to improve the robustness of NLP models to spurious features by automating the process of generating Counterfactually Augmented Data (CADs).* Methods: The authors use three generative NLP models - Polyjuice, ChatGPT, and Flan-T5 - to automatically generate CADs, and evaluate their effectiveness in improving model robustness compared to manually-generated CADs.* Results: The authors find that while manually-generated CADs are still the most effective, CADs generated by ChatGPT come a close second. However, the changes introduced by the automated methods are often insufficient to flip the original label, which limits their performance.
    Abstract NLP models are used in a variety of critical social computing tasks, such as detecting sexist, racist, or otherwise hateful content. Therefore, it is imperative that these models are robust to spurious features. Past work has attempted to tackle such spurious features using training data augmentation, including Counterfactually Augmented Data (CADs). CADs introduce minimal changes to existing training data points and flip their labels; training on them may reduce model dependency on spurious features. However, manually generating CADs can be time-consuming and expensive. Hence in this work, we assess if this task can be automated using generative NLP models. We automatically generate CADs using Polyjuice, ChatGPT, and Flan-T5, and evaluate their usefulness in improving model robustness compared to manually-generated CADs. By testing both model performance on multiple out-of-domain test sets and individual data point efficacy, our results show that while manual CADs are still the most effective, CADs generated by ChatGPT come a close second. One key reason for the lower performance of automated methods is that the changes they introduce are often insufficient to flip the original label.
    摘要

A Study of Continual Learning Under Language Shift

  • paper_url: http://arxiv.org/abs/2311.01200
  • repo_url: None
  • paper_authors: Evangelia Gogoulou, Timothée Lesort, Magnus Boman, Joakim Nivre
  • for: 本研究探讨了在新语言数据可用时更新语言模型的benefits和缺点,特别是在语言shift情况下的持续学习。
  • methods: 研究人员从英语单语言模型开始,逐步添加挪威语和冰岛语数据,研究如何在不同的语言顺序和模型大小下实现转移学习的效果。
  • results: 研究结果表明,前向传递效果几乎是无关语言顺序的,而后向传递效果则可能受到语言顺序和语言特征的影响,并且与不同的学习率规则相关。
    Abstract The recent increase in data and model scale for language model pre-training has led to huge training costs. In scenarios where new data become available over time, updating a model instead of fully retraining it would therefore provide significant gains. In this paper, we study the benefits and downsides of updating a language model when new data comes from new languages - the case of continual learning under language shift. Starting from a monolingual English language model, we incrementally add data from Norwegian and Icelandic to investigate how forward and backward transfer effects depend on the pre-training order and characteristics of languages, for different model sizes and learning rate schedulers. Our results show that, while forward transfer is largely positive and independent of language order, backward transfer can be either positive or negative depending on the order and characteristics of new languages. To explain these patterns we explore several language similarity metrics and find that syntactic similarity appears to have the best correlation with our results.
    摘要 We begin with a monolingual English language model and incrementally add data from Norwegian and Icelandic to examine how forward and backward transfer effects depend on the pre-training order and characteristics of languages, as well as different model sizes and learning rate schedulers. Our findings show that forward transfer is generally positive and independent of language order, while backward transfer can be either positive or negative, depending on the order and characteristics of the new languages.To understand these patterns, we investigate several language similarity metrics and find that syntactic similarity is the most strongly correlated with our results. Our study provides valuable insights into the benefits and challenges of continual learning under language shift, and highlights the importance of considering language similarity when updating a language model with new data.

CRUSH4SQL: Collective Retrieval Using Schema Hallucination For Text2SQL

  • paper_url: http://arxiv.org/abs/2311.01173
  • repo_url: None
  • paper_authors: Mayank Kothyari, Dhruva Dhingra, Sunita Sarawagi, Soumen Chakrabarti
  • for: 本研究旨在提高大规模数据库中的文本到SQL生成器效率,并且可以在不需要整个schema的情况下进行文本到SQL生成。
  • methods: 本研究提出了一种两stage的方法,首先使用LLM hallucinate一个最小的数据库 schema,然后使用这个hallucinated schema进行多个dense retrieval来选择实际schema中的一个子集。
  • results: 研究发现,使用hallucination可以提高文本到SQL生成器的准确率,并且与现有的State-of-the-Art retrieval-based augmentation方法相比,本方法可以获得更高的回归率。
    Abstract Existing Text-to-SQL generators require the entire schema to be encoded with the user text. This is expensive or impractical for large databases with tens of thousands of columns. Standard dense retrieval techniques are inadequate for schema subsetting of a large structured database, where the correct semantics of retrieval demands that we rank sets of schema elements rather than individual elements. In response, we propose a two-stage process for effective coverage during retrieval. First, we instruct an LLM to hallucinate a minimal DB schema deemed adequate to answer the query. We use the hallucinated schema to retrieve a subset of the actual schema, by composing the results from multiple dense retrievals. Remarkably, hallucination $\unicode{x2013}$ generally considered a nuisance $\unicode{x2013}$ turns out to be actually useful as a bridging mechanism. Since no existing benchmarks exist for schema subsetting on large databases, we introduce three benchmarks. Two semi-synthetic datasets are derived from the union of schemas in two well-known datasets, SPIDER and BIRD, resulting in 4502 and 798 schema elements respectively. A real-life benchmark called SocialDB is sourced from an actual large data warehouse comprising 17844 schema elements. We show that our method1 leads to significantly higher recall than SOTA retrieval-based augmentation methods.
    摘要 现有的文本到SQL生成器需要整个 schema 被编码到用户文本中。这是costly或实际不切实际的 для大型数据库,其中包含 tens of thousands 的列。标准稠密检索技术无法对大型结构化数据库进行schemasubsetting,因为正确的 semantics of retrieval 需要我们将set of schema elements 排序,而不是单个元素。因此,我们提出了一个两stage的过程,以实现有效的覆盖。首先,我们请求 LLM 生成一个最小的DB schema,可以回答查询。我们使用生成的 schema 来 retrieve 一个实际 schema 的子集,通过多个稠密检索的结果进行组合。 Surprisingly,hallucination $\unicode{x2013}$ ,一直被视为幻觉 $\unicode{x2013}$ ,实际上是一种有用的桥接机制。由于现有的 benchmark 不存在于大型数据库上的schema subsetting,我们引入了三个 benchmark。这三个 benchmark 包括两个 semi-synthetic 数据集,它们来自 SPIDER 和 BIRD 两个well-known数据集,共计4502 和 798 的 schema element。此外,我们还引入了一个实际的大数据库,即 SocialDB,它包含 17844 的 schema element。我们表明,我们的方法1 可以达到 significantly higher recall than SOTA retrieval-based augmentation methods。

ACES: Translation Accuracy Challenge Sets at WMT 2023

  • paper_url: http://arxiv.org/abs/2311.01153
  • repo_url: None
  • paper_authors: Chantal Amrhein, Nikita Moghe, Liane Guillou
  • for: 本研究使用ACES挑战集(Amrhein et al., 2022)进行了对 segmentlevel метри克的性能评估,以便为 WMT 2023 提供评估metric。
  • methods: 研究使用了68种现象和146种语言对的36,000个示例进行了评估,并为每个 метри克提供了错误类型的细化分布图表以及一个总的ACES-Score,以便快速比较。此外,研究还测试了2023和2022年度metric的增量性能。
  • results: 研究发现,1)没有明确的赢家 Among the metrics submitted to WMT 2023,2)2023和2022年度metric的性能变化很大。研究建议, metric developer should focus on:建立多家 metric ensemble,开发更注重源语言和避免surface-level overlap的 metric,以及仔细确定多语言嵌入的影响于MT评估。
    Abstract We benchmark the performance of segmentlevel metrics submitted to WMT 2023 using the ACES Challenge Set (Amrhein et al., 2022). The challenge set consists of 36K examples representing challenges from 68 phenomena and covering 146 language pairs. The phenomena range from simple perturbations at the word/character level to more complex errors based on discourse and real-world knowledge. For each metric, we provide a detailed profile of performance over a range of error categories as well as an overall ACES-Score for quick comparison. We also measure the incremental performance of the metrics submitted to both WMT 2023 and 2022. We find that 1) there is no clear winner among the metrics submitted to WMT 2023, and 2) performance change between the 2023 and 2022 versions of the metrics is highly variable. Our recommendations are similar to those from WMT 2022. Metric developers should focus on: building ensembles of metrics from different design families, developing metrics that pay more attention to the source and rely less on surface-level overlap, and carefully determining the influence of multilingual embeddings on MT evaluation.
    摘要 我们对WTM 2023中提交的segmentlevel metric进行了性能测试,使用ACES Challenge Set(Amrhein et al., 2022)。这个挑战集包含36K个例子,表示68种现象和146种语言对。这些现象包括单个字/字符级别的简单扰乱到更加复杂的错误基于话语和实际知识。对每个指标,我们提供了错误类别的详细分布图以及一个总的ACES-Score,方便比较。我们还测量了2023和2022两年度metric submission中的增量性能。我们发现:1)WTM 2023中提交的指标没有一个明确的赢家,2)2023和2022两年度metric submission中的性能变化很大。我们的建议与WTM 2022相似:指标开发者应该:1)建立不同设计家族的metric ensemble,2)开发更加关注源语和 superficialevel overlap的metric,3)仔细确定多语言嵌入影响MT评估。

Predicting Question-Answering Performance of Large Language Models through Semantic Consistency

  • paper_url: http://arxiv.org/abs/2311.01152
  • repo_url: None
  • paper_authors: Ella Rabinovich, Samuel Ackerman, Orna Raz, Eitan Farchi, Ateret Anaby-Tavor
  • for: 本研究旨在评估当代大语言模型(LLM)的问答(QA) semantic consistency,通过手动创建高质量重句替换的底库,并将其发布给社区。
  • methods: 本研究使用的方法包括创建底库,以及与先前的工作相关的其他测量方法,用于评估 LLM QA 性能。
  • results: 研究结果表明,与基elines比较,本 frameworks 能够显著地提高 LLM 的问答性能, demonstrating encouraging results.
    Abstract Semantic consistency of a language model is broadly defined as the model's ability to produce semantically-equivalent outputs, given semantically-equivalent inputs. We address the task of assessing question-answering (QA) semantic consistency of contemporary large language models (LLMs) by manually creating a benchmark dataset with high-quality paraphrases for factual questions, and release the dataset to the community. We further combine the semantic consistency metric with additional measurements suggested in prior work as correlating with LLM QA accuracy, for building and evaluating a framework for factual QA reference-less performance prediction -- predicting the likelihood of a language model to accurately answer a question. Evaluating the framework on five contemporary LLMs, we demonstrate encouraging, significantly outperforming baselines, results.
    摘要 Semantic consistency of a language model is broadly defined as the model's ability to produce semantically-equivalent outputs, given semantically-equivalent inputs. We address the task of assessing question-answering (QA) semantic consistency of contemporary large language models (LLMs) by manually creating a benchmark dataset with high-quality paraphrases for factual questions, and release the dataset to the community. We further combine the semantic consistency metric with additional measurements suggested in prior work as correlating with LLM QA accuracy, for building and evaluating a framework for factual QA reference-less performance prediction -- predicting the likelihood of a language model to accurately answer a question. Evaluating the framework on five contemporary LLMs, we demonstrate encouraging, significantly outperforming baselines, results.Here's the translation in Traditional Chinese as well:Semantic consistency of a language model is broadly defined as the model's ability to produce semantically-equivalent outputs, given semantically-equivalent inputs. We address the task of assessing question-answering (QA) semantic consistency of contemporary large language models (LLMs) by manually creating a benchmark dataset with high-quality paraphrases for factual questions, and release the dataset to the community. We further combine the semantic consistency metric with additional measurements suggested in prior work as correlating with LLM QA accuracy, for building and evaluating a framework for factual QA reference-less performance prediction -- predicting the likelihood of a language model to accurately answer a question. Evaluating the framework on five contemporary LLMs, we demonstrate encouraging, significantly outperforming baselines, results.

Chinesewebtext: Large-scale high-quality Chinese web text extracted with effective evaluation model

  • paper_url: http://arxiv.org/abs/2311.01149
  • repo_url: https://github.com/casia-lm/chinesewebtext
  • paper_authors: Jianghao Chen, Pu Jian, Tengxiao Xi, Yidong Yi, Chenglin Ding, Qianlong Du, Guibo Zhu, Chengqing Zong, Jinqiao Wang, Jiajun Zhang
  • for: 提高中文大型语言模型(LLM)的研究,需要大量和高质量的预训练数据。现有的大规模数据集主要集中在英文上,中文数据缺乏完整的工具链和高质量的预训练数据。
  • methods: 我们提出了一个新的完整工具链EvalWeb,可以从不 cleaner web 数据中提取高质量的中文文本。我们使用手动编写的规则排除Raw crawled web 内容中的直接的噪音文本。然后,我们使用一种高效的评估模型,对剩下的相对清洁数据进行评分,并将每个文本分配一个特定的质量分数。
  • results: 我们采用EvalWeb工具链,从不 cleaner web 数据中提取了1.42 TB的高质量中文文本,每个文本都有一个质量分数。此外,我们还释放了600 GB的更加干净的中文数据,其质量超过90%。
    Abstract During the development of large language models (LLMs), the scale and quality of the pre-training data play a crucial role in shaping LLMs' capabilities. To accelerate the research of LLMs, several large-scale datasets, such as C4 [1], Pile [2], RefinedWeb [3] and WanJuan [4], have been released to the public. However, most of the released corpus focus mainly on English, and there is still lack of complete tool-chain for extracting clean texts from web data. Furthermore, fine-grained information of the corpus, e.g. the quality of each text, is missing. To address these challenges, we propose in this paper a new complete tool-chain EvalWeb to extract Chinese clean texts from noisy web data. First, similar to previous work, manually crafted rules are employed to discard explicit noisy texts from the raw crawled web contents. Second, a well-designed evaluation model is leveraged to assess the remaining relatively clean data, and each text is assigned a specific quality score. Finally, we can easily utilize an appropriate threshold to select the high-quality pre-training data for Chinese. Using our proposed approach, we release the largest and latest large-scale high-quality Chinese web text ChineseWebText, which consists of 1.42 TB and each text is associated with a quality score, facilitating the LLM researchers to choose the data according to the desired quality thresholds. We also release a much cleaner subset of 600 GB Chinese data with the quality exceeding 90%.
    摘要 在大语言模型(LLM)的研发过程中,预训练数据的规模和质量对LML的能力产生关键作用。为加速LLM研究,许多大规模数据集,如C4 [1]、Pile [2]、RefinedWeb [3]和WanJuan [4],已经公开发布给广大科学家。然而,大多数发布的 corpus 都主要关注英语,中文 corpus 缺乏完整的工具链,并且每个文本的质量信息缺失。为解决这些挑战,我们在本纸提出一种新的完整工具链 EvalWeb,用于从不 cleaner 的网络数据中提取中文高质量文本。首先,与前一工作类似,我们手动制定规则来排除 raw 爬取网络内容中的直接不良文本。其次,我们设计了一种可以评估剩下的相对清洁数据的评价模型,每个文本都分配了特定的质量分数。最后,我们可以轻松地使用适当的阈值来选择中文高质量预训练数据。使用我们的提议方法,我们发布了1.42 TB 的最新和最大规模高质量中文网络文本 ChineseWebText,每个文本都有质量分数,以便 LLM 研究人员根据需要的质量阈值来选择数据。此外,我们还发布了90%以上的更加清洁的600 GB 中文数据。

Noise-Robust Fine-Tuning of Pretrained Language Models via External Guidance

  • paper_url: http://arxiv.org/abs/2311.01108
  • repo_url: None
  • paper_authors: Song Wang, Zhen Tan, Ruocheng Guo, Jundong Li
  • for: 这篇论文目的是为了提高预训练语言模型(PLM)在自然语言处理领域中的性能,并且解决实际应用中的数据标签噪音问题。
  • methods: 本文提出了一个创新的PLM fine-tuning策略,利用噪音标签和大型语言模型(LLM) like ChatGPT 的指导,帮助精确地分辨清洁和噪音标签的标签,并提供了额外的信息,以推动PLM的学习过程中。
  • results: 实验结果显示,本文的框架在Synthetic和实际的噪音标签 datasets 上具有明显的优势,较前一些基eline的性能。
    Abstract Adopting a two-stage paradigm of pretraining followed by fine-tuning, Pretrained Language Models (PLMs) have achieved substantial advancements in the field of natural language processing. However, in real-world scenarios, data labels are often noisy due to the complex annotation process, making it essential to develop strategies for fine-tuning PLMs with such noisy labels. To this end, we introduce an innovative approach for fine-tuning PLMs using noisy labels, which incorporates the guidance of Large Language Models (LLMs) like ChatGPT. This guidance assists in accurately distinguishing between clean and noisy samples and provides supplementary information beyond the noisy labels, thereby boosting the learning process during fine-tuning PLMs. Extensive experiments on synthetic and real-world noisy datasets further demonstrate the superior advantages of our framework over the state-of-the-art baselines.
    摘要 使用两阶段模型的预训练和精度调整方法,预训练语言模型(PLM)在自然语言处理领域已经实现了重要进步。然而,在实际场景中,数据标签往往受到复杂的注释过程影响,导致标签上的噪声,因此需要开发适应这类噪声标签的精度调整策略。为此,我们提出了一种新的精度调整策略,利用大型语言模型(LLM)如ChatGPT的指导,以帮助在精度调整PLM时更加准确地分辨清洁和噪声样本,并提供额外的信息,以便在精度调整过程中增强学习。我们在 sintetic和实际噪声数据集上进行了广泛的实验,并证明了我们的框架在状态静态基准上具有显著优势。

DistilWhisper: Efficient Distillation of Multi-task Speech Models via Language-Specific Experts

  • paper_url: http://arxiv.org/abs/2311.01070
  • repo_url: None
  • paper_authors: Thomas Palmeira Ferraz, Marcely Zanon Boito, Caroline Brun, Vassilina Nikoulina
  • for: 提高少数语言自动语音识别(ASR)性能
  • methods: 使用轻量级模块化精度调教和知识填充法
  • results: 比标准精度调教或LoRA适应器更有效,提高目标语言ASR性能,并增加仅一个微 Parameters overhead during inference.
    Abstract Whisper is a multitask and multilingual speech model covering 99 languages. It yields commendable automatic speech recognition (ASR) results in a subset of its covered languages, but the model still under-performs on a non-negligible number of under-represented languages, a problem exacerbated in smaller model versions. In this work, we propose DistilWhisper, an approach able to bridge the performance gap in ASR for these languages while retaining the advantages of multitask and multilingual capabilities. Our approach involves two key strategies: lightweight modular ASR fine-tuning of whisper-small using language-specific experts, and knowledge distillation from whisper-large-v2. This dual approach allows us to effectively boost ASR performance while keeping the robustness inherited from the multitask and multilingual pre-training. Results demonstrate that our approach is more effective than standard fine-tuning or LoRA adapters, boosting performance in the targeted languages for both in- and out-of-domain test sets, while introducing only a negligible parameter overhead at inference.
    摘要 喷零是一个多任务多语言的语音模型,覆盖99种语言。它在一部分覆盖语言上表现出色,但模型仍然在一些被束缚语言上表现不佳,这问题在较小的模型版本中变得更加严重。在这项工作中,我们提出了DistilWhisper这种方法,能够bridging ASR表现差距在这些语言上,同时保留多任务多语言预训练的优点。我们的方法包括两个关键策略:使用语言特定专家进行轻量级模块ASR精度调整,以及从whisper-large-v2中进行知识填充。这种双重方法allow我们可以很好地提高ASR表现,保留多任务多语言预训练中的稳定性。结果表明,我们的方法比标准精度调整或LoRA扩展器更有效,在目标语言上对 both in-和out-of-domain测试集中提高表现,而且只增加了一个微 parameter overhead在推理过程中。

COPAL-ID: Indonesian Language Reasoning with Local Culture and Nuances

  • paper_url: http://arxiv.org/abs/2311.01012
  • repo_url: https://github.com/haryoa/copal-id
  • paper_authors: Haryo Akbarianto Wibowo, Erland Hilman Fuadi, Made Nindyatama Nityasya, Radityo Eko Prasojo, Alham Fikri Aji
  • for: 这个论文是为了描述一个新的印度尼西亚语言常识逻辑数据集(COPAL-ID)。
  • methods: 这个数据集是由本地native写作,具有更加自然的日常 causal 逻辑,而不同于之前的印度尼西亚COPA数据集(XCOPA-ID)。
  • results: 研究发现,even the current best open-source, multilingual model struggles to perform well on COPAL-ID,其中的accuracy是65.47%, significanly lower than XCOPA-ID(79.40%)。这显示这些语言模型仍然很难理解印度尼西亚的地方特点。
    Abstract We present publicly available COPAL-ID, a novel Indonesian language common sense reasoning dataset. Unlike the previous Indonesian COPA dataset (XCOPA-ID), COPAL-ID incorporates Indonesian local and cultural nuances, and therefore, provides a more natural portrayal of day-to-day causal reasoning within the Indonesian cultural sphere. Professionally written by natives from scratch, COPAL-ID is more fluent and free from awkward phrases, unlike the translated XCOPA-ID. In addition, we present COPAL-ID in both standard Indonesian and in Jakartan Indonesian--a dialect commonly used in daily conversation. COPAL-ID poses a greater challenge for existing open-sourced and closed state-of-the-art multilingual language models, yet is trivially easy for humans. Our findings suggest that even the current best open-source, multilingual model struggles to perform well, achieving 65.47% accuracy on COPAL-ID, significantly lower than on the culturally-devoid XCOPA-ID (79.40%). Despite GPT-4's impressive score, it suffers the same performance degradation compared to its XCOPA-ID score, and it still falls short of human performance. This shows that these language models are still way behind in comprehending the local nuances of Indonesian.
    摘要 我们公开提供了一个新的印度尼西亚语言常识理解数据集,称为COPAL-ID。与前一代印度尼西亚COPA数据集(XCOPA-ID)不同,COPAL-ID包含了印度尼西亚当地的本地文化特点,因此更能呈现日常生活中印度尼西亚文化圈中的 causal reasoning 的自然场景。由本地Native编写,COPAL-ID更加流畅,无awkward phrase,与XCOPA-ID不同。此外,我们还提供了COPAL-ID的标准印度尼西亚和雅加达印度尼西亚语言版本--一种日常交流中广泛使用的方言。COPAL-ID对现有的开源和关闭式状态艺术语言模型提出了更大的挑战,然而对人类来说是极其容易的。我们的发现表明, même GPT-4 在COPAL-ID 上表现出色,但它仍然落后于人类表现,只有65.47%的准确率,与 XCOPA-ID 的79.40%的准确率相比有所下降。这显示了这些语言模型仍然远远落后于理解印度尼西亚本地特点。

Blending Reward Functions via Few Expert Demonstrations for Faithful and Accurate Knowledge-Grounded Dialogue Generation

  • paper_url: http://arxiv.org/abs/2311.00953
  • repo_url: None
  • paper_authors: Wanyu Du, Yangfeng Ji
    for: 实现对话式资讯搜寻系统的可靠性和准确性,需要对话模型能够基于相关知识文本生成 faithful和 precisel 的回答。methods: 我们使用了增强学习算法,创建了一个新的赏罚函数,让模型能够在不可靠的知识文本中快速地学习并生成高质量的回答。results: 我们的方法在两个对话式资讯搜寻 datasets 上进行了实验,结果显示我们的方法可以与其他强制学习基于标签的基底搜寻方法竞争。
    Abstract The development of trustworthy conversational information-seeking systems relies on dialogue models that can generate faithful and accurate responses based on relevant knowledge texts. However, two main challenges hinder this task. Firstly, language models may generate hallucinations due to data biases present in their pretraining corpus. Secondly, knowledge texts often contain redundant and irrelevant information that distracts the model's attention from the relevant text span. Previous works use additional data annotations on the knowledge texts to learn a knowledge identification module in order to bypass irrelevant information, but collecting such high-quality span annotations can be costly. In this work, we leverage reinforcement learning algorithms to overcome the above challenges by introducing a novel reward function. Our reward function combines an accuracy metric and a faithfulness metric to provide a balanced quality judgment of generated responses, which can be used as a cost-effective approximation to a human preference reward model when only a few preference annotations are available. Empirical experiments on two conversational information-seeking datasets demonstrate that our method can compete with other strong supervised learning baselines.
    摘要 发展可靠的对话信息搜索系统需要对话模型可以基于相关知识文本生成准确和 faithful的回答。然而,两大挑战妨碍这项任务。首先,语言模型可能会生成幻觉,因为它们的预训练集中存在数据偏见。其次,知识文本经常包含无关和不必要的信息,这会让模型的注意力偏离重要的文本段落。在这种情况下,先前的工作是通过添加知识文本上的数据标注来学习知识标识模块,以避免无关信息的影响。然而,收集这些高质量的标注数据可以是昂贵的。在这种情况下,我们利用了强化学习算法,通过引入一个新的奖励函数来解决以上挑战。我们的奖励函数组合了一个准确度指标和一个忠实度指标,以提供一个平衡的质量评价标准,可以用作只有几个偏好标注available的情况下的cost-effective approximation。我们的实验结果表明,我们的方法可以与其他强制学习基elines相比竞争。

E3 TTS: Easy End-to-End Diffusion-based Text to Speech

  • paper_url: http://arxiv.org/abs/2311.00945
  • repo_url: None
  • paper_authors: Yuan Gao, Nobuyuki Morioka, Yu Zhang, Nanxin Chen
  • for: 这种研究的目的是开发一种简单、高效的端到端文本译 speech模型,该模型基于扩散过程来生成高质量的声音波形。
  • methods: 这种模型使用扩散过程来模拟声音波形的时间结构,而不需要任何中间表示或对齐信息。
  • results: 实验表明,这种模型可以生成高 fideltity 的声音,与当前最佳 neural TTS 系统的性能相似。声音样本可以在 https://e3tts.github.io 中找到。
    Abstract We propose Easy End-to-End Diffusion-based Text to Speech, a simple and efficient end-to-end text-to-speech model based on diffusion. E3 TTS directly takes plain text as input and generates an audio waveform through an iterative refinement process. Unlike many prior work, E3 TTS does not rely on any intermediate representations like spectrogram features or alignment information. Instead, E3 TTS models the temporal structure of the waveform through the diffusion process. Without relying on additional conditioning information, E3 TTS could support flexible latent structure within the given audio. This enables E3 TTS to be easily adapted for zero-shot tasks such as editing without any additional training. Experiments show that E3 TTS can generate high-fidelity audio, approaching the performance of a state-of-the-art neural TTS system. Audio samples are available at https://e3tts.github.io.
    摘要 我们提出了易用的端到端扩散基于文本到语音模型E3 TTS,这是一种简单、高效的端到端文本到语音模型。E3 TTS直接从普通文本输入中获取 Audio waveform ,通过迭代精化过程来生成。与许多先前的工作不同,E3 TTS不依赖于任何中间表示,如spectrogram特征或投入信息。相反,E3 TTS通过扩散过程来模型波形的时间结构。不需要额外的条件信息,E3 TTS可以轻松地适应零扩展任务,如编辑,无需进行额外的训练。实验表明,E3 TTS可以生成高品质的Audio,与当前的 neural TTS 系统性能相似。Audio 样本可以在https://e3tts.github.io 找到。

Task-Agnostic Low-Rank Adapters for Unseen English Dialects

  • paper_url: http://arxiv.org/abs/2311.00915
  • repo_url: https://github.com/zedian/hyperlora
  • paper_authors: Zedian Xiao, William Held, Yanchen Liu, Diyi Yang
  • for: 这个研究旨在对英文口语方言的多样性进行融合,以便实现语言科技的普遍化。
  • methods: 这个研究使用了专家语言知识来构建HyperLoRA方法,它可以对不同口语方言进行资源有效的适应。
  • results: HyperLoRA方法在5种口语方言的零条件设定下实现了最好或最竞争的性能,并且比传统方法更加数据效率。
    Abstract Large Language Models (LLMs) are trained on corpora disproportionally weighted in favor of Standard American English. As a result, speakers of other dialects experience significantly more failures when interacting with these technologies. In practice, these speakers often accommodate their speech to be better understood. Our work shares the belief that language technologies should be designed to accommodate the diversity in English dialects and not the other way around. However, prior works on dialect struggle with generalizing to evolving and emerging dialects in a scalable manner. To fill this gap, our method, HyperLoRA, leverages expert linguistic knowledge to enable resource-efficient adaptation via hypernetworks. By disentangling dialect-specific and cross-dialectal information, HyperLoRA improves generalization to unseen dialects in a task-agnostic fashion. Not only is HyperLoRA more scalable in the number of parameters, but it also achieves the best or most competitive performance across 5 dialects in a zero-shot setting. In this way, our approach facilitates access to language technology for billions of English dialect speakers who are traditionally underrepresented.
    摘要

Self-Influence Guided Data Reweighting for Language Model Pre-training

  • paper_url: http://arxiv.org/abs/2311.00913
  • repo_url: None
  • paper_authors: Megh Thakkar, Tolga Bolukbasi, Sriram Ganapathy, Shikhar Vashishth, Sarath Chandar, Partha Talukdar
  • for: 这篇研究旨在提出一种样本重量化方法,以提高预训练语言模型(LM)的质量和稳定性。
  • methods: 该方法基于自我影响(SI)分数,用于重量化预训练数据中的样本。
  • results: 经过广泛的分析, authors 发现 PRESENCE 可以提高模型的新鲜度和稳定性,并且适用于不同的模型大小、数据集和任务。
    Abstract Language Models (LMs) pre-trained with self-supervision on large text corpora have become the default starting point for developing models for various NLP tasks. Once the pre-training corpus has been assembled, all data samples in the corpus are treated with equal importance during LM pre-training. However, due to varying levels of relevance and quality of data, equal importance to all the data samples may not be the optimal choice. While data reweighting has been explored in the context of task-specific supervised learning and LM fine-tuning, model-driven reweighting for pre-training data has not been explored. We fill this important gap and propose PRESENCE, a method for jointly reweighting samples by leveraging self-influence (SI) scores as an indicator of sample importance and pre-training. PRESENCE promotes novelty and stability for model pre-training. Through extensive analysis spanning multiple model sizes, datasets, and tasks, we present PRESENCE as an important first step in the research direction of sample reweighting for pre-training language models.
    摘要 Language Models (LMs) 预先训练于大量文本资料上已成为开发不同NLPTask的默认开始点。在LM预训练中,所有数据样本均受到相同的重要性对待。然而,由于数据样本的相关性和质量可能存在差异,那么对所有数据样本都受到相同的重要性可能并不是最佳选择。而数据重新分配已在任务特定的超vised学习和LM精度调整中被探索,但模型驱动的数据重新分配 для预训练数据尚未被探索。我们填补了这一重要的空白,并提出了PRESENCE,一种通过自我影响(SI)分数作为样本重要性指标来重新分配样本的方法。PRESENCE提高了模型预训练的新鲜性和稳定性。通过多个模型大小、数据集和任务的广泛分析,我们展示了PRESENCE作为预训练语言模型样本重新分配的重要首步。

Re-weighting Tokens: A Simple and Effective Active Learning Strategy for Named Entity Recognition

  • paper_url: http://arxiv.org/abs/2311.00906
  • repo_url: None
  • paper_authors: Haocheng Luo, Wei Tan, Ngoc Dang Nguyen, Lan Du
  • for: 提高Named Entity Recognition(NER)模型的效果,这是因为数据不均衡的问题,使得活跃学习效果不佳。
  • methods: 使用重新定Weighting策略,将个别字符的 weights给动态调整,这个策略可以与不同的字符获取函数相容。
  • results: 实验结果显示,将重新定Weighting策略与现有的获取函数搭配使用,可以实现大幅提高NER模型的性能。
    Abstract Active learning, a widely adopted technique for enhancing machine learning models in text and image classification tasks with limited annotation resources, has received relatively little attention in the domain of Named Entity Recognition (NER). The challenge of data imbalance in NER has hindered the effectiveness of active learning, as sequence labellers lack sufficient learning signals. To address these challenges, this paper presents a novel reweighting-based active learning strategy that assigns dynamic smoothed weights to individual tokens. This adaptable strategy is compatible with various token-level acquisition functions and contributes to the development of robust active learners. Experimental results on multiple corpora demonstrate the substantial performance improvement achieved by incorporating our re-weighting strategy into existing acquisition functions, validating its practical efficacy.
    摘要 Here's the text in Simplified Chinese:活动学习,一种广泛采用的技术以增强机器学习模型在文本和图像分类任务中,尤其是在有限的标注资源的情况下,受到了NER领域的Relative little attention。NER数据不均衡的挑战使得活动学习效果受到了限制,因为序列标签器缺乏足够的学习信号。为Address these challenges, this paper presents a novel reweighting-based active learning strategy that assigns dynamic smoothed weights to individual tokens. This adaptable strategy is compatible with various token-level acquisition functions and contributes to the development of robust active learners. Experimental results on multiple corpora demonstrate the significant performance improvement achieved by incorporating our re-weighting strategy into existing acquisition functions, validating its practical effectiveness.

cs.LG - 2023-11-02

PPI++: Efficient Prediction-Powered Inference

  • paper_url: http://arxiv.org/abs/2311.01453
  • repo_url: https://github.com/aangelopoulos/ppi_py
  • paper_authors: Anastasios N. Angelopoulos, John C. Duchi, Tijana Zrnic
  • for: 这个论文是为了提出一种 computationally lightweight 的方法来进行估计和推理,使用小量标注数据和大量机器学习预测结果。
  • methods: 这个方法使用 prediction-powered inference (PPI) 方法,通过自动适应可用预测的质量,计算出高效的自信量集,用于估计参数的任意维度。
  • results: 实验和 sintetic experiments 表明,提出的修改可以提高计算和统计效率,并且在不同的预测质量下都能够得到更好的结果。
    Abstract We present PPI++: a computationally lightweight methodology for estimation and inference based on a small labeled dataset and a typically much larger dataset of machine-learning predictions. The methods automatically adapt to the quality of available predictions, yielding easy-to-compute confidence sets -- for parameters of any dimensionality -- that always improve on classical intervals using only the labeled data. PPI++ builds on prediction-powered inference (PPI), which targets the same problem setting, improving its computational and statistical efficiency. Real and synthetic experiments demonstrate the benefits of the proposed adaptations.
    摘要 我团队现请您耳机提供PPI++:一种 computationally 轻量级的方法,用于基于小量标注数据集和大量机器学习预测数据集的估计和推理。该方法自动适应可用预测的质量,生成可算法的信任区间,用于任何维度的参数。PPI++ 基于预测力量推理(PPI),targeting 同一个问题设定,提高其计算和统计效率。实际和Synthetic 实验表明提案的修改带来了优势。Note that "PPI" in the text refers to "prediction-powered inference", which is a methodology for estimation and inference based on a small labeled dataset and a larger dataset of machine-learning predictions.

Deep Double Descent for Time Series Forecasting: Avoiding Undertrained Models

  • paper_url: http://arxiv.org/abs/2311.01442
  • repo_url: None
  • paper_authors: Valentino Assandri, Sam Heshmati, Burhaneddin Yaman, Anton Iakovlev, Ariel Emiliano Repetur
  • for: 本研究旨在探讨深度学习模型在时间序列预测中的训练策略,即如何训练深度学习模型不受模型架构的限制。
  • methods: 我们采用了广泛的实验方法来研究深度学习模型在时间序列预测中的深度双峰现象,以及使用更多的迭代可以抑制过拟合。
  • results: 我们在72个公共时间序列数据集中实现了长序时间序列预测的状态对应率,达到了70%左右。此外,我们还提出了一种分类方法来描述训练策略的修改,包括数据增强、模型输入、模型目标、时间序列数据量和计算预算。
    Abstract Deep learning models, particularly Transformers, have achieved impressive results in various domains, including time series forecasting. While existing time series literature primarily focuses on model architecture modifications and data augmentation techniques, this paper explores the training schema of deep learning models for time series; how models are trained regardless of their architecture. We perform extensive experiments to investigate the occurrence of deep double descent in several Transformer models trained on public time series data sets. We demonstrate epoch-wise deep double descent and that overfitting can be reverted using more epochs. Leveraging these findings, we achieve state-of-the-art results for long sequence time series forecasting in nearly 70% of the 72 benchmarks tested. This suggests that many models in the literature may possess untapped potential. Additionally, we introduce a taxonomy for classifying training schema modifications, covering data augmentation, model inputs, model targets, time series per model, and computational budget.
    摘要 深度学习模型,特别是转换器,在不同领域中已经达到了吸引人的成绩,包括时间序列预测。现有的时间序列文献主要关注模型建构修改和数据扩充技术,而这篇论文则探索了深度学习模型在时间序列训练中的Schema;无论模型的建构如何。我们进行了广泛的实验,探讨了深度双峰现象在多种转换器模型中的发生,并证明了训练epoch数的增加可以抑制过拟合。基于这些发现,我们在72个benchmark测试集中实现了长序时间序列预测的州OF-the-art结果,达到了70%左右。这表示许多在文献中出现的模型可能具有未发挥的潜力。此外,我们还提出了训练schema修改的分类法,涵盖了数据扩充、模型输入、模型目标、时间序列每个模型和计算预算。

Contrastive Moments: Unsupervised Halfspace Learning in Polynomial Time

  • paper_url: http://arxiv.org/abs/2311.01435
  • repo_url: None
  • paper_authors: Xinyuan Cao, Santosh S. Vempala
  • for: 学习高维半空间,保证TV距离 Within desired 程度。
  • methods: 使用一种多项式时间算法,不需要标签,可以快速地学习高维半空间。
  • results: 算法的样本和时间复杂度是多项式增长,可以保证学习结果的准确性。
    Abstract We give a polynomial-time algorithm for learning high-dimensional halfspaces with margins in $d$-dimensional space to within desired TV distance when the ambient distribution is an unknown affine transformation of the $d$-fold product of an (unknown) symmetric one-dimensional logconcave distribution, and the halfspace is introduced by deleting at least an $\epsilon$ fraction of the data in one of the component distributions. Notably, our algorithm does not need labels and establishes the unique (and efficient) identifiability of the hidden halfspace under this distributional assumption. The sample and time complexity of the algorithm are polynomial in the dimension and $1/\epsilon$. The algorithm uses only the first two moments of suitable re-weightings of the empirical distribution, which we call contrastive moments; its analysis uses classical facts about generalized Dirichlet polynomials and relies crucially on a new monotonicity property of the moment ratio of truncations of logconcave distributions. Such algorithms, based only on first and second moments were suggested in earlier work, but hitherto eluded rigorous guarantees. Prior work addressed the special case when the underlying distribution is Gaussian via Non-Gaussian Component Analysis. We improve on this by providing polytime guarantees based on Total Variation (TV) distance, in place of existing moment-bound guarantees that can be super-polynomial. Our work is also the first to go beyond Gaussians in this setting.
    摘要 我们提供一个多项式时间算法,用于在$d$-维空间中学习高维半空间,并且保证与所需的总体变化距离(TV距离)相匹配。在这个设定下,我们假设拥有一个未知的一维对称凹降分布,并且半空间是通过删除至少$\epsilon$ fraction的数据来引入的。我们的算法不需要标签,并且可以高效地识别隐藏的半空间,具有 polynomial 的样本和时间复杂度,其中 $1/\epsilon$ 是随着维度和 $\epsilon$ 的函数。我们的算法使用适当的重新权重的 empirical distribution 的第一个和第二个 момен数,我们称之为对比分oment,并且我们的分析基于通用的 Generalized Dirichlet polynomials 的特性,以及一个新的准确性Property of the moment ratio of truncations of logconcave distributions。这种算法在之前的工作中已经提出,但是它们没有得到正式的保证。在先前的工作中,人们已经处理了下面特殊情况:在下面的设定下,所下面的分布是 Gaussian。我们超越了这个特殊情况,并提供了基于 TV 距离的多项式时间 guarantees,而不是之前的时间 bound 保证,这些保证可能是超 polynomial。此外,我们的工作也是首次超越了 Gaussian 的情况。

Identifying Alzheimer Disease Dementia Levels Using Machine Learning Methods

  • paper_url: http://arxiv.org/abs/2311.01428
  • repo_url: None
  • paper_authors: Md Gulzar Hussain, Ye Shiren
  • for: 本研究旨在用机器学习或深度学习模型为您的AD分类提供有效方法。
  • methods: 本研究使用RF、SVM和CNN算法,并将水池分 segmentation作为特征提取方法。
  • results: 我们的结果表明,SVM与水池特征的组合可以达到96.25%的高精度,超过其他分类方法。
    Abstract Dementia, a prevalent neurodegenerative condition, is a major manifestation of Alzheimer's disease (AD). As the condition progresses from mild to severe, it significantly impairs the individual's ability to perform daily tasks independently, necessitating the need for timely and accurate AD classification. Machine learning or deep learning models have emerged as effective tools for this purpose. In this study, we suggested an approach for classifying the four stages of dementia using RF, SVM, and CNN algorithms, augmented with watershed segmentation for feature extraction from MRI images. Our results reveal that SVM with watershed features achieves an impressive accuracy of 96.25%, surpassing other classification methods. The ADNI dataset is utilized to evaluate the effectiveness of our method, and we observed that the inclusion of watershed segmentation contributes to the enhanced performance of the models.
    摘要 德мен谱,一种常见的神经元 degeneration 病种,是阿尔ц海默病(AD)的主要表现。随着病情从轻到严重,它会significantly 减退个体独立完成日常任务的能力,导致了时间和准确的 AD 分类的需求。机器学习或深度学习模型已成为有效的工具。在这项研究中,我们建议了基于 RF、SVM 和 CNN 算法的四个阶段 демен谱分类方法,并使用 watershed 分 segmentation 来提取 MRI 图像的特征。我们的结果表明,SVM 与 watershed 特征达到了96.25%的准确率,超过了其他分类方法。我们使用 ADNI 数据集来评估我们的方法的效果,并发现了包括 watershed 分 segmentation 的模型表现更佳。

Holistic Transfer: Towards Non-Disruptive Fine-Tuning with Partial Target Data

  • paper_url: http://arxiv.org/abs/2311.01420
  • repo_url: None
  • paper_authors: Cheng-Hao Tu, Hong-You Chen, Zheda Mai, Jike Zhong, Vardaan Pahuja, Tanya Berger-Wolf, Song Gao, Charles Stewart, Yu Su, Wei-Lun Chao
  • for: 本研究旨在适应预训练模型到目标频谱中进行类别化,使用目标数据覆盖部分标签空间。这种问题在实际应用中很重要,因为收集全部类别数据是不现实的。然而,这个问题在文献中得到了有限的关注。本文通过构建 benchmark 数据集和广泛的实验来探讨这个问题,并发现一个困境:在新目标频谱中适应是重要的,但是保持缺失类别的准确率是极其困难的。
  • methods: 我们提出了两个关键方向来解决这个困境:1)分离频谱差异和分类差异,2)保持类别关系。我们提出了多种有效的解决方案,以保持缺失类别的准确率,并提高总性能。
  • results: 我们的实验结果显示,我们的方法可以维持缺失类别的准确率,同时提高总性能,建立了基于部分目标数据的预训练模型的坚实基准。
    Abstract We propose a learning problem involving adapting a pre-trained source model to the target domain for classifying all classes that appeared in the source data, using target data that covers only a partial label space. This problem is practical, as it is unrealistic for the target end-users to collect data for all classes prior to adaptation. However, it has received limited attention in the literature. To shed light on this issue, we construct benchmark datasets and conduct extensive experiments to uncover the inherent challenges. We found a dilemma -- on the one hand, adapting to the new target domain is important to claim better performance; on the other hand, we observe that preserving the classification accuracy of classes missing in the target adaptation data is highly challenging, let alone improving them. To tackle this, we identify two key directions: 1) disentangling domain gradients from classification gradients, and 2) preserving class relationships. We present several effective solutions that maintain the accuracy of the missing classes and enhance the overall performance, establishing solid baselines for holistic transfer of pre-trained models with partial target data.
    摘要 我们提出了一个实际问题,即适应预训练源模型到目标频谱中的全类划分,使用目标数据覆盖部分标签空间。这个问题在文献中受到了限制的关注。为了照明这个问题,我们构建了标准 benchmark 数据集并进行了广泛的实验,以揭示内在的挑战。我们发现了一个困境:一个方面是适应新的目标频谱非常重要,以提高性能;另一方面,我们发现保持缺失在目标适应数据中的分类精度非常困难,更何决定提高。为了解决这个问题,我们确定了两个关键方向:1)分离频谱方向和分类方向的融合,2)保持分类关系。我们提出了多种有效的解决方案,以保持缺失分类的精度并提高总性能,建立了坚实的基准值,以便整体传输预训练模型。

A Coreset-based, Tempered Variational Posterior for Accurate and Scalable Stochastic Gaussian Process Inference

  • paper_url: http://arxiv.org/abs/2311.01409
  • repo_url: None
  • paper_authors: Mert Ketenci, Adler Perotte, Noémie Elhadad, Iñigo Urteaga
  • for: 本研究提出了一种新的Stochastic Variational Gaussian Process(SVGP)推理方法,用于寻找高精度的GP模型参数。
  • methods: 该方法基于一个可学习的pseudo输入输出点集(coreset),并使用一种基于GP prior和数据可能性的可变温度家族来定义推理模型。
  • results: 研究人员通过分析CVTGP的下界,发现CVTGP可以减少学习参数的大小,保持数据 sparse和可解释的表示,并且在实际推理任务中提供了更好的证据下界估计和预测平均方差。
    Abstract We present a novel stochastic variational Gaussian process ($\mathcal{GP}$) inference method, based on a posterior over a learnable set of weighted pseudo input-output points (coresets). Instead of a free-form variational family, the proposed coreset-based, variational tempered family for $\mathcal{GP}$s (CVTGP) is defined in terms of the $\mathcal{GP}$ prior and the data-likelihood; hence, accommodating the modeling inductive biases. We derive CVTGP's lower bound for the log-marginal likelihood via marginalization of the proposed posterior over latent $\mathcal{GP}$ coreset variables, and show it is amenable to stochastic optimization. CVTGP reduces the learnable parameter size to $\mathcal{O}(M)$, enjoys numerical stability, and maintains $\mathcal{O}(M^3)$ time- and $\mathcal{O}(M^2)$ space-complexity, by leveraging a coreset-based tempered posterior that, in turn, provides sparse and explainable representations of the data. Results on simulated and real-world regression problems with Gaussian observation noise validate that CVTGP provides better evidence lower-bound estimates and predictive root mean squared error than alternative stochastic $\mathcal{GP}$ inference methods.
    摘要 我们提出了一种新的随机变量 Gaussian process($\mathcal{GP}$)推理方法,基于一个可学习的Weighted pseudo input-output点(coreset)的 posterior。而不是一个自由变量的变ational家族,我们定义了一个基于 $\mathcal{GP}$ 先验和数据可能性的 coreset-based, variational tempered家族(CVTGP)。我们 derivates CVTGP 的下界 для吞吐量 marginal likelihood via marginalization of the proposed posterior over latent $\mathcal{GP}$ coreset variables,并证明它是可数学化的。CVTGP 减少可学习参数的大小至 $\mathcal{O}(M)$,享受到数学稳定性,并保持 $\mathcal{O}(M^3)$ 时间-和 $\mathcal{O}(M^2)$ 空间复杂性,通过利用一个 coreset-based tempered posterior,从而提供稀疏和可解释的数据表示。在 simulate 和实际世界 regression 问题中,我们 obtainted results show that CVTGP 提供更好的证据下界估计和预测根据标准差Error than alternative stochastic $\mathcal{GP}$ inference methods。

Normalizing flows as approximations of optimal transport maps via linear-control neural ODEs

  • paper_url: http://arxiv.org/abs/2311.01404
  • repo_url: None
  • paper_authors: Alessandro Scagliotti, Sara Farinelli
  • for: 本研究目的是计算两个概率分布 $\mu$ 和 $\nu$ 之间的最优运输图。
  • methods: 本文使用了深度神经网络来构造可逆运输图,并使用了线性控制的启动Vector场来实现 $W_2$-优化运输图。
  • results: 本文提出了一种基于 $\Gamma$-收敛原理的数值方法,可以实现 praktisch 计算最优运输图。
    Abstract The term "Normalizing Flows" is related to the task of constructing invertible transport maps between probability measures by means of deep neural networks. In this paper, we consider the problem of recovering the $W_2$-optimal transport map $T$ between absolutely continuous measures $\mu,\nu\in\mathcal{P}(\mathbb{R}^n)$ as the flow of a linear-control neural ODE. We first show that, under suitable assumptions on $\mu,\nu$ and on the controlled vector fields, the optimal transport map is contained in the $C^0_c$-closure of the flows generated by the system. Assuming that discrete approximations $\mu_N,\nu_N$ of the original measures $\mu,\nu$ are available, we use a discrete optimal coupling $\gamma_N$ to define an optimal control problem. With a $\Gamma$-convergence argument, we prove that its solutions correspond to flows that approximate the optimal transport map $T$. Finally, taking advantage of the Pontryagin Maximum Principle, we propose an iterative numerical scheme for the resolution of the optimal control problem, resulting in an algorithm for the practical computation of the approximated optimal transport map.
    摘要 “Normalizing Flows”是指通过深度神经网络建立可逆传输映射,以实现probability measures之间的寻常化。在这篇论文中,我们考虑了将$W_2$-优化的传输 map $T$ между绝对连续概率分布 $\mu,\nu\in\mathcal{P}(\mathbb{R}^n)$的问题。我们首先表明,在适当的假设下,优化的传输 map 包含在$C^0_c$-闭合的流动中。假设可用的柯西分布 $\mu_N,\nu_N$ 是原始概率分布 $\mu,\nu$ 的精炼版本,我们使用一个最优对应问题来定义一个优化控制问题。通过 $\Gamma$-收敛证明,我们证明其解对应于流动,这些流动与优化传输 map 相似。最后,我们利用彭特拉金最大原理,提出了一种可行的数值方法,用于实现优化控制问题的解决方案,从而实现了优化传输 map 的实际计算。

Time-series Generation by Contrastive Imitation

  • paper_url: http://arxiv.org/abs/2311.01388
  • repo_url: None
  • paper_authors: Daniel Jarrett, Ioana Bica, Mihaela van der Schaar
  • for: 这个论文的目标是学习时间序列数据的生成模型,并且解决了序列设置中的独特挑战,即生成器需要捕捉转移的Conditional Dynamics,同时其打包rollouts也需要保持多步轨迹的 JOINT 分布。
  • methods: 这个论文使用了一种混合学习的框架,即通过对比估计来学习一个全局的能量模型,并且通过优化一个本地(但是前向looking)的转移策略来学习一个生成器。在训练时,这两个 ком成分被学习了合作地,避免了对抗性目标的不稳定。在推断时,学习的策略服务为生成器,并且学习的能量服务为轨迹级别的评估标准。
  • results: 该论文的实验结果表明,这种方法可以生成有用的预测样本,并且与现有的标准准确。
    Abstract Consider learning a generative model for time-series data. The sequential setting poses a unique challenge: Not only should the generator capture the conditional dynamics of (stepwise) transitions, but its open-loop rollouts should also preserve the joint distribution of (multi-step) trajectories. On one hand, autoregressive models trained by MLE allow learning and computing explicit transition distributions, but suffer from compounding error during rollouts. On the other hand, adversarial models based on GAN training alleviate such exposure bias, but transitions are implicit and hard to assess. In this work, we study a generative framework that seeks to combine the strengths of both: Motivated by a moment-matching objective to mitigate compounding error, we optimize a local (but forward-looking) transition policy, where the reinforcement signal is provided by a global (but stepwise-decomposable) energy model trained by contrastive estimation. At training, the two components are learned cooperatively, avoiding the instabilities typical of adversarial objectives. At inference, the learned policy serves as the generator for iterative sampling, and the learned energy serves as a trajectory-level measure for evaluating sample quality. By expressly training a policy to imitate sequential behavior of time-series features in a dataset, this approach embodies "generation by imitation". Theoretically, we illustrate the correctness of this formulation and the consistency of the algorithm. Empirically, we evaluate its ability to generate predictively useful samples from real-world datasets, verifying that it performs at the standard of existing benchmarks.
    摘要 请考虑学习一种生成模型 для时间序列数据。这种顺序设置具有独特的挑战:不仅 generator应该捕捉(步骤)转移的Conditional动力,而且其开放Loop执行应该保持多步轨迹的共同分布。一方面,基于MLE的autoregressive模型可以学习并计算Explicit转移分布,但它们在执行过程中会受到堆叠的错误影响。另一方面,基于GAN的 adversarial模型可以消除这种曝露偏见,但转移是隐式的难以评估。在这种工作中,我们研究了一种生成框架,它将 autoregressive 模型和 GAN 模型的优点相结合。我们在这个框架中采用了一个 momentum-matching 目标函数来减少堆叠错误的影响,并且通过对 local (但是前向的) 转移策略进行优化,使得 reinforcement 信号来自 global (但是步骤可分解的) energy 模型,该模型通过对比估计来训练。在训练时,这两个组件被学习共同,以避免 adversarial 目标函数的不稳定性。在推理时,学习的策略将服务为生成器,用于逐步采样,而学习的能量将用于评估样本质量。通过直接在时间序列特征上学习一种策略,这种方法实现了"通过模仿来生成'。理论上,我们证明了这种形式的正确性和算法的一致性。实际上,我们评估了它在真实数据上的预测用途性能,并证明它与现有的标准准确。

Monotone Generative Modeling via a Gromov-Monge Embedding

  • paper_url: http://arxiv.org/abs/2311.01375
  • repo_url: None
  • paper_authors: Wonjun Lee, Yifei Yang, Dongmian Zou, Gilad Lerman
  • for: 这个论文的目的是提出一种基于深度生成模型的方法,用于解决生成器激素网络(GANs)中的初始化条件敏感和模式崩溃问题。
  • methods: 该方法利用了格罗莫夫-蒙日 embedding(GME)来识别数据的低维结构,并将其映射到一个低维 latent space 中,保持了数据的几何结构。这个映射是通过使用 GME 和 $c$-cyclical monotonicity 的生成映射来确保,其中 $c$ 是一个内在嵌入成本。
  • results: 数值实验表明,该方法可以生成高质量的图像,避免模式崩溃,并在不同的初始化条件下展现出较好的Robustness。
    Abstract Generative Adversarial Networks (GANs) are powerful tools for creating new content, but they face challenges such as sensitivity to starting conditions and mode collapse. To address these issues, we propose a deep generative model that utilizes the Gromov-Monge embedding (GME). It helps identify the low-dimensional structure of the underlying measure of the data and then maps it, while preserving its geometry, into a measure in a low-dimensional latent space, which is then optimally transported to the reference measure. We guarantee the preservation of the underlying geometry by the GME and $c$-cyclical monotonicity of the generative map, where $c$ is an intrinsic embedding cost employed by the GME. The latter property is a first step in guaranteeing better robustness to initialization of parameters and mode collapse. Numerical experiments demonstrate the effectiveness of our approach in generating high-quality images, avoiding mode collapse, and exhibiting robustness to different starting conditions.
    摘要 生成 adversarial networks (GANs) 是一种强大的创新工具,但它们面临敏感性和模式塌缩的挑战。为解决这些问题,我们提议一种深度生成模型,利用 Gromov-Monge 嵌入 (GME)。它可以识别数据的下面结构,并将其映射到一个低维度的隐藏空间中,保持其几何结构。然后,我们可以使用最优的运输算法将其传输到参照掌握中的掌握。我们保证 GME 的嵌入精度和 $c$-cyclical 幂环境的生成映射具有良好的Robustness,其中 $c$ 是 GME 使用的内在嵌入成本。这个性质是保证更好的模型初始化参数和模式塌缩的第一步。数值实验证明我们的方法可以生成高质量的图像,避免模式塌缩,并在不同的初始参数下展现出良好的Robustness。

Respiratory Anomaly Detection using Reflected Infrared Light-wave Signals

  • paper_url: http://arxiv.org/abs/2311.01367
  • repo_url: None
  • paper_authors: Md Zobaer Islam, Brenden Martin, Carly Gotcher, Tyler Martinez, John F. O’Hara, Sabit Ekin
  • for: 这个研究旨在开发一种无接触呼吸异常检测方法,使用机械人的胸部反射的异步光波信号来检测呼吸 Parameters。
  • methods: 该方法使用低成本、普遍存在的红外照明LED和光电传感器,并使用机器学习模型来识别不同类型的呼吸数据中的异常 Parameters。
  • results: 实验结果表明,该方法可以在0.5米至1.5米的距离内准确地识别7种不同类型的呼吸数据,准确率高达96.6%。此外,系统还可以排除不含呼吸信息的数据。该系统可以在家庭或医疗机构中作为一种智能、无接触、谨慎的呼吸监测方法应用。
    Abstract In this study, we present a non-contact respiratory anomaly detection method using incoherent light-wave signals reflected from the chest of a mechanical robot that can breathe like human beings. In comparison to existing radar and camera-based sensing systems for vitals monitoring, this technology uses only a low-cost ubiquitous light source (e.g., infrared light emitting diode) and sensor (e.g., photodetector). This light-wave sensing (LWS) system recognizes different breathing anomalies from the variations of light intensity reflected from the chest of the robot within a 0.5m-1.5m range. The anomaly detection model demonstrates up to 96.6% average accuracy in classifying 7 different types of breathing data using machine learning. The model can also detect faulty data collected by the system that does not contain breathing information. The developed system can be utilized at home or healthcare facilities as a smart, non-contact and discreet respiration monitoring method.
    摘要 在这项研究中,我们提出了一种无接触呼吸异常检测方法,使用机械人的胸部反射的不协调光波信号。与现有的雷达和摄像头基本感知系统相比,这技术只需使用低成本的普遍存在的光源(例如,红外光晶闻)和检测器(例如,光电池)。这个光波检测(LWS)系统可以从机械人胸部反射的光INTENSITY变化中识别出不同的呼吸异常。模型可以达到96.6%的均值准确率,在7种不同的呼吸数据类型中进行分类。模型还可以检测系统收集的数据中不含呼吸信息的异常数据。已开发的系统可以在家庭或医疗机构中使用,作为智能、无接触和谨慎的呼吸监测方法。

On the Lipschitz constant of random neural networks

  • paper_url: http://arxiv.org/abs/2311.01356
  • repo_url: None
  • paper_authors: Paul Geuchen, Thomas Heindl, Dominik Stöger, Felix Voigtlaender
  • for: 这篇论文旨在研究随机ReLU神经网络的Lipschitz常数。
  • methods: 该论文使用 teorethical 方法来研究随机ReLU神经网络的Lipschitz常数。
  • results: 论文得到了随机ReLU神经网络的Lipschitz常数的 bounds,其中深度取决于宽度和深度。
    Abstract Empirical studies have widely demonstrated that neural networks are highly sensitive to small, adversarial perturbations of the input. The worst-case robustness against these so-called adversarial examples can be quantified by the Lipschitz constant of the neural network. However, only few theoretical results regarding this quantity exist in the literature. In this paper, we initiate the study of the Lipschitz constant of random ReLU neural networks, i.e., neural networks whose weights are chosen at random and which employ the ReLU activation function. For shallow neural networks, we characterize the Lipschitz constant up to an absolute numerical constant. Moreover, we extend our analysis to deep neural networks of sufficiently large width where we prove upper and lower bounds for the Lipschitz constant. These bounds match up to a logarithmic factor that depends on the depth.
    摘要 empirical studies have widely demonstrated that neural networks are highly sensitive to small, adversarial perturbations of the input. the worst-case robustness against these so-called adversarial examples can be quantified by the lipschitz constant of the neural network. however, only few theoretical results regarding this quantity exist in the literature. in this paper, we initiate the study of the lipschitz constant of random relu neural networks, i.e., neural networks whose weights are chosen at random and which employ the relu activation function. for shallow neural networks, we characterize the lipschitz constant up to an absolute numerical constant. moreover, we extend our analysis to deep neural networks of sufficiently large width where we prove upper and lower bounds for the lipschitz constant. these bounds match up to a logarithmic factor that depends on the depth.Here's the translation in Traditional Chinese:empirical studies have widely demonstrated that neural networks are highly sensitive to small, adversarial perturbations of the input. the worst-case robustness against these so-called adversarial examples can be quantified by the lipschitz constant of the neural network. however, only few theoretical results regarding this quantity exist in the literature. in this paper, we initiate the study of the lipschitz constant of random relu neural networks, i.e., neural networks whose weights are chosen at random and which employ the relu activation function. for shallow neural networks, we characterize the lipschitz constant up to an absolute numerical constant. moreover, we extend our analysis to deep neural networks of sufficiently large width where we prove upper and lower bounds for the lipschitz constant. these bounds match up to a logarithmic factor that depends on the depth.

Unreading Race: Purging Protected Features from Chest X-ray Embeddings

  • paper_url: http://arxiv.org/abs/2311.01349
  • repo_url: None
  • paper_authors: Tobias Weber, Michael Ingrisch, Bernd Bischl, David Rügamer
  • for: 这份研究是为了测试和移除医疗影像模型内的保护特征效应。
  • methods: 这份研究使用了一种叫做“orthogonalization”的方法,以移除医疗影像模型内的保护特征影响。
  • results: 研究发现,在医疗影像预测中,保护特征会有影响,但是使用“orthogonalization”方法可以移除这些影响,同时维持预测性能。I hope that helps! Let me know if you have any other questions.
    Abstract Purpose: To analyze and remove protected feature effects in chest radiograph embeddings of deep learning models. Materials and Methods: An orthogonalization is utilized to remove the influence of protected features (e.g., age, sex, race) in chest radiograph embeddings, ensuring feature-independent results. To validate the efficacy of the approach, we retrospectively study the MIMIC and CheXpert datasets using three pre-trained models, namely a supervised contrastive, a self-supervised contrastive, and a baseline classifier model. Our statistical analysis involves comparing the original versus the orthogonalized embeddings by estimating protected feature influences and evaluating the ability to predict race, age, or sex using the two types of embeddings. Results: Our experiments reveal a significant influence of protected features on predictions of pathologies. Applying orthogonalization removes these feature effects. Apart from removing any influence on pathology classification, while maintaining competitive predictive performance, orthogonalized embeddings further make it infeasible to directly predict protected attributes and mitigate subgroup disparities. Conclusion: The presented work demonstrates the successful application and evaluation of the orthogonalization technique in the domain of chest X-ray classification.
    摘要 目的:分析并消除深度学习模型中保护特征的影响(例如年龄、性别、种族)的影响。材料和方法:使用正交化来消除深度学习模型中的保护特征影响,以确保不受特定特征影响的结果。为验证方法的有效性,我们对MIMIC和CheXpert数据集使用三个预训练模型,namely一个监督对比模型、一个自监督对比模型和一个基eline类ifier模型。我们的统计分析包括比较原始与正交化的嵌入之间的差异,并评估使用两种类型的嵌入来预测年龄、性别和种族。结果:我们的实验显示,保护特征对疾病预测产生了显著的影响。通过正交化,消除这些特征的影响。除了消除疾病预测中的保护特征影响,正交化嵌入还使得不可直接预测保护属性,减少了 subgroup disparities。结论:本研究示出了在胸部X射线分类领域中正交化技术的成功应用和评估。

High-dimensional Linear Bandits with Knapsacks

  • paper_url: http://arxiv.org/abs/2311.01327
  • repo_url: None
  • paper_authors: Wanteng Ma, Dong Xia, Jiashuo Jiang
  • for: 本研究探讨了高维度上的上下文抽象鸟(CBwK)问题,即每个抽象鸟的奖励等于高维度稀疏加权向量与当前到达者特征的乘积,加上随机噪声。
  • methods: 本研究提出了一种在高维度上实现稀疏估计的在线变体,并将其与 primal-dual 框架相结合,以控制杯具容量的消耗。
  • results: 研究人员表明,这种整合方法可以实现依赖于特征维度的下线 regret,并在数据缺乏和数据充沛两个 Régime 中达到最优 regret。同时,他们还进行了数值实验,证明了他们的算法在高维度上的实际性。
    Abstract We study the contextual bandits with knapsack (CBwK) problem under the high-dimensional setting where the dimension of the feature is large. The reward of pulling each arm equals the multiplication of a sparse high-dimensional weight vector and the feature of the current arrival, with additional random noise. In this paper, we investigate how to exploit this sparsity structure to achieve improved regret for the CBwK problem. To this end, we first develop an online variant of the hard thresholding algorithm that performs the sparse estimation in an online manner. We further combine our online estimator with a primal-dual framework, where we assign a dual variable to each knapsack constraint and utilize an online learning algorithm to update the dual variable, thereby controlling the consumption of the knapsack capacity. We show that this integrated approach allows us to achieve a sublinear regret that depends logarithmically on the feature dimension, thus improving the polynomial dependency established in the previous literature. We also apply our framework to the high-dimension contextual bandit problem without the knapsack constraint and achieve optimal regret in both the data-poor regime and the data-rich regime. We finally conduct numerical experiments to show the efficient empirical performance of our algorithms under the high dimensional setting.
    摘要 我们研究了高维度上的上下文抽奖问题(CBwK),其中抽奖奖劵的值等于高维度稀疏质量 вектор与当前到达者的特征进行乘法,加上随机噪声。在这篇论文中,我们探讨如何利用这种稀疏结构来实现CBwK问题的改善 regret。为此,我们首先开发了在线的坚持阈值算法,该算法在线上进行稀疏估计。然后,我们将我们的在线估计与 primal-dual 框架相结合,其中我们将每个knapsack约束分配一个 dual 变量,并使用在线学习算法来更新 dual 变量,以控制knapsack资源的使用。我们证明这种整合方法可以实现依赖于特征维度的倒数的 regret,而不是在先前的文献中所确立的多项式依赖。此外,我们还应用了我们的框架到高维度上的无约束上下文抽奖问题,并在数据稀缺和数据充沛两种情况下实现了最优的 regret。最后,我们进行了数据分析,证明了我们的算法在高维度设置下的有效性。

Long-Range Neural Atom Learning for Molecular Graphs

  • paper_url: http://arxiv.org/abs/2311.01276
  • repo_url: None
  • paper_authors: Xuan Li, Zhanke Zhou, Jiangchao Yao, Yu Rong, Lu Zhang, Bo Han
  • for: 提高 Graph Neural Networks (GNNs) 在药物发现中的性能,特别是捕捉长距离交互 (LRI),以提高分子质量的预测。
  • methods: 提出一种方法,使用归纳神经元(Neural Atoms)将原始原子归纳到一些抽象的表示中,以捕捉分子中各个原子组的共同信息。该方法通过明确地交换归纳神经元之间的信息,使归纳神经元成为分子中不同原子之间的通信渠道,从而减少了任意两个节点之间的交互范围。
  • results: 通过对三个长距离图 benchmark 进行广泛的实验,证明该方法可以与任何 GNN 结合使用,并帮助捕捉 LRI。
    Abstract Graph Neural Networks (GNNs) have been widely adopted for drug discovery with molecular graphs. Nevertheless, current GNNs are mainly good at leveraging short-range interactions (SRI) but struggle to capture long-range interactions (LRI), both of which are crucial for determining molecular properties. To tackle this issue, we propose a method that implicitly projects all original atoms into a few Neural Atoms, which abstracts the collective information of atomic groups within a molecule. Specifically, we explicitly exchange the information among neural atoms and project them back to the atoms' representations as an enhancement. With this mechanism, neural atoms establish the communication channels among distant nodes, effectively reducing the interaction scope of arbitrary node pairs into a single hop. To provide an inspection of our method from a physical perspective, we reveal its connection with the traditional LRI calculation method, Ewald Summation. We conduct extensive experiments on three long-range graph benchmarks, covering both graph-level and link-level tasks on molecular graphs. We empirically justify that our method can be equipped with an arbitrary GNN and help to capture LRI.
    摘要 Graph Neural Networks (GNNs) 已经广泛应用于分子图中的药物探索。然而,目前的 GNNs 主要能够利用短距离交互(SRI),而忽略长距离交互(LRI),这两者都是决定分子性质的关键因素。为解决这个问题,我们提议一种方法,将原始原子Projects into几个神经原子,这些神经原子抽象了分子中原子群的共同信息。具体来说,我们显式地交换神经原子之间的信息,并将其投影回原子表示中作为增强。通过这种机制,神经原子建立了跨节点通信频道,有效地将任意节点对的交互范围减少为单个跳远。为了物理上 inspect 我们的方法,我们揭示了它与传统的 LRI 计算方法,Ewald Summation 之间的连接。我们在三个长距离图 benchmark 上进行了广泛的实验,覆盖了分子图级别和链接级别任务。我们经验表明,我们的方法可以与任意 GNN 结合使用,帮助捕捉 LRI。

Sanitized Clustering against Confounding Bias

  • paper_url: http://arxiv.org/abs/2311.01252
  • repo_url: https://github.com/evaflower/scab
  • paper_authors: Yinghua Yao, Yuangang Pan, Jing Li, Ivor W. Tsang, Xin Yao
    for:This paper aims to address the issue of confounding bias in cluster analysis by proposing a new framework called Sanitized Clustering Against confounding Bias (SCAB).methods:The SCAB framework uses a Variational Auto-Encoder (VAE) to eliminate the confounding bias in the semantic latent space of complex data by minimizing the mutual information between the confounding factor and the latent representation.results:The proposed SCAB framework achieves a significant gain in clustering performance by removing the confounding bias, as demonstrated through extensive experiments on complex datasets. The code is available at \url{https://github.com/EvaFlower/SCAB}.Here's the Chinese version:for:这篇论文目的是解决复杂数据中干扰因素的影响,提出了一种新的框架叫做Sanitized Clustering Against confounding Bias (SCAB)。methods:SCAB使用Variational Auto-Encoder (VAE)来消除复杂数据中干扰因素的影响,在semantic latent space中减少干扰因素和数据的相互信息。results:提出的SCAB框架在复杂数据集上实现了干扰因素的消除,从而提高了聚类性能,经过广泛的实验 validate 这一结论。代码可以在 \url{https://github.com/EvaFlower/SCAB} 上获取。
    Abstract Real-world datasets inevitably contain biases that arise from different sources or conditions during data collection. Consequently, such inconsistency itself acts as a confounding factor that disturbs the cluster analysis. Existing methods eliminate the biases by projecting data onto the orthogonal complement of the subspace expanded by the confounding factor before clustering. Therein, the interested clustering factor and the confounding factor are coarsely considered in the raw feature space, where the correlation between the data and the confounding factor is ideally assumed to be linear for convenient solutions. These approaches are thus limited in scope as the data in real applications is usually complex and non-linearly correlated with the confounding factor. This paper presents a new clustering framework named Sanitized Clustering Against confounding Bias (SCAB), which removes the confounding factor in the semantic latent space of complex data through a non-linear dependence measure. To be specific, we eliminate the bias information in the latent space by minimizing the mutual information between the confounding factor and the latent representation delivered by Variational Auto-Encoder (VAE). Meanwhile, a clustering module is introduced to cluster over the purified latent representations. Extensive experiments on complex datasets demonstrate that our SCAB achieves a significant gain in clustering performance by removing the confounding bias. The code is available at \url{https://github.com/EvaFlower/SCAB}.
    摘要 实际世界数据集往往含有偏见,这些偏见来自于不同的来源或条件 durante 数据采集。这些不一致性本身就是一个干扰因素,对 clustering 分析造成干扰。现有方法通过将数据投影到偏见因素的 ortogonal complement 上进行 clustering。在这些方法中,数据和偏见因素在原始特征空间中被粗略地考虑,其中数据和偏见因素之间的相关性被假设为线性,以便解决方便。这些方法因此受到限制,因为实际数据通常是复杂的和非线性相关的。这篇文章提出了一种新的 clustering 框架,名为 Sanitized Clustering Against confounding Bias (SCAB),它可以在复杂数据中除掉偏见因素。SCAB 使用非线性依赖度度量来除掉偏见因素在 semantic 隐藏空间中。具体来说,我们通过 Variational Auto-Encoder (VAE) 来提取 latent representation,并将偏见因素的信息从 latent representation 中除掉。然后,我们引入 clustering 模块,将 purified 的 latent representation 进行 clustering。我们在复杂数据集上进行了广泛的实验,得到的结果表明,SCAB 可以减少偏见 bias,并且在 clustering 性能上具有显著的提升。代码可以在 \url{https://github.com/EvaFlower/SCAB} 上获取。

Gaussian Processes on Cellular Complexes

  • paper_url: http://arxiv.org/abs/2311.01198
  • repo_url: None
  • paper_authors: Mathieu Alain, So Takao, Brooks Paige, Marc Peter Deisenroth
  • for: 这篇论文旨在开发基于图的机器学习模型,以利用图的拓扑逻辑假设来训练模型。
  • methods: 该论文提出了基于维度较高的维度复杂系统(cellular complexes)的加aussian proceses(GP)模型,并提出了两种新型的kernels:一种是图Matér kernel的推广,另一种是将不同维度类型的信息混合在一起。
  • results: 该论文的实验结果表明,基于维度复杂系统的GP模型在处理复杂的图数据时具有更高的准确率和更好的一致性,而且可以更好地捕捉图的拓扑结构。
    Abstract In recent years, there has been considerable interest in developing machine learning models on graphs in order to account for topological inductive biases. In particular, recent attention was given to Gaussian processes on such structures since they can additionally account for uncertainty. However, graphs are limited to modelling relations between two vertices. In this paper, we go beyond this dyadic setting and consider polyadic relations that include interactions between vertices, edges and one of their generalisations, known as cells. Specifically, we propose Gaussian processes on cellular complexes, a generalisation of graphs that captures interactions between these higher-order cells. One of our key contributions is the derivation of two novel kernels, one that generalises the graph Mat\'ern kernel and one that additionally mixes information of different cell types.
    摘要 近年来,有很大的兴趣在开发机器学习模型,以利用图structure上的拓扑假设。特别是,有人关注在图上的Gaussian процессе,因为它可以同时考虑不确定性。然而,图只能模型两个顶点之间的关系。在这篇论文中,我们超越这个二元设定,考虑多元关系,包括顶点、边和其总是一个通用的扩展,即细胞。我们提议在细胞复杂体系上的Gaussian проце序,这是图的扩展,可以捕捉不同细胞类型之间的交互。我们的一个关键贡献是 derivation of two novel kernels,一个总结图Matér kernel,另一个同时混合不同细胞类型的信息。

  • paper_url: http://arxiv.org/abs/2311.01196
  • repo_url: https://github.com/tmlr-group/rgib
  • paper_authors: Zhanke Zhou, Jiangchao Yao, Jiaxu Liu, Xiawei Guo, Quanming Yao, Li He, Liang Wang, Bo Zheng, Bo Han
  • for: 本研究旨在提高图神经网络(GNN)对边际噪声的抗针对性。
  • methods: 本研究提出了一种信息论指导的原则,即强健图信息瓶颈(RGIB),以提取可靠的监督信号,并避免表示归一化。RGIB不同于基本信息瓶颈,更加强调和谐地处理图结构、目标标签和表示之间的互相依赖关系,建立了新的学习目标,以提高对边际噪声的抗针对性表示。
  • results: 对六个dataset和三种GNN进行了广泛的实验,证明了RGIB实例的效iveness。
    Abstract Although link prediction on graphs has achieved great success with the development of graph neural networks (GNNs), the potential robustness under the edge noise is still less investigated. To close this gap, we first conduct an empirical study to disclose that the edge noise bilaterally perturbs both input topology and target label, yielding severe performance degradation and representation collapse. To address this dilemma, we propose an information-theory-guided principle, Robust Graph Information Bottleneck (RGIB), to extract reliable supervision signals and avoid representation collapse. Different from the basic information bottleneck, RGIB further decouples and balances the mutual dependence among graph topology, target labels, and representation, building new learning objectives for robust representation against the bilateral noise. Two instantiations, RGIB-SSL and RGIB-REP, are explored to leverage the merits of different methodologies, i.e., self-supervised learning and data reparameterization, for implicit and explicit data denoising, respectively. Extensive experiments on six datasets and three GNNs with diverse noisy scenarios verify the effectiveness of our RGIB instantiations. The code is publicly available at: https://github.com/tmlr-group/RGIB.
    摘要 尽管图гра数据预测已经在图神经网络(GNNs)的发展过程中取得了很大的成功,但图像预测下边的稳定性仍然得不到充分的研究。为了填补这一漏洞,我们首先进行了一项实验,发现图像预测双方干扰了输入图гра和目标标签,导致性能下降和表示塌陷。为解决这个问题,我们提出了一种基于信息理论的原则,即Robust Graph Information Bottleneck(RGIB),用于提取可靠的监督信号并避免表示塌陷。与基本信息瓶颈不同,RGIB进一步解耦了图гра结构、目标标签和表示之间的互相关系,建立了新的学习目标,以避免双方干扰下的表示塌陷。我们还explored two instantiations, RGIB-SSL和RGIB-REP,以利用不同的方法oloiges,即自监督学习和数据重parameterization,进行隐式和显式数据干扰。我们在六个dataset和三种GNN中进行了广泛的实验,证明了我们的RGIB实现的效果。代码可以在 GitHub上获取:https://github.com/tmlr-group/RGIB。

Add and Thin: Diffusion for Temporal Point Processes

  • paper_url: http://arxiv.org/abs/2311.01139
  • repo_url: None
  • paper_authors: David Lüdke, Marin Biloš, Oleksandr Shchur, Marten Lienen, Stephan Günnemann
  • for: 这个论文是为了提出一种可靠的概率噪声扩散模型,用于识别和预测连续时间事件数据。
  • methods: 这个模型使用了TPP框架,并使用了概率噪声扩散的方法来处理整个事件序列。这种方法不同于现有的扩散方法,可以自然地处理数据中的绝对和连续组成部分。
  • results: 在 synthetic 和实际数据集上的实验中,这个模型与现状的TPP模型匹配在概率密度估计方面,而且在预测方面强力突出于现状模型。
    Abstract Autoregressive neural networks within the temporal point process (TPP) framework have become the standard for modeling continuous-time event data. Even though these models can expressively capture event sequences in a one-step-ahead fashion, they are inherently limited for long-term forecasting applications due to the accumulation of errors caused by their sequential nature. To overcome these limitations, we derive ADD-THIN, a principled probabilistic denoising diffusion model for TPPs that operates on entire event sequences. Unlike existing diffusion approaches, ADD-THIN naturally handles data with discrete and continuous components. In experiments on synthetic and real-world datasets, our model matches the state-of-the-art TPP models in density estimation and strongly outperforms them in forecasting.
    摘要 自适应神经网络在时间点过程(TPP)框架中已成为连续时间事件数据的标准模型。尽管这些模型可以表达性地捕捉事件序列,但它们因为顺序性的限制而难以长期预测。为了超越这些限制,我们 derivate ADD-THIN,一种基于概率的减噪扩散模型,用于整个事件序列。与现有扩散方法不同,ADD-THIN自然处理数据中的离散和连续组成部分。在synthetic和实际数据集上的实验中,我们的模型与状态态准TPP模型匹配在概率分布估计方面,而且在预测方面强制超越它们。

Generating QM1B with PySCF$_{\text{IPU}}$

  • paper_url: http://arxiv.org/abs/2311.01135
  • repo_url: https://github.com/graphcore-research/pyscf-ipu
  • paper_authors: Alexander Mathiasen, Hatem Helal, Kerstin Klaser, Paul Balanca, Josef Dean, Carlo Luschi, Dominique Beaini, Andrew Fitzgibbon, Dominic Masters
  • for: 这个论文的目的是探索使用增强计算机见解和自然语言处理的基础模型,以提高量子化学任务的进步。
  • methods: 这个论文使用了PySCF$_{\text{IPU}$数据生成器和智能处理单元(IPU)来创建大量的训练样例,并生成了100亿个训练样例的 dataset QM1B。
  • results: 这个论文的结果表明,一个简单的基线神经网络(SchNet 9M)可以通过增加训练样例来提高性能,而无需添加额外的逻辑假设。
    Abstract The emergence of foundation models in Computer Vision and Natural Language Processing have resulted in immense progress on downstream tasks. This progress was enabled by datasets with billions of training examples. Similar benefits are yet to be unlocked for quantum chemistry, where the potential of deep learning is constrained by comparatively small datasets with 100k to 20M training examples. These datasets are limited in size because the labels are computed using the accurate (but computationally demanding) predictions of Density Functional Theory (DFT). Notably, prior DFT datasets were created using CPU supercomputers without leveraging hardware acceleration. In this paper, we take a first step towards utilising hardware accelerators by introducing the data generator PySCF$_{\text{IPU}$ using Intelligence Processing Units (IPUs). This allowed us to create the dataset QM1B with one billion training examples containing 9-11 heavy atoms. We demonstrate that a simple baseline neural network (SchNet 9M) improves its performance by simply increasing the amount of training data without additional inductive biases. To encourage future researchers to use QM1B responsibly, we highlight several limitations of QM1B and emphasise the low-resolution of our DFT options, which also serves as motivation for even larger, more accurate datasets. Code and dataset are available on Github: http://github.com/graphcore-research/pyscf-ipu
    摘要 “基于Foundational模型的进步在计算机视觉和自然语言处理领域已经取得了巨大的进展,这些进展得到了大量的训练例子的支持。然而,量子化化领域的进展仍然受到限制,因为deep learning的潜力受到了相对较少的训练例子的限制,这些训练例子的数量在100k到20M之间。这些训练例子的有限性是因为labels是通过精度高但计算复杂的密度函数理论(DFT)来计算的。尽管之前的DFT数据集都是使用CPU超级计算机创建的,但我们在这篇论文中首次利用硬件加速器。我们开发了一个名为PySCF$_{\text{IPU}$的数据生成器,使用智能处理单元(IPU)来生成QM1B数据集,该数据集包含10亿个训练例子,其中9-11个重元素。我们示示了一个简单的基线神经网络(SchNet 9M)在增加训练数据量后,性能得到了提高。为鼓励未来的研究人员负责QM1B,我们列出了QM1B的一些限制和我们的DFT选项的低分辨率,这也作为鼓励更大、更准确的数据集的动机。代码和数据集可以在GitHub上获取:http://github.com/graphcore-research/pyscf-ipu。”

AI for Interpretable Chemistry: Predicting Radical Mechanistic Pathways via Contrastive Learning

  • paper_url: http://arxiv.org/abs/2311.01118
  • repo_url: None
  • paper_authors: Mohammadamin Tavakoli, Yin Ting T. Chiu, Alexander Shmakov, Ann Marie Carlton, David Van Vranken, Pierre Baldi
    for:This paper aims to address the limitations of existing deep learning-based reaction predictors by introducing a new system called RMechRP, which leverages contrastive learning and mechanistic pathways to provide more interpretable and generalizable predictions of radical reactions.methods:The authors use a public database of radical reactions, RMechDB, to develop and train multiple deep-learning models. They employ contrastive learning to learn a representation of chemical reactions that is based on mechanistic pathways, which are the most interpretable representation of chemical reactions.results:The authors demonstrate the effectiveness of RMechRP in providing accurate and interpretable predictions of radical reactions, and its potential for various applications in atmospheric chemistry. Their results show that RMechRP outperforms existing reaction predictors in terms of accuracy and interpretability, and has the potential to be applied in a variety of domains beyond radical chemistry.Here is the text in Simplified Chinese:for:这篇论文的目的是解决现有的深度学习基于反应预测器的限制,通过引入对比学习和机制路径来提供更加可读性和普适性的反应预测。methods:作者使用了一个公共的自由 radical reaction 数据库,RMechDB,来开发和训练多个深度学习模型。他们使用对比学习来学习化学反应的表示,基于机制路径,这是化学反应最可读的表示。results:作者表明了 RMechRP 的有效性,可以提供高度可读性和精度的 radical reaction 预测,并且在大气化学中有广泛的应用前景。他们的结果表明,RMechRP 在准确性和可读性方面超过了现有的反应预测器,并且在不同的预测任务中具有普适性。
    Abstract Deep learning-based reaction predictors have undergone significant architectural evolution. However, their reliance on reactions from the US Patent Office results in a lack of interpretable predictions and limited generalization capability to other chemistry domains, such as radical and atmospheric chemistry. To address these challenges, we introduce a new reaction predictor system, RMechRP, that leverages contrastive learning in conjunction with mechanistic pathways, the most interpretable representation of chemical reactions. Specifically designed for radical reactions, RMechRP provides different levels of interpretation of chemical reactions. We develop and train multiple deep-learning models using RMechDB, a public database of radical reactions, to establish the first benchmark for predicting radical reactions. Our results demonstrate the effectiveness of RMechRP in providing accurate and interpretable predictions of radical reactions, and its potential for various applications in atmospheric chemistry.
    摘要 深度学习基于的反应预测器在建筑方面已经经历了重要的变革。然而,它们仍然依赖于美国专利局的反应,导致预测的解释性和泛化能力受限,无法应用于其他化学领域,如自由基和大气化学。为解决这些挑战,我们提出了一种新的反应预测系统,RMechRP,它利用对比学习并与机理路径相结合,以获得最可读的化学反应表示。RMechRP特意设计为自由基反应,可提供不同水平的化学反应解释。我们使用RMechDB,一个公共的自由基反应数据库,来训练和开发多种深度学习模型,以建立自由基反应预测的首个benchmark。我们的结果表明RMechRP可以提供高度准确和可读的自由基反应预测,并有很好的应用前景在大气化学领域。

In Defense of Softmax Parametrization for Calibrated and Consistent Learning to Defer

  • paper_url: http://arxiv.org/abs/2311.01106
  • repo_url: None
  • paper_authors: Yuzhou Cao, Hussein Mozannar, Lei Feng, Hongxin Wei, Bo An
  • for: 本研究旨在提高机器学习分类器的安全性和性能,通过学习如何分类和如何委托专家来实现。
  • methods: 本研究使用学习如何委托专家的框架,并利用 asymmetric softmax 基于的代表函数来解决过去的误差潜在问题。
  • results: 本研究表明,使用 asymmetric softmax 基于的代表函数可以生成有效的估计,并且不受过去误差的影响。 addition, the study shows that the proposed method has good non-asymptotic properties and is empirically validated on benchmark datasets.
    Abstract Enabling machine learning classifiers to defer their decision to a downstream expert when the expert is more accurate will ensure improved safety and performance. This objective can be achieved with the learning-to-defer framework which aims to jointly learn how to classify and how to defer to the expert. In recent studies, it has been theoretically shown that popular estimators for learning to defer parameterized with softmax provide unbounded estimates for the likelihood of deferring which makes them uncalibrated. However, it remains unknown whether this is due to the widely used softmax parameterization and if we can find a softmax-based estimator that is both statistically consistent and possesses a valid probability estimator. In this work, we first show that the cause of the miscalibrated and unbounded estimator in prior literature is due to the symmetric nature of the surrogate losses used and not due to softmax. We then propose a novel statistically consistent asymmetric softmax-based surrogate loss that can produce valid estimates without the issue of unboundedness. We further analyze the non-asymptotic properties of our method and empirically validate its performance and calibration on benchmark datasets.
    摘要 使机器学习分类器延迟决策到下游专家更准确的决策将提高安全性和性能。这个目标可以通过学习延迟框架来实现,该框架旨在同时学习分类和延迟专家的决策。在latest studies中,已经证明了使用softmax参数化的popular estimators可以导致无限大的投票概率,从而使得其不准确。然而,还未知道这是由于广泛使用的softmax参数化还是由其他因素引起的。在这个工作中,我们首先表明了先前文献中的不准确和无限大的估计器的起因是由surrogate losses的对称性所致,而不是由softmax。然后,我们提出了一种新的 statistically consistent asymmetric softmax-based surrogate loss,可以生成有效的估计器而不受 boundedness问题的影响。我们进一步分析了我们的方法的非几何性质,并通过 benchmark datasets进行了实验验证。

Contrastive Modules with Temporal Attention for Multi-Task Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2311.01075
  • repo_url: https://github.com/niiceMing/CMTA
  • paper_authors: Siming Lan, Rui Zhang, Qi Yi, Jiaming Guo, Shaohui Peng, Yunkai Gao, Fan Wu, Ruizhi Chen, Zidong Du, Xing Hu, Xishan Zhang, Ling Li, Yunji Chen
  • for: 这篇论文主要关注在多任务强化学习中的模块原则,即将功能特性分成不同模块并合适地联合使用,以避免多任务间的负面转移问题。
  • methods: 本论文提出了对抗式模块(CMTA)方法,它通过对模块进行对比学习,并在细节水平上联合共享模块,透过时间注意力,解决多任务间的负面转移问题,提高模块方法的通用性和表现力。
  • results: 根据Meta-World多任务强化学习benchmark的实验结果,CMTA方法比单独学习每个任务的情况下,首次取得了性能提升,并在基eline之上取得了substantial的表现改善。
    Abstract In the field of multi-task reinforcement learning, the modular principle, which involves specializing functionalities into different modules and combining them appropriately, has been widely adopted as a promising approach to prevent the negative transfer problem that performance degradation due to conflicts between tasks. However, most of the existing multi-task RL methods only combine shared modules at the task level, ignoring that there may be conflicts within the task. In addition, these methods do not take into account that without constraints, some modules may learn similar functions, resulting in restricting the model's expressiveness and generalization capability of modular methods. In this paper, we propose the Contrastive Modules with Temporal Attention(CMTA) method to address these limitations. CMTA constrains the modules to be different from each other by contrastive learning and combining shared modules at a finer granularity than the task level with temporal attention, alleviating the negative transfer within the task and improving the generalization ability and the performance for multi-task RL. We conducted the experiment on Meta-World, a multi-task RL benchmark containing various robotics manipulation tasks. Experimental results show that CMTA outperforms learning each task individually for the first time and achieves substantial performance improvements over the baselines.
    摘要 在多任务强化学习领域,模块原则广泛应用,即将功能分解为不同模块,并合理地组合它们,以避免多任务之间的负面传递问题。然而,大多数现有的多任务RL方法只是将共享模块在任务级别 combinable,忽略了任务之间的冲突。此外,这些方法不考虑模块之间可能存在相似函数学习,从而限制模块的表达能力和多模块方法的泛化能力。在这篇论文中,我们提出了对这些限制的解决方案,即对比模块学习(CMTA)方法。CMTA方法通过对模块进行对比学习,并将共享模块在更细粒度的级别combine,使得在任务之间避免负面传递,提高多任务RL的表达能力和性能。我们在Meta-World,一个包含多种 робо控制任务的多任务RL benchmark上进行了实验。实验结果表明,CMTA方法可以在每个任务上学习一个新的任务,并且在基eline上达到了显著性能提升。

Deep Learning for real-time neural decoding of grasp

  • paper_url: http://arxiv.org/abs/2311.01061
  • repo_url: None
  • paper_authors: Paolo Viviani, Ilaria Gesmundo, Elios Ghinato, Andres Agudelo-Toro, Chiara Vercellino, Giacomo Vitali, Letizia Bergamasco, Alberto Scionti, Marco Ghislieri, Valentina Agostini, Olivier Terzo, Hansjörg Scherberger
  • for: 这个论文的目的是提出一种基于深度学习的 neural decoding 方法,用于 классифика grasp type 的 neural 信号。
  • methods: 这个方法使用 LSTM 网络来处理时间序列中的 neural 数据(即脉冲列),并将其分类为不同的 grasp 类型。
  • results: 论文提出的方法在使用真实的 neural 记录数据上实现了显著的提升,并且在 simulated real-time decoding 中也表现出了优于之前的成果。
    Abstract Neural decoding involves correlating signals acquired from the brain to variables in the physical world like limb movement or robot control in Brain Machine Interfaces. In this context, this work starts from a specific pre-existing dataset of neural recordings from monkey motor cortex and presents a Deep Learning-based approach to the decoding of neural signals for grasp type classification. Specifically, we propose here an approach that exploits LSTM networks to classify time series containing neural data (i.e., spike trains) into classes representing the object being grasped. The main goal of the presented approach is to improve over state-of-the-art decoding accuracy without relying on any prior neuroscience knowledge, and leveraging only the capability of deep learning models to extract correlations from data. The paper presents the results achieved for the considered dataset and compares them with previous works on the same dataset, showing a significant improvement in classification accuracy, even if considering simulated real-time decoding.
    摘要 neural decoding 涉及将Brain Machine Interfaces中获取的脑电信号与物理世界中的变量相关联,如手部运动或机器人控制。在这种情况下,这项工作从一个具体的先前存在的脑电听记录数据集开始,并提出了基于深度学习的方法来解码脑电信号,以分类抓取类型。特别是,我们提议使用LSTM网络来分类时间序列中的脑电数据(即电听轨迹)为抓取物品的类别。我们的主要目标是超越现有的解码精度,不依赖任何先前的神经科学知识,只依靠深度学习模型来从数据中提取相关性。文章介绍的结果与同一数据集的先前工作进行比较,显示了明显的提高分类精度,即使考虑实时解码。

Adapt On-the-Go: Behavior Modulation for Single-Life Robot Deployment

  • paper_url: http://arxiv.org/abs/2311.01059
  • repo_url: None
  • paper_authors: Annie S. Chen, Govind Chada, Laura Smith, Archit Sharma, Zipeng Fu, Sergey Levine, Chelsea Finn
    for: 这种研究旨在帮助机器人在部署时适应不同于训练seen的情况,以便在实际世界中成功。methods: 这种方法基于在测试时一个episode内选择和适应已经预训练的行为的价值感知机制。results: 我们的方法能够在模拟和实际Go1四脚机器人上快速适应动力学变化,并在一些out-of-distribution情况下效果更高于现有方法,提高了机器人在部署时的适应能力。
    Abstract To succeed in the real world, robots must cope with situations that differ from those seen during training. We study the problem of adapting on-the-fly to such novel scenarios during deployment, by drawing upon a diverse repertoire of previously learned behaviors. Our approach, RObust Autonomous Modulation (ROAM), introduces a mechanism based on the perceived value of pre-trained behaviors to select and adapt pre-trained behaviors to the situation at hand. Crucially, this adaptation process all happens within a single episode at test time, without any human supervision. We provide theoretical analysis of our selection mechanism and demonstrate that ROAM enables a robot to adapt rapidly to changes in dynamics both in simulation and on a real Go1 quadruped, even successfully moving forward with roller skates on its feet. Our approach adapts over 2x as efficiently compared to existing methods when facing a variety of out-of-distribution situations during deployment by effectively choosing and adapting relevant behaviors on-the-fly.
    摘要 中文翻译:为了在真实世界中成功,机器人需要适应不同于训练中的情况。我们的方法RObust Autonomous Modulation(ROAM)使机器人能够根据每种情况选择和适应已经学习的行为。这个适应过程发生在测试时间内一个话,无需人工监督。我们的方法能够快速适应实际中的动力学变化,包括在真实的Go1 quadruped上并在轮式滑块上进行成功移动。相比已有方法,ROAM在面对多种异常情况时的适应效率高达2倍,通过选择和适应相应的行为来实现这一点。

Resilient Multiple Choice Learning: A learned scoring scheme with application to audio scene analysis

  • paper_url: http://arxiv.org/abs/2311.01052
  • repo_url: https://github.com/victorletzelter/code-rmcl
  • paper_authors: Victor Letzelter, Mathieu Fontaine, Mickaël Chen, Patrick Pérez, Gael Richard, Slim Essid
  • for: 这篇论文是用于探讨多个目标的条件分布估计在回采测 Settings 中。
  • methods: 这篇论文使用了强化多选择学习 (rMCL),是一种将多个假设组合成一个条件分布估计的方法。
  • results: rMCL 可以维持多个预测的多样性,并且可以从 Voronoi 分布中获得一个条件分布估计的概率解释。 实验显示,rMCL 可以在声音源localization问题中实现实用的应用和解释。
    Abstract We introduce Resilient Multiple Choice Learning (rMCL), an extension of the MCL approach for conditional distribution estimation in regression settings where multiple targets may be sampled for each training input. Multiple Choice Learning is a simple framework to tackle multimodal density estimation, using the Winner-Takes-All (WTA) loss for a set of hypotheses. In regression settings, the existing MCL variants focus on merging the hypotheses, thereby eventually sacrificing the diversity of the predictions. In contrast, our method relies on a novel learned scoring scheme underpinned by a mathematical framework based on Voronoi tessellations of the output space, from which we can derive a probabilistic interpretation. After empirically validating rMCL with experiments on synthetic data, we further assess its merits on the sound source localization problem, demonstrating its practical usefulness and the relevance of its interpretation.
    摘要 我们介绍了一种强化多选学习(rMCL)方法,用于在回归 Setting 中估计 condition distribution,在每个训练输入上可能会采样多个目标。多选学习是一个简单的框架,用于处理多模态概率分布,使用 Winner-Takes-All(WTA)损失函数来评估一组假设。现有的 MCL 变体在回归 Setting 中强制合并假设,从而最终牺牲预测的多样性。相比之下,我们的方法基于一种新的学习得分方案,受到输出空间 Voronoi 分割的数学基础,从而可以 derivate 一种概率解释。经验 validate 了 rMCL 方法在 sintetic 数据上,然后进一步评估了它在声音源localization问题上的实用性和解释的相关性。

Application and Energy-Aware Data Aggregation using Vector Synchronization in Distributed Battery-less IoT Networks

  • paper_url: http://arxiv.org/abs/2311.01050
  • repo_url: None
  • paper_authors: Chetna Singhal, Subhrajit Barick, Rishabh Sonkar
  • for: 提供一种机制来聚合感知器数据并提供可持续的应用支持在分布式电池less IoT网络中。
  • methods: 提出了一种应用感知器 Task 和能量管理器(ATEM),以及一种基于向量同步的数据聚合器(VSDA)。ATEM 支持设备级联合能量收集和系统级energy-aware多应用管理。VSDA 使用 LSTM 模型预测可用 ambient 能量,并根据设备 Profiling 和应用任务速率设置相应的设备配置。
  • results: 提出的方案可以满足多样化应用需求,降低数据损失和包开销,提高硬件组件可用性,并使组件更早可用。相比之前的状态对照,提出的方案具有明显的优势。
    Abstract The battery-less Internet of Things (IoT) devices are a key element in the sustainable green initiative for the next-generation wireless networks. These battery-free devices use the ambient energy, harvested from the environment. The energy harvesting environment is dynamic and causes intermittent task execution. The harvested energy is stored in small capacitors and it is challenging to assure the application task execution. The main goal is to provide a mechanism to aggregate the sensor data and provide a sustainable application support in the distributed battery-less IoT network. We model the distributed IoT network system consisting of many battery-free IoT sensor hardware modules and heterogeneous IoT applications that are being supported in the device-edge-cloud continuum. The applications require sensor data from a distributed set of battery-less hardware modules and there is provision of joint control over the module actuators. We propose an application-aware task and energy manager (ATEM) for the IoT devices and a vector-synchronization based data aggregator (VSDA). The ATEM is supported by device-level federated energy harvesting and system-level energy-aware heterogeneous application management. In our proposed framework the data aggregator forecasts the available power from the ambient energy harvester using long-short-term-memory (LSTM) model and sets the device profile as well as the application task rates accordingly. Our proposed scheme meets the heterogeneous application requirements with negligible overhead; reduces the data loss and packet delay; increases the hardware component availability; and makes the components available sooner as compared to the state-of-the-art.
    摘要 “无电池互联网关关(IoT)设备是下一代无线网络的可持续绿色计划的重要元素。这些无电池设备使用周围环境的能源,通过环境能量收集。环境能量收集是动态的,导致间歇性任务执行。收集到的能源储存在小电容器中,并困难保证应用程序执行。我们的目标是提供一个机制,将散布在多个无电池IoT感知硬件模组之间的感知数据聚合,并提供可持续的应用程序支持。我们模型了分布式IoT网络系统,该系统包括许多无电池IoT感知硬件模组和多种不同的IoT应用程序。这些应用程序需要从分布式的无电池硬件模组中获取感知数据,并且实现共同控制模组扭转器。我们提出了应用程序相关的任务和能量管理器(ATEM),以及基于分布式device级联合能源征收和系统级能源敏感多应用程序管理的vector-synchronization基于数据聚合器(VSDA)。ATEM运行在IoT设备上,并使用LSTM模型预测周围环境能量收集器中可用的电力,然后设定设备oprofile和应用程序任务速率。我们的提案可以满足多种不同的应用程序需求,并且对应用程序执行造成轻微过载; 实现数据损失和封包延误的减少; 提高硬件元件可用性; 并让元件更早地可用。”

Improving Robustness via Tilted Exponential Layer: A Communication-Theoretic Perspective

  • paper_url: http://arxiv.org/abs/2311.01047
  • repo_url: https://github.com/bhagyapuranik/texp_for_robustness
  • paper_authors: Bhagyashree Puranik, Ahmad Beirami, Yao Qin, Upamanyu Madhow
  • for: 提高深度网络的 robustness,不仅仅靠 empirical risk minimization 和合适的数据增强。
  • methods: 提议使用通信理论来提高深度网络层输出的信号噪比,通过在学习和推理过程中进行神经竞争。
  • results: 实验表明,使用 TEXP 学习和推理可以提高图像数据集上的 robustness against 噪音和其他常见损害,无需数据增强。此外,可以通过合适地结合 TEXP 和数据增强技术来获得更加累积的 robustness 提升。
    Abstract State-of-the-art techniques for enhancing robustness of deep networks mostly rely on empirical risk minimization with suitable data augmentation. In this paper, we propose a complementary approach motivated by communication theory, aimed at enhancing the signal-to-noise ratio at the output of a neural network layer via neural competition during learning and inference. In addition to minimization of a standard end-to-end cost, neurons compete to sparsely represent layer inputs by maximization of a tilted exponential (TEXP) objective function for the layer. TEXP learning can be interpreted as maximum likelihood estimation of matched filters under a Gaussian model for data noise. Inference in a TEXP layer is accomplished by replacing batch norm by a tilted softmax, which can be interpreted as computation of posterior probabilities for the competing signaling hypotheses represented by each neuron. After providing insights via simplified models, we show, by experimentation on standard image datasets, that TEXP learning and inference enhances robustness against noise and other common corruptions, without requiring data augmentation. Further cumulative gains in robustness against this array of distortions can be obtained by appropriately combining TEXP with data augmentation techniques.
    摘要 现代深度网络技术 mostly 依靠 empirical risk minimization 提高网络的Robustness。在这篇论文中,我们提出了一种基于通信理论的 complementary 方法,通过在学习和推理过程中进行神经竞争来提高神经网络层输出的信号噪声比。此外,我们还 minimization 标准的 end-to-end 成本函数,并且使 neurons 竞争 representation 层输入数据,通过 maximization 倾斜的 exponential 目标函数 (TEXP) 来提高层输出的稳定性。TEXP 学习可以 interpret 为 Gaussian 模型下的数据噪声的最大 likelihood estimation ,而 inference 可以通过 replacing batch norm with tilted softmax 来实现,这可以 interpret 为 computed posterior probabilities для竞争的信号推测说。通过简化模型的分析和实验研究,我们发现了在标准图像数据集上,TEXP 学习和推理可以提高噪声和其他常见损害的Robustness,不需要数据增强。此外,可以通过合理地组合 TEXP 和数据增强技术来获得更加总的Robustness 提升。

Time-Independent Information-Theoretic Generalization Bounds for SGLD

  • paper_url: http://arxiv.org/abs/2311.01046
  • repo_url: None
  • paper_authors: Futoshi Futami, Masahiro Fujisawa
  • for: 这篇论文是为了提供一种新的信息理论性的普适化 bound,用于随机梯度劳伦豪玄(SGLD)的 sampling 和非托管优化研究。
  • methods: 这篇论文使用了对smoothness和dissipativity的假设,并通过关注时间演化的Kullback–Leibler差异来 derivethe generalization error bounds。
  • results: 这篇论文提供了一种时间独立的普适化 bound,其 decay to zero 随着样本大小增加,不管迭代次数和步长是否固定。此外,这篇论文还提供了首次在training和test loss相同时获得的信息理论性普适化 bound,该 bound 是时间独立的和步长无关的,从而导致了一个改进的过失 bound。
    Abstract We provide novel information-theoretic generalization bounds for stochastic gradient Langevin dynamics (SGLD) under the assumptions of smoothness and dissipativity, which are widely used in sampling and non-convex optimization studies. Our bounds are time-independent and decay to zero as the sample size increases, regardless of the number of iterations and whether the step size is fixed. Unlike previous studies, we derive the generalization error bounds by focusing on the time evolution of the Kullback--Leibler divergence, which is related to the stability of datasets and is the upper bound of the mutual information between output parameters and an input dataset. Additionally, we establish the first information-theoretic generalization bound when the training and test loss are the same by showing that a loss function of SGLD is sub-exponential. This bound is also time-independent and removes the problematic step size dependence in existing work, leading to an improved excess risk bound by combining our analysis with the existing non-convex optimization error bounds.
    摘要 我们提供了一个新的信息理论基准,用于测量泊reno-Langevin dynamics(SGLD)的扩展 bound,假设具有光滑性和耗散性的假设。我们的 bound 是时间独立的,随着样本大小增加而降至零,不过步长是固定的或者是变化的。与先前的研究不同,我们通过专注在 Kullback-Leibler 构成函数的时间演化中,从而得到了一个与实际数据之间的稳定性相关的一个上限。此外,我们还提出了第一个具有同一个训练和测试损失的信息理论扩展 bound,通过证明 SGLD 的损失函数是下减几何的,从而移除了先前的步长问题,导致一个改进的过失率 bound。

Better with Less: A Data-Active Perspective on Pre-Training Graph Neural Networks

  • paper_url: http://arxiv.org/abs/2311.01038
  • repo_url: https://github.com/galina0217/apt
  • paper_authors: Jiarong Xu, Renhong Huang, Xin Jiang, Yuxuan Cao, Carl Yang, Chunping Wang, Yang Yang
  • for: 预训文件(Pre-training on graph neural networks)的目的是学习可转移的知识,以便在无标注数据下进行下游任务。
  • methods: 提议一种“更好减少”(Better-with-less)框架,即使用 fewer, but carefully chosen 数据进行预训。该框架包括一个图选择器和一个预训模型。图选择器选择最有代表性和指导性的数据点,根据图的内在特性和预测uncertainty。预训模型采用predictive uncertainty作为反馈,以测量模型对数据的信任程度。
  • results: 实验结果表明,提议的 APT 可以在 fewer 训练数据下得到更高的下游性能。
    Abstract Pre-training on graph neural networks (GNNs) aims to learn transferable knowledge for downstream tasks with unlabeled data, and it has recently become an active research area. The success of graph pre-training models is often attributed to the massive amount of input data. In this paper, however, we identify the curse of big data phenomenon in graph pre-training: more training data do not necessarily lead to better downstream performance. Motivated by this observation, we propose a better-with-less framework for graph pre-training: fewer, but carefully chosen data are fed into a GNN model to enhance pre-training. The proposed pre-training pipeline is called the data-active graph pre-training (APT) framework, and is composed of a graph selector and a pre-training model. The graph selector chooses the most representative and instructive data points based on the inherent properties of graphs as well as predictive uncertainty. The proposed predictive uncertainty, as feedback from the pre-training model, measures the confidence level of the model in the data. When fed with the chosen data, on the other hand, the pre-training model grasps an initial understanding of the new, unseen data, and at the same time attempts to remember the knowledge learned from previous data. Therefore, the integration and interaction between these two components form a unified framework (APT), in which graph pre-training is performed in a progressive and iterative way. Experiment results show that the proposed APT is able to obtain an efficient pre-training model with fewer training data and better downstream performance.
    摘要 <>预训 graphs neural networks(GNNs)以学习可转移的知识,以便在无标注数据上下游任务中表现出色。预训图模型的成功通常归功于大量输入数据。但在这篇论文中,我们发现了“巨量数据诅咒”现象:更多的训练数据并不一定能提高下游性能。基于这一观察,我们提出了“更好减少”框架,即在GNN模型中更少,但更精心选择的数据来提高预训。我们称之为数据活跃图预训(APT)框架。APT框架由图选择器和预训模型组成,图选择器根据图的内在特性和预测不确定性选择最有代表性和指导性的数据点。预训模型则在被选择的数据点上学习初步理解新、未见数据,同时尝试记忆之前数据中学习的知识。因此,图选择器和预训模型之间的结合和互动形成了一个综合框架,在这个框架中,图预训是进行进行逐步和轮循的。实验结果表明,我们的APT能够在更少的训练数据下提取更好的预训模型,并且在下游任务中表现出色。

Tensor Trust: Interpretable Prompt Injection Attacks from an Online Game

  • paper_url: http://arxiv.org/abs/2311.01011
  • repo_url: None
  • paper_authors: Sam Toyer, Olivia Watkins, Ethan Adrian Mendes, Justin Svegliato, Luke Bailey, Tiffany Wang, Isaac Ong, Karim Elmaaroufi, Pieter Abbeel, Trevor Darrell, Alan Ritter, Stuart Russell
  • for: 研究人员可以使用这些攻击和防御样本来研究LLMs中的攻击表现和防御机制。
  • methods: 这些攻击和防御样本都是由在线游戏《Tensor Trust》中的玩家生成的。
  • results: 许多模型对于这些攻击策略是易受攻击的,而且一些攻击策略可以在不同的环境中广泛应用。Translation:
  • for: 研究人员可以使用这些攻击和防御样本来研究LLMs中的攻击表现和防御机制。
  • methods: 这些攻击和防御样本都是由在线游戏《Tensor Trust》中的玩家生成的。
  • results: 许多模型对于这些攻击策略是易受攻击的,而且一些攻击策略可以在不同的环境中广泛应用。
    Abstract While Large Language Models (LLMs) are increasingly being used in real-world applications, they remain vulnerable to prompt injection attacks: malicious third party prompts that subvert the intent of the system designer. To help researchers study this problem, we present a dataset of over 126,000 prompt injection attacks and 46,000 prompt-based "defenses" against prompt injection, all created by players of an online game called Tensor Trust. To the best of our knowledge, this is currently the largest dataset of human-generated adversarial examples for instruction-following LLMs. The attacks in our dataset have a lot of easily interpretable stucture, and shed light on the weaknesses of LLMs. We also use the dataset to create a benchmark for resistance to two types of prompt injection, which we refer to as prompt extraction and prompt hijacking. Our benchmark results show that many models are vulnerable to the attack strategies in the Tensor Trust dataset. Furthermore, we show that some attack strategies from the dataset generalize to deployed LLM-based applications, even though they have a very different set of constraints to the game. We release all data and source code at https://tensortrust.ai/paper
    摘要 While Large Language Models (LLMs) are increasingly being used in real-world applications, they remain vulnerable to prompt injection attacks: malicious third-party prompts that subvert the intent of the system designer. To help researchers study this problem, we present a dataset of over 126,000 prompt injection attacks and 46,000 prompt-based "defenses" against prompt injection, all created by players of an online game called Tensor Trust. To the best of our knowledge, this is currently the largest dataset of human-generated adversarial examples for instruction-following LLMs. The attacks in our dataset have a lot of easily interpretable structure, and shed light on the weaknesses of LLMs. We also use the dataset to create a benchmark for resistance to two types of prompt injection, which we refer to as prompt extraction and prompt hijacking. Our benchmark results show that many models are vulnerable to the attack strategies in the Tensor Trust dataset. Furthermore, we show that some attack strategies from the dataset generalize to deployed LLM-based applications, even though they have a very different set of constraints to the game. We release all data and source code at https://tensortrust.ai/paper.Here's the translation in Traditional Chinese:而 Large Language Models (LLMs) 在实际应用中越来越普遍,但它们仍然受到伪造第三方提示的攻击:这些攻击可以让系统设计者的意愿被覆盖。为了帮助研究人员研究这个问题,我们提供了超过 126,000 个提示插入攻击和 46,000 个防御措施,这些攻击和防御都是在线上游戏“Tensor Trust”中被玩家所创造的。根据我们所知,这是目前最大的人工生成的反对例集,用于测试 instruction-following LLMs。我们发现,这些攻击有许多易于理解的结构,并给出 LLMs 的弱点。我们还使用这个数据集来建立了两种提示插入的标准参考,即提取提示和夺取提示。我们的参考结果显示许多模型受到了这些攻击战略的威胁。此外,我们发现一些攻击战略可以通过游戏的不同组态对应到现场 LLM-based 应用中的攻击。我们在 发布了所有数据和源代码。

Scalable Probabilistic Forecasting in Retail with Gradient Boosted Trees: A Practitioner’s Approach

  • paper_url: http://arxiv.org/abs/2311.00993
  • repo_url: None
  • paper_authors: Xueying Long, Quang Bui, Grady Oktavian, Daniel F. Schmidt, Christoph Bergmeir, Rakshitha Godahewa, Seong Per Lee, Kaifeng Zhao, Paul Condylis
  • for: 这篇论文旨在解决大型电子商务公司面临的预测挑战,并且考虑到电子商务与传统零售之间的重要区别。
  • methods: 本论文提出了一个两层 Hierarchy 架构,首先采用一个顶部预测方法,将大量时间序列转换为较少的数量和较少的干扰,然后将预测结果转换为决策层预测。另外,本论文还提出了直接在下一层训练的方法,使用子样本进行训练。
  • results: 本论文通过评估多个数据集,包括自有数据集和 M5 竞赛数据集,并证明了两层 Hierarchy 架构的可扩展性和精准性。此外,本论文还证明了电子商务和传统零售数据集之间的重要区别。
    Abstract The recent M5 competition has advanced the state-of-the-art in retail forecasting. However, we notice important differences between the competition challenge and the challenges we face in a large e-commerce company. The datasets in our scenario are larger (hundreds of thousands of time series), and e-commerce can afford to have a larger assortment than brick-and-mortar retailers, leading to more intermittent data. To scale to larger dataset sizes with feasible computational effort, firstly, we investigate a two-layer hierarchy and propose a top-down approach to forecasting at an aggregated level with less amount of series and intermittency, and then disaggregating to obtain the decision-level forecasts. Probabilistic forecasts are generated under distributional assumptions. Secondly, direct training at the lower level with subsamples can also be an alternative way of scaling. Performance of modelling with subsets is evaluated with the main dataset. Apart from a proprietary dataset, the proposed scalable methods are evaluated using the Favorita dataset and the M5 dataset. We are able to show the differences in characteristics of the e-commerce and brick-and-mortar retail datasets. Notably, our top-down forecasting framework enters the top 50 of the original M5 competition, even with models trained at a higher level under a much simpler setting.
    摘要 最近的M5竞赛已经提高了零售预测的状态艺。然而,我们注意到竞赛挑战和我们在大型电商公司面临的挑战之间存在重要的差异。我们的场景中的数据集大于竞赛挑战的数据集,而电商可以拥有更多的产品种类,导致更多的间歇性数据。为了可靠地扩展到更大的数据集大小,我们首先investigate了两层层次结构,并提出了一种从上向下的预测方法,首先预测了汇总层次的预测,然后细化到获得决策层次的预测。我们采用了分布式预测方法。其次,直接在下一层使用子样本进行训练也可以作为扩展的方法。我们对主数据集进行评估模型的性能。除了自己的专有数据集外,我们还使用了Favorita数据集和M5数据集进行评估。我们能够发现电商和面向店铺零售数据集之间的不同特征。值得一提的是,我们的顶部预测框架在原M5竞赛中的top50中排名,即使在更简单的设置下进行训练。

Autonomous Learning of Generative Models with Chemical Reaction Network Ensembles

  • paper_url: http://arxiv.org/abs/2311.00975
  • repo_url: None
  • paper_authors: William Poole, Thomas E. Ouldridge, Manoj Gopalkrishnan
  • for: 这篇论文目的是研究一种能够自主学习复杂环境的微米大小的袋型分子系统。
  • methods: 该论文使用控制理论、机器学习理论、化学反应网络理论和统计物理来开发一种通用的化学系统自主学习复杂分布的架构。该架构基于化学实现机器学习优化工具:Relative entropy cost function的梯度下降。
  • results: 该方法可以优化任何均衡的化学反应网络,并且可以使用隐藏单元学习复杂分布。这一结果被重新归类为一种形式的积分反馈控制。此外,由于使用了Explicit физи学模型,因此可以 derivate thermodynamic costs和trade-offs相关于这个过程。
    Abstract Can a micron sized sack of interacting molecules autonomously learn an internal model of a complex and fluctuating environment? We draw insights from control theory, machine learning theory, chemical reaction network theory, and statistical physics to develop a general architecture whereby a broad class of chemical systems can autonomously learn complex distributions. Our construction takes the form of a chemical implementation of machine learning's optimization workhorse: gradient descent on the relative entropy cost function. We show how this method can be applied to optimize any detailed balanced chemical reaction network and that the construction is capable of using hidden units to learn complex distributions. This result is then recast as a form of integral feedback control. Finally, due to our use of an explicit physical model of learning, we are able to derive thermodynamic costs and trade-offs associated to this process.
    摘要 可以把一个微米大小的袋形分子之间互动的系统视为自主学习一个复杂且波动的环境的内部模型吗?我们从控制理论、机器学习理论、化学反应网络理论和统计物理学中着手吸取灵感,开发了一种通用的化学系统自主学习复杂分布的总体体系。我们的设计通过使用化学实现机器学习优化工具:相对 entropy 成本函数的梯度下降。我们证明这种方法可以应用于优化任何均衡的化学反应网络,并且使用隐藏单元学习复杂分布。这一结果最后被重新映射为一种类型的积分反馈控制。由于我们使用了明确的物理学习模型,我们能够计算这个过程中的热力学成本和折衔。

Federated Linear Bandits with Finite Adversarial Actions

  • paper_url: http://arxiv.org/abs/2311.00973
  • repo_url: None
  • paper_authors: Li Fan, Ruida Zhou, Chao Tian, Cong Shen
  • for: 这个论文是解决联合 linear 上下文选择问题的 federated 学习方法。
  • methods: 该方法基于 SupLinUCB 和 OFUL 算法的扩展,针对 adversarial 有穷finite action set 问题。
  • results: 该方法可以达到 $\tilde{O}(\sqrt{d T})$ 的总 regret,与最小最优下界匹配,是order-optimal(以polylog term为准)。
    Abstract We study a federated linear bandits model, where $M$ clients communicate with a central server to solve a linear contextual bandits problem with finite adversarial action sets that may be different across clients. To address the unique challenges of adversarial finite action sets, we propose the FedSupLinUCB algorithm, which extends the principles of SupLinUCB and OFUL algorithms in linear contextual bandits. We prove that FedSupLinUCB achieves a total regret of $\tilde{O}(\sqrt{d T})$, where $T$ is the total number of arm pulls from all clients, and $d$ is the ambient dimension of the linear model. This matches the minimax lower bound and thus is order-optimal (up to polylog terms). We study both asynchronous and synchronous cases and show that the communication cost can be controlled as $O(d M^2 \log(d)\log(T))$ and $O(\sqrt{d^3 M^3} \log(d))$, respectively. The FedSupLinUCB design is further extended to two scenarios: (1) variance-adaptive, where a total regret of $\tilde{O} (\sqrt{d \sum \nolimits_{t=1}^{T} \sigma_t^2})$ can be achieved with $\sigma_t^2$ being the noise variance of round $t$; and (2) adversarial corruption, where a total regret of $\tilde{O}(\sqrt{dT} + d C_p)$ can be achieved with $C_p$ being the total corruption budget. Experiment results corroborate the theoretical analysis and demonstrate the effectiveness of FedSupLinUCB on both synthetic and real-world datasets.
    摘要 我们研究了一个联邦线性帆风模型,其中 $M$ 客户端与中央服务器进行交互,以解决一个线性上下文帆风问题,其中可能存在不同客户端的敌对行动集。为了解决这些独特挑战,我们提议了 FedSupLinUCB 算法,该算法基于 SupLinUCB 和 OFUL 算法,并在线性上下文帆风中进行扩展。我们证明了 FedSupLinUCB 算法的总后悔为 $\tilde{O}(\sqrt{dT})$,其中 $T$ 是所有客户端共同抽取的枪下数,$d$ 是线性模型的 ambient 维度。这与最佳下界匹配,因此是顺序优化的(即polylog 项)。我们研究了同步和异步情况,并证明了通信成本可以控制为 $O(dM^2 \log(d) \log(T))$ 和 $O(\sqrt{d^3M^3} \log(d))$, 分别。FedSupLinUCB 设计还可以在以下两个方面进行扩展:(1)变量适应,可以实现 $\tilde{O}(\sqrt{d \sum_{t=1}^{T} \sigma_t^2})$ 的总后悔,其中 $\sigma_t^2$ 是循环 $t$ 的噪声方差;(2)敌对损害,可以实现 $\tilde{O}(\sqrt{dT} + dC_p)$ 的总后悔,其中 $C_p$ 是敌对损害预算。实验结果证明了理论分析的可靠性,并在 synthetic 和实际数据上证明了 FedSupLinUCB 的有效性。

Invariant-Feature Subspace Recovery: A New Class of Provable Domain Generalization Algorithms

  • paper_url: http://arxiv.org/abs/2311.00966
  • repo_url: https://github.com/facebookresearch/InvarianceUnitTests
  • paper_authors: Haoxiang Wang, Gargi Balasubramaniam, Haozhe Si, Bo Li, Han Zhao
    for:* 这个论文的目的是提出一种新的预测模型,以实现鲁棒的预测在不同环境下。methods:* 这个论文使用了一种新的算法,即非参数隐藏特征子空间恢复(ISR),来解决预测模型在不同环境下的鲁棒性问题。results:* 这个论文的实验结果表明,ISR可以在不同环境下实现鲁棒的预测,并且可以减少训练环境的数量,从而提高预测的效率。
    Abstract Domain generalization asks for models trained over a set of training environments to generalize well in unseen test environments. Recently, a series of algorithms such as Invariant Risk Minimization (IRM) have been proposed for domain generalization. However, Rosenfeld et al. (2021) shows that in a simple linear data model, even if non-convexity issues are ignored, IRM and its extensions cannot generalize to unseen environments with less than $d_s+1$ training environments, where $d_s$ is the dimension of the spurious-feature subspace. In this work, we propose Invariant-feature Subspace Recovery (ISR): a new class of algorithms to achieve provable domain generalization across the settings of classification and regression problems. First, in the binary classification setup of Rosenfeld et al. (2021), we show that our first algorithm, ISR-Mean, can identify the subspace spanned by invariant features from the first-order moments of the class-conditional distributions, and achieve provable domain generalization with $d_s+1$ training environments. Our second algorithm, ISR-Cov, further reduces the required number of training environments to $O(1)$ using the information of second-order moments. Notably, unlike IRM, our algorithms bypass non-convexity issues and enjoy global convergence guarantees. Next, we extend ISR-Mean to the more general setting of multi-class classification and propose ISR-Multiclass, which leverages class information and provably recovers the invariant-feature subspace with $\lceil d_s/k\rceil+1$ training environments for $k$-class classification. Finally, for regression problems, we propose ISR-Regression that can identify the invariant-feature subspace with $d_s+1$ training environments. Empirically, we demonstrate the superior performance of our ISRs on synthetic benchmarks. Further, ISR can be used as post-processing methods for feature extractors such as neural nets.
    摘要 领域总则要求训练在多个环境下的模型能够在未经见过的测试环境中general化良好。近年来,一些算法如不变风险最小化(IRM)在领域总则方面得到了提出。然而, Rosenfeld等人(2021)显示,在简单的线性数据模型中,即使忽略非拟合问题,IRM和其扩展也无法在未经见过环境中general化,需要至少有$d_s+1$个训练环境,其中$d_s$是假值特征空间的维度。在这种情况下,我们提出了不变特征子空间恢复(ISR):一种新的算法,以实现领域总则的证明性普适性。在 Rosenfeld等人(2021)中的 binary 分类设置下,我们首先示出了我们的第一个算法,ISR-Mean,可以从类别Conditional distribution的首要幂中找到不变特征子空间的拟合,并实现领域总则的证明性,需要至少有$d_s+1$个训练环境。我们的第二个算法,ISR-Cov,进一步减少了需要的训练环境数量,使用类别的次要幂信息,可以在$O(1)$个训练环境下实现领域总则。与 IRM 不同的是,我们的算法不会遇到非拟合问题,并且具有全球整合性。接下来,我们扩展了 ISR-Mean 到更加一般的多类分类问题,并提出了 ISR-Multiclass,它可以在 $k$-class 分类问题中利用类信息来证明性地恢复不变特征子空间,需要至少有 $\lceil d_s/k\rceil+1$ 个训练环境。最后,我们为回归问题提出了 ISR-Regression,可以在 $d_s+1$ 个训练环境下实现领域总则。在实际中,我们通过synthetic benchmarks进行了证明性的实验,并证明了 ISR 可以作为特征提取器如 neural nets 的后处理方法。

On Finding Bi-objective Pareto-optimal Fraud Prevention Rule Sets for Fintech Applications

  • paper_url: http://arxiv.org/abs/2311.00964
  • repo_url: None
  • paper_authors: Chengyao Wen, Yin Lou
  • For: 这个论文关注于从初始规则集中找到高质量的规则子集,以提高风险预测决策的精度。* Methods: 该论文采用了解决问题选择在前(SSF)问题来找到非占优规则子集,并提出了一种基于谱分析的规则选择算法called SpectralRules。* Results: 实验表明,该方法可以在实际应用场景中提高风险预测的精度,并且比现有方法更有优势。
    Abstract Rules are widely used in Fintech institutions to make fraud prevention decisions, since rules are highly interpretable thanks to their intuitive if-then structure. In practice, a two-stage framework of fraud prevention decision rule set mining is usually employed in large Fintech institutions. This paper is concerned with finding high-quality rule subsets in a bi-objective space (such as precision and recall) from an initial pool of rules. To this end, we adopt the concept of Pareto optimality and aim to find a set of non-dominated rule subsets, which constitutes a Pareto front. We propose a heuristic-based framework called PORS and we identify that the core of PORS is the problem of solution selection on the front (SSF). We provide a systematic categorization of the SSF problem and a thorough empirical evaluation of various SSF methods on both public and proprietary datasets. We also introduce a novel variant of sequential covering algorithm called SpectralRules to encourage the diversity of the initial rule set and we empirically find that SpectralRules further improves the quality of the found Pareto front. On two real application scenarios within Alipay, we demonstrate the advantages of our proposed methodology compared to existing work.
    摘要 ��worts are widely used in Fintech institutions to make fraud prevention decisions, since ��worts are highly interpretable thanks to their intuitive if-then structure. In practice, a two-stage framework of fraud prevention decision rule set mining is usually employed in large Fintech institutions. This paper is concerned with finding high-quality rule subsets in a bi-objective space (such as precision and recall) from an initial pool of ��worts. To this end, we adopt the concept of Pareto optimality and aim to find a set of non-dominated rule subsets, which constitutes a Pareto front. We propose a heuristic-based framework called PORS and we identify that the core of PORS is the problem of solution selection on the front (SSF). We provide a systematic categorization of the SSF problem and a thorough empirical evaluation of various SSF methods on both public and proprietary datasets. We also introduce a novel variant of sequential covering algorithm called SpectralRules to encourage the diversity of the initial rule set and we empirically find that SpectralRules further improves the quality of the found Pareto front. On two real application scenarios within Alipay, we demonstrate the advantages of our proposed methodology compared to existing work.Note:* "��worts" is the Simplified Chinese term for "rules"* "Pareto optimality" is translated as "Pareto优化" (Pareto yòu jì)* "non-dominated rule subsets" is translated as "非主导��wort subset" (fēi zhǔ dǎo ��wort subset)* "solution selection on the front" is translated as "前台解决方案选择" (qián tai jiě jué fāng yì xuǎn chōu)* "SSF problem" is translated as "SSF问题" (SSF wèn tí)* "sequential covering algorithm" is translated as "顺序覆盖算法" (shù xì fù kè algoritmo)* "SpectralRules" is translated as "光谱规则" (guāng xiàng guī fā)

Dynamic Fair Federated Learning Based on Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2311.00959
  • repo_url: None
  • paper_authors: Weikang Chen, Junping Du, Yingxia Shao, Jia Wang, Yangxi Zhou
  • for: 提高 Federated Learning 中设备之间不公正 Representation 问题的解决方案
  • methods: 使用动态q公平 Federated Learning 算法(DQFFL),结合 reinforcement learning,以减少设备聚合不公正性和提高 Federated Learning 中所有组合的待遇公平性
  • results: DQFFL 在评估全局 Federated 模型性能、公平性和收敛速度三个方面,与现有方法相比,表现更佳,可以减少设备之间的不公正 Representation 问题,提高 Federated Learning 的总性能和公平性。
    Abstract Federated learning enables a collaborative training and optimization of global models among a group of devices without sharing local data samples. However, the heterogeneity of data in federated learning can lead to unfair representation of the global model across different devices. To address the fairness issue in federated learning, we propose a dynamic q fairness federated learning algorithm with reinforcement learning, called DQFFL. DQFFL aims to mitigate the discrepancies in device aggregation and enhance the fairness of treatment for all groups involved in federated learning. To quantify fairness, DQFFL leverages the performance of the global federated model on each device and incorporates {\alpha}-fairness to transform the preservation of fairness during federated aggregation into the distribution of client weights in the aggregation process. Considering the sensitivity of parameters in measuring fairness, we propose to utilize reinforcement learning for dynamic parameters during aggregation. Experimental results demonstrate that our DQFFL outperforms the state-of-the-art methods in terms of overall performance, fairness and convergence speed.
    摘要 “ federated learning 可以实现多个设备之间的共同训练和优化全域模型,而不需要分享本地数据样本。但是,跨设备数据的不均衡可能导致全球模型的不公平代表性。为了解决联邦学习中的公平问题,我们提出了一个动态q公平联邦学习算法(DQFFL)。DQFFL 的目的是在联邦学习中缓和设备聚合的差异,并对所有参与联邦学习的集群进行公平的对待。为了量化公平,DQFFL 利用每个设备上全球联邦模型的表现,并通过 α-公平来转换维护公平的保存到联邦聚合过程中的分布。考虑到联邦学习中的参数敏感性,我们提出了在联邦聚合过程中使用循环学习来动态参数。实验结果显示,我们的 DQFFL 在全面性、公平性和融合速度三方面比前者方法更好。”

Stochastic Smoothed Gradient Descent Ascent for Federated Minimax Optimization

  • paper_url: http://arxiv.org/abs/2311.00944
  • repo_url: None
  • paper_authors: Wei Shen, Minhui Huang, Jiawei Zhang, Cong Shen
  • for: 这个论文是为了研究如何在联合学习中使用缓和技术来解决非中心非对称最小最大值问题。
  • methods: 这个论文提出了一种新的算法 called Federated Stochastic Smoothed Gradient Descent Ascent (FESS-GDA), 它利用了缓和技术来解决联合学习中的非中心非对称最小最大值问题。
  • results: 论文证明FESS-GDA可以在几种联合学习任务中uniformly使用,并提供了新的或更好的分析融合结果。 Additionally, the paper shows the practical efficiency of FESS-GDA in training generative adversarial networks (GANs) and fair classification.
    Abstract In recent years, federated minimax optimization has attracted growing interest due to its extensive applications in various machine learning tasks. While Smoothed Alternative Gradient Descent Ascent (Smoothed-AGDA) has proved its success in centralized nonconvex minimax optimization, how and whether smoothing technique could be helpful in federated setting remains unexplored. In this paper, we propose a new algorithm termed Federated Stochastic Smoothed Gradient Descent Ascent (FESS-GDA), which utilizes the smoothing technique for federated minimax optimization. We prove that FESS-GDA can be uniformly used to solve several classes of federated minimax problems and prove new or better analytical convergence results for these settings. We showcase the practical efficiency of FESS-GDA in practical federated learning tasks of training generative adversarial networks (GANs) and fair classification.
    摘要

Learning Defect Prediction from Unrealistic Data

  • paper_url: http://arxiv.org/abs/2311.00931
  • repo_url: None
  • paper_authors: Kamel Alrashedy, Vincent J. Hellendoorn, Alessandro Orso
  • for: 本研究的目的是调查大规模synthetic数据生成的问题,以及如何使用这些数据来改进代码理解和生成任务的模型性能。
  • methods: 本研究使用了一种方法来识别大量但不真实的数据集中的有用样本,以提高代码理解和生成任务中的模型性能。这种方法基于模型学习的高维嵌入,对真实和人工添加bug的程序进行评分,并且只允许使用最相似的样本进行训练。
  • results: 研究结果表明,使用这种方法可以在两种流行的代码预训练模型中提高代码理解和生成任务的性能。此外,研究还发现了大规模synthetic数据生成的限制,并提出了在实际应用中预测漏洞和bug的AI模型的限制。
    Abstract Pretrained models of code, such as CodeBERT and CodeT5, have become popular choices for code understanding and generation tasks. Such models tend to be large and require commensurate volumes of training data, which are rarely available for downstream tasks. Instead, it has become popular to train models with far larger but less realistic datasets, such as functions with artificially injected bugs. Models trained on such data, however, tend to only perform well on similar data, while underperforming on real world programs. In this paper, we conjecture that this discrepancy stems from the presence of distracting samples that steer the model away from the real-world task distribution. To investigate this conjecture, we propose an approach for identifying the subsets of these large yet unrealistic datasets that are most similar to examples in real-world datasets based on their learned representations. Our approach extracts high-dimensional embeddings of both real-world and artificial programs using a neural model and scores artificial samples based on their distance to the nearest real-world sample. We show that training on only the nearest, representationally most similar samples while discarding samples that are not at all similar in representations yields consistent improvements across two popular pretrained models of code on two code understanding tasks. Our results are promising, in that they show that training models on a representative subset of an unrealistic dataset can help us harness the power of large-scale synthetic data generation while preserving downstream task performance. Finally, we highlight the limitations of applying AI models for predicting vulnerabilities and bugs in real-world applications
    摘要 预训模型,如CodeBERT和CodeT5,在代码理解和生成任务中变得流行。这些模型通常很大,需要相应的培训数据量,但这些数据很少存在下游任务中。因此,有人开始使用更大的 yet less realistic 的数据集来训练模型,例如在函数中 искусственного注入漏洞。但这些模型通常只在类似数据上表现出色,在实际世界程序中表现不佳。在这篇论文中,我们推测这种差异是因为大量的干扰样本,使模型偏离实际世界任务分布。为了研究这一点,我们提出了一种方法,可以在大规模的不真实数据集中标识最类似实际世界数据的子集。我们的方法使用神经网络模型提取实际世界和人工生成的程序高维表示,然后根据这些表示的距离对人工样本进行评分。我们发现,只具有最类似实际世界样本的表示,而不是所有人工样本,可以提高两种流行的预训模型在两个代码理解任务上的性能。我们的结果启示,可以通过在大规模的不真实数据集中提取代码的表示,来使用大规模的人工数据生成来帮助我们利用大规模的人工数据来提高下游任务性能。最后,我们强调了在实际应用中使用人工智能模型预测漏洞和漏洞的限制。

A Review and Roadmap of Deep Causal Model from Different Causal Structures and Representations

  • paper_url: http://arxiv.org/abs/2311.00923
  • repo_url: None
  • paper_authors: Hang Chen, Keqing Du, Chenguang Li, Xinyu Yang
  • for: 本研究旨在探讨 causal 模型与深度学习在复杂数据集上的整合,包括图像或文本组件之间的 causal 关系。
  • methods: 本研究提出了将 causal 数据分类为三类:定态数据、半定态数据和未定态数据,根据 causal 结构和表示方式。
  • results: 研究表明,定态数据主要应用于传统 causal 场景中的统计数据,半定态数据包括深度学习中常见的时间序列、图像、文本等数据格式,而未定态数据是一个新兴的研究领域,尚未得到充分发展。
    Abstract The fusion of causal models with deep learning introducing increasingly intricate data sets, such as the causal associations within images or between textual components, has surfaced as a focal research area. Nonetheless, the broadening of original causal concepts and theories to such complex, non-statistical data has been met with serious challenges. In response, our study proposes redefinitions of causal data into three distinct categories from the standpoint of causal structure and representation: definite data, semi-definite data, and indefinite data. Definite data chiefly pertains to statistical data used in conventional causal scenarios, while semi-definite data refers to a spectrum of data formats germane to deep learning, including time-series, images, text, and others. Indefinite data is an emergent research sphere inferred from the progression of data forms by us. To comprehensively present these three data paradigms, we elaborate on their formal definitions, differences manifested in datasets, resolution pathways, and development of research. We summarize key tasks and achievements pertaining to definite and semi-definite data from myriad research undertakings, present a roadmap for indefinite data, beginning with its current research conundrums. Lastly, we classify and scrutinize the key datasets presently utilized within these three paradigms.
    摘要 <> translate text into Simplified ChineseThe fusion of causal models with deep learning, introducing increasingly intricate data sets such as the causal associations within images or between textual components, has emerged as a major research focus. However, the extension of original causal concepts and theories to such complex, non-statistical data has posed significant challenges. In response, our study proposes redefinitions of causal data into three distinct categories based on causal structure and representation: definite data, semi-definite data, and indefinite data. Definite data primarily refers to statistical data used in conventional causal scenarios, while semi-definite data encompasses a range of data formats relevant to deep learning, including time-series, images, text, and others. Indefinite data is an emerging research area that we have inferred from the evolution of data forms. To comprehensively present these three data paradigms, we delineate their formal definitions, differences manifest in datasets, resolution pathways, and development of research. We summarize key tasks and achievements pertaining to definite and semi-definite data from numerous research endeavors, present a roadmap for indefinite data, beginning with its current research conundrums. Finally, we categorize and scrutinize the key datasets currently employed within these three paradigms.

MIST: Defending Against Membership Inference Attacks Through Membership-Invariant Subspace Training

  • paper_url: http://arxiv.org/abs/2311.00919
  • repo_url: None
  • paper_authors: Jiacheng Li, Ninghui Li, Bruno Ribeiro
  • For: The paper is written to address the problem of membership inference attacks in machine learning, specifically defending against attacks that try to determine whether a particular instance was used to train a model.* Methods: The paper introduces a novel method called Membership-Invariant Subspace Training (MIST) that leverages counterfactually-invariant representations and subspace learning methods to defend against MI attacks. MIST avoids overfitting the vulnerable instances without significantly impacting other instances.* Results: The paper shows that MIST outperforms other state-of-the-art MI defenses while resulting in minimal reduction in testing accuracy, based on extensive experimental studies against several SOTA MI attacks.
    Abstract In Member Inference (MI) attacks, the adversary try to determine whether an instance is used to train a machine learning (ML) model. MI attacks are a major privacy concern when using private data to train ML models. Most MI attacks in the literature take advantage of the fact that ML models are trained to fit the training data well, and thus have very low loss on training instances. Most defenses against MI attacks therefore try to make the model fit the training data less well. Doing so, however, generally results in lower accuracy. We observe that training instances have different degrees of vulnerability to MI attacks. Most instances will have low loss even when not included in training. For these instances, the model can fit them well without concerns of MI attacks. An effective defense only needs to (possibly implicitly) identify instances that are vulnerable to MI attacks and avoids overfitting them. A major challenge is how to achieve such an effect in an efficient training process. Leveraging two distinct recent advancements in representation learning: counterfactually-invariant representations and subspace learning methods, we introduce a novel Membership-Invariant Subspace Training (MIST) method to defend against MI attacks. MIST avoids overfitting the vulnerable instances without significant impact on other instances. We have conducted extensive experimental studies, comparing MIST with various other state-of-the-art (SOTA) MI defenses against several SOTA MI attacks. We find that MIST outperforms other defenses while resulting in minimal reduction in testing accuracy.
    摘要 在 Member Inference(MI)攻击中,敌方尝试确定训练机器学习(ML)模型中使用的实例。MI攻击是使用私人数据训练ML模型的隐私问题。大多数MI攻击在文献中利用了ML模型对训练数据的适应性,因此大多数防御措施是通过减少模型对训练实例的适应程度来进行。但是,这通常会导致减少准确性。我们发现训练实例有不同的抵触MI攻击的程度。大多数实例会在没有包含在训练中时仍然有低损失。这些实例上,模型可以很好地适应它们无需MI攻击的担忧。一个有效的防御仅需要(可能是隐式的)标识MI攻击的抵触实例,并避免对它们进行过度适应。一个主要挑战是如何在有效的训练过程中实现这一目标。我们利用了两种最近的表示学习技术:对于抵触不变的表示和子空间学习方法,我们提出了一种新的Membership-Invariant Subspace Training(MIST)方法,以防止MI攻击。MIST不会过度适应抵触实例,而不会对其他实例产生重要的影响。我们对MIST方法进行了广泛的实验研究,与其他多种现状顶峰MI防御方法进行比较,并对多个SOTAMI攻击进行了测试。我们发现MIST方法在减少测试损失的同时,与其他防御方法相比,具有最佳的性能。

eess.IV - 2023-11-02

Unveiling the deep plumbing system of a volcano by a reflection matrix analysis of seismic noise

  • paper_url: http://arxiv.org/abs/2311.01296
  • repo_url: None
  • paper_authors: Elsa Giraudat, Arnaud Burtin, Arthur Le Ber, Mathias Fink, Jean-Christophe Komorowski, Alexandre Aubry
  • for: 这篇论文是为了研究加德鲁瓦岛拉索韦雷尔火山的内部结构和水热系统而写的。
  • methods: 这篇论文使用了震动噪声记录在罕见的地震仪数组上的方法,以揭示火山内部的液体和岩石多尺度不均质和非线性动态。
  • results: 研究人员通过分析震动噪声的时空横列相关性,实际获得了火山内部的各种反射特征,以及水热系统的几何结构和动态行为。这些结果为火山的概念模型和高级监测带来了新的视角和信息。
    Abstract In geophysics, volcanoes are particularly difficult to image because of the multi-scale heterogeneities of fluids and rocks that compose them and their complex non-linear dynamics. By exploiting seismic noise recorded by a sparse array of geophones, we are able to reveal the magmatic and hydrothermal plumbing system of La Soufri\`ere volcano in Guadeloupe. Spatio-temporal cross-correlation of seismic noise actually provides the impulse responses between virtual geophones located inside the volcano. The resulting reflection matrix can be exploited to numerically perform an auto-focus of seismic waves on any reflector of the underground. An unprecedented view on the volcano's inner structure is obtained at a half-wavelength resolution. This innovative observable provides fundamental information for the conceptual modeling and high-resolution monitoring of volcanoes.
    摘要 在地球物理学中,火山特别难以成像,因为它们由多级不同性的液体和岩石组成,以及复杂非线性动态。我们通过利用地震噪声记录的稀疏阵列地震仪,能够揭示拉撒韦尔火山(La Soufri\`ere)在格瑞达岛的 магмати和 hidrotermal 沟管系统。在空间时间交叉相关的地震噪声中,实际提供了虚拟地震仪内部火山中的响应函数。得到的反射矩阵可以 numerically 进行自动фокус处理,以获得任何反射器的地下高分辨率视图。这种创新的可见提供了基本信息 для概念模型和高分辨率监测火山。

eess.SP - 2023-11-02

Supervised Learning Based Real-Time Adaptive Beamforming On-board Multibeam Satellites

  • paper_url: http://arxiv.org/abs/2311.01334
  • repo_url: None
  • paper_authors: Flor Ortiz, Juan A. Vasquez-Peralvo, Jorge Querol, Eva Lagunas, Jorge L. Gonzalez Rios, Marcele O. K. Mendonca, Luis Garces, Victor Monzon Baeza, Symeon Chatzinotas
  • for: 本研究旨在提高卫星通信系统的资源管理效率,以满足动态交通需求和宽频带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽带宽�
    Abstract Satellite communications (SatCom) are crucial for global connectivity, especially in the era of emerging technologies like 6G and narrowing the digital divide. Traditional SatCom systems struggle with efficient resource management due to static multibeam configurations, hindering quality of service (QoS) amidst dynamic traffic demands. This paper introduces an innovative solution - real-time adaptive beamforming on multibeam satellites with software-defined payloads in geostationary orbit (GEO). Utilizing a Direct Radiating Array (DRA) with circular polarization in the 17.7 - 20.2 GHz band, the paper outlines DRA design and a supervised learning-based algorithm for on-board beamforming. This adaptive approach not only meets precise beam projection needs but also dynamically adjusts beamwidth, minimizes sidelobe levels (SLL), and optimizes effective isotropic radiated power (EIRP).
    摘要

Statistical Results of Multivariate Fox-H Function for Exact Performance Analysis of RIS-Assisted Wireless Communication

  • paper_url: http://arxiv.org/abs/2311.01312
  • repo_url: None
  • paper_authors: vinay kumar chapala, S. M. Zafaruddin
    for: This paper aims to provide an exact analysis of the ergodic capacity and outage probability of RIS-assisted wireless systems using a multivariate Fox-H function to characterize the statistical properties of the signal-to-noise ratio (SNR).methods: The proposed approach uses a novel method to obtain the distribution of the sum of independent and non-identically distributed (i.ni.d) random variables characterized by the multivariate Fox-H function. The authors also develop a general framework for an exact analysis of the ergodic capacity when the multivariate Fox-H function characterizes the statistics of SNR.results: The paper derives exact expressions for the outage probability and ergodic capacity of RIS-assisted wireless systems under Rician fading channels with phase errors. The results are validated through computer simulations, and the performance of the RIS-assisted system is demonstrated under various practically relevant scenarios for a better performance assessment.
    Abstract Existing research provides statistical results on the sum of single-variate Fox-H functions to analyze the performance of diversity receivers and reconfigurable intelligent surfaces (RIS) based wireless systems. There is a research gap in exact performance analysis when more than a single-variate Fox-H function represents the statistical characterization of wireless systems. In this paper, we propose a novel approach to obtain the distribution of the sum of independent and non-identically distributed (i.ni.d) random variables characterized by the multivariate Fox-H function. Further, we develop a general framework for an exact analysis of the ergodic capacity when the multivariate Fox-H function characterizes the statistics of signal-to-noise ratio (SNR). We apply the derived results to conduct an exact performance analysis of outage probability and ergodic capacity, taking an example of RIS-assisted communication over Rician fading channels with phase errors. We conduct computer simulations to validate the exact analysis and demonstrate performance of the RIS-assisted system under various practically relevant scenarios for a better performance assessment.
    摘要 existingu research provides statistical results on the sum of single-variate Fox-H functions to analyze the performance of diversity receivers and reconfigurable intelligent surfaces (RIS) based wireless systems. There is a research gap in exact performance analysis when more than a single-variate Fox-H function represents the statistical characterization of wireless systems. In this paper, we propose a novel approach to obtain the distribution of the sum of independent and non-identically distributed (i.ni.d) random variables characterized by the multivariate Fox-H function. Further, we develop a general framework for an exact analysis of the ergodic capacity when the multivariate Fox-H function characterizes the statistics of signal-to-noise ratio (SNR). We apply the derived results to conduct an exact performance analysis of outage probability and ergodic capacity, taking an example of RIS-assisted communication over Rician fading channels with phase errors. We conduct computer simulations to validate the exact analysis and demonstrate performance of the RIS-assisted system under various practically relevant scenarios for a better performance assessment.Here's the word-for-word translation of the text into Simplified Chinese:现有研究提供了单variate Fox-H函数的统计结果,以分析多样化接收机和智能表面(RIS)基于无线系统的性能。研究存在多variate Fox-H函数表征无线系统的统计分析凌隙。在这篇论文中,我们提出了一种新的方法,以获得独立和非相同分布(i.ni.d)随机变量的总和的分布,这些随机变量由多variate Fox-H函数表征。此外,我们还提出了一个通用的框架,用于准确分析吞吐量(ergodic capacity),当多variate Fox-H函数表征无线系统的统计。我们应用得出的结果,进行了准确的性能分析,包括失业概率和吞吐量的分析,使用RIS协助通信系统在Rician折射投射通道上的示例。我们还进行了计算机实验,以验证准确分析,并在不同的实际情况下,展示RIS协助系统的性能。

Map-assisted TDOA Localization Enhancement Based On CNN

  • paper_url: http://arxiv.org/abs/2311.01291
  • repo_url: None
  • paper_authors: Yiwen Chen, Tianqi Xiang, Xi Chen, Xin Zhang
  • for: 本研究旨在提高无线位置准备技术中的NLOS多Path效应 correction。
  • methods: 该研究提出一种基于Convolutional Neural Network (CNN)的本地化错误修正方法,通过映射中的障碍物特征提取来预测NLOS多Path效应引起的本地化错误。
  • results: 研究表明,使用CNN预测NLOS多Path效应引起的本地化错误后,对比TDOA本地化算法的结果,NLOS多Path效应 correction表现出色,可以大幅提高TDOA的准备精度。
    Abstract For signal processing related to localization technologies, non line of sight (NLOS) multipaths have great impact over the localization error level. This study proposes a localization correction method based on convolution neural network (CNN) that extracts obstacles' features from maps to predict the localization errors caused by NLOS effects. A novel compensation scheme is developed and structured around the localization error predicted by the CNN. Four prediction tasks are executed over the building distributions within the maps and the propagation model in urban zones, resulting in CNN models with high prediction accuracy. Finally, a thorough comparison of the accuracy performance between the time difference of arrival (TDOA) localization algorithm and the results after the error compensation reveals that, generally, the CNN prediction approach demonstrates a great localization error correction performance. It can be observed that the powerful feature extraction capability of CNN can be exploited by processing surrounding maps to predict localization error distribution, which has great potential in further enhancement of TDOA performance under challenging scenarios with rich multi-path propagation.
    摘要 Translated into Simplified Chinese:为信号处理相关的本地化技术,非线视程(NLOS)多paths有着很大的影响于本地化错误水平。本研究提出了基于卷积神经网络(CNN)的本地化修正方法,该方法通过从地图中提取障碍物特征来预测NLOS效应所导致的本地化错误。基于这个预测结果,我们开发了一种新的补偿方案,并将其结构化为本地化错误预测值。在 urbana 区域内,执行了四个预测任务,这些任务是基于建筑物分布和propagation模型。结果显示,CNN模型具有高度预测精度。最后,对TDOA本地化算法和补偿后的结果进行了严格的比较,可以看到,通过CNN预测方法可以对本地化错误进行高度的修正。这表明,CNN的强大特征提取能力可以通过处理周围的地图来预测本地化错误分布,这有很大的潜力,可以进一步提高TDOA性能在复杂的场景下。

ExPECA: An Experimental Platform for Trustworthy Edge Computing Applications

  • paper_url: http://arxiv.org/abs/2311.01279
  • repo_url: None
  • paper_authors: Samie Mostafavi, Vishnu Narayanan Moothedath, Stefan Rönngren, Neelabhro Roy, Gourav Prateek Sharma, Sangwon Seo, Manuel Olguín Muñoz, James Gross
  • for: 这篇论文旨在提供一个拥有综合终端到终端实验和高度复制性的edge计算和无线通信研发测试环境。
  • methods: 该测试环境基于OpenStack-based Chameleon Infrastructure(CHI)框架,利用其灵活性和操作 convenienceto provide a highly controlled underground facility for wireless experiments.
  • results: 通过使用OpenRTiST应用程序,研究人员可以在ExPECA测试环境中进行灵活的实验和性能分析,并且可以利用容器化计算环境来支持多种研究领域和实验设置。
    Abstract This paper presents ExPECA, an edge computing and wireless communication research testbed designed to tackle two pressing challenges: comprehensive end-to-end experimentation and high levels of experimental reproducibility. Leveraging OpenStack-based Chameleon Infrastructure (CHI) framework for its proven flexibility and ease of operation, ExPECA is located in a unique, isolated underground facility, providing a highly controlled setting for wireless experiments. The testbed is engineered to facilitate integrated studies of both communication and computation, offering a diverse array of Software-Defined Radios (SDR) and Commercial Off-The-Shelf (COTS) wireless and wired links, as well as containerized computational environments. We exemplify the experimental possibilities of the testbed using OpenRTiST, a latency-sensitive, bandwidth-intensive application, and analyze its performance. Lastly, we highlight an array of research domains and experimental setups that stand to gain from ExPECA's features, including closed-loop applications and time-sensitive networking.
    摘要 To demonstrate the experimental capabilities of the testbed, we use OpenRTiST, a latency-sensitive, bandwidth-intensive application, and analyze its performance. Additionally, we highlight a variety of research domains and experimental setups that can benefit from ExPECA's features, including closed-loop applications and time-sensitive networking.Translated into Simplified Chinese:这篇论文介绍了ExPECA,一个Edge computing和无线通信研究测试床,旨在解决两个紧迫的挑战:全面的端到端实验和高水平的实验复制性。利用OpenStack基础设施的Chameleon基础设施(CHI)框架,ExPECA位于一个独特的地下设施中,提供了一个高度控制的无线实验环境。测试床设计用于探索无线通信和计算的集成研究,提供了一系列Software-Defined Radio(SDR)和商业可用的无线和有线链路,以及容器化的计算环境。我们使用OpenRTiST,一个延迟敏感、带宽敏感的应用程序,来示例测试床的实验可能性,并分析其性能。此外,我们还高亮了一些研究领域和实验设置,可以从ExPECA的特点中受益,包括关闭Loop应用和时间敏感网络。

Decentralized Federated Learning on the Edge over Wireless Mesh Networks

  • paper_url: http://arxiv.org/abs/2311.01186
  • repo_url: None
  • paper_authors: Abdelaziz Salama, Achilleas Stergioulis, Syed Ali Zaidi, Des McLernon
  • for: 这个研究旨在提出一个替代方案,即分布式机器学习(Decentralized Federated Learning,DFL),并在无线网络作为通信基础建构。
  • methods: 这个研究使用了Stochastic Geometry Theory和物理干扰模型进行网络性能分析,并且对应不同的统计汇编方法(FedAvg、Krum和Median方法)进行系统实验。
  • results: 研究结果显示,将使用Genetic Algorithm进行压缩可以将 particiants的本地模型大小压缩到基线中的一半,并且与中央化 Federated Learning 和传统的分布式机器学习相比,在类别任务中实现相似的精度和平均损失。此外,它还可以实现大量数据的压缩和通信带宽的减少。
    Abstract The rapid growth of Internet of Things (IoT) devices has generated vast amounts of data, leading to the emergence of federated learning as a novel distributed machine learning paradigm. Federated learning enables model training at the edge, leveraging the processing capacity of edge devices while preserving privacy and mitigating data transfer bottlenecks. However, the conventional centralized federated learning architecture suffers from a single point of failure and susceptibility to malicious attacks. In this study, we delve into an alternative approach called decentralized federated learning (DFL) conducted over a wireless mesh network as the communication backbone. We perform a comprehensive network performance analysis using stochastic geometry theory and physical interference models, offering fresh insights into the convergence analysis of DFL. Additionally, we conduct system simulations to assess the proposed decentralized architecture under various network parameters and different aggregator methods such as FedAvg, Krum and Median methods. Our model is trained on the widely recognized EMNIST dataset for benchmarking handwritten digit classification. To minimize the model's size at the edge and reduce communication overhead, we employ a cutting-edge compression technique based on genetic algorithms. Our simulation results reveal that the compressed decentralized architecture achieves performance comparable to the baseline centralized architecture and traditional DFL in terms of accuracy and average loss for our classification task. Moreover, it significantly reduces the size of shared models over the wireless channel by compressing participants' local model sizes to nearly half of their original size compared to the baselines, effectively reducing complexity and communication overhead.
    摘要 “因互联网物联网(IoT)装置的快速增长,导致机器学习(ML)模型训练的大量数据生成,带来了分布式机器学习(FedML)的出现。 FedML可以在边缘进行模型训练,利用边缘设备的处理能力,同时保持隐私和减少数据传输瓶须。但是,传统中央化的 FedML架构受到单点失效和黑客攻击的威胁。在这篇研究中,我们研究了一种分布式 FedML(DFL)架构,通过无线 mesh 网络作为通信基础建构。我们使用Stochastic Geometry Theory和物理干扰模型进行网络性能分析,提供新的混合分析方法。此外,我们还进行系统实验,评估我们的对称架构在不同网络参数和组合方法(如FedAvg、Krum和Median方法)下的表现。我们的模型是基于EMNIST资料集,用于测试手写数字分类。为了优化模型在边缘的存储和减少通信负载,我们使用 cutting-edge 干扰技术,基于遗传算法实现模型压缩。我们的实验结果显示,压缩的分布式架构可以与中央化架构和传统的 DFL 相比,在准确性和平均损失方面具有相似的表现,同时对于我们的分类任务,对模型的共享大小进行了有效的压缩,从而减少了网络负载和复杂度。”

Enhanced Traffic Congestion Management with Fog Computing: A Simulation-based Investigation using iFog-Simulator

  • paper_url: http://arxiv.org/abs/2311.01181
  • repo_url: None
  • paper_authors: Alzahraa Elsayed, Khalil Mohamed, Hany Harb
  • for: 提高智能城市压力堵塞管理系统的准确延迟计算,以便在云计算中处理大量数据。
  • methods: fog computing技术,以实现在边缘处理而不是云计算。
  • results: 比较其他方法,包括IOV和STL,并确定提出的系统模型在各种纪录中的优劣表现。
    Abstract Accurate latency computation is essential for the Internet of Things (IoT) since the connected devices generate a vast amount of data that is processed on cloud infrastructure. However, the cloud is not an optimal solution. To overcome this issue, fog computing is used to enable processing at the edge while still allowing communication with the cloud. Many applications rely on fog computing, including traffic management. In this paper, an Intelligent Traffic Congestion Mitigation System (ITCMS) is proposed to address traffic congestion in heavily populated smart cities. The proposed system is implemented using fog computing and tested in a crowded city. Its performance is evaluated based on multiple metrics, such as traffic efficiency, energy savings, reduced latency, average traffic flow rate, and waiting time. The obtained results are compared with similar techniques that tackle the same issue. The results obtained indicate that the execution time of the simulation is 4,538 seconds, and the delay in the application loop is 49.67 seconds. The paper addresses various issues, including CPU usage, heap memory usage, throughput, and the total average delay, which are essential for evaluating the performance of the ITCMS. Our system model is also compared with other models to assess its performance. A comparison is made using two parameters, namely throughput and the total average delay, between the ITCMS, IOV (Internet of Vehicle), and STL (Seasonal-Trend Decomposition Procedure based on LOESS). Consequently, the results confirm that the proposed system outperforms the others in terms of higher accuracy, lower latency, and improved traffic efficiency.
    摘要 优质延迟计算是智能物联网(IoT)中不可或缺的,因为连接设备生成的数据量非常大,需要在云基础设施上处理。但云不是最佳解决方案。为了解决这个问题,fog计算被用来启用边缘处理,同时仍允许与云进行通信。许多应用程序依赖于fog计算,包括交通管理。在这篇论文中,一个智能交通堵塞缓解系统(ITCMS)被提出,以解决智能城市中高度拥堵的交通问题。该系统采用fog计算实现,并在热点城市进行测试。其性能被评估基于多个纪录,包括交通效率、能源储存、延迟、平均交通流速和等待时间。获得的结果与其他解决方案进行比较,结果表明,提案系统在多个纪录中表现更高精度、更低延迟和更好的交通效率。system model comparison also is made with other models to assess its performance. A comparison is made using two parameters, namely throughput and the total average delay, between the ITCMS, IOV (Internet of Vehicle), and STL (Seasonal-Trend Decomposition Procedure based on LOESS). Consequently, the results confirm that the proposed system outperforms the others in terms of higher accuracy, lower latency, and improved traffic efficiency.

Modulation Design and Optimization for RIS-Assisted Symbiotic Radios

  • paper_url: http://arxiv.org/abs/2311.01167
  • repo_url: None
  • paper_authors: Hu Zhou, Bowen Cai, Qianqian Zhang, Ruizhe Long, Yiyang Pei, Ying-Chang Liang
  • For: 提高RIS协助SR系统中干扰链的性能,解决SR系统中简单干扰链的干扰问题。* Methods: 提出一种新的Modulation scheme,将RIS的频率差分成两部分:征文件不变部分和征文件变部分,用于协助主传和传输次信号。通过解决复合信号的检测问题,提高主和次传信号的比特错误率性能。* Results: 对比 conventional modulation scheme,提出的新Modulation scheme能够提高SR系统中干扰链的性能,并且可以在主传链受阻时提供更好的性能。
    Abstract In reconfigurable intelligent surface (RIS)-assisted symbiotic radio (SR), the RIS acts as a secondary transmitter by modulating its information bits over the incident primary signal and simultaneously assists the primary transmission, then a cooperative receiver is used to jointly decode the primary and secondary signals. Most existing works of SR focus on using RIS to enhance the reflecting link while ignoring the ambiguity problem for the joint detection caused by the multiplication relationship of the primary and secondary signals. Particularly, in case of a blocked direct link, joint detection will suffer from severe performance loss due to the ambiguity, when using the conventional on-off keying and binary phase shift keying modulation schemes for RIS. To address this issue, we propose a novel modulation scheme for RIS-assisted SR that divides the phase-shift matrix into two components: the symbol-invariant and symbol-varying components, which are used to assist the primary transmission and carry the secondary signal, respectively. To design these two components, we focus on the detection of the composite signal formed by the primary and secondary signals, through which a problem of minimizing the bit error rate (BER) of the composite signal is formulated to improve both the BER performance of the primary and secondary ones. By solving the problem, we derive the closed-form solution of the optimal symbol-invariant and symbol-varying components, which is related to the channel strength ratio of the direct link to the reflecting link. Moreover, theoretical BER performance is analyzed. Finally, simulation results show the superiority of the proposed modulation scheme over its conventional counterpart.
    摘要 在协助式广播(SR)中,协助器(RIS) behave as a secondary transmitter by modulating its information bits over the incident primary signal and simultaneously assisting the primary transmission,然后使用协同接收器来联合解码主要和次要信号。大多数现有的SR工作都忽略了在共同检测中的混淆问题,特别是在直接链路被阻断时,共同检测将受到严重的性能损失due to the multiplication relationship between the primary and secondary signals。为解决这个问题,我们提出了一种新的模ulation scheme for RIS-assisted SR,该方案将分解阶跃矩阵into two components:符号不变和符号变components,用于帮助主传输和传输次要信号,分别。为设计这两个组成部分,我们关注了主传输和次要传输的复合信号的检测,并通过解决这个问题,我们得到了closed-form solution of the optimal symbol-invariant and symbol-varying components,该解决方案与直接链路到反射链路的通信强度比进行关系。此外,我们还进行了理论性能分析。最后,我们通过实验结果展示了我们的提案方案的优越性。

Combating Inter-Operator Pilot Contamination in Reconfigurable Intelligent Surfaces Assisted Multi-Operator Networks

  • paper_url: http://arxiv.org/abs/2311.01151
  • repo_url: None
  • paper_authors: Doğa Gürgünoğlu, Emil Björnson, Gábor Fodor
  • for: 研究了在多个运营商协助网络中出现的新型飞行器污染现象, где多个运营商为其各自的用户提供服务。运营商使用专用频率带,但每个智能表层在多个频率带中不可避免地反射用户设备的传输上行信号。因此,同时反射的试验信号在渠道估计阶段引入了一种新的交互运营商飞行器污染效应。
  • methods: investigate the implications of this effect in systems with either deterministic or correlated Rayleigh fading channels, specifically focusing on its impact on channel estimation quality, signal equalization, and channel capacity.
  • results: numerical results demonstrate the substantial degradation in system performance caused by this phenomenon and highlight the pressing need to address inter-operator pilot contamination in multi-operator RIS deployments. To combat the negative effect of this new type of pilot contamination, we propose to use orthogonal RIS configurations during uplink pilot transmission, which can mitigate or eliminate the negative effect of inter-operator pilot contamination at the expense of some inter-operator information exchange and orchestration.
    Abstract In this paper, we study a new kind of pilot contamination appearing in multi-operator reconfigurable intelligent surfaces (RIS) assisted networks, where multiple operators provide services to their respective served users. The operators use dedicated frequency bands, but each RIS inadvertently reflects the transmitted uplink signals of the user equipment devices in multiple bands. Consequently, the concurrent reflection of pilot signals during the channel estimation phase introduces a new inter-operator pilot contamination effect. We investigate the implications of this effect in systems with either deterministic or correlated Rayleigh fading channels, specifically focusing on its impact on channel estimation quality, signal equalization, and channel capacity. The numerical results demonstrate the substantial degradation in system performance caused by this phenomenon and highlight the pressing need to address inter-operator pilot contamination in multi-operator RIS deployments. To combat the negative effect of this new type of pilot contamination, we propose to use orthogonal RIS configurations during uplink pilot transmission, which can mitigate or eliminate the negative effect of inter-operator pilot contamination at the expense of some inter-operator information exchange and orchestration.
    摘要

Comparison of Different Segmentations in Automated Detection of Hypertension Using Electrocardiography with Empirical Mode Decomposition

  • paper_url: http://arxiv.org/abs/2311.01142
  • repo_url: None
  • paper_authors: Y. E. Erdoğan, A. Narin
  • for: 旨在早期和准确地诊断高血压(HPT)病例,使用电子心脏agram(ECG)信号进行自动识别。
  • methods: 使用Empirical Mode Decomposition方法提取5层Intrinsic Mode Function(IMF)信号,并从每个IMF中提取9个特征进行分类。
  • results: 使用5-fold cross-validation技术,在ECG数据集中实现了99.9991%和99.9989%的准确率,表明该方法在诊断HPT中具有潜在的用途。
    Abstract Hypertension (HPT) refers to a condition where the pressure exerted on the walls of arteries by blood pumped from the heart to the body reaches levels that can lead to various ailments. Annually, a significant number of lives are lost globally due to diseases linked to HPT. Therefore, the early and accurate diagnosis of HPT is of utmost importance. This study aimed to automatically and with minimal error detect patients suffering from HPT by utilizing electrocardiogram (ECG) signals. The research involved the collection of ECG signals from two distinct groups. These groups consisted of ECG data of both five thousand and ten thousand data points in length, respectively. The performance in HPT detection was evaluated using entropy measurements derived from the 5-layer Intrinsic Mode Function(IMF) signals through the application of the Empirical Mode Decomposition method. The resulting performances were compared based on the nine features extracted from each IMF. To summarize, employing the 5-fold cross-validation technique, the most exceptional accuracy rates achieved were 99.9991% and 99.9989% for ECG data of lengths five thousand and ten thousand,respectively, using decision tree algorithms. These remarkable performance results indicate the potential usefulness of this method in assisting medical professionals to identify individuals with HPT.
    摘要 高血压(HPT)是指心脏吐出到体内的血液压力超过了正常范围,可能导致多种疾病。每年全球都有很多人因与HPT相关的疾病而丧生。因此,早期准确诊断HPT的重要性是自然的。本研究目的是使用电子心电团(ECG)信号自动、准确地诊断患有HPT的患者。研究中收集了ECG信号的两个组。这两个组分别包括5000和10000个数据点的ECG数据。通过使用预测方法,对5层内含函数(IMF)信号进行了Entropy测量,以评估HPT检测的性能。基于每个IMF提取的9个特征进行比较。总结来说,通过使用5fold交叉验证法,使用决策树算法的情况下,ECG数据的5000和10000个数据点的性能最高达99.9991%和99.9989%。这些优异的性能结果表明这种方法在帮助医生诊断HPT患者有潜在的用途。

Noncontact Detection of Sleep Apnea Using Radar and Expectation-Maximization Algorithm

  • paper_url: http://arxiv.org/abs/2311.01084
  • repo_url: None
  • paper_authors: Takato Koda, Shigeaki Okumura, Hirofumi Taki, Satoshi Hamada, Hironobu Sunadome, Susumu Sato, Kazuo Chin, Takuya Sakamoto
  • for: 这研究旨在提出一种基于激光的新方法,用于准确地检测呼吸障碍事件。
  • methods: 这方法使用了望前预设的预算-最大化算法,将正常和不正常呼吸模式中的呼吸特征提取出来,实现了灵活的呼吸检测能力,无需任何实验参数。
  • results: 这研究通过对五名呼吸障碍症状患者进行同时的诊断和激光测量,发现这方法可以每小时检测出呼吸障碍事件4.8次,与传统的阈值基本方法相比,提高了准确性1.8倍,显示了我们提出的方法的有效性。
    Abstract Sleep apnea syndrome requires early diagnosis because this syndrome can lead to a variety of health problems. If sleep apnea events can be detected in a noncontact manner using radar, we can then avoid the discomfort caused by the contact-type sensors that are used in conventional polysomnography. This study proposes a novel radar-based method for accurate detection of sleep apnea events. The proposed method uses the expectation-maximization algorithm to extract the respiratory features that form normal and abnormal breathing patterns, resulting in an adaptive apnea detection capability without any requirement for empirical parameters. We conducted an experimental quantitative evaluation of the proposed method by performing polysomnography and radar measurements simultaneously in five patients with the symptoms of sleep apnea syndrome. Through these experiments, we show that the proposed method can detect the number of apnea and hypopnea events per hour with an error of 4.8 times/hour; this represents an improvement in the accuracy by 1.8 times when compared with the conventional threshold-based method and demonstrates the effectiveness of our proposed method.
    摘要 睡眠呼吸暂停综合症需早期诊断,因为这种病种可能会导致多种健康问题。如果可以使用雷达探测sleep apnea事件而不用触摸式传感器,我们就可以避免由传统多somnography所带来的不适感。本研究提出了一种基于雷达的新方法,能够准确检测sleep apnea事件。该方法使用了望望-最大化算法提取呼吸特征,从而实现了不需任何参数的自适应apnea检测能力。我们在五名睡眠呼吸暂停症病人身上同时进行了多somnography和雷达测量,并通过实验证明了我们的提议方法可以准确地检测每小时的apnea和低吸量事件数量,误差为4.8次/小时,与传统的阈值基于方法相比提高精度1.8倍,这表明了我们的提议方法的有效性。

Fourier Analysis of Signals on Directed Acyclic Graphs (DAG) Using Graph Zero-Padding

  • paper_url: http://arxiv.org/abs/2311.01073
  • repo_url: None
  • paper_authors: Ljubisa Stankovic, Milos Dakovic, Ali Bagheri Bardi, Milos Brajovic, Isidora Stankovic
  • for: 这篇论文是为了解决directed acyclic graphs(DAGs)中的causal关系、依赖关系和流动关系模型中的一个问题,即spectral analysis变得不实用,因为adjacency matrix的特征值全部为零。
  • methods: 该论文提出了一种graph zero-paddingapproach,即在原有的DAG结构上添加更多的vertices,并将这些vertices的信号值设为零。这种方法可以帮助实现DAG上的spectral评估,即计算顶点域散射无 aliasing的问题。
  • results: 该论文的研究结果表明,通过使用graph zero-paddingapproach,可以实现DAG上的spectral评估,并且不会因为Graph结构的变化而带来干扰。这种方法可以帮助解决DAG中的一些问题,如causal关系、依赖关系和流动关系的模型。
    Abstract Directed acyclic graphs (DAGs) are used for modeling causal relationships, dependencies, and flows in various systems. However, spectral analysis becomes impractical in this setting because the eigendecomposition of the adjacency matrix yields all eigenvalues equal to zero. This inherent property of DAGs results in an inability to differentiate between frequency components of signals on such graphs. This problem can be addressed by adding edges in DAG. However, this approach changes the physics of the considered problem. To address this limitation, we propose a graph zero-padding approach. This approach involves augmenting the original DAG with additional vertices that are connected to the existing structure. The added vertices are characterized by signal values set to zero. The proposed technique enables the spectral evaluation of system outputs on DAGs, that is the computation of vertex-domain convolution without the adverse effects of aliasing due to changes in graph structure.
    摘要

Continuous Fluid Antenna Systems: Modeling and Analysis

  • paper_url: http://arxiv.org/abs/2311.01058
  • repo_url: None
  • paper_authors: Constantinos Psomas, Peter J. Smith, Himal A. Suraweera, Ioannis Krikidis
  • for: 这篇论文是关于流体天线(FA)技术的研究,FA 技术可以带来无线网络中的灵活性和重新配置能力。
  • methods: 该论文使用了一个综合的框架来设计和分析流体天线系统(CFAS),并 derive了关于水平交叉率(LCR)和均匀干扰比率(SIR)过程中的closed-form analytical表达。
  • results: 研究结果表明,CFAS 在比较 discrete 天线系统时表现更好,并提供了 FA 系统的性能限制。
    Abstract Fluid antennas (FAs) is a promising technology for introducing flexibility and reconfigurability in wireless networks. Recent research efforts have highlighted the potential gains that can be achieved in comparison to conventional antennas. These works assume that the FA has a discrete number of positions that the liquid can take. However, from a practical standpoint, the liquid moves in a continuous fashion to any point inside the FA. In this paper, we focus on a continuous FA system (CFAS) and present a general framework for its design and analytical evaluation. In particular, we derive closed-form analytical expressions for the level crossing rate (LCR) and the average fade duration of the continuous signal-to-interference ratio (SIR) process over the FA's length. Then, by leveraging the LCR expression, we characterize the system's outage performance with a bound on the cumulative distribution function of the SIR's supremum. Our results confirm that the CFAS outperforms its discrete counterpart and thus provides the performance limits of FA-based systems.
    摘要 “流体天线(FA)是一种可能带来flexibility和重新配置的无线网络技术。近期的研究努力表明,相比于传统天线,FA可以获得更大的优化。”“这些研究假设FA具有确定数量的位置,但实际上,流体可以在FA中任意位置移动。在本文中,我们专注于连续FA系统(CFAS),并提出一个通用的设计框架和分析评估。”“具体来说,我们 derivated 点横过率(LCR)和平均障碍时间的关注率表达式,并使用LCR表达式来描述系统的失灵性表现。”“我们的结果显示,CFAS在比较于确定FA系统时表现更好,因此提供了FA-based系统的性能限制。”

From 5G to 6G: Revolutionizing Satellite Networks through TRANTOR Foundation

  • paper_url: http://arxiv.org/abs/2311.01055
  • repo_url: None
  • paper_authors: Pol Henarejos, Xavier Artiga, Miguel A. Vázquez, Màrius Caus, Musbah Shaat, Joan Bas, Lluís Blanco, Ana I. Pérez-Neira
  • for: 本研究旨在开发一个标准化的5G生态系统,以满足卫星互联网提供商的需求,提供更高的数据速率、更大的网络容量、更低的延迟、更可靠的连接和更高的可用性。
  • methods: 本研究使用了5G技术,包括开发多频道多轨道天线、gNodeB和UE5G非地面网络设备,以满足卫星交通的多种需求和负载。
  • results: 本研究实现了一个可扩展、安全、高效的卫星网络管理解决方案,可以满足卫星互联网提供商的增长需求和多样化需求。
    Abstract 5G technology will drastically change the way satellite internet providers deliver services by offering higher data speeds, massive network capacity, reduced latency, improved reliability and increased availability. A standardised 5G ecosystem will enable adapting 5G to satellite needs. The EU-funded TRANTOR project will seek to develop novel and secure satellite network management solutions that allow scaling up heterogeneous satellite traffic demands and capacities in a cost-effective and highly dynamic way. Researchers also target the development of flexible 6G non-terrestrial access architectures. The focus will be on the design of a multi-orbit and multi-band antenna for satellite user equipment (UE), as well as the development of gNodeB (gNB) and UE 5G non-terrestrial network equipment to support multi-connectivity.
    摘要 5G技术将完全改变卫星互联网提供商如何提供服务,提供更高的数据速率、更大的网络容量、减少延迟、改善可靠性和提高可用性。标准化的5G生态系统将帮助适应5G卫星需求。欧盟资金支持的TRANTOR项目将努力开发新的安全卫星网络管理解决方案,以满足卫星流量需求和容量的扩展和弹性scaling。研究人员还将 targets developing flexible 6G non-terrestrial access architectures。关注的方面包括卫星用户设备(UE)多天线多频段设计,以及支持多连接的gNodeB(gNB)和UE 5G非地面网络设备的开发。

Mathematical Properties of the Zadoff-Chu Sequences

  • paper_url: http://arxiv.org/abs/2311.01035
  • repo_url: None
  • paper_authors: David Gregoratti, Xavier Arteaga, Joaquim Broquetas
  • for: 这篇论文是一份收集了知名结果的赫杯-珠 sequences 论文,包括所有证明,使用一致的数学符号,为易引用。
  • methods: 论文 derivates a formula to compute the first term (频率为零) of the discrete Fourier transform of a Zadoff-Chu sequence $x_u[n]$ of prime length $N_{\text{ZC}$ and root index $u$, with constant complexity independent of the sequence length.
  • results: 论文得到了一个公式,可以计算 Zadoff-Chu sequence $x_u[n]$ 的抽象傅立叶变换的首项(频率为零),并且这个公式与赫杯-珠 sequences 的特性有关。
    Abstract This paper is a compilation of well-known results about Zadoff-Chu sequences, including all proofs with a consistent mathematical notation, for easy reference. Moreover, for a Zadoff-Chu sequence $x_u[n]$ of prime length $N_{\text{ZC}$ and root index $u$, a formula is derived that allows computing the first term (frequency zero) of its discrete Fourier transform, $X_u[0]$, with constant complexity independent of the sequence length, as opposed to accumulating all its $N_{\text{ZC}$ terms. The formula stems from a famous result in analytic number theory and is an interesting complement to the fact that the discrete Fourier transform of a Zadoff-Chu sequence is itself a Zadoff-Chu sequence whose terms are scaled by $X_u[0]$. Finally, the paper concludes with a brief analysis of time-continuous signals derived from Zadoff-Chu sequences, especially those obtained by OFDM-modulating a Zadoff-Chu sequence.
    摘要 Note: "Simplified Chinese" is a translation of the text into Traditional Chinese, which is the standard writing system used in Taiwan and other countries.Here's the translation of the text into Simplified Chinese, which is used in mainland China:这篇论文收集了关于佐道夫-楚 sequences的Well-known Results,包括所有证明,使用一致的数学符号,为易参照。此外,对佐道夫-楚序列 $x_u[n]$ 的 prime length $N_{\text{ZC}$ 和根指数 $u$, derivation 一个公式,可以计算其抽象傅立叙 Transform 的首项(频率为零),X_u[0],与Constant complexity 独立于序列长度,不同于积累所有 $N_{\text{ZC}$ 项。这个公式基于分析数论中著名的结果,是一种有趣的补充,因为抽象傅立叙 Transform 的佐道夫-楚序列自身是一个扩展的佐道夫-楚序列,其项被扩展为 $X_u[0]$。最后,文章 briefly analyzes time-continuous signals derived from Zadoff-Chu sequences, particularly those obtained by OFDM-modulating a Zadoff-Chu sequence.

cs.SD - 2023-11-01

Investigating Self-Supervised Deep Representations for EEG-based Auditory Attention Decoding

  • paper_url: http://arxiv.org/abs/2311.00814
  • repo_url: None
  • paper_authors: Karan Thakkar, Jiarui Hai, Mounya Elhilali
  • for: 这篇研究旨在探索深度自愿学习(SS)表现在脑活动讯号处理中的可行性,尤其是在复杂的声音环境中对需要的声音源进行隔离。
  • methods: 本研究使用了12个深度和2个浅层表现,对EEG数据自57名参与者和多种语言进行了评估。
  • results: 实验结果显示,深度特征在背景声音源隔离中表现出超过浅层特征,无论是哪些数据和分析窗口。这显示可能存在脑中隐藏的非线性编码,深度非线性特征可能会更好地捕捉这些隐藏的讯号。此外,研究还分析了不同层次的SS表现和窗口大小对AAD性能的影响。
    Abstract Auditory Attention Decoding (AAD) algorithms play a crucial role in isolating desired sound sources within challenging acoustic environments directly from brain activity. Although recent research has shown promise in AAD using shallow representations such as auditory envelope and spectrogram, there has been limited exploration of deep Self-Supervised (SS) representations on a larger scale. In this study, we undertake a comprehensive investigation into the performance of linear decoders across 12 deep and 2 shallow representations, applied to EEG data from multiple studies spanning 57 subjects and multiple languages. Our experimental results consistently reveal the superiority of deep features for AAD at decoding background speakers, regardless of the datasets and analysis windows. This result indicates possible nonlinear encoding of unattended signals in the brain that are revealed using deep nonlinear features. Additionally, we analyze the impact of different layers of SS representations and window sizes on AAD performance. These findings underscore the potential for enhancing EEG-based AAD systems through the integration of deep feature representations.
    摘要 听觉注意力解码(AAD)算法在复杂的听觉环境中直接从大脑活动中隔离感兴趣的声音源。although recent research has shown promise in AAD using shallow representations such as auditory envelope and spectrogram, there has been limited exploration of deep Self-Supervised (SS) representations on a larger scale. In this study, we undertake a comprehensive investigation into the performance of linear decoders across 12 deep and 2 shallow representations, applied to EEG data from multiple studies spanning 57 subjects and multiple languages. Our experimental results consistently reveal the superiority of deep features for AAD at decoding background speakers, regardless of the datasets and analysis windows. This result indicates possible nonlinear encoding of unattended signals in the brain that are revealed using deep nonlinear features. Additionally, we analyze the impact of different layers of SS representations and window sizes on AAD performance. These findings underscore the potential for enhancing EEG-based AAD systems through the integration of deep feature representations.Here's the word-for-word translation:听觉注意力解码算法在复杂的听觉环境中直接从大脑活动中隔离感兴趣的声音源。although recent research has shown promise in AAD using shallow representations such as auditory envelope and spectrogram, there has been limited exploration of deep Self-Supervised (SS) representations on a larger scale. In this study, we undertake a comprehensive investigation into the performance of linear decoders across 12 deep and 2 shallow representations, applied to EEG data from multiple studies spanning 57 subjects and multiple languages. Our experimental results consistently reveal the superiority of deep features for AAD at decoding background speakers, regardless of the datasets and analysis windows. This result indicates possible nonlinear encoding of unattended signals in the brain that are revealed using deep nonlinear features. Additionally, we analyze the impact of different layers of SS representations and window sizes on AAD performance. These findings underscore the potential for enhancing EEG-based AAD systems through the integration of deep feature representations.

C2C: Cough to COVID-19 Detection in BHI 2023 Data Challenge

  • paper_url: http://arxiv.org/abs/2311.00364
  • repo_url: None
  • paper_authors: Woo-Jin Chung, Miseul Kim, Hong-Goo Kang
  • For: The paper is written for the BHI 2023 Data Competition: Sensor challenge, with the goal of developing an acoustic-based COVID-19 diagnosis system.* Methods: The paper uses pre-processing of input signals, cough-related representation extraction leveraging Wav2vec2.0, and data augmentation to develop the Cough to COVID-19 (C2C) system.* Results: The paper demonstrates the promising potential of C2C to enhance the diagnostic accuracy of COVID-19 via cough signals, with a ROC-AUC value of 0.7810 in the context of COVID-19 diagnosis.Here is the text in Simplified Chinese:* 为:本文为BHI 2023数据比赛:感测挑战提交作品,旨在开发基于听音的 COVID-19诊断系统。* 方法:本文使用输入信号预处理、基于 Wav2vec2.0 的喊喊相关表示EXTRACT、数据增强等方法开发 Cough to COVID-19 (C2C) 系统。* 结果:本文通过实验发现,C2C 系统可以在 COVID-19 诊断中提高诊断精度,ROC-AUC 值达0.7810。
    Abstract This report describes our submission to BHI 2023 Data Competition: Sensor challenge. Our Audio Alchemists team designed an acoustic-based COVID-19 diagnosis system, Cough to COVID-19 (C2C), and won the 1st place in the challenge. C2C involves three key contributions: pre-processing of input signals, cough-related representation extraction leveraging Wav2vec2.0, and data augmentation. Through experimental findings, we demonstrate C2C's promising potential to enhance the diagnostic accuracy of COVID-19 via cough signals. Our proposed model achieves a ROC-AUC value of 0.7810 in the context of COVID-19 diagnosis. The implementation details and the python code can be found in the following link: https://github.com/Woo-jin-Chung/BHI_2023_challenge_Audio_Alchemists
    摘要 这份报告描述了我们对BHI 2023数据比赛:感知挑战的提交。我们的Audio Alchemists团队设计了基于声音的COVID-19诊断系统,叫做Cough to COVID-19(C2C),并在挑战中获得了第一名。C2C包括三个关键贡献:输入信号的预处理、基于Wav2vec2.0的喊喊相关特征提取,以及数据扩展。通过实验发现,我们示出了C2C在COVID-19诊断中的潜在优势,可以提高COVID-19诊断的准确率。我们的提出的模型在COVID-19诊断上 achievement ROC-AUC值为0.7810。更多细节和python代码可以在以下链接中找到:https://github.com/Woo-jin-Chung/BHI_2023_challenge_Audio_Alchemists。

eess.AS - 2023-11-01

Reverberant sound field equalisation for an enhanced stereo playback experience

  • paper_url: http://arxiv.org/abs/2311.00624
  • repo_url: None
  • paper_authors: James Brooks-Park, Steven van de Par
  • for: 提高室内播放质量
  • methods: 使用新型平衡技术,通过两个围壳喇叭加入杂音场能量,保持直接喇叭不变,但是listenposition处的总能量平衡
  • results: 比traditional平衡技术和stereo播放更受欢迎
    Abstract The topic of room equalisation has been at the forefront of research and product development for many years, with the aim of increasing the playback quality of loudspeakers in reverberant rooms. Traditional room equalisation systems comprise of a number of filters that when applied to the primary loudspeakers, additional room colouration is compensated for. This publication introduces a novel equalisation technique where gammatone filter band energy is added to the reverberant sound field via two surround loudspeakers, leaving the direct sound from the primary loudspeakers unaltered, but the sum of direct and reverberant energy is equalised at the listening position. Unlike traditional systems, this method allows the target function of the direct sound to differ from the reverberant sound field. The proposed method is motivated by the different roles direct and reverberant sound components play in humans perception of sound. Along with introducing the proposed method, results from a subjective listening test are presented, demonstrating the preference towards the proposed technique when compared to a traditional room equalisation technique and stereo playback.
    摘要 topic of 房间平衡已经是多年来研究和产品开发的焦点,目的是提高喷流房间中 loudspeakers 的播放质量。传统的房间平衡系统包括多个缓减器,当应用于主要喷流speakers时,会赔偿房间颜色。这篇文章介绍了一种新的平衡技术,通过两个围声speakers 将 gammatone 缓减器带能量添加到透传声场中,保留直接喷流speakers 不变,但是在听众位置进行平衡。与传统系统不同,这种方法允许目标函数直接喷流的音响不同于透传声场。该提议的方法被动机于人类听众对音响的感知中直接和透传声场的不同角色。文章还 introduce 了这种方法并发布了一个对比传统房间平衡技术和 stero 播放的主观听测结果。

An analysis of large speech models-based representations for speech emotion recognition

  • paper_url: http://arxiv.org/abs/2311.00394
  • repo_url: None
  • paper_authors: Adrian Bogdan Stânea, Vlad Striletchi, Cosmin Striletchi, Adriana Stan
  • for: 这项研究是为了研究大语音模型中的特征,以及这些特征在语音情感识别任务中的表现。
  • methods: 这项研究使用了预训练的大语音模型,并explored其内置的抽象能力。研究使用了简单的分类方法,以避免对任务添加知识或干扰。
  • results: 研究发现,无需迁移,一些大语音模型的表示能够包含情感识别任务中的信息,使得表现与标准数据集上的 state-of-the-art 结果几乎相同。
    Abstract Large speech models-derived features have recently shown increased performance over signal-based features across multiple downstream tasks, even when the networks are not finetuned towards the target task. In this paper we show the results of an analysis of several signal- and neural models-derived features for speech emotion recognition. We use pretrained models and explore their inherent potential abstractions of emotions. Simple classification methods are used so as to not interfere or add knowledge to the task. We show that, even without finetuning, some of these large neural speech models' representations can enclose information that enables performances close to, and even beyond state-of-the-art results across six standard speech emotion recognition datasets.
    摘要 大型语音模型Derived feature在多个下游任务中表现出来的提高,即使无需finetune到目标任务。在这篇论文中,我们进行了各种信号模型和神经网络模型Derived feature的分析,用于speech emotion认知。我们使用预训练模型,探索它们的内在抽象情感。我们使用简单的分类方法,以避免干扰或添加任务知识。我们发现,无需finetune,一些大型语音模型的表示可以包含情感认知的信息,使其表现与或超过标准六个speech emotion认知 dataset的状态的报告。

cs.CV - 2023-11-01

A Call to Arms: AI Should be Critical for Social Media Analysis of Conflict Zones

  • paper_url: http://arxiv.org/abs/2311.00810
  • repo_url: None
  • paper_authors: Afia Abedin, Abdul Bais, Cody Buntain, Laura Courchesne, Brian McQuinn, Matthew E. Taylor, Muhib Ullah
  • for: 本研究旨在利用计算机视觉技术来跟踪乌克兰冲突中不同类型武装部队的武器系统和 armed groups的标识符,以及这些武器系统在冲突中的分布。
  • methods: 本研究使用计算机视觉技术来识别和跟踪乌克兰冲突中的武器系统和 armed groups。
  • results: 本研究可能可以跟踪冲突中不同类型武装部队的武器系统和 armed groups的使用情况,以及这些武器系统在冲突中的分布。这种系统可能可以用于实时跟踪冲突,包括人道主义和医疗援助的需求。
    Abstract The massive proliferation of social media data represents a transformative moment in conflict studies. This data can provide unique insights into the spread and use of weaponry, but the scale and types of data are problematic for traditional open-source intelligence. This paper presents preliminary, transdisciplinary work using computer vision to identify specific weapon systems and the insignias of the armed groups using them. There is potential to not only track how weapons are distributed through networks of armed units but also to track which types of weapons are being used by the different types of state and non-state military actors in Ukraine. Such a system could ultimately be used to understand conflicts in real-time, including where humanitarian and medical aid is most needed. We believe that using AI to help automate such processes should be a high-priority goal for our community, with near-term real-world payoffs.
    摘要 “社交媒体数据的庞大扩散 представляет一个转变时刻在冲突研究中。这些数据可以提供唯一的察视武器的扩散和使用情况,但传统的开源情报处理这些数据的规模和类型具有挑战。本文发表了初步的跨学科工作,使用计算机视觉来识别特定的武器系统和武装组织使用的标识符。这种系统可以跟踪武器在武装单位网络中的分布,以及不同类型的国家和非国家军事 acted in ukraine 中的武器使用类型。这种系统可以在实时进行跟踪,包括人道主义和医疗援助的需求。我们认为,使用AI自动化这些过程应该是我们社区的高优先事项,具有近期的实际应用效果。”Note: Please note that the translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need the translation in Traditional Chinese, please let me know.

VQA-GEN: A Visual Question Answering Benchmark for Domain Generalization

  • paper_url: http://arxiv.org/abs/2311.00807
  • repo_url: None
  • paper_authors: Suraj Jyothi Unni, Raha Moraffah, Huan Liu
  • for: 这个论文的目的是提出一个多模态 benchmark dataset,以便评估视觉问答模型在不同的分布Shift下的一致性。
  • methods: 这个论文使用了一个shift induced pipeline来生成多模态分布Shift,并对现有的VQA模型进行评估。
  • results: 实验表明,VQA-GEN dataset能够暴露现有的VQA模型在多模态分布Shift下的敏感性,并且模型在VQA-GEN dataset上进行训练后,在跨Domains和内Domains中表现出了改善。此外,这个论文还分析了每种分布Shift技术的重要性,以便更好地理解模型在不同的分布Shift下的一致性。
    Abstract Visual question answering (VQA) models are designed to demonstrate visual-textual reasoning capabilities. However, their real-world applicability is hindered by a lack of comprehensive benchmark datasets. Existing domain generalization datasets for VQA exhibit a unilateral focus on textual shifts while VQA being a multi-modal task contains shifts across both visual and textual domains. We propose VQA-GEN, the first ever multi-modal benchmark dataset for distribution shift generated through a shift induced pipeline. Experiments demonstrate VQA-GEN dataset exposes the vulnerability of existing methods to joint multi-modal distribution shifts. validating that comprehensive multi-modal shifts are critical for robust VQA generalization. Models trained on VQA-GEN exhibit improved cross-domain and in-domain performance, confirming the value of VQA-GEN. Further, we analyze the importance of each shift technique of our pipeline contributing to the generalization of the model.
    摘要 <>传入文本转化为简化中文。<>视觉问答(VQA)模型是用来演示视觉文本理解能力的。然而,它们在实际应用中受到了全面的基准数据集的限制。现有的领域泛化数据集 для VQA 具有单一的文本变化预测,而 VQA 是一个多Modal任务,其中包括视觉和文本频率的变化。我们提议 VQA-GEN,是首个基于shift induced pipeline的多Modal基准数据集。实验表明,VQA-GEN数据集会暴露现有方法对于多Modal共同分布的敏感性。这 validate了需要对多Modal分布进行robust化,以确保VQA模型的通用化。模型在 VQA-GEN 上训练后,在跨频道和内频道性能都有所提高,证明了 VQA-GEN 的价值。此外,我们分析了我们的排序队列中每种排序技术的重要性,以确定模型的通用化。

Automatic counting of planting microsites via local visual detection and global count estimation

  • paper_url: http://arxiv.org/abs/2311.00796
  • repo_url: None
  • paper_authors: Ahmed Zgaren, Wassim Bouachir, Nizar Bouguila
  • for: 这篇论文是用于自动估计植树块中垫峰的数量的。
  • methods: 该方法使用了计算机视觉和人工智能技术,将估计任务转化为一个超级vised学习问题,使用了两个预测模型。首先,使用深度特征来探测可见的垫峰,然后使用块级特征来提供最终估计。
  • results: 对于 constructed UAV dataset,实验表明提案方法比人工方法在精度上具有优势,同时可以大幅降低时间和成本。
    Abstract In forest industry, mechanical site preparation by mounding is widely used prior to planting operations. One of the main problems when planning planting operations is the difficulty in estimating the number of mounds present on a planting block, as their number may greatly vary depending on site characteristics. This estimation is often carried out through field surveys by several forestry workers. However, this procedure is prone to error and slowness. Motivated by recent advances in UAV imagery and artificial intelligence, we propose a fully automated framework to estimate the number of mounds on a planting block. Using computer vision and machine learning, we formulate the counting task as a supervised learning problem using two prediction models. A local detection model is firstly used to detect visible mounds based on deep features, while a global prediction function is subsequently applied to provide a final estimation based on block-level features. To evaluate the proposed method, we constructed a challenging UAV dataset representing several plantation blocks with different characteristics. The performed experiments demonstrated the robustness of the proposed method, which outperforms manual methods in precision, while significantly reducing time and cost.
    摘要 在森林工业中,机械场准备通过堆填是广泛使用的,以便进行植树操作。一个主要的问题在规划植树操作时是计算堆填的数量,因为它们的数量可能会很大地变化,很难估算。这个估算通常通过外部考察而进行,但这种方法容易出错和慢。驱动了最近的无人机影像和人工智能的进步,我们提出了一个完全自动化的计算方法,以便计算堆填的数量。使用计算机视觉和机器学习,我们将计算任务转化为一个监督学习问题,使用两个预测模型。首先,我们使用深度特征来检测可见的堆填,然后使用块级特征来提供最终的估算。为评估我们的方法,我们构建了一个具有不同特点的无人机数据集,表示了多个植树块。实验表明,我们的方法比手动方法更加精准,同时可以明显减少时间和成本。

What User Behaviors Make the Differences During the Process of Visual Analytics?

  • paper_url: http://arxiv.org/abs/2311.00690
  • repo_url: None
  • paper_authors: Shahin Doroudian, Zekun Wu, Aidong Lu
  • for: 本研究旨在提高视觉分析过程中的理解,以便提高视觉设计和交互功能的发展。
  • methods: 本研究使用时间序列分类方法进行数据收集和分析,以了解用户在视觉分析过程中的行为。
  • results: 研究发现,用户在视觉分析过程中的行为可以被分 distinguish,并且存在用户物理行为和视觉任务之间的强相关性。此外,我们还提出了一种自动地study sensemaking的方法,无需繁重的手动标注。
    Abstract The understanding of visual analytics process can benefit visualization researchers from multiple aspects, including improving visual designs and developing advanced interaction functions. However, the log files of user behaviors are still hard to analyze due to the complexity of sensemaking and our lack of knowledge on the related user behaviors. This work presents a study on a comprehensive data collection of user behaviors, and our analysis approach with time-series classification methods. We have chosen a classical visualization application, Covid-19 data analysis, with common analysis tasks covering geo-spatial, time-series and multi-attributes. Our user study collects user behaviors on a diverse set of visualization tasks with two comparable systems, desktop and immersive visualizations. We summarize the classification results with three time-series machine learning algorithms at two scales, and explore the influences of behavior features. Our results reveal that user behaviors can be distinguished during the process of visual analytics and there is a potentially strong association between the physical behaviors of users and the visualization tasks they perform. We also demonstrate the usage of our models by interpreting open sessions of visual analytics, which provides an automatic way to study sensemaking without tedious manual annotations.
    摘要 理解视觉分析过程可以为视觉研究人员带来多方面的利益,包括改进视觉设计和开发高级交互功能。然而,用户行为的日志仍然具有复杂的感知和我们对相关用户行为的不了解。这项工作提出了一项全面的用户行为数据收集研究,以及我们的分析方法和时间序列分类方法。我们选择了一个经典的视觉应用程序,即COVID-19数据分析,并在这个应用程序中进行了常见的分析任务,包括地理空间、时间序列和多属性。我们的用户研究收集了用户在多个视觉任务上的行为记录,并对两种相似的系统进行了对比分析。我们总结了三种时间序列机器学习算法的分类结果,并探索了行为特征的影响。我们的结果表明,用户行为在视觉分析过程中可以分辨出来,并且用户的物理行为和他们执行的视觉任务之间可能存在强相关性。此外,我们还证明了我们的模型可以通过自动地解释开放会话,以便无需繁琐的手动标注,来研究感知过程。

Collaboration in Immersive Environments: Challenges and Solutions

  • paper_url: http://arxiv.org/abs/2311.00689
  • repo_url: https://github.com/jettbrains/-L-
  • paper_authors: Shahin Doroudian, Zachary Wartell
  • for: This paper provides an overview of the current state of research on collaboration in immersive environments, including Virtual Reality (VR) and Augmented Reality (AR) settings.
  • methods: The paper discusses the different types of immersive environments, including VR and AR, and the different forms of collaboration that can occur in these environments.
  • results: The paper highlights the challenges and limitations of collaboration in immersive environments, such as the lack of physical cues, cost and usability, and the need for further research in this area.Here is the same information in Simplified Chinese text:
  • for: 这篇论文提供了现有关于在虚拟和扩展实现中进行协作的研究状况概述,包括虚拟和扩展实现环境。
  • methods: 论文讨论了不同类型的虚拟和扩展实现环境,以及在这些环境中可能出现的不同合作形式。
  • results: 论文强调了在虚拟和扩展实现环境中进行协作的挑战和限制,如缺乏物理提示、成本和使用性问题,以及需要进一步的研究。
    Abstract Virtual Reality (VR) and Augmented Reality (AR) tools have been applied in all engineering fields in order to avoid the use of physical prototypes, to train in high-risk situations, and to interpret real or simulated results. In order to complete a shared task or assign tasks to the agents in such immersive environments, collaboration or Shared Cooperative Activities are a necessity. Collaboration in immersive environments is an emerging field of research that aims to study and enhance the ways in which people interact and work together in Virtual and Augmented Reality settings. Collaboration in immersive environments is a complex process that involves different factors such as communication, coordination, and social presence. This paper provides an overview of the current state of research on collaboration in immersive environments. It discusses the different types of immersive environments, including VR and AR, and the different forms of collaboration that can occur in these environments. The paper also highlights the challenges and limitations of collaboration in immersive environments, such as the lack of physical cues, cost and usability and the need for further research in this area. Overall, collaboration in immersive environments is a promising field with a wide range of potential applications, from education to industry, and it can benefit both individuals and groups by enhancing their ability to work together effectively.
    摘要 虚拟现实(VR)和增强现实(AR)工具在所有工程领域中应用,以避免使用物理原型,进行高风险的训练,并解释现实或模拟结果。为完成共同任务或分配任务给代理人,在这些吸引环境中进行合作或共同活动是必要。合作在吸引环境中是一项新兴的研究领域,旨在研究人们在虚拟和增强现实Setting中如何交互和合作工作。在这些环境中进行合作是一个复杂的过程,涉及到不同的因素,如通信、协调和社交存在。本文提供了有关现有研究的总体情况,包括VR和AR等不同类型的吸引环境,以及在这些环境中不同形式的合作。文章还强调了在吸引环境中合作的挑战和限制,如物理冲击缺失、成本和可用性问题,以及需要进一步研究。总的来说,在吸引环境中进行合作是一个有前途的领域,它可以为人们提供更好的合作方式,从教育到行业,并且对个人和团队来说都有益。

ProcSim: Proxy-based Confidence for Robust Similarity Learning

  • paper_url: http://arxiv.org/abs/2311.00668
  • repo_url: None
  • paper_authors: Oriol Barbany, Xiaofan Lin, Muhammet Bastan, Arnab Dhua
  • for: 学习一个嵌入空间,使得输入之间的距离与其内在Semantic相似性有高度相关性。
  • methods: 使用ProcSim框架,对每个样本计算 норamlized距离到类表现者的信任分数。
  • results: 实验结果表明,提议的方法在投入uniform和提议的semantic coherent noise的DMLbenchmark数据集上达到了状态 искусственный智能的性能。
    Abstract Deep Metric Learning (DML) methods aim at learning an embedding space in which distances are closely related to the inherent semantic similarity of the inputs. Previous studies have shown that popular benchmark datasets often contain numerous wrong labels, and DML methods are susceptible to them. Intending to study the effect of realistic noise, we create an ontology of the classes in a dataset and use it to simulate semantically coherent labeling mistakes. To train robust DML models, we propose ProcSim, a simple framework that assigns a confidence score to each sample using the normalized distance to its class representative. The experimental results show that the proposed method achieves state-of-the-art performance on the DML benchmark datasets injected with uniform and the proposed semantically coherent noise.
    摘要 深度度量学(DML)方法目标是学习一个尺度空间,其中距离与输入的内在Semantic相似性 closely related。先前的研究表明,流行的benchmark数据集经常包含大量的错误标签,而DML方法受其影响。为了研究实际噪声的效果,我们创建了一个数据集中类别的 ontology,并使用其来模拟Semantically coherent的标签错误。为了训练Robust DML模型,我们提议ProcSim,一种简单的框架,对每个样本分配一个信任分数,基于样本的normalized distance to its class representative。实验结果显示,我们的方法在DML benchmark数据集中 uniformly和我们所提出的semantically coherent噪声下 дости得了state-of-the-art性能。

TPSeNCE: Towards Artifact-Free Realistic Rain Generation for Deraining and Object Detection in Rain

  • paper_url: http://arxiv.org/abs/2311.00660
  • repo_url: https://github.com/shenzheng2000/tpsence
  • paper_authors: Shen Zheng, Changjie Lu, Srinivasa G. Narasimhan
  • for: 这 paper 的目的是提出一种无对应图像传输框架,以生成更加真实的雨天图像,并减少了雨天图像生成中的噪声和扭曲。
  • methods: 该 paper 使用了一种Triangular Probability Similarity (TPS) 约束,以引导生成的图像向清晰和雨天图像的槽中靠拢,从而减少生成过程中的噪声和扭曲。此外,它还使用了一种Semantic Noise Contrastive Estimation (SeNCE) 策略,重新评估负样本的推动力度,根据清晰和雨天图像之间的语义相似性和特征相似性。
  • results: 实验结果表明,该方法可以生成更加真实的雨天图像,减少了噪声和扭曲。此外,该方法还可以用于生成真实的雪天和夜天图像,强调其普遍应用性。代码可以在 https://github.com/ShenZheng2000/TPSeNCE 上获取。
    Abstract Rain generation algorithms have the potential to improve the generalization of deraining methods and scene understanding in rainy conditions. However, in practice, they produce artifacts and distortions and struggle to control the amount of rain generated due to a lack of proper constraints. In this paper, we propose an unpaired image-to-image translation framework for generating realistic rainy images. We first introduce a Triangular Probability Similarity (TPS) constraint to guide the generated images toward clear and rainy images in the discriminator manifold, thereby minimizing artifacts and distortions during rain generation. Unlike conventional contrastive learning approaches, which indiscriminately push negative samples away from the anchors, we propose a Semantic Noise Contrastive Estimation (SeNCE) strategy and reassess the pushing force of negative samples based on the semantic similarity between the clear and the rainy images and the feature similarity between the anchor and the negative samples. Experiments demonstrate realistic rain generation with minimal artifacts and distortions, which benefits image deraining and object detection in rain. Furthermore, the method can be used to generate realistic snowy and night images, underscoring its potential for broader applicability. Code is available at https://github.com/ShenZheng2000/TPSeNCE.
    摘要 雨生成算法有potential以提高雨天情况下的泛化和场景理解。然而,在实践中,它们会生成artefacts和扭曲,并且控制雨水生成的量因缺乏合适的约束而困难。在这篇论文中,我们提出了一种不带对应图像的图像到图像翻译框架,用于生成真实的雨天图像。我们首先引入了三角形概率相似(TPS)约束,以导引生成的图像向清晰和雨天图像的权重空间中迁移,从而减少artefacts和扭曲在雨水生成过程中。不同于传统的对比学习方法,我们提出了semantic Noise Contrastive Estimation(SeNCE)策略,并重新评估负样本的推动力度基于清晰和雨天图像之间的semantic相似性和特征相似性。实验表明可以生成真实的雨天图像,同时减少artefacts和扭曲,这对图像抢掉和物体检测在雨天有益。此外,该方法还可以用于生成真实的雪天和夜天图像,这augments its potential for broader applicability。代码可以在https://github.com/ShenZheng2000/TPSeNCE上获取。

De-Diffusion Makes Text a Strong Cross-Modal Interface

  • paper_url: http://arxiv.org/abs/2311.00618
  • repo_url: None
  • paper_authors: Chen Wei, Chenxi Liu, Siyuan Qiao, Zhishuai Zhang, Alan Yuille, Jiahui Yu
  • for: 这 paper 是用于提出一种新的文本-图像 interfaces,它可以将图像转换为文本形式,从而提高图像和文本之间的交互性和灵活性。
  • methods: 这 paper 使用了一种自适应文本-图像干涉模型,其中的编码器将输入图像转换为文本形式,然后使用固定的文本-图像干涉解码器来重建输入图像。这种方法被称为 De-Diffusion。
  • results: 实验表明,De-Diffusion 可以准确地将图像转换为文本形式,并且可以让这种文本形式被Off-the-shelf text-to-image工具和大型自然语言处理器(LLMs)进行多样化的多模态任务。例如,一个 De-Diffusion 模型可以通过提供不同的文本描述来生成可重用的描述,并且在开放式视觉语言任务中达到了新的州OF-THE-ART。
    Abstract We demonstrate text as a strong cross-modal interface. Rather than relying on deep embeddings to connect image and language as the interface representation, our approach represents an image as text, from which we enjoy the interpretability and flexibility inherent to natural language. We employ an autoencoder that uses a pre-trained text-to-image diffusion model for decoding. The encoder is trained to transform an input image into text, which is then fed into the fixed text-to-image diffusion decoder to reconstruct the original input -- a process we term De-Diffusion. Experiments validate both the precision and comprehensiveness of De-Diffusion text representing images, such that it can be readily ingested by off-the-shelf text-to-image tools and LLMs for diverse multi-modal tasks. For example, a single De-Diffusion model can generalize to provide transferable prompts for different text-to-image tools, and also achieves a new state of the art on open-ended vision-language tasks by simply prompting large language models with few-shot examples.
    摘要 我们展示了文本作为强大的跨模态界面。而不是依靠深度嵌入来连接图像和语言作为界面表示,我们的方法将图像转换为文本,从而得到了自然语言中的可读性和灵活性。我们使用一个自适应Encoder,使用预训练的文本到图像扩散模型进行解码。解码器被训练以将输入图像转换为文本,然后将其传递给固定的文本到图像扩散解码器进行重建原始输入——一个过程我们称为“拆解”。实验证明了我们的拆解方法可以准确地表示图像,并且可以轻松地被普通的文本到图像工具和大语言模型进行多样化的多模态任务。例如,单个拆解模型可以泛化提供不同文本到图像工具的可移植提示,同时也实现了开放式视觉语言任务的新 государ录录。

Occluded Person Re-Identification with Deep Learning: A Survey and Perspectives

  • paper_url: http://arxiv.org/abs/2311.00603
  • repo_url: None
  • paper_authors: Enhao Ning, Changshuo Wang, Huang Zhangc, Xin Ning, Prayag Tiwari
  • for: 本研究评估了多种遮盲人识别技术,以提高人识别系统的可靠性和精度。
  • methods: 本研究使用了深度学习技术,对 occluded person Re-ID 方法进行了系统性的比较和分析,并提出了未来发展的想法。
  • results: 研究发现了一些状态级别的 occluded person Re-ID 方法,并对这些方法进行了系统性的评估和比较。
    Abstract Person re-identification (Re-ID) technology plays an increasingly crucial role in intelligent surveillance systems. Widespread occlusion significantly impacts the performance of person Re-ID. Occluded person Re-ID refers to a pedestrian matching method that deals with challenges such as pedestrian information loss, noise interference, and perspective misalignment. It has garnered extensive attention from researchers. Over the past few years, several occlusion-solving person Re-ID methods have been proposed, tackling various sub-problems arising from occlusion. However, there is a lack of comprehensive studies that compare, summarize, and evaluate the potential of occluded person Re-ID methods in detail. In this review, we start by providing a detailed overview of the datasets and evaluation scheme used for occluded person Re-ID. Next, we scientifically classify and analyze existing deep learning-based occluded person Re-ID methods from various perspectives, summarizing them concisely. Furthermore, we conduct a systematic comparison among these methods, identify the state-of-the-art approaches, and present an outlook on the future development of occluded person Re-ID.
    摘要 人识别(Re-ID)技术在智能监测系统中发挥越来越重要的作用。广泛的遮挡会导致人识别的性能下降。遮挡人识别指的是一种受到人员信息损失、干扰噪声和视角偏移等挑战的人识别方法。这一问题在过去几年内吸引了大量研究人员的关注。在这篇评论中,我们首先提供了遮挡人识别数据集和评价方法的详细介绍。然后,我们科学地分类和分析了现有的深度学习基于遮挡人识别方法,从多种角度进行了系统性的总结。此外,我们进行了系统性的比较,确定了状态之最佳方法,并对未来人识别领域的发展提出了一个前look。

PAUMER: Patch Pausing Transformer for Semantic Segmentation

  • paper_url: http://arxiv.org/abs/2311.00586
  • repo_url: None
  • paper_authors: Evann Courdier, Prabhu Teja Sivaprasad, François Fleuret
  • for: 提高图像分割器的效率,通过不同的计算量来区分不同部分的图像。
  • methods: 使用预测结果的熵作为挫止计算的标准,以实现在最终解码器之前停止计算。
  • results: 在 Cityscapes 和 ADE20K 两个标准图像分割数据集上,我们的方法可以实现约 $50%$ 的高速运行,并且对 mIoU 的下降只有 $0.65%$ 和 $4.6%$ 分别。
    Abstract We study the problem of improving the efficiency of segmentation transformers by using disparate amounts of computation for different parts of the image. Our method, PAUMER, accomplishes this by pausing computation for patches that are deemed to not need any more computation before the final decoder. We use the entropy of predictions computed from intermediate activations as the pausing criterion, and find this aligns well with semantics of the image. Our method has a unique advantage that a single network trained with the proposed strategy can be effortlessly adapted at inference to various run-time requirements by modulating its pausing parameters. On two standard segmentation datasets, Cityscapes and ADE20K, we show that our method operates with about a $50\%$ higher throughput with an mIoU drop of about $0.65\%$ and $4.6\%$ respectively.
    摘要 我团队研究如何通过不同的计算量来提高分割变换器的效率。我们的方法PAUMER使用停止计算的方式来实现这一点,在最终解码器之前停止计算某些patches。我们使用Intermediate activation的 entropy来决定停止计算的标准,发现它与图像 semantics 吻合得非常好。我们的方法有一个独特的优势,可以在执行时根据需要适应不同的运行时参数,只需要调整停止计算的参数即可。在Cityscapes和ADE20K两个标准分割dataset上,我们证明了我们的方法可以实现约50%的更高的throughput,并且相应的mIoU下降约0.65%和4.6%。

A Robust Deep Learning Method with Uncertainty Estimation for the Pathological Classification of Renal Cell Carcinoma based on CT Images

  • paper_url: http://arxiv.org/abs/2311.00567
  • repo_url: None
  • paper_authors: Ni Yao, Hang Hu, Kaicong Chen, Chen Zhao, Yuan Guo, Boya Li, Jiaofen Nan, Yanting Li, Chuang Han, Fubao Zhu, Weihua Zhou, Li Tian
  • for: 这个研究的目的是为了使用深度学习模型来帮助医生在骨髓癌手术前进行不同类型的肾癌诊断,以提高诊断的准确性和有效性。
  • methods: 这个研究使用了5-fold横推分来运行深度学习模型,并在模型中包含不确定性估计,以提高模型的准确性和可靠性。
  • results: 研究结果显示,这个深度学习模型在5-fold横推分中的AUC值为0.868(95% CI:0.826-0.923),并在外部验证集中的AUC值为0.856(95% CI:0.838-0.882)。这表示这个模型在预后诊断肾癌不同类型的过程中表现了良好的准确性和可靠性。
    Abstract Objectives To develop and validate a deep learning-based diagnostic model incorporating uncertainty estimation so as to facilitate radiologists in the preoperative differentiation of the pathological subtypes of renal cell carcinoma (RCC) based on CT images. Methods Data from 668 consecutive patients, pathologically proven RCC, were retrospectively collected from Center 1. By using five-fold cross-validation, a deep learning model incorporating uncertainty estimation was developed to classify RCC subtypes into clear cell RCC (ccRCC), papillary RCC (pRCC), and chromophobe RCC (chRCC). An external validation set of 78 patients from Center 2 further evaluated the model's performance. Results In the five-fold cross-validation, the model's area under the receiver operating characteristic curve (AUC) for the classification of ccRCC, pRCC, and chRCC was 0.868 (95% CI: 0.826-0.923), 0.846 (95% CI: 0.812-0.886), and 0.839 (95% CI: 0.802-0.88), respectively. In the external validation set, the AUCs were 0.856 (95% CI: 0.838-0.882), 0.787 (95% CI: 0.757-0.818), and 0.793 (95% CI: 0.758-0.831) for ccRCC, pRCC, and chRCC, respectively. Conclusions The developed deep learning model demonstrated robust performance in predicting the pathological subtypes of RCC, while the incorporated uncertainty emphasized the importance of understanding model confidence, which is crucial for assisting clinical decision-making for patients with renal tumors. Clinical relevance statement Our deep learning approach, integrated with uncertainty estimation, offers clinicians a dual advantage: accurate RCC subtype predictions complemented by diagnostic confidence references, promoting informed decision-making for patients with RCC.
    摘要 目的:通过深度学习模型并实现uncertainty estimation,帮助放射学家在 préoperative 阶段 diferenciate 肾癌细型(RCC)的 PATHOLOGICAL 亚型,基于 CT 图像。方法:收集了 668 例 consecutive 病例数据,确诊为 RCC,从 Center 1 进行了 Retrospective 收集。通过五fold 交叉验证,我们开发了一种 incorporating uncertainty estimation 的深度学习模型,用于分类 RCC 亚型为 clear cell RCC(ccRCC)、papillary RCC(pRCC)和 chromophobe RCC(chRCC)。外验证集中心 2 中的 78 例病例进一步评估了模型的性能。结果:在五fold 交叉验证中,模型的 Receiver Operating Characteristic Curve(AUC)为 ccRCC、pRCC 和 chRCC 的分类为 0.868(95% CI:0.826-0.923)、0.846(95% CI:0.812-0.886)和 0.839(95% CI:0.802-0.88),分别。在外验证集中,AUCs 为 0.856(95% CI:0.838-0.882)、0.787(95% CI:0.757-0.818)和 0.793(95% CI:0.758-0.831)。结论:我们开发的深度学习模型在预测 RCC 亚型方面表现了robust,同时 incorporated uncertainty 强调了理解模型confidence的重要性,这对于帮助诊断肾肿瘤病人具有重要的价值。临床实践意义:我们的深度学习方法,integrated with uncertainty estimation,为放射学家提供了 dual advantage:准确地预测 RCC 亚型,同时提供了诊断 confidence 参考,为肾肿瘤病人提供了 informed decision-making。

CROMA: Remote Sensing Representations with Contrastive Radar-Optical Masked Autoencoders

  • paper_url: http://arxiv.org/abs/2311.00566
  • repo_url: https://github.com/antofuller/croma
  • paper_authors: Anthony Fuller, Koreen Millard, James R. Green
  • for: 这个研究旨在开发一个基于自类超级学习的框架,以 learns rich unimodal和多modal表现。
  • methods: 这个框架 combine了对比和重建自我超级学习目标,分别将陌生 multispectral 和 Synthetic Aperture Radar 标本处理为 masked-out 状态,并在空间和时间Alignment的情况下进行 Cross-modal contrastive learning。
  • results: 这个框架可以实现高效地 extrapolate 到大型测试影像(最大17.6倍),并且在四个分类 bencmark 上进行评估,包括 fine-tuning、线性和非线性 probing、kNN 分类和 K-means 对应。
    Abstract A vital and rapidly growing application, remote sensing offers vast yet sparsely labeled, spatially aligned multimodal data; this makes self-supervised learning algorithms invaluable. We present CROMA: a framework that combines contrastive and reconstruction self-supervised objectives to learn rich unimodal and multimodal representations. Our method separately encodes masked-out multispectral optical and synthetic aperture radar samples -- aligned in space and time -- and performs cross-modal contrastive learning. Another encoder fuses these sensors, producing joint multimodal encodings that are used to predict the masked patches via a lightweight decoder. We show that these objectives are complementary when leveraged on spatially aligned multimodal data. We also introduce X- and 2D-ALiBi, which spatially biases our cross- and self-attention matrices. These strategies improve representations and allow our models to effectively extrapolate to images up to 17.6x larger at test-time. CROMA outperforms the current SoTA multispectral model, evaluated on: four classification benchmarks -- finetuning (avg. 1.8%), linear (avg. 2.4%) and nonlinear (avg. 1.4%) probing, kNN classification (avg. 3.5%), and K-means clustering (avg. 8.4%); and three segmentation benchmarks (avg. 6.4%). CROMA's rich, optionally multimodal representations can be widely leveraged across remote sensing applications.
    摘要 一个非常重要和快速发展的应用程序,远程感知提供了庞大但罕见标注的、空间对齐的多模态数据,这使得无监督学习算法成为了非常重要的。我们提出了 CROMA 框架,该框架将对比和重建自我监督目标结合在一起,以学习丰富的单模态和多模态表示。我们在 espacio 和时间方面对多普通频谱光学和Synthetic Aperture Radar 样本进行了隐藏masking,然后通过对各个感知器进行交叉模式对比学习来学习单模态和多模态表示。另一个Encoder 将这些感知器进行融合,生成了联合多模态编码,并使用轻量级解码器来预测隐藏的补充。我们表明了这些目标之间的对比性,并引入了 X-和2D-ALiBi 的空间偏好矩阵,以提高表示和使模型能够在测试时 extrapolate 到大小为 17.6 倍的图像。CROMA 在四个分类标准 bencmarks 上取得了 SoTA 的最佳成绩,包括:四个分类 benchmarks 的 fine-tuning (平均 1.8%)、直接学习(平均 2.4%)和非直接学习(平均 1.4%) probing、kNN 分类(平均 3.5%)和 K-means 归一化(平均 8.4%)。此外,CROMA 在三个分割标准 bencmarks 上取得了平均 6.4% 的成绩。CROMA 的丰富、可选的多模态表示可以广泛应用于远程感知领域。

MNN: Mixed Nearest-Neighbors for Self-Supervised Learning

  • paper_url: http://arxiv.org/abs/2311.00562
  • repo_url: https://github.com/pc-cp/mnn
  • paper_authors: Chen Peng, Xianzhong Long, Yun Li
  • for: 提高自动学习中的自我超视标注的性能和训练效率
  • methods: 基于权重融合和图像混合操作的简单自我超视标注框架Mixed Nearest-Neighbors for Self-Supervised Learning (MNN)
  • results: 在四个标准数据集上表现出色的泛化性和训练效率
    Abstract In contrastive self-supervised learning, positive samples are typically drawn from the same image but in different augmented views, resulting in a relatively limited source of positive samples. An effective way to alleviate this problem is to incorporate the relationship between samples, which involves including the top-k nearest neighbors of positive samples in the framework. However, the problem of false neighbors (i.e., neighbors that do not belong to the same category as the positive sample) is an objective but often overlooked challenge due to the query of neighbor samples without human supervision. In this paper, we present a simple Self-supervised learning framework called Mixed Nearest-Neighbors for Self-Supervised Learning (MNN). MNN optimizes the influence of neighbor samples on the semantics of positive samples through an intuitive weighting approach and image mixture operations. The results of our study demonstrate that MNN exhibits exceptional generalization performance and training efficiency on four benchmark datasets.
    摘要 contrastive self-supervised learning中,正样本通常来自同一幅图像,但在不同的扩展视图中,导致正样本的数量相对受限。一种有效的解决方法是利用样本之间的关系,这包括将正样本的top-k最近邻居包含在框架中。然而,false neighbors(即不属于正样本类别的邻居)是一个目标,但通常被忽略的挑战,因为查询邻居样本没有人工监督。在本文中,我们提出了一种简单的自动学习框架,名为混合最近邻居自我超vised学习(MNN)。MNN通过对正样本的语义优化邻居样本的影响,使用直观的权重方法和图像混合操作。我们的研究结果表明,MNN在四个标准 benchmark dataset上展现出了非常出色的泛化性和训练效率。

ProBio: A Protocol-guided Multimodal Dataset for Molecular Biology Lab

  • paper_url: http://arxiv.org/abs/2311.00556
  • repo_url: None
  • paper_authors: Jieming Cui, Ziren Gong, Baoxiong Jia, Siyuan Huang, Zilong Zheng, Jianzhu Ma, Yixin Zhu
  • for: 解决分子生物领域中复制研究结果的困难
  • methods: 使用现代智能系统进行研究,并在 BioLab settings 中进行 activity understanding 研究
  • results: 提供了一个完整的多模态数据集(ProBio)和两个挑战性的标准(透明解决方案跟踪和多模态行为识别),以便研究现代 AI 技术在分子生物领域的应用。
    Abstract The challenge of replicating research results has posed a significant impediment to the field of molecular biology. The advent of modern intelligent systems has led to notable progress in various domains. Consequently, we embarked on an investigation of intelligent monitoring systems as a means of tackling the issue of the reproducibility crisis. Specifically, we first curate a comprehensive multimodal dataset, named ProBio, as an initial step towards this objective. This dataset comprises fine-grained hierarchical annotations intended for the purpose of studying activity understanding in BioLab. Next, we devise two challenging benchmarks, transparent solution tracking and multimodal action recognition, to emphasize the unique characteristics and difficulties associated with activity understanding in BioLab settings. Finally, we provide a thorough experimental evaluation of contemporary video understanding models and highlight their limitations in this specialized domain to identify potential avenues for future research. We hope ProBio with associated benchmarks may garner increased focus on modern AI techniques in the realm of molecular biology.
    摘要 科学研究复现困难问题在分子生物学领域内存着重要的阻碍。现代智能系统的出现在不同领域中带来了显著的进步,因此我们决定通过研究智能监测系统来解决复现危机。我们首先筹集了一个全面的多Modal数据集,名为ProBio,作为这个目标的初步步骤。这个数据集包括细化的层次标注,用于研究 BioLab 中活动理解的目的。然后,我们设计了两个具有挑战性的标准,透明解决方案跟踪和多Modal动作认知,以强调 BioLab 中活动理解的特殊特征和挑战。最后,我们进行了详细的实验评估当今视频理解模型,并指出其在这个特殊领域的限制,以便未来研究的可能性。我们希望 ProBio 和相关的标准可以吸引更多关注现代AI技术在分子生物学领域的应用。

Continual atlas-based segmentation of prostate MRI

  • paper_url: http://arxiv.org/abs/2311.00548
  • repo_url: https://github.com/meclabtuda/atlas-replay
  • paper_authors: Amin Ranem, Camila González, Daniel Pinto dos Santos, Andreas Michael Bucher, Ahmed Ezzat Othman, Anirban Mukhopadhyay
  • for: 这篇论文是为了解决自然图像分类中的持续学习(Continual Learning,CL)方法对医疗图像分类的问题。
  • methods: 这篇论文使用了 atlas-based segmentation 方法,利用对区域 инте点的domain knowledge,实现了semantically coherent的预测。此外,这篇论文还使用了隐私保护的 prototype,以确保模型不会受到训练分布的影响。
  • results: 这篇论文的结果显示,Atlas Replay 方法可以在七个公开的 проstate segmentation 数据集上提供高品质的分类mask,并且能够在不同的训练分布下保持知识。此外,Atlas Replay 方法还能够对 yet-unseen 领域进行Robust 的预测,而不是end-to-end segmentation 方法。
    Abstract Continual learning (CL) methods designed for natural image classification often fail to reach basic quality standards for medical image segmentation. Atlas-based segmentation, a well-established approach in medical imaging, incorporates domain knowledge on the region of interest, leading to semantically coherent predictions. This is especially promising for CL, as it allows us to leverage structural information and strike an optimal balance between model rigidity and plasticity over time. When combined with privacy-preserving prototypes, this process offers the advantages of rehearsal-based CL without compromising patient privacy. We propose Atlas Replay, an atlas-based segmentation approach that uses prototypes to generate high-quality segmentation masks through image registration that maintain consistency even as the training distribution changes. We explore how our proposed method performs compared to state-of-the-art CL methods in terms of knowledge transferability across seven publicly available prostate segmentation datasets. Prostate segmentation plays a vital role in diagnosing prostate cancer, however, it poses challenges due to substantial anatomical variations, benign structural differences in older age groups, and fluctuating acquisition parameters. Our results show that Atlas Replay is both robust and generalizes well to yet-unseen domains while being able to maintain knowledge, unlike end-to-end segmentation methods. Our code base is available under https://github.com/MECLabTUDA/Atlas-Replay.
    摘要

Improving Cardiovascular Disease Prediction Through Comparative Analysis of Machine Learning Models: A Case Study on Myocardial Infarction

  • paper_url: http://arxiv.org/abs/2311.00517
  • repo_url: None
  • paper_authors: Jonayet Miah, Duc M Ca, Md Abu Sayed, Ehsanur Rashid Lipu, Fuad Mahmud, S M Yasir Arafat
    for: 这个研究旨在预测心肺病,即医学研究中的一项挑战。methods: 这个研究使用了六种不同的机器学习模型进行比较分析,包括Logistic Regression、Support Vector Machine、决策树、Bagging、XGBoost和LightGBM。results: 研究结果显示,XGBoost模型表现最佳,准确率达到92.72%。此外,Logistic Regression、Support Vector Machine和LightGBM模型也达到了 relativamente高的准确率。这些结果表明,通过采用高级机器学习技术,可以提高心肺病预测的精度。
    Abstract Cardiovascular disease remains a leading cause of mortality in the contemporary world. Its association with smoking, elevated blood pressure, and cholesterol levels underscores the significance of these risk factors. This study addresses the challenge of predicting myocardial illness, a formidable task in medical research. Accurate predictions are pivotal for refining healthcare strategies. This investigation conducts a comparative analysis of six distinct machine learning models: Logistic Regression, Support Vector Machine, Decision Tree, Bagging, XGBoost, and LightGBM. The attained outcomes exhibit promise, with accuracy rates as follows: Logistic Regression (81.00%), Support Vector Machine (75.01%), XGBoost (92.72%), LightGBM (90.60%), Decision Tree (82.30%), and Bagging (83.01%). Notably, XGBoost emerges as the top-performing model. These findings underscore its potential to enhance predictive precision for coronary infarction. As the prevalence of cardiovascular risk factors persists, incorporating advanced machine learning techniques holds the potential to refine proactive medical interventions.
    摘要 心血管疾病仍然是当今世界上主要的死亡原因之一。它与吸烟、高血压和凝血水平的关系,表明了这些风险因素的重要性。这项研究面临着预测心肌疾病的挑战,这是医学研究中的一项困难任务。准确的预测是医疗战略的重要组成部分。这个研究对六种不同的机器学习模型进行了比较分析:Logistic Regression、Support Vector Machine、决策树、Bagging、XGBoost和LightGBM。所获得的结果展示了批处的批处,准确率如下:Logistic Regression(81.00%)、Support Vector Machine(75.01%)、XGBoost(92.72%)、LightGBM(90.60%)、决策树(82.30%)和Bagging(83.01%)。值得注意的是,XGBoost在这些模型中表现出色,这表明它在预测心肌疾病方面的潜在能力。随着卡路дии血管风险因素的存在,利用高级机器学习技术可能会更加细化的预测和投入措施。

Deep Neural Networks for Automatic Speaker Recognition Do Not Learn Supra-Segmental Temporal Features

  • paper_url: http://arxiv.org/abs/2311.00489
  • repo_url: None
  • paper_authors: Daniel Neururer, Volker Dellwo, Thilo Stadelmann
  • for: 本研究旨在探讨深度神经网络在自动人识唱任务中的成功原因,以及它们是如何模型各种特征的。
  • methods: 本研究使用了一种新的测试方法,用于评估现有的状态对 speaker recognition 的性能是否受到各种特征的影响。同时,研究还提出了一些使得网络更加注重各种特征的方法,并评估了它们的效果。
  • results: 研究发现,现有的 CNN 和 RNN 网络 Architecture 对 speaker recognition 并不充分模型各种特征,即使被迫了。这些结果为未来更好地利用语音信号和深度学习的研究提供了一个高度相关的基础,同时也提供了对这些网络的解释性的深入了解。
    Abstract While deep neural networks have shown impressive results in automatic speaker recognition and related tasks, it is dissatisfactory how little is understood about what exactly is responsible for these results. Part of the success has been attributed in prior work to their capability to model supra-segmental temporal information (SST), i.e., learn rhythmic-prosodic characteristics of speech in addition to spectral features. In this paper, we (i) present and apply a novel test to quantify to what extent the performance of state-of-the-art neural networks for speaker recognition can be explained by modeling SST; and (ii) present several means to force respective nets to focus more on SST and evaluate their merits. We find that a variety of CNN- and RNN-based neural network architectures for speaker recognition do not model SST to any sufficient degree, even when forced. The results provide a highly relevant basis for impactful future research into better exploitation of the full speech signal and give insights into the inner workings of such networks, enhancing explainability of deep learning for speech technologies.
    摘要 深度神经网络在自动识别人谱和相关任务中表现出色,但是currently little is understood about what exactly is responsible for these results. Prior work has attributed some of the success to their ability to model supra-segmental temporal information (SST), i.e., learn the rhythmic and prosodic characteristics of speech in addition to spectral features. In this paper, we (i) present and apply a novel test to quantify the extent to which the performance of state-of-the-art neural networks for speaker recognition can be explained by modeling SST; and (ii) present several means to force these networks to focus more on SST and evaluate their merits. We find that a variety of CNN- and RNN-based neural network architectures for speaker recognition do not model SST to a sufficient degree, even when forced. The results provide a highly relevant basis for impactful future research into better exploitation of the full speech signal and give insights into the inner workings of such networks, enhancing explainability of deep learning for speech technologies.Here's the translation in Traditional Chinese as well:深度神经网络在自动识别人谱和相关任务中表现出色,但目前对其成功的根本原因所知甚少。 Prior work将一部分成功归功于它们能够模型超 Segmental temporal information (SST),即学习语音的几何和声振特征。在这篇论文中,我们(i)提出和应用一个新的测试来评估现代神经网络对人识别的表现是否可以被归因于模型SST; 以及(ii)提出了让这些网络更加专注于SST的多种方法,并评估其效果。我们发现现代CNN和RNN基于神经网络架构的对人识别表现并不充分靠拢SST,甚至在强制性下也不会。结果提供了深刻的基础 для未来对整个语音信号的更好利用,并对神经网络内部的运作给出了更多的解释,对于语音科技的深度学习提供了新的思路。

DEFN: Dual-Encoder Fourier Group Harmonics Network for Three-Dimensional Macular Hole Reconstruction with Stochastic Retinal Defect Augmentation and Dynamic Weight Composition

  • paper_url: http://arxiv.org/abs/2311.00483
  • repo_url: https://github.com/iipl-hangzhoudianziuniversity/defn-pytorch
  • paper_authors: Xingru Huang, Yihao Guo, Jian Huang, Zhi Li, Tianyun Zhang, Kunyan Cai, Gaopeng Huang, Wenhao Chen, Zhaoyang Xu, Liangqiong Qu, Ji Hu, Tinyu Wang, Shaowei Jiang, Chenggang Yan, Yaoqi Sun, Xin Ye, Yaqi Wang
  • for: 这个论文的目的是提供一种基于深度学习的三维重建方法,以帮助诊断和治疗棘狭血管病变。
  • methods: 该论文使用了一种名为DEFN的三维 segmentation网络,该网络包括三个创新模块:Fourier Group Harmonics(FuGH)、Simplified 3D Spatial Attention(S3DSA)和Harmonic Squeeze-and-Excitation Module(HSE)。此外,该论文还提出了一种新的数据增强方法 named Stochastic Retinal Defect Injection(SRDI)和一种网络优化策略 named DynamicWeightCompose(DWC)。
  • results: 相比13个基线方法,DEFN表现最佳,并可以提供高精度的三维retinal重建和量化指标,为ophthalmologists提供革命性的诊断和治疗决策工具,对棘狭血管病变的诊断和治疗具有启示性的影响。
    Abstract The spatial and quantitative parameters of macular holes are vital for diagnosis, surgical choices, and post-op monitoring. Macular hole diagnosis and treatment rely heavily on spatial and quantitative data, yet the scarcity of such data has impeded the progress of deep learning techniques for effective segmentation and real-time 3D reconstruction. To address this challenge, we assembled the world's largest macular hole dataset, Retinal OCTfor Macular Hole Enhancement (ROME-3914), and a Comprehensive Archive for Retinal Segmentation (CARS-30k), both expertly annotated. In addition, we developed an innovative 3D segmentation network, the Dual-Encoder FuGH Network (DEFN), which integrates three innovative modules: Fourier Group Harmonics (FuGH), Simplified 3D Spatial Attention (S3DSA) and Harmonic Squeeze-and-Excitation Module (HSE). These three modules synergistically filter noise, reduce computational complexity, emphasize detailed features, and enhance the network's representation ability. We also proposed a novel data augmentation method, Stochastic Retinal Defect Injection (SRDI), and a network optimization strategy DynamicWeightCompose (DWC), to further improve the performance of DEFN. Compared with 13 baselines, our DEFN shows the best performance. We also offer precise 3D retinal reconstruction and quantitative metrics, bringing revolutionary diagnostic and therapeutic decision-making tools for ophthalmologists, and is expected to completely reshape the diagnosis and treatment patterns of difficult-to-treat macular degeneration. The source code is publicly available at: https://github.com/IIPL-HangzhouDianUniversity/DEFN-Pytorch.
    摘要 “ macular hole 的空间和量化参数非常重要 для诊断、手术选择以及后期监测。然而,这些数据的缺乏使得深度学习技术的应用在macular hole 的准确分割和实时3D重建方面受到了阻碍。为解决这个挑战,我们组建了全球最大的macular hole数据集,Retinal OCT for Macular Hole Enhancement (ROME-3914),以及一个丰富的Retinal Segmentation Archive (CARS-30k),均有专业的注释。此外,我们开发了一种创新的3D分割网络,双核Encoder FuGH网络 (DEFN),该网络包括三个创新模块: fourier group harmonics (FuGH)、简化3D空间注意力 (S3DSA) 和响应式压缩模块 (HSE)。这三个模块紧密协作,减少噪声、降低计算复杂度、强调细节特征、提高网络的表征能力。我们还提出了一种新的数据增强方法,随机retinal defect injection (SRDI),以及一种网络优化策略,动态 веса组合 (DWC),以进一步提高 DEFN 的性能。与13个基线相比,我们的 DEFN 表现最佳。此外,我们还提供了高精度的3D retinal重建和量化指标,为各位眼科医生提供革命性的诊断和治疗决策工具,并预计将完全重塑difficult-to-treat macular degeneration 的诊断和治疗模式。数据集源代码可以在以下链接获取:https://github.com/IIPL-HangzhouDianUniversity/DEFN-Pytorch。”

Group Distributionally Robust Knowledge Distillation

  • paper_url: http://arxiv.org/abs/2311.00476
  • repo_url: None
  • paper_authors: Konstantinos Vilouras, Xiao Liu, Pedro Sanchez, Alison Q. O’Neil, Sotirios A. Tsaftaris
  • for: 这篇论文旨在解决专业医疗影像分析中的sub-population shift问题,即训练模型在不同医院或扫描机上取得的数据不寻常的情况。
  • methods: 本文提出了一种分布式Robust optimization(DRO)技术,即集合权重更新方法,以解决在训练过程中的各组别损失问题。
  • results: 本文透过实验 validate了我们的方法,GroupDistil,在两个 benchmark 数据集(自然图像和心脏MRI)上,以提高worst-group accuracy。
    Abstract Knowledge distillation enables fast and effective transfer of features learned from a bigger model to a smaller one. However, distillation objectives are susceptible to sub-population shifts, a common scenario in medical imaging analysis which refers to groups/domains of data that are underrepresented in the training set. For instance, training models on health data acquired from multiple scanners or hospitals can yield subpar performance for minority groups. In this paper, inspired by distributionally robust optimization (DRO) techniques, we address this shortcoming by proposing a group-aware distillation loss. During optimization, a set of weights is updated based on the per-group losses at a given iteration. This way, our method can dynamically focus on groups that have low performance during training. We empirically validate our method, GroupDistil on two benchmark datasets (natural images and cardiac MRIs) and show consistent improvement in terms of worst-group accuracy.
    摘要 知识填充可以快速和有效地将大型模型中学习的特征传递到小型模型中。然而,液态目标函数容易受到次群体变化的影响,这是医学图像分析中常见的问题,即训练数据中具有少量表达的组/领域。例如,通过多个扫描仪或医院获得的健康数据训练模型可能会导致少数群体的性能下降。在这篇论文中,我们 inspirited by distributionally robust optimization(DRO)技术,提出了一种群体意识的填充损失函数。在优化过程中,一组参数会根据每个组的损失值进行更新。这样,我们的方法可以在训练过程中动态地关注表现不佳的组。我们在两个标准 benchmark 数据集(自然图像和心脏 MR)上进行了实验,并显示了适用于最差组的精度。

PET Tracer Conversion among Brain PET via Variable Augmented Invertible Network

  • paper_url: http://arxiv.org/abs/2311.00735
  • repo_url: None
  • paper_authors: Bohui Shen, Wei Zhang, Xubiao Liu, Pengfei Yu, Shirui Jiang, Xinchong Shi, Xiangsong Zhang, Xiaoyu Zhou, Weirui Zhang, Bingxuan Li, Qiegen Liu
  • for: 用于诊断脑病和脑科研究
  • methods: 使用深度学习的追踪转换神经网络(TC-INN)将FDG图像映射到DOPA图像上
  • results: 实现了FDG图像与DOPA图像之间的图像映射,获得了更多的诊断信息
    Abstract Positron emission tomography (PET), as an imaging technique with high biochemical sensitivity, has been widely used in diagnosis of encephalopathy and brain science research used in brain disease diagnosis and brain science research. Since different tracers present different effects on the same focal area, the choice of tracers is getting more significant for PET imaging. Nowadays, with the wide application of PET imaging in neuropsychiatric treatment, 6-18F-fluoro-3, 4-dihydroxy-L-phenylalanine (DOPA) has been found to be more effective than 18F-labeled fluorine-2-deoxyglucose (FDG) in this field. However, due to the complexity of its preparation and other limitations, DOPA is far less widely used than FDG. To address this issue, a tracer conversion invertible neural network (TC-INN) for image projection is developed to map FDG images to DOPA images through deep learning. More diagnostic information is obtained by generating PET images from FDG to DOPA. Specifically, the proposed TC-INN consists of two separate phases, one for training the traceable data, the other for re-building the new data. The reference DOPA PET image is used as the learning target for the corresponding network during the training process of tracer conversion. Mean-while, the invertible network iteratively estimates the resultant DOPA PET data and compares it to the reference DOPA PET data. Notably, the reversible model employed variable enhancement techniques to achieve better power generation. Moreover, image registration needs to be performed before training due to the angular deviation of the acquired FDG and DOPA data information. Experimental results show generative ability in mapping be-tween FDG images and DOPA images. It demonstrates great potential for PET image conversion in the case of limited tracer applications.
    摘要 Positron emission tomography (PET) 技术,因其高度的生物化敏感度,在脑病诊断和脑科研究中广泛应用。由于不同的追踪物在同一个点上有不同的效果,因此选择追踪物的选择变得更加重要。目前,随着PET成像在神经精神疾病治疗中的广泛应用,6-18F-fluoro-3,4-二氢苯乙酸(DOPA)在这个领域被发现比18F-标记的氟德氧糖(FDG)更有效。然而,由于DOPA的制备复杂和其他限制,它在实际应用中远远不如FDG广泛应用。为解决这个问题,我们开发了一种tc-inn(追踪转换深度学习网络),用于将FDG成像映射到DOPA成像上。通过深度学习,我们可以从FDG成像中获得更多的诊断信息。特别是,tc-inn包括两个不同阶段:一个用于训练追踪数据,另一个用于重新构建新数据。参考DOPA PET成像作为学习目标,tc-inn在训练过程中使用深度学习来转化FDG成像为DOPA成像。同时,tc-inn还使用可变增强技术来提高能量生成。此外,由于FDG和DOPA数据信息的angular偏移,需要在训练前进行图像 региSTR,以保证模型的准确性。实验结果表明,tc-inn可以有效地将FDG成像映射到DOPA成像上。这表明tc-inn在追踪物应用有限时具有潜在的潜力。

Single-view 3D Scene Reconstruction with High-fidelity Shape and Texture

  • paper_url: http://arxiv.org/abs/2311.00457
  • repo_url: https://github.com/DaLi-Jack/SSR-code
  • paper_authors: Yixin Chen, Junfeng Ni, Nan Jiang, Yaowei Zhang, Yixin Zhu, Siyuan Huang
  • for: 本研究旨在提高单视图像中的场景重建细节,以提高场景理解和3D场景编辑等应用。
  • methods: 该方法基于单视神经隐式形状和颜色场景场景(SSR)表示,利用了explicit 3D形状超级视图和volume rendering来恢复高精度的 объек形和表面质感。
  • results: 该方法可以提高对象的细节重建率,并且可以在不同的视图角度下渲染图像。此外,该方法还可以组合对象水平的表示,以实现场景的整体理解和3D场景编辑等应用。
    Abstract Reconstructing detailed 3D scenes from single-view images remains a challenging task due to limitations in existing approaches, which primarily focus on geometric shape recovery, overlooking object appearances and fine shape details. To address these challenges, we propose a novel framework for simultaneous high-fidelity recovery of object shapes and textures from single-view images. Our approach utilizes the proposed Single-view neural implicit Shape and Radiance field (SSR) representations to leverage both explicit 3D shape supervision and volume rendering of color, depth, and surface normal images. To overcome shape-appearance ambiguity under partial observations, we introduce a two-stage learning curriculum incorporating both 3D and 2D supervisions. A distinctive feature of our framework is its ability to generate fine-grained textured meshes while seamlessly integrating rendering capabilities into the single-view 3D reconstruction model. This integration enables not only improved textured 3D object reconstruction by 27.7% and 11.6% on the 3D-FRONT and Pix3D datasets, respectively, but also supports the rendering of images from novel viewpoints. Beyond individual objects, our approach facilitates composing object-level representations into flexible scene representations, thereby enabling applications such as holistic scene understanding and 3D scene editing. We conduct extensive experiments to demonstrate the effectiveness of our method.
    摘要 <>传统方法有限,单视图图像 reconstruction 仍然是一个挑战,主要集中于几何形状回归,忽略物体外观和细节形状。为解决这些挑战,我们提出了一种新的框架,可同时高精度地恢复物体形状和текстура从单视图图像。我们的方法利用我们提出的单视图神经隐式形状场和颜色场(SSR)来利用Explicit 3D形状超级视图和Volume Rendering技术。为了解决形状外观不确定性,我们提出了一个两阶段学习课程,包括3D和2D超级视图。我们的框架可以生成细节rich的纹理化链 mesh,并同时将渲染功能集成到单视图3D重建模型中。这种集成不仅提高了纹理3D物体重建的精度,还支持从新视角渲染图像。我们的方法不仅可以应用于个体物体,还可以组合物体水平的表示,以实现全景场理解和3D场景编辑。我们进行了广泛的实验,以证明我们的方法的有效性。Note: The translation is in Simplified Chinese, which is the standard written form of Chinese used in mainland China. If you prefer Traditional Chinese, please let me know and I can provide the translation in that form instead.

Progressive Recurrent Network for Shadow Removal

  • paper_url: http://arxiv.org/abs/2311.00455
  • repo_url: None
  • paper_authors: Yonghui Wang, Wengang Zhou, Hao Feng, Li Li, Houqiang Li
  • for: removes shadows from images in a coarse-to-fine fashion
  • methods: Progressive Recurrent Network (PRNet) with shadow feature extraction and progressive shadow removal
  • results: superior performance in removing shadows compared to existing deep learning-based approaches, with 29% fewer network parameters.
    Abstract Single-image shadow removal is a significant task that is still unresolved. Most existing deep learning-based approaches attempt to remove the shadow directly, which can not deal with the shadow well. To handle this issue, we consider removing the shadow in a coarse-to-fine fashion and propose a simple but effective Progressive Recurrent Network (PRNet). The network aims to remove the shadow progressively, enabing us to flexibly adjust the number of iterations to strike a balance between performance and time. Our network comprises two parts: shadow feature extraction and progressive shadow removal. Specifically, the first part is a shallow ResNet which constructs the representations of the input shadow image on its original size, preventing the loss of high-frequency details caused by the downsampling operation. The second part has two critical components: the re-integration module and the update module. The proposed re-integration module can fully use the outputs of the previous iteration, providing input for the update module for further shadow removal. In this way, the proposed PRNet makes the whole process more concise and only uses 29% network parameters than the best published method. Extensive experiments on the three benchmarks, ISTD, ISTD+, and SRD, demonstrate that our method can effectively remove shadows and achieve superior performance.
    摘要 单图阴影除除是一项仍未解决的重要任务。现有的深度学习基本方法都是直接除阴影,这会导致阴影处理不够好。为了解决这个问题,我们提出了一种分解阴影的方法,并提出了一种简单 yet effective的进程回归网络(PRNet)。该网络的目标是逐步除阴影,以便根据需要调整迭代次数,以达到性能和时间之间的平衡。我们的网络包括两部分:阴影特征提取和进程阴影除法。特别是,第一部分是一个浅层ResNet,可以在输入阴影图像的原始大小上构建阴影图像的表示,避免由下采样操作导致的高频率细节丢失。第二部分包括两个关键组件:重新集成模块和更新模块。我们提出的重新集成模块可以全面使用上一轮的输出,提供更新模块的输入,从而实现更好的阴影除法。这样,我们的PRNet可以让整个过程更简洁,仅使用29%的网络参数,比最佳发布方法少。广泛的实验表明,我们的方法可以有效地除阴影,并达到高性能。

CLIP-AD: A Language-Guided Staged Dual-Path Model for Zero-shot Anomaly Detection

  • paper_url: http://arxiv.org/abs/2311.00453
  • repo_url: None
  • paper_authors: Xuhai Chen, Jiangning Zhang, Guanzhong Tian, Haoyang He, Wuhao Zhang, Yabiao Wang, Chengjie Wang, Yunsheng Wu, Yong Liu
  • for: 这篇论文针对零例异常检测(AD)进行研究,AD 是一个有价值但未受到充分研究的任务,它可以在测试物件没有任何参考图像时进行检测。
  • methods: 本文使用语言导向策略,提出了简单又有效的架构 CLIP-AD,利用大型感知语言模型 CLIP 的zero-shot分类能力。我们直接计算文本/图像特征之间的相似性,但我们发现这会导致错误的预测和无关的高亮。因此,我们引入了一个阶段双轻量级模型(SDP),它可以有效地使用不同层次的特征,并通过架构和特征切除来解决这些问题。
  • results: 实验结果显示,SDP 可以在 VisA 上优于 SOTA 的表现,例如在分类/分 segmentation F1 分数上提高 +1.0/+1.2 分,而 SDP+ 则可以在 VisA 上提高 +1.9/+11.7 分。
    Abstract This paper considers zero-shot Anomaly Detection (AD), a valuable yet under-studied task, which performs AD without any reference images of the test objects. Specifically, we employ a language-guided strategy and propose a simple-yet-effective architecture CLIP-AD, leveraging the superior zero-shot classification capabilities of the large vision-language model CLIP. A natural idea for anomaly segmentation is to directly calculate the similarity between text/image features, but we observe opposite predictions and irrelevant highlights in the results. Inspired by the phenomena, we introduce a Staged Dual-Path model (SDP) that effectively uses features from various levels and applies architecture and feature surgery to address these issues. Furthermore, delving beyond surface phenomena, we identify the problem arising from misalignment of text/image features in the joint embedding space. Thus, we introduce a fine-tuning strategy by adding linear layers and construct an extended model SDP+, further enhancing the performance. Abundant experiments demonstrate the effectiveness of our approach, e.g., on VisA, SDP outperforms SOTA by +1.0/+1.2 in classification/segmentation F1 scores, while SDP+ achieves +1.9/+11.7 improvements.
    摘要 However, we observe that directly calculating the similarity between text/image features can lead to opposite predictions and irrelevant highlights in the results. To address this issue, we introduce a Staged Dual-Path (SDP) model that effectively uses features from various levels and applies architecture and feature surgery.Furthermore, we identify the problem of misaligned text/image features in the joint embedding space as the root cause of the issues. To address this, we propose a fine-tuning strategy that involves adding linear layers and constructing an extended model called SDP+. This approach further enhances the performance of our method.The effectiveness of our approach is demonstrated through abundant experiments, where SDP outperforms state-of-the-art (SOTA) methods by +1.0/+1.2 in classification/segmentation F1 scores, while SDP+ achieves +1.9/+11.7 improvements.

On Manipulating Scene Text in the Wild with Diffusion Models

  • paper_url: http://arxiv.org/abs/2311.00734
  • repo_url: None
  • paper_authors: Joshua Santoso, Christian Simon, Williem Pao
  • for: 这篇论文是为了提出一种基于扩散模型的场景文本修改方法,以提高图像修改的精度和稳定性。
  • methods: 该方法使用了两种适应策略, namely one-shot style adaptation和文本识别导航,以使用扩散模型来替换图像中的文本。
  • results: 在多个场景文本 dataset上进行了广泛的比较和ablation study,并证明了该方法的高效性和稳定性。 更进一步,该方法可以在 Synthesize scene text tasks 中实现高度的 Optical Character Recognition (OCR) 精度。
    Abstract Diffusion models have gained attention for image editing yielding impressive results in text-to-image tasks. On the downside, one might notice that generated images of stable diffusion models suffer from deteriorated details. This pitfall impacts image editing tasks that require information preservation e.g., scene text editing. As a desired result, the model must show the capability to replace the text on the source image to the target text while preserving the details e.g., color, font size, and background. To leverage the potential of diffusion models, in this work, we introduce Diffusion-BasEd Scene Text manipulation Network so-called DBEST. Specifically, we design two adaptation strategies, namely one-shot style adaptation and text-recognition guidance. In experiments, we thoroughly assess and compare our proposed method against state-of-the-arts on various scene text datasets, then provide extensive ablation studies for each granularity to analyze our performance gain. Also, we demonstrate the effectiveness of our proposed method to synthesize scene text indicated by competitive Optical Character Recognition (OCR) accuracy. Our method achieves 94.15% and 98.12% on COCO-text and ICDAR2013 datasets for character-level evaluation.
    摘要 Diffusion models 已经吸引了关注,用于图像编辑,并且在文本至图像任务中取得了出众的结果。然而,可能会注意到,Diffusion models 生成的图像中的细节会受到影响,导致图像编辑任务中的信息损失。这个问题特别影响了场景文本编辑任务,例如修改场景中的文本。为了解决这个问题,我们在这里提出了Diffusion-BasEd Scene Text manipulation Network,简称DBEST。我们设计了两种适应策略,即一次式风格适应和文本识别引导。在实验中,我们详细评估和比较我们的提议方法与当前最佳方法在不同的场景文本集合上。此外,我们还进行了每个细节水平的拓展研究,以分析我们的性能提升。此外,我们还证明了我们的提议方法可以生成高质量的场景文本,并且与高精度Optical Character Recognition (OCR)相匹配。我们的方法在 COCO-text 和 ICDAR2013 数据集上取得了94.15%和98.12%的字符级评估结果。

Enhancing Traffic Object Detection in Variable Illumination with RGB-Event Fusion

  • paper_url: http://arxiv.org/abs/2311.00436
  • repo_url: None
  • paper_authors: Zhanwen Liu, Nan Yang, Yang Wang, Yuke Li, Xiangmo Zhao, Fei-Yue Wang
  • for: 本研究旨在提高遥感摄像头中的交通物体检测精度,并且可以在不同的照明条件下提供高效的检测结果。
  • methods: 本研究使用生物体ahnspired事件摄像头和Structure-aware Fusion Network (SFNet),通过跨Modalities的融合来补做了图像中的信息损失,从而获得适用于不同照明条件的交通物体检测表现。
  • results: 实验结果表明,SFNet可以超越传统摄像头的视觉界限,并且在比较照明条件下比 Frame-based方法提高了8.0%的mAP50和5.9%的mAP50:95。
    Abstract Traffic object detection under variable illumination is challenging due to the information loss caused by the limited dynamic range of conventional frame-based cameras. To address this issue, we introduce bio-inspired event cameras and propose a novel Structure-aware Fusion Network (SFNet) that extracts sharp and complete object structures from the event stream to compensate for the lost information in images through cross-modality fusion, enabling the network to obtain illumination-robust representations for traffic object detection. Specifically, to mitigate the sparsity or blurriness issues arising from diverse motion states of traffic objects in fixed-interval event sampling methods, we propose the Reliable Structure Generation Network (RSGNet) to generate Speed Invariant Frames (SIF), ensuring the integrity and sharpness of object structures. Next, we design a novel Adaptive Feature Complement Module (AFCM) which guides the adaptive fusion of two modality features to compensate for the information loss in the images by perceiving the global lightness distribution of the images, thereby generating illumination-robust representations. Finally, considering the lack of large-scale and high-quality annotations in the existing event-based object detection datasets, we build a DSEC-Det dataset, which consists of 53 sequences with 63,931 images and more than 208,000 labels for 8 classes. Extensive experimental results demonstrate that our proposed SFNet can overcome the perceptual boundaries of conventional cameras and outperform the frame-based method by 8.0% in mAP50 and 5.9% in mAP50:95. Our code and dataset will be available at https://github.com/YN-Yang/SFNet.
    摘要 天然语言探测交通对象在不同照明条件下是挑战,因为传统的帧基本摄像机的动态范围有限,导致信息损失。为解决这问题,我们引入生物体鼓励的事件摄像机和一种新的结构意识网络(SFNet),以提取事件流中的锐利和完整的对象结构,并通过交叉模式融合,赋予网络获取不同照明条件下的对象检测表现。Specifically,我们提出了可靠结构生成网络(RSGNet),以生成速度不变的帧(SIF),以避免由交通对象不同运动状态所导致的缺失或模糊问题。然后,我们设计了一种适应性特征补充模块(AFCM),以便适应性融合两种模式特征,以补偿图像中的信息损失。最后,由于现有的事件基本对象检测数据集缺乏大规模和高质量的标注,我们建立了DSEC-Det数据集,该数据集包括53个序列,63931张图像和208178个标注,对8个类进行标注。我们的实验结果表明,我们的提出的SFNet可以超越传统摄像机的感知边界,并在MAP50和MAP50:95上比Frame-based方法高8.0%和5.9%。我们的代码和数据集将在https://github.com/YN-Yang/SFNet上提供。

Event-based Background-Oriented Schlieren

  • paper_url: http://arxiv.org/abs/2311.00434
  • repo_url: https://github.com/tub-rip/event_based_bos
  • paper_authors: Shintaro Shiba, Friedhelm Hamann, Yoshimitsu Aoki, Guillermo Gallego
  • For: This paper is written to explore the use of event cameras for schlieren imaging, a technique used to observe the flow of transparent media such as air or water.* Methods: The paper uses a novel technique that combines event data and frames to perceive air convection, and formulates the problem as a variational optimization problem.* Results: The proposed method is shown to obtain on par results with existing frame-based optical flow techniques, and works under dark conditions where frame-based schlieren fails. Additionally, the method enables slow-motion analysis.Here is the information in Simplified Chinese text:* For: 这篇论文是用来探索使用事件摄像机进行斜扫图像技术,用于观察透明媒体如空气或水的流动。* Methods: 这篇论文使用了一种新的方法,将事件数据和帧数据结合起来感知空气循环,并将问题转化为一个可变优化问题。* Results: 提议的方法能够与现有的帧基于的光流计算机相比,并在黑暗条件下工作,而 frame-based 斜扫技术失败。此外,方法还可以进行慢动作分析。
    Abstract Schlieren imaging is an optical technique to observe the flow of transparent media, such as air or water, without any particle seeding. However, conventional frame-based techniques require both high spatial and temporal resolution cameras, which impose bright illumination and expensive computation limitations. Event cameras offer potential advantages (high dynamic range, high temporal resolution, and data efficiency) to overcome such limitations due to their bio-inspired sensing principle. This paper presents a novel technique for perceiving air convection using events and frames by providing the first theoretical analysis that connects event data and schlieren. We formulate the problem as a variational optimization one combining the linearized event generation model with a physically-motivated parameterization that estimates the temporal derivative of the air density. The experiments with accurately aligned frame- and event camera data reveal that the proposed method enables event cameras to obtain on par results with existing frame-based optical flow techniques. Moreover, the proposed method works under dark conditions where frame-based schlieren fails, and also enables slow-motion analysis by leveraging the event camera's advantages. Our work pioneers and opens a new stack of event camera applications, as we publish the source code as well as the first schlieren dataset with high-quality frame and event data. https://github.com/tub-rip/event_based_bos
    摘要 《Schlieren成像技术是一种用于观察透明媒体流动的光学技术,无需添加任何粒子杂料。然而,传统的帧基技术需要高分辨率和高时间分辨率的摄像机,这会带来严重的照明和计算限制。事件摄像机具有优势(高动态范围、高时间分辨率和数据效率),这些优势可以让我们超越传统的约束。本文提出了一种使用事件和帧来实现空气擦拂的观察方法,并提供了首次对事件数据和约束之间的连接的理论分析。我们将问题定义为一种变分优化问题,将线性化事件生成模型与物理上有理的参数化联合起来,以估计空气密度的时间导数。实验结果表明,我们提出的方法可以使事件摄像机与现有的帧基光学流动技术相当。此外,我们的方法在黑暗条件下也可以工作,并且可以利用事件摄像机的优势进行慢动作分析。我们的工作开启了新的事件摄像机应用领域,同时我们也发布了高质量帧和事件数据的首个Schlieren数据集。请参考我们的GitHub地址:https://github.com/tub-rip/event_based_bos。

Feature-oriented Deep Learning Framework for Pulmonary Cone-beam CT (CBCT) Enhancement with Multi-task Customized Perceptual Loss

  • paper_url: http://arxiv.org/abs/2311.00412
  • repo_url: https://github.com/zhujiarui42/cfp-loss
  • paper_authors: Jiarui Zhu, Werxing Chen, Hongfei Sun, Shaohua Zhi, Jing Qin, Jing Cai, Ge Ren
    for: This paper aims to enhance the quality of cone-beam computed tomography (CBCT) images for cancer treatment planning by using a deep learning-based feature-oriented framework.methods: The proposed framework consists of two main components: a multi-task learning feature-selection network (MTFS-Net) and a CBCT-to-CT translation network guided by feature-to-feature perceptual loss. The MTFS-Net customizes a perceptual loss function, while the CBCT-to-CT translation network uses advanced generative models such as U-Net, GAN, and CycleGAN.results: The proposed framework can generate synthesized CT (sCT) images for the lung that have a high similarity to CT images, with an average SSIM index of 0.9869 and an average PSNR index of 39.9621. The sCT images also exhibit visually pleasing performance with effective artifacts suppression, noise reduction, and distinctive anatomical details preservation. The proposed framework outperforms state-of-the-art models for pulmonary CBCT enhancement.
    Abstract Cone-beam computed tomography (CBCT) is routinely collected during image-guided radiation therapy (IGRT) to provide updated patient anatomy information for cancer treatments. However, CBCT images often suffer from streaking artifacts and noise caused by under-rate sampling projections and low-dose exposure, resulting in low clarity and information loss. While recent deep learning-based CBCT enhancement methods have shown promising results in suppressing artifacts, they have limited performance on preserving anatomical details since conventional pixel-to-pixel loss functions are incapable of describing detailed anatomy. To address this issue, we propose a novel feature-oriented deep learning framework that translates low-quality CBCT images into high-quality CT-like imaging via a multi-task customized feature-to-feature perceptual loss function. The framework comprises two main components: a multi-task learning feature-selection network(MTFS-Net) for customizing the perceptual loss function; and a CBCT-to-CT translation network guided by feature-to-feature perceptual loss, which uses advanced generative models such as U-Net, GAN and CycleGAN. Our experiments showed that the proposed framework can generate synthesized CT (sCT) images for the lung that achieved a high similarity to CT images, with an average SSIM index of 0.9869 and an average PSNR index of 39.9621. The sCT images also achieved visually pleasing performance with effective artifacts suppression, noise reduction, and distinctive anatomical details preservation. Our experiment results indicate that the proposed framework outperforms the state-of-the-art models for pulmonary CBCT enhancement. This framework holds great promise for generating high-quality anatomical imaging from CBCT that is suitable for various clinical applications.
    摘要 通常情况下, cone-beam computed tomography(CBCT)在image-guided radiation therapy(IGRT)中被 Routinely collected to provide updated patient anatomy information for cancer treatments。然而, CBCT 图像经常受到弧形artefacts和噪声的影响,这些噪声和artefacts是由于低 sampling rate和低剂量暴露而导致的。虽然最近的深度学习基于 CBCT 改进方法已经显示出了良好的表现,但它们在保留 анатомиче细节方面有限的表现,因为传统的像素到像素损失函数无法描述细节的anaatomy。为解决这个问题,我们提出了一种新的 feature-oriented 深度学习框架。该框架包括两个主要组成部分:一个多任务特征选择网络(MTFS-Net),用于定制特征损失函数;以及一个CBCT 到 CT 翻译网络,通过特征与特征的损失函数来 guid。该网络使用了先进的生成模型,如 U-Net、GAN 和 CycleGAN。我们的实验结果表明,提议的框架可以生成高质量的 CT 图像,与 CT 图像的相似性平均值为 0.9869,PSNR 值平均值为 39.9621。这些synthesized CT(sCT)图像也达到了较好的视觉表现,有效地减少了噪声和artefacts,同时保留了细节的anaatomy。我们的实验结果表明,提议的框架超过了现状最佳模型,用于肺部 CBCT 改进。这种框架具有大量应用前景,可以生成高质量的 анатомиче imaging,适用于各种临床应用。

Open-Set Face Recognition with Maximal Entropy and Objectosphere Loss

  • paper_url: http://arxiv.org/abs/2311.00400
  • repo_url: None
  • paper_authors: Rafael Henrique Vareto, Yu Linghu, Terrance E. Boult, William Robson Schwartz, Manuel Günther
  • for: 本研究针对开放集成识别问题,即在训练和投入阶段未见过的未知个体出现在运行阶段。
  • methods: 本文提出了一种嵌入式网络,该网络可以通过额外的负面图像和特定的成本函数(如物体镜像损失和提议的最大熵损失)得到改进。
  • results: 研究人员通过使用预训练的深度神经网络(DNN)作为特征提取器,然后使用嵌入式网络来替换预训练DNN的输出层,实现了在开放集成卷积上的出色表现。
    Abstract Open-set face recognition characterizes a scenario where unknown individuals, unseen during the training and enrollment stages, appear on operation time. This work concentrates on watchlists, an open-set task that is expected to operate at a low False Positive Identification Rate and generally includes only a few enrollment samples per identity. We introduce a compact adapter network that benefits from additional negative face images when combined with distinct cost functions, such as Objectosphere Loss (OS) and the proposed Maximal Entropy Loss (MEL). MEL modifies the traditional Cross-Entropy loss in favor of increasing the entropy for negative samples and attaches a penalty to known target classes in pursuance of gallery specialization. The proposed approach adopts pre-trained deep neural networks (DNNs) for face recognition as feature extractors. Then, the adapter network takes deep feature representations and acts as a substitute for the output layer of the pre-trained DNN in exchange for an agile domain adaptation. Promising results have been achieved following open-set protocols for three different datasets: LFW, IJB-C, and UCCS as well as state-of-the-art performance when supplementary negative data is properly selected to fine-tune the adapter network.
    摘要

Towards Omni-supervised Referring Expression Segmentation

  • paper_url: http://arxiv.org/abs/2311.00397
  • repo_url: https://github.com/nineblu/omni-res
  • paper_authors: Minglang Huang, Yiyi Zhou, Gen Luo, Guannan Jiang, Weilin Zhuang, Xiaoshuai Sun
  • for: 提高 Referring Expression Segmentation (RES) 训练效率,使用不同类型的数据,如无标注数据、部分标注数据和弱标注数据,进行efficient RES 训练。
  • methods: 提出 Omni-supervised Referring Expression Segmentation (Omni-RES) 任务,使用教师学生学习方法,选择和改进高质量 Pseudo-masks,以提高 RES 性能。
  • results: 对一些 state-of-the-art RES 模型进行了广泛的实验,并证明了 Omni-RES 方法的效iveness,比如使用Only 10% 全标注数据,Omni-RES 可以帮助基本模型达到完全标注数据的性能水平,并且在半标注数据上超过半标注数据上超过 semi-supervised 方法,提高 RefCOCO 和 RefCOCO+ 的性能。
    Abstract Referring Expression Segmentation (RES) is an emerging task in computer vision, which segments the target instances in images based on text descriptions. However, its development is plagued by the expensive segmentation labels. To address this issue, we propose a new learning task for RES called Omni-supervised Referring Expression Segmentation (Omni-RES), which aims to make full use of unlabeled, fully labeled and weakly labeled data, e.g., referring points or grounding boxes, for efficient RES training. To accomplish this task, we also propose a novel yet strong baseline method for Omni-RES based on the recently popular teacher-student learning, where where the weak labels are not directly transformed into supervision signals but used as a yardstick to select and refine high-quality pseudo-masks for teacher-student learning. To validate the proposed Omni-RES method, we apply it to a set of state-of-the-art RES models and conduct extensive experiments on a bunch of RES datasets. The experimental results yield the obvious merits of Omni-RES than the fully-supervised and semi-supervised training schemes. For instance, with only 10% fully labeled data, Omni-RES can help the base model achieve 100% fully supervised performance, and it also outperform the semi-supervised alternative by a large margin, e.g., +14.93% on RefCOCO and +14.95% on RefCOCO+, respectively. More importantly, Omni-RES also enable the use of large-scale vision-langauges like Visual Genome to facilitate low-cost RES training, and achieve new SOTA performance of RES, e.g., 80.66 on RefCOCO.
    摘要 “参考表达分割(RES)是计算机视觉领域的一种新趋势,它基于图像中的文本描述来分割目标实例。然而,其发展受到严重的分割标签成本的限制。为解决这个问题,我们提出了一种新的学习任务,即多种指导 Referring Expression Segmentation(Omni-RES),它通过利用无标签数据、全标签数据和弱标签数据,例如引用点或权重标签,进行高效的RES训练。为实现这个任务,我们还提出了一种新的基线方法,基于最近受欢迎的教师学生学习,其中弱标签不直接转化为监督信号,而是用于选择和改进高质量的 pseudo-mask。为验证我们的 Omni-RES 方法,我们对一些状态前的 RES 模型进行了广泛的实验,并在一些 RES 数据集上进行了测试。实验结果表明,Omni-RES 方法在比 Fully-supervised 和 Semi-supervised 训练方案更有优势。例如,只有 10% 全标签数据,Omni-RES 可以帮助基础模型达到全标签性能,并且也在 Semi-supervised 方案中高于差分较大,例如 RefCOCO 和 RefCOCO+ 上的 +14.93% 和 +14.95%,分别。此外,Omni-RES 还可以使用大规模的视觉语言,如 Visual Genome,来实现低成本的 RES 训练,并实现新的 SOTA 性能,例如 RefCOCO 上的 80.66%。”

Fixation-based Self-calibration for Eye Tracking in VR Headsets

  • paper_url: http://arxiv.org/abs/2311.00391
  • repo_url: None
  • paper_authors: Ryusei Uramune, Sei Ikeda, Hiroki Ishizuka, Osamu Oshiro
    for:这种研究旨在提出一种基于自由视点和对象表面上的点 fixes 的自适应均衡方法,以便在虚拟现实(VR)头戴式设备中进行眼动跟踪。methods:该方法基于视点可以自由移动以及在不同视点下 fixes 的点的分布在对象表面上的假设。首先,通过对三维场景数据进行扩展的 I-VDT 算法,检测出 fixations。然后,通过最小化 PoRs 的散度指标来优化均衡参数。results:这种方法可能可以无需显式用户均衡、图像处理或标记代理物来确定用户依赖的偏移量从视场轴到视场轴,并且在 18 名参与者在两个 VR 环境中步行时,该方法的精度为 2.1$^\circ$,与平均偏移相比明显低。这是第一种在三维环境中的自适应均衡方法,其精度低于 3$^\circ$。此外,通过修改检测 fixations 或优化优化算法可以提高方法的精度。
    Abstract This study proposes a novel self-calibration method for eye tracking in a virtual reality (VR) headset. The proposed method is based on the assumptions that the user's viewpoint can freely move and that the points of regard (PoRs) from different viewpoints are distributed within a small area on an object surface during visual fixation. In the method, fixations are first detected from the time-series data of uncalibrated gaze directions using an extension of the I-VDT (velocity and dispersion threshold identification) algorithm to a three-dimensional (3D) scene. Then, the calibration parameters are optimized by minimizing the sum of a dispersion metrics of the PoRs. The proposed method can potentially identify the optimal calibration parameters representing the user-dependent offset from the optical axis to the visual axis without explicit user calibration, image processing, or marker-substitute objects. For the gaze data of 18 participants walking in two VR environments with many occlusions, the proposed method achieved an accuracy of 2.1$^\circ$, which was significantly lower than the average offset. Our method is the first self-calibration method with an average error lower than 3$^\circ$ in 3D environments. Further, the accuracy of the proposed method can be improved by up to 1.2$^\circ$ by refining the fixation detection or optimization algorithm.
    摘要 Translated into Simplified Chinese:这个研究提出了一种新的自适应准备方法 для视觉跟踪器在虚拟现实(VR)头戴式设备中。该方法基于用户视点可以自由移动以及不同视点下的点关注(PoR)在物体表面上分布在小区域内的假设。在该方法中,首先从未加工的时间序列数据中检测了不加工视线方向的fixation,使用了三维场景中的I-VDT(速度和杂散阈值识别)算法的扩展。然后,通过最小化点关注的杂散度矩阵来优化准备参数。该方法可能可以无需显式用户准备、图像处理或marker substitute对象来确定用户依赖的偏移量从光学轴到视觉轴。对18名参与者在两个VR环境中走动的视觉数据进行了分析,该方法的精度为2.1°,与平均偏移值有 statistically significant difference。我们的方法是3D环境中第一个自适应准备方法,其平均错误低于3°。此外,通过改进fixation检测或优化算法,可以提高该方法的精度。

NeuralGF: Unsupervised Point Normal Estimation by Learning Neural Gradient Function

  • paper_url: http://arxiv.org/abs/2311.00389
  • repo_url: https://github.com/leoqli/neuralgf
  • paper_authors: Qing Li, Huifang Feng, Kanle Shi, Yue Gao, Yi Fang, Yu-Shen Liu, Zhizhong Han
  • for: 本研究旨在提出一种能够直接从点云数据中提取方向法的深度学习方法,而无需使用地面真实方向的监督。
  • methods: 我们引入了一种新的神经网络学习方法,该方法鼓励神经网络 fits 输入点云数据,并且在点上产生单位方向的梯度。我们还引入了损失函数,使得查询点能够逐渐到达移动目标点,并且在approximated surface上归一化。
  • results: 我们的方法可以更加准确地估计方向,并且能够抗抗噪、缺失和density变化。我们的result在广泛使用的benchmark上达到了最佳性能,超过了latest方法。
    Abstract Normal estimation for 3D point clouds is a fundamental task in 3D geometry processing. The state-of-the-art methods rely on priors of fitting local surfaces learned from normal supervision. However, normal supervision in benchmarks comes from synthetic shapes and is usually not available from real scans, thereby limiting the learned priors of these methods. In addition, normal orientation consistency across shapes remains difficult to achieve without a separate post-processing procedure. To resolve these issues, we propose a novel method for estimating oriented normals directly from point clouds without using ground truth normals as supervision. We achieve this by introducing a new paradigm for learning neural gradient functions, which encourages the neural network to fit the input point clouds and yield unit-norm gradients at the points. Specifically, we introduce loss functions to facilitate query points to iteratively reach the moving targets and aggregate onto the approximated surface, thereby learning a global surface representation of the data. Meanwhile, we incorporate gradients into the surface approximation to measure the minimum signed deviation of queries, resulting in a consistent gradient field associated with the surface. These techniques lead to our deep unsupervised oriented normal estimator that is robust to noise, outliers and density variations. Our excellent results on widely used benchmarks demonstrate that our method can learn more accurate normals for both unoriented and oriented normal estimation tasks than the latest methods. The source code and pre-trained model are publicly available at https://github.com/LeoQLi/NeuralGF.
    摘要 普通估计3D点云是3D形状处理的基本任务。现状领域的方法都是基于点云上的本地表面适应学习得到的约束。然而,实际扫描数据中的正常监督信息通常不可获得,因此这些方法中学习的约束有限。另外,在不同形状之间均匀的正常方向 consistency 还是一个难以实现的问题。为解决这些问题,我们提出了一种新的方法, direct 从点云中估计方向正常场,不使用地面 truth 监督。我们通过引入一种新的 neural gradient 函数学习 paradigm,让神经网络适应输入点云,并且在点上得到单位 нор 的梯度。具体来说,我们引入了一种新的损失函数,使得查询点能够逐步到达移动目标,并将查询点积累到估计的表面上,从而学习全局表面 Representation 的数据。同时,我们将梯度 integrate 到表面估计中,以测量查询点与表面之间的最小积分差,从而获得一个均匀的梯度场,与表面相关的梯度场。这些技术导致我们的深度无监督方向正常估计器,可以抗抵达噪音、异常点和density 变化。我们的Result 在广泛使用的标准 benchmark 上表现出色,表明我们的方法可以更加准确地估计方向正常场,超过最新的方法。我们的源代码和预训练模型可以在 上获取。

Learning Cooperative Trajectory Representations for Motion Forecasting

  • paper_url: http://arxiv.org/abs/2311.00371
  • repo_url: https://github.com/air-thu/dair-v2x-seq
  • paper_authors: Hongzhi Ruan, Haibao Yu, Wenxian Yang, Siqi Fan, Yingjuan Tang, Zaiqing Nie
  • for: This paper focuses on motion forecasting for autonomous driving, specifically using cooperative information from infrastructure and other vehicles to enhance forecasting capabilities.
  • methods: The proposed method is called V2X-Graph, which is an interpretable and end-to-end learning framework that leverages cooperative motion and interaction contexts using an interpretable graph.
  • results: The paper demonstrates the effectiveness of V2X-Graph on the V2I motion forecasting dataset V2X-Seq, and also constructs a real-world V2X motion forecasting dataset V2X-Traj to further evaluate the method. The results show the advantage of the proposed method.
    Abstract Motion forecasting is an essential task for autonomous driving, and the effective information utilization from infrastructure and other vehicles can enhance motion forecasting capabilities. Existing research have primarily focused on leveraging single-frame cooperative information to enhance the limited perception capability of the ego vehicle, while underutilizing the motion and interaction information of traffic participants observed from cooperative devices. In this paper, we first propose the cooperative trajectory representations learning paradigm. Specifically, we present V2X-Graph, the first interpretable and end-to-end learning framework for cooperative motion forecasting. V2X-Graph employs an interpretable graph to fully leverage the cooperative motion and interaction contexts. Experimental results on the vehicle-to-infrastructure (V2I) motion forecasting dataset, V2X-Seq, demonstrate the effectiveness of V2X-Graph. To further evaluate on V2X scenario, we construct the first real-world vehicle-to-everything (V2X) motion forecasting dataset V2X-Traj, and the performance shows the advantage of our method. We hope both V2X-Graph and V2X-Traj can facilitate the further development of cooperative motion forecasting. Find project at https://github.com/AIR-THU/V2X-Graph, find data at https://github.com/AIR-THU/DAIR-V2X-Seq.
    摘要 传感器预测是自动驾驶中的关键任务,利用基础设施和其他车辆的有效信息可以提高传感器预测能力。现有研究主要是利用单一帧合作信息来增强egos车辆的有限感知能力,而未充分利用交通参与者的运动和互动信息。在本文中,我们首先提出了合作轨迹表示学习思路。特别是,我们提出了V2X-Graph,第一个可解释的终端学习框架 для合作运动预测。V2X-Graph使用可解释的图来完全利用合作运动和互动上下文。实验结果表明,V2X-Graph在V2I动作预测数据集上具有显著优势。为进一步评估V2X场景,我们构建了第一个真实世界的 Vehicle-to-Everything(V2X)动作预测数据集V2X-Traj,并发现了我们的方法的优势。我们希望V2X-Graph和V2X-Traj可以促进合作动作预测的进一步发展。项目可以在https://github.com/AIR-THU/V2X-Graph找到,数据可以在https://github.com/AIR-THU/DAIR-V2X-Seq找到。

LatentWarp: Consistent Diffusion Latents for Zero-Shot Video-to-Video Translation

  • paper_url: http://arxiv.org/abs/2311.00353
  • repo_url: None
  • paper_authors: Yuxiang Bao, Di Qiu, Guoliang Kang, Baochang Zhang, Bo Jin, Kaiye Wang, Pengfei Yan
  • for: zero-shot video-to-video translation with temporal coherence
  • methods: incorporates warping operation in the latent space to constrain query tokens and improve temporal consistency
  • results: superior video-to-video translation with enhanced visual temporal coherence compared to previous methods
    Abstract Leveraging the generative ability of image diffusion models offers great potential for zero-shot video-to-video translation. The key lies in how to maintain temporal consistency across generated video frames by image diffusion models. Previous methods typically adopt cross-frame attention, \emph{i.e.,} sharing the \textit{key} and \textit{value} tokens across attentions of different frames, to encourage the temporal consistency. However, in those works, temporal inconsistency issue may not be thoroughly solved, rendering the fidelity of generated videos limited.%The current state of the art cross-frame attention method aims at maintaining fine-grained visual details across frames, but it is still challenged by the temporal coherence problem. In this paper, we find the bottleneck lies in the unconstrained query tokens and propose a new zero-shot video-to-video translation framework, named \textit{LatentWarp}. Our approach is simple: to constrain the query tokens to be temporally consistent, we further incorporate a warping operation in the latent space to constrain the query tokens. Specifically, based on the optical flow obtained from the original video, we warp the generated latent features of last frame to align with the current frame during the denoising process. As a result, the corresponding regions across the adjacent frames can share closely-related query tokens and attention outputs, which can further improve latent-level consistency to enhance visual temporal coherence of generated videos. Extensive experiment results demonstrate the superiority of \textit{LatentWarp} in achieving video-to-video translation with temporal coherence.
    摘要 利用生成能力的图像扩散模型可以提供无需预训练的视频到视频翻译的潜在潜力。关键在于如何在生成的视频帧中保持时间一致性。先前的方法通常采用cross-frame注意力,即在不同帧的注意力中共享键和值token,以促进时间一致性。然而,这些工作中可能并未完全解决时间不一致的问题,因此生成的视频的准确性有限。现状的潜在抑制方法是保持细腻的视觉细节的同时,仍然面临着时间协调问题。在这篇论文中,我们发现瓶颈在不受限制的查询token上,因此我们提出了一种新的无需预训练视频到视频翻译框架,名为LatentWarp。我们的方法简单:在生成过程中,基于原始视频中获得的光流,将生成的最后一帧的秘密特征截图进行扭曲,以使得当前帧和相邻帧的相关区域可以共享相似的查询token和注意力输出,从而进一步提高秘密水平的一致性,以提高生成的视频的视觉时间准确性。我们的实验结果表明,LatentWarp可以在无需预训练的情况下实现视频到视频翻译的时间准确性。

Analyzing Head Orientation of Neurotypical and Autistic Individuals in Triadic Conversations

  • paper_url: http://arxiv.org/abs/2311.00343
  • repo_url: None
  • paper_authors: Onur N. Tepencelik, Wenchuan Wei, Pamela C. Cosman, Sujit Dey
  • for: 这个论文是用来估计人体和头部的方向偏移的系统。
  • methods: 这个系统使用了低分辨率点云数据来估计人体和头部的方向偏移。模型使用椭球适应和人工神经网络回归器来准确地估计人体和头部的方向偏移。
  • results: 模型的测试结果表明,与其他使用RGB摄像头的身体和头部偏移估计系统相比,使用LiDAR感知器保护用户隐私,并达到相似的准确性。此外,模型不需要在前方 Specified placement,并且可以在实际会话中准确地估计人体和头部的方向偏移。
    Abstract We propose a system that estimates people's body and head orientations using low-resolution point cloud data from two LiDAR sensors. Our models make accurate estimations in real-world conversation settings where the subject moves naturally with varying head and body poses. The body orientation estimation model uses ellipse fitting while the head orientation estimation model is a pipeline of geometric feature extraction and an ensemble of neural network regressors. Compared with other body and head orientation estimation systems using RGB cameras, our proposed system uses LiDAR sensors to preserve user privacy, while achieving comparable accuracy. Unlike other body/head orientation estimation systems, our sensors do not require a specified placement in front of the subject. Our models achieve a mean absolute estimation error of 5.2 degrees for body orientation and 13.7 degrees for head orientation. We use our models to quantify behavioral differences between neurotypical and autistic individuals in triadic conversations. Tests of significance show that people with autism spectrum disorder display significantly different behavior compared to neurotypical individuals in terms of distributing attention between participants in a conversation, suggesting that the approach could be a component of a behavioral analysis or coaching system.
    摘要 我们提出了一个系统,使用低分辨率点云数据从两个LiDAR感知器来估算人体和头部orientation。我们的模型在真实世界的对话 Setting中具有高精度,可以捕捉人体在自然的移动和变化pose下的orientation。bodyorientation estimation模型使用椭球拟合,而headorientation estimation模型则是一个包括几何特征提取和多个神经网络回归器的管道。与其他使用RGB摄像头的body和headorientation estimation系统相比,我们的提议的系统使用LiDAR感知器,保持用户隐私,同时实现相似的准确性。与其他body/headorientation estimation系统不同的是,我们的感知器不需要在前方的主题处置于特定的位置。我们的模型的 mean absolute estimation error为5.2度 дляbodyorientation和13.7度 дляheadorientation。我们使用我们的模型来量化 между neurtypical和autism Spectrum Disorder(ASD)个体在多人对话中的行为差异。测试显示,人类ASD Displayed significant differences in attention distribution between participants in a conversation compared to neurotypical individuals, suggesting that the approach could be a component of a behavioral analysis or coaching system.

fMRI-PTE: A Large-scale fMRI Pretrained Transformer Encoder for Multi-Subject Brain Activity Decoding

  • paper_url: http://arxiv.org/abs/2311.00342
  • repo_url: None
  • paper_authors: Xuelin Qian, Yun Wang, Jingyang Huo, Jianfeng Feng, Yanwei Fu
  • for: This paper aims to develop a novel approach for pre-training fMRI data, addressing the challenges of individual brain differences and improving the quality of brain activity decoding.
  • methods: The proposed approach, called fMRI-PTE, uses an auto-encoder to transform fMRI signals into unified 2D representations, leveraging a novel learning strategy and image generators to enhance the quality of reconstruction and facilitate downstream tasks.
  • results: The authors demonstrate the effectiveness of fMRI-PTE through extensive experiments, showing improved performance in brain activity decoding compared to traditional methods and offering a promising foundation for future research in this area.
    Abstract The exploration of brain activity and its decoding from fMRI data has been a longstanding pursuit, driven by its potential applications in brain-computer interfaces, medical diagnostics, and virtual reality. Previous approaches have primarily focused on individual subject analysis, highlighting the need for a more universal and adaptable framework, which is the core motivation behind our work. In this work, we propose fMRI-PTE, an innovative auto-encoder approach for fMRI pre-training, with a focus on addressing the challenges of varying fMRI data dimensions due to individual brain differences. Our approach involves transforming fMRI signals into unified 2D representations, ensuring consistency in dimensions and preserving distinct brain activity patterns. We introduce a novel learning strategy tailored for pre-training 2D fMRI images, enhancing the quality of reconstruction. fMRI-PTE's adaptability with image generators enables the generation of well-represented fMRI features, facilitating various downstream tasks, including within-subject and cross-subject brain activity decoding. Our contributions encompass introducing fMRI-PTE, innovative data transformation, efficient training, a novel learning strategy, and the universal applicability of our approach. Extensive experiments validate and support our claims, offering a promising foundation for further research in this domain.
    摘要 <>translate(The exploration of brain activity and its decoding from fMRI data has been a longstanding pursuit, driven by its potential applications in brain-computer interfaces, medical diagnostics, and virtual reality. Previous approaches have primarily focused on individual subject analysis, highlighting the need for a more universal and adaptable framework, which is the core motivation behind our work. In this work, we propose fMRI-PTE, an innovative auto-encoder approach for fMRI pre-training, with a focus on addressing the challenges of varying fMRI data dimensions due to individual brain differences. Our approach involves transforming fMRI signals into unified 2D representations, ensuring consistency in dimensions and preserving distinct brain activity patterns. We introduce a novel learning strategy tailored for pre-training 2D fMRI images, enhancing the quality of reconstruction. fMRI-PTE's adaptability with image generators enables the generation of well-represented fMRI features, facilitating various downstream tasks, including within-subject and cross-subject brain activity decoding. Our contributions encompass introducing fMRI-PTE, innovative data transformation, efficient training, a novel learning strategy, and the universal applicability of our approach. Extensive experiments validate and support our claims, offering a promising foundation for further research in this domain.)Here's the translation:<>翻译(脑活动的探索和从fMRI数据中的解码已经是一项长期的追求,它的潜在应用包括脑机器交互、医疗诊断和虚拟现实等。先前的方法主要集中在个体Subject的分析上,这 highlights the need for a more universal and adaptable framework, which is the core motivation behind our work。在这项工作中,我们提出了fMRI-PTE,一种创新的自编码方法 для fMRI预训练,强调解决因个体脑difficulties而导致的fMRI数据维度的变化挑战。我们的方法涉及将fMRI信号转换成统一的2D表示形式,确保维度的一致性和保持脑活动特征的分布。我们介绍了一种适应于预训练2D fMRI图像的新学习策略,提高了重建质量。fMRI-PTE的图像生成器的可靠性使得可以生成高质量的fMRI特征,促进了多种下游任务,包括在个体和跨个体脑活动解码。我们的贡献包括引入fMRI-PTE、创新数据转换、高效训练、适应学习策略和我们的方法的 universality。广泛的实验证明和支持我们的声明,提供了一个可靠的基础 для进一步的研究在这个领域。)

Space Narrative: Generating Images and 3D Scenes of Chinese Garden from Text using Deep Learning

  • paper_url: http://arxiv.org/abs/2311.00339
  • repo_url: None
  • paper_authors: Jiaxi Shi1, Hao Hua1
  • for: 这篇论文主要针对的是传统中国园林研究和修复的困难问题,即缺乏直接资料。
  • methods: 该论文提出了一种基于深度学习方法的园林图像生成方法,使用了文本描述和历史园林画作为数据集,并使用了LoRA进行精度调整。
  • results: 该论文通过使用文本描述和历史园林画的数据集,使用了深度学习方法生成了具有明朝风格的园林图像,并可以在Unity 3D 中实现三维展示。
    Abstract The consistent mapping from poems to paintings is essential for the research and restoration of traditional Chinese gardens. But the lack of firsthand ma-terial is a great challenge to the reconstruction work. In this paper, we pro-pose a method to generate garden paintings based on text descriptions using deep learning method. Our image-text pair dataset consists of more than one thousand Ming Dynasty Garden paintings and their inscriptions and post-scripts. A latent text-to-image diffusion model learns the mapping from de-scriptive texts to garden paintings of the Ming Dynasty, and then the text description of Jichang Garden guides the model to generate new garden paintings. The cosine similarity between the guide text and the generated image is the evaluation criterion for the generated images. Our dataset is used to fine-tune the pre-trained diffusion model using Low-Rank Adapta-tion of Large Language Models (LoRA). We also transformed the generated images into a panorama and created a free-roam scene in Unity 3D. Our post-trained model is capable of generating garden images in the style of Ming Dynasty landscape paintings based on textual descriptions. The gener-ated images are compatible with three-dimensional presentation in Unity 3D.
    摘要 “ Traditional Chinese gardens 的研究和修复受到描述与画作之间的一致性很大帮助。但是,由于缺乏直接证据,修复工作受到很大挑战。本文提出了一种基于深度学习方法生成园林画作的方法。我们的图文对照集包括明朝园林画作和其附注和词汇,共计超过一千个。我们使用文本描述和图像生成模型,将描述文本引导模型生成新的园林画作。我们使用 cosine 相似性作为生成图像的评价标准。我们的模型可以在 Unity 3D 中生成园林画作,并将其转换为漫游场景。我们的模型可以基于文本描述生成园林画作,并且可以与三维表现相匹配。”

SDF4CHD: Generative Modeling of Cardiac Anatomies with Congenital Heart Defects

  • paper_url: http://arxiv.org/abs/2311.00332
  • repo_url: None
  • paper_authors: Fanwei Kong, Sascha Stocker, Perry S. Choi, Michael Ma, Daniel B. Ennis, Alison Marsden
    for:This paper aims to improve the diagnosis and treatment planning of congenital heart disease (CHD) by generating virtual cardiac anatomies using deep learning (DL) methods.methods:The proposed approach uses a type- and shape-disentangled generative model based on signed distance fields (SDF) to capture the wide spectrum of cardiac anatomies observed in different CHD types. The approach also learns invertible deformations to morph the learned CHD type-specific anatomies and reconstruct patient-specific shapes.results:The proposed approach has the potential to augment the image-segmentation pairs for rarer CHD types for cardiac segmentation and generate cohorts of CHD cardiac meshes for computational simulation, which can improve the diagnosis and treatment planning of CHD patients.
    Abstract Congenital heart disease (CHD) encompasses a spectrum of cardiovascular structural abnormalities, often requiring customized treatment plans for individual patients. Computational modeling and analysis of these unique cardiac anatomies can improve diagnosis and treatment planning and may ultimately lead to improved outcomes. Deep learning (DL) methods have demonstrated the potential to enable efficient treatment planning by automating cardiac segmentation and mesh construction for patients with normal cardiac anatomies. However, CHDs are often rare, making it challenging to acquire sufficiently large patient cohorts for training such DL models. Generative modeling of cardiac anatomies has the potential to fill this gap via the generation of virtual cohorts; however, prior approaches were largely designed for normal anatomies and cannot readily capture the significant topological variations seen in CHD patients. Therefore, we propose a type- and shape-disentangled generative approach suitable to capture the wide spectrum of cardiac anatomies observed in different CHD types and synthesize differently shaped cardiac anatomies that preserve the unique topology for specific CHD types. Our DL approach represents generic whole heart anatomies with CHD type-specific abnormalities implicitly using signed distance fields (SDF) based on CHD type diagnosis, which conveniently captures divergent anatomical variations across different types and represents meaningful intermediate CHD states. To capture the shape-specific variations, we then learn invertible deformations to morph the learned CHD type-specific anatomies and reconstruct patient-specific shapes. Our approach has the potential to augment the image-segmentation pairs for rarer CHD types for cardiac segmentation and generate cohorts of CHD cardiac meshes for computational simulation.
    摘要 Congenital heart disease (CHD) 包括一系列心血管结构畸形,经常需要根据各个患者的特殊情况制定个性化的治疗方案。计算模型和分析这些特殊的心血管结构可以提高诊断和治疗规划,并最终可能导致更好的结果。深度学习(DL)方法已经表明可以通过自动化心血管分 segmentation和心血管建模来提高诊断和治疗规划的效率。但是,CHD 的发生率很低,因此困难以获得足够的患者组合来训练这些 DL 模型。生成模型可以填补这一空白,通过生成虚拟患者组合来模拟不同类型的 CHD。但是,先前的方法主要针对正常的心血管结构,无法轻松地捕捉 CHD 患者的重要的拓扑变化。因此,我们提出了一种类型和形状分解的生成方法,可以 capture 不同类型的 CHD 的各种拓扑变化,并生成具有不同形状的心血管结构。我们的 DL 方法使用了签名距离场(SDF)基于 CHD 类型诊断,以便捕捉不同类型的 CHD 中的多样化拓扑变化。然后,我们学习了可逆变形,以将学习的 CHD 类型特定的心血管结构变换为患者特定的形状。我们的方法有望增加较少seen的 CHD 类型的图像分割对,以及生成不同类型的 CHD 心血管核心,以便计算模拟。

Enhancing Clustering Representations with Positive Proximity and Cluster Dispersion Learning

  • paper_url: http://arxiv.org/abs/2311.00731
  • repo_url: None
  • paper_authors: Abhishek Kumar, Dong-Gyu Lee
  • for: 这篇论文主要用于提出一种新的深度划分方法,即PIPCDR方法,用于解决现代深度划分中的问题。
  • methods: PIPCDR方法使用了一种新的正例邻近损失函数和一种减少划分趋势的补偿器,以兼顾两种方法的优点,并消除其缺点。
  • results: PIPCDR方法在一系列模拟和实际 datasets上表现出色,能够生成良好的划分结果,同时避免维度缩合和类别碰撞问题,提高划分精度。
    Abstract Contemporary deep clustering approaches often rely on either contrastive or non-contrastive techniques to acquire effective representations for clustering tasks. Contrastive methods leverage negative pairs to achieve homogenous representations but can introduce class collision issues, potentially compromising clustering performance. On the contrary, non-contrastive techniques prevent class collisions but may produce non-uniform representations that lead to clustering collapse. In this work, we propose a novel end-to-end deep clustering approach named PIPCDR, designed to harness the strengths of both approaches while mitigating their limitations. PIPCDR incorporates a positive instance proximity loss and a cluster dispersion regularizer. The positive instance proximity loss ensures alignment between augmented views of instances and their sampled neighbors, enhancing within-cluster compactness by selecting genuinely positive pairs within the embedding space. Meanwhile, the cluster dispersion regularizer maximizes inter-cluster distances while minimizing within-cluster compactness, promoting uniformity in the learned representations. PIPCDR excels in producing well-separated clusters, generating uniform representations, avoiding class collision issues, and enhancing within-cluster compactness. We extensively validate the effectiveness of PIPCDR within an end-to-end Majorize-Minimization framework, demonstrating its competitive performance on moderate-scale clustering benchmark datasets and establishing new state-of-the-art results on large-scale datasets.
    摘要 现代深度划分方法 oftentimes 依赖于 either 对比或非对比技术来获得有效的划分表示。对比方法 利用负对比来实现同一类型的表示,但可能会引入类冲突问题,可能会降低划分性能。然而,非对比技术 避免类冲突问题,但可能会生成不均匀的表示,导致划分崩溃。在这项工作中,我们提出了一种新的端到端深度划分方法,名为PIPCDR。PIPCDR 结合了正例邻接损失和群分散正则化。正例邻接损失 确保在扩展视图中的实例和其采样的邻居之间的Alignment,提高同一个分组内的紧凑性,通过选择真正的正例对在嵌入空间中进行选择。同时,群分散正则化 最大化间分组距离,最小化同一个分组内的紧凑性,使得学习的表示具有均匀性。PIPCDR 在生成均匀的分组、避免类冲突问题、提高同一个分组内的紧凑性和端到端 Majorize-Minimization 框架中展现出了竞争性的性能,并在大规模数据集上达到了新的国际纪录。

Flooding Regularization for Stable Training of Generative Adversarial Networks

  • paper_url: http://arxiv.org/abs/2311.00318
  • repo_url: None
  • paper_authors: Iu Yahiro, Takashi Ishida, Naoto Yokoya
  • for: 提高生成 adversarial network (GAN) 的稳定性。
  • methods: 直接对抗损失函数进行补偿,使用涌流法来防止批判器损失值过低。
  • results: 实验表明,涌流法可以稳定 GAN 的训练,并且可以与其他稳定技术结合使用。此外,研究还发现,对批判器损失值进行限制,可以使训练过程更加稳定,即使涌流水平较高。
    Abstract Generative Adversarial Networks (GANs) have shown remarkable performance in image generation. However, GAN training suffers from the problem of instability. One of the main approaches to address this problem is to modify the loss function, often using regularization terms in addition to changing the type of adversarial losses. This paper focuses on directly regularizing the adversarial loss function. We propose a method that applies flooding, an overfitting suppression method in supervised learning, to GANs to directly prevent the discriminator's loss from becoming excessively low. Flooding requires tuning the flood level, but when applied to GANs, we propose that the appropriate range of flood level settings is determined by the adversarial loss function, supported by theoretical analysis of GANs using the binary cross entropy loss. We experimentally verify that flooding stabilizes GAN training and can be combined with other stabilization techniques. We also reveal that by restricting the discriminator's loss to be no greater than flood level, the training proceeds stably even when the flood level is somewhat high.
    摘要

An Empirical Study of Frame Selection for Text-to-Video Retrieval

  • paper_url: http://arxiv.org/abs/2311.00298
  • repo_url: None
  • paper_authors: Mengxia Wu, Min Cao, Yang Bai, Ziyin Zeng, Chen Chen, Liqiang Nie, Min Zhang
  • for: 文本-视频重现(TVR)旨在在大量视频库中找到基于文本查询的最相关视频。
  • methods: exist 方法通常选择视频中的一 subset of frames来表示视频内容,以提高 TVR 的性能和效率。
  • results: 根据多个 TVR benchmark 的全面分析,我们证明了 proper frame 选择可以显著提高 TVR 的检索效率,而无需牺牲检索性能。
    Abstract Text-to-video retrieval (TVR) aims to find the most relevant video in a large video gallery given a query text. The intricate and abundant context of the video challenges the performance and efficiency of TVR. To handle the serialized video contexts, existing methods typically select a subset of frames within a video to represent the video content for TVR. How to select the most representative frames is a crucial issue, whereby the selected frames are required to not only retain the semantic information of the video but also promote retrieval efficiency by excluding temporally redundant frames. In this paper, we make the first empirical study of frame selection for TVR. We systemically classify existing frame selection methods into text-free and text-guided ones, under which we detailedly analyze six different frame selections in terms of effectiveness and efficiency. Among them, two frame selections are first developed in this paper. According to the comprehensive analysis on multiple TVR benchmarks, we empirically conclude that the TVR with proper frame selections can significantly improve the retrieval efficiency without sacrificing the retrieval performance.
    摘要

Graph Representation Learning for Infrared and Visible Image Fusion

  • paper_url: http://arxiv.org/abs/2311.00291
  • repo_url: None
  • paper_authors: Jing Li, Lu Bai, Bin Yang, Chang Li, Lingfei Ma, Edwin R. Hancock
  • for: 本研究的目的是提出一种基于图表示的多模态图像重构方法,以提高图像重构的精度和效率。
  • methods: 本研究使用图 convolutional neural networks (GCNs) 抽取非本地自相关特征 (NLss),通过图可以提供细致的结构来归一化特征并传递信息。
  • results: 实验结果表明,提出的方法可以更好地捕捉图像中的NLss,并且与传统方法相比,具有更高的重构精度和效率。
    Abstract Infrared and visible image fusion aims to extract complementary features to synthesize a single fused image. Many methods employ convolutional neural networks (CNNs) to extract local features due to its translation invariance and locality. However, CNNs fail to consider the image's non-local self-similarity (NLss), though it can expand the receptive field by pooling operations, it still inevitably leads to information loss. In addition, the transformer structure extracts long-range dependence by considering the correlativity among all image patches, leading to information redundancy of such transformer-based methods. However, graph representation is more flexible than grid (CNN) or sequence (transformer structure) representation to address irregular objects, and graph can also construct the relationships among the spatially repeatable details or texture with far-space distance. Therefore, to address the above issues, it is significant to convert images into the graph space and thus adopt graph convolutional networks (GCNs) to extract NLss. This is because the graph can provide a fine structure to aggregate features and propagate information across the nearest vertices without introducing redundant information. Concretely, we implement a cascaded NLss extraction pattern to extract NLss of intra- and inter-modal by exploring interactions of different image pixels in intra- and inter-image positional distance. We commence by preforming GCNs on each intra-modal to aggregate features and propagate information to extract independent intra-modal NLss. Then, GCNs are performed on the concatenate intra-modal NLss features of infrared and visible images, which can explore the cross-domain NLss of inter-modal to reconstruct the fused image. Ablation studies and extensive experiments illustrates the effectiveness and superiority of the proposed method on three datasets.
    摘要 infrared和可见图像融合的目标是提取 complementary 特征来生成合并的单独图像。许多方法使用卷积神经网络(CNN)提取本地特征,因为它们的翻译不变性和本地性。然而,CNN失去了图像的非本地自相似性(NLss),尽管它可以通过抽象操作扩大感知范围,但仍然会导致信息损失。此外,transformer结构可以捕捉图像的长距离相关性,但是它会导致图像的信息重复。然而,图像表示更加灵活于网格(CNN)或序列(transformer结构)表示,可以 Address Irregular Objects。因此,为了解决这些问题,需要将图像转换为图像空间,并采用图像卷积神经网络(GCNs)提取NLss。这是因为图像可以提供细致的结构来聚合特征和在最近邻居中传递信息,而无需引入重复信息。具体来说,我们实现了一种卷积NLss提取模式,通过探索不同图像像素之间的交互来提取NLss。我们开始是在每个intra-modal中使用GCNs来聚合特征和传递信息,以提取独立的intra-modalNLss。然后,我们在 concatenate intra-modalNLss特征上使用GCNs来探索跨频道NLss,以重构融合图像。aborlation studies和广泛的实验表明了我们提posed方法的有效性和优越性。

Mixture-of-Experts for Open Set Domain Adaptation: A Dual-Space Detection Approach

  • paper_url: http://arxiv.org/abs/2311.00285
  • repo_url: None
  • paper_authors: Zhenbang Du, Jiayu An, Jiahao Hong, Dongrui Wu
  • for: 这篇论文的目的是实现开放集领域适束(OSDA),以实现源领域和目标领域之间的分布和标签迁移,并实现精准的分类结果。
  • methods: 这篇论文使用了混合专家(MoE)的方法,将不同的专家处理不同的输入特征,从而生成不同的专家路由模式,以便更好地识别未知的类别样本。
  • results: 这篇论文的实验结果显示,使用了Graph Router来更好地利用图像组件之间的空间信息,可以更好地识别未知的类别样本,并且比较精准地预测未知类别的结果。
    Abstract Open Set Domain Adaptation (OSDA) aims to cope with the distribution and label shifts between the source and target domains simultaneously, performing accurate classification for known classes while identifying unknown class samples in the target domain. Most existing OSDA approaches, depending on the final image feature space of deep models, require manually-tuned thresholds, and may easily misclassify unknown samples as known classes. Mixture-of-Expert (MoE) could be a remedy. Within an MoE, different experts address different input features, producing unique expert routing patterns for different classes in a routing feature space. As a result, unknown class samples may also display different expert routing patterns to known classes. This paper proposes Dual-Space Detection, which exploits the inconsistencies between the image feature space and the routing feature space to detect unknown class samples without any threshold. Graph Router is further introduced to better make use of the spatial information among image patches. Experiments on three different datasets validated the effectiveness and superiority of our approach. The code will come soon.
    摘要

TLMCM Network for Medical Image Hierarchical Multi-Label Classification

  • paper_url: http://arxiv.org/abs/2311.00282
  • repo_url: None
  • paper_authors: Meng Wu, Siyan Luo, Qiyu Wu, Wenbin Ouyang
    for:本研究旨在提高现代医疗领域的医学影像层次多标签分类(MI-HMC)任务中的两大挑战:数据不均衡和层次约束。现有的解决方案通常包括复杂的模型建立设计或域专业的预处理,需要较大的专业知识或努力进行实现。methods:为解决这些限制,本研究提出了传输学习与最大约束模块(TLMCM)网络,用于MI-HMC任务。TLMCM网络提供了一种新的方法,以超越现有方法,根据Area Under the Average Precision and Recall Curve($AU\overline{(PRC)}$)指标。此外,本研究还提出了两种新的准确率指标:$EMR$和$HammingAccuracy$,在MI-HMC任务中尚未得到广泛的探讨。results:实验结果表明,TLMCM网络在MI-HMC任务中可以达到高多标签预测率(80%-90%),使其成为医疗领域应用中的有价值贡献。
    Abstract Medical Image Hierarchical Multi-Label Classification (MI-HMC) is of paramount importance in modern healthcare, presenting two significant challenges: data imbalance and \textit{hierarchy constraint}. Existing solutions involve complex model architecture design or domain-specific preprocessing, demanding considerable expertise or effort in implementation. To address these limitations, this paper proposes Transfer Learning with Maximum Constraint Module (TLMCM) network for the MI-HMC task. The TLMCM network offers a novel approach to overcome the aforementioned challenges, outperforming existing methods based on the Area Under the Average Precision and Recall Curve($AU\overline{(PRC)}$) metric. In addition, this research proposes two novel accuracy metrics, $EMR$ and $HammingAccuracy$, which have not been extensively explored in the context of the MI-HMC task. Experimental results demonstrate that the TLMCM network achieves high multi-label prediction accuracy($80\%$-$90\%$) for MI-HMC tasks, making it a valuable contribution to healthcare domain applications.
    摘要 医疗图像层次多标签分类(MI-HMC)在现代医疗中具有重要意义,存在两个主要挑战:数据不均衡和层次约束。现有的解决方案包括复杂的模型建立设计或域pecific的预处理,需要较大的专业知识或努力进行实现。为了解决这些限制,本文提出了基于传输学习的最大约束模块(TLMCM)网络,用于MI-HMC任务。TLMCM网络提供了一种新的方法,超越现有方法,根据Area Under the Average Precision and Recall Curve($AU\overline{(PRC)}$)指标。此外,本研究还提出了两个新的准确率指标,$EMR$和$HammingAccuracy$,在MI-HMC任务中尚未得到广泛的探讨。实验结果表明,TLMCM网络在MI-HMC任务中实现了80%-90%的多标签预测率,使其成为医疗领域应用中的有价值贡献。

OpenForest: A data catalogue for machine learning in forest monitoring

  • paper_url: http://arxiv.org/abs/2311.00277
  • repo_url: https://github.com/rolnicklab/openforest
  • paper_authors: Arthur Ouaknine, Teja Kattenborn, Etienne Laliberté, David Rolnick
  • for: 本研究旨在提供一个开放的数据集,以便应用机器学习方法进行大规模森林监测。
  • methods: 本研究使用了86个开放数据集,包括森林资源调查、地面、空中和卫星数据记录等,以探讨森林生态系统的变化。
  • results: 本研究提供了一个动态目录,名为OpenForest,用于收集和总结所有可用的开放数据集,以便推广大规模森林监测的研究。
    Abstract Forests play a crucial role in Earth's system processes and provide a suite of social and economic ecosystem services, but are significantly impacted by human activities, leading to a pronounced disruption of the equilibrium within ecosystems. Advancing forest monitoring worldwide offers advantages in mitigating human impacts and enhancing our comprehension of forest composition, alongside the effects of climate change. While statistical modeling has traditionally found applications in forest biology, recent strides in machine learning and computer vision have reached important milestones using remote sensing data, such as tree species identification, tree crown segmentation and forest biomass assessments. For this, the significance of open access data remains essential in enhancing such data-driven algorithms and methodologies. Here, we provide a comprehensive and extensive overview of 86 open access forest datasets across spatial scales, encompassing inventories, ground-based, aerial-based, satellite-based recordings, and country or world maps. These datasets are grouped in OpenForest, a dynamic catalogue open to contributions that strives to reference all available open access forest datasets. Moreover, in the context of these datasets, we aim to inspire research in machine learning applied to forest biology by establishing connections between contemporary topics, perspectives and challenges inherent in both domains. We hope to encourage collaborations among scientists, fostering the sharing and exploration of diverse datasets through the application of machine learning methods for large-scale forest monitoring. OpenForest is available at https://github.com/RolnickLab/OpenForest .
    摘要 森林扮演着地球系维程序中重要的角色,提供了一系列社会和经济生态系统服务,但是人类活动对森林产生了明显的干扰,导致生态系统内部的平衡状态出现了明显的异常。推进全球森林监测可以有助于减轻人类对森林的影响,提高我们对森林结构的理解,以及气候变化的影响。在森林生物学中,统计学模型传统上找到了应用,但现在机器学习和计算机视觉在使用遥感数据时已经取得了重要的进步,例如树种识别、树冠分割和森林生物质评估。为此,开放数据的重要性仍然是不可或缺的,以提高这些数据驱动的算法和方法。本文提供了86个开放访问森林数据集,覆盖不同的空间尺度,包括森林资源库、地面、航空、卫星记录等,并分类在OpenForest中,一个动态目录开放至贡献。此外,在这些数据集的背景下,我们希望通过与机器学习领域的连接来激发研究,探索森林生物学中的当代话题、视角和挑战,并促进科学家之间的合作,共享和探索多种数据集,通过机器学习方法进行大规模森林监测。OpenForest可以在https://github.com/RolnickLab/OpenForest上获取。

Adaptive Latent Diffusion Model for 3D Medical Image to Image Translation: Multi-modal Magnetic Resonance Imaging Study

  • paper_url: http://arxiv.org/abs/2311.00265
  • repo_url: https://github.com/jongdory/aldm
  • paper_authors: Jonghun Kim, Hyunjin Park
  • for: This paper proposes a model for image-to-image translation in 3D medical images without patch cropping, which can be used for comprehensive evaluations in medical image analysis.
  • methods: The proposed model uses a latent diffusion model (LDM) with switchable blocks, specifically multiple switchable spatially adaptive normalization (MS-SPADE), to generate high-quality target modalities in 3D.
  • results: The model demonstrated successful image synthesis across different source-target modality scenarios and outperformed other models in quantitative evaluations tested on multi-modal brain magnetic resonance imaging datasets of four different modalities and an independent IXI dataset.
    Abstract Multi-modal images play a crucial role in comprehensive evaluations in medical image analysis providing complementary information for identifying clinically important biomarkers. However, in clinical practice, acquiring multiple modalities can be challenging due to reasons such as scan cost, limited scan time, and safety considerations. In this paper, we propose a model based on the latent diffusion model (LDM) that leverages switchable blocks for image-to-image translation in 3D medical images without patch cropping. The 3D LDM combined with conditioning using the target modality allows generating high-quality target modality in 3D overcoming the shortcoming of the missing out-of-slice information in 2D generation methods. The switchable block, noted as multiple switchable spatially adaptive normalization (MS-SPADE), dynamically transforms source latents to the desired style of the target latents to help with the diffusion process. The MS-SPADE block allows us to have one single model to tackle many translation tasks of one source modality to various targets removing the need for many translation models for different scenarios. Our model exhibited successful image synthesis across different source-target modality scenarios and surpassed other models in quantitative evaluations tested on multi-modal brain magnetic resonance imaging datasets of four different modalities and an independent IXI dataset. Our model demonstrated successful image synthesis across various modalities even allowing for one-to-many modality translations. Furthermore, it outperformed other one-to-one translation models in quantitative evaluations.
    摘要 多Modal图像在医学影像分析中扮演着关键性的角色,提供了补充信息,用于标识临床重要的生物标志物。然而,在临床实践中,获取多Modal的可能性受到了多种因素的限制,如扫描成本、扫描时间和安全考虑。在这篇论文中,我们提出了基于秘密扩散模型(LDM)的模型,使用可 switchable 块(MS-SPADE)来实现图像到图像翻译。3D LDM 与 Conditioning 使用目标模式Allow generating high-quality target modality in 3D, overcoming the shortcomings of missing out-of-slice information in 2D generation methods. MS-SPADE block dynamically transforms source latents to the desired style of the target latents to help with the diffusion process. Our model can handle many translation tasks of one source modality to various targets, eliminating the need for multiple translation models for different scenarios. We tested our model on multi-modal brain magnetic resonance imaging datasets of four different modalities and an independent IXI dataset, and it exhibited successful image synthesis across different source-target modality scenarios and outperformed other models in quantitative evaluations. Our model also demonstrated successful image synthesis across various modalities, even allowing for one-to-many modality translations, and outperformed other one-to-one translation models in quantitative evaluations.

Solutions to Elliptic and Parabolic Problems via Finite Difference Based Unsupervised Small Linear Convolutional Neural Networks

  • paper_url: http://arxiv.org/abs/2311.00259
  • repo_url: None
  • paper_authors: Adrian Celaya, Keegan Kirk, David Fuentes, Beatrice Riviere
  • for: 解决Partial Differential Equations (PDEs)问题,尤其是使用深度学习和神经网络来解决PDEs。
  • methods: 提出了一种 Fully Unsupervised Approach,不需要训练数据或标注输入输出对。
  • results: 对一些选择的elliptic和parabolic问题进行比较,与finite difference方法相当准确。
    Abstract In recent years, there has been a growing interest in leveraging deep learning and neural networks to address scientific problems, particularly in solving partial differential equations (PDEs). However, current neural network-based PDE solvers often rely on extensive training data or labeled input-output pairs, making them prone to challenges in generalizing to out-of-distribution examples. To mitigate the generalization gap encountered by conventional neural network-based methods in estimating PDE solutions, we formulate a fully unsupervised approach, requiring no training data, to estimate finite difference solutions for PDEs directly via small convolutional neural networks. Our proposed algorithms demonstrate a comparable accuracy to the true solution for several selected elliptic and parabolic problems compared to the finite difference method.
    摘要 近年来,有越来越多的关注利用深度学习和神经网络解决科学问题,特别是解决部分偏微分方程(PDEs)。然而,现有的神经网络基于PDE解决方法通常需要大量的训练数据或标注输入输出对,使其容易遇到对不同示例的泛化问题。为了解决传统神经网络基于方法中的泛化差距,我们提出了一种完全无监督的方法,不需要任何训练数据,直接通过小型 convolutional neural networks 来估算部分偏微分解决方案。我们的提议的算法在选择的椭球和偏微分问题中与finite difference方法相比证明了相似的准确性。

RAUNE-Net: A Residual and Attention-Driven Underwater Image Enhancement Method

  • paper_url: http://arxiv.org/abs/2311.00246
  • repo_url: https://github.com/fansuregrin/raune-net
  • paper_authors: Wangzhen Peng, Chenghao Zhou, Runze Hu, Jingchao Cao, Yutao Liu
  • For: 提高水下图像的清晰度和质量* Methods: 使用深度学习和含义学习的策略,包括高级特征径 residual 学习和两种注意力控制技术* Results: 对水下图像进行了改进,提高了对水下图像的恢复和修复性能,并且在不同的水下环境下都能够保持良好的视觉效果
    Abstract Underwater image enhancement (UIE) poses challenges due to distinctive properties of the underwater environment, including low contrast, high turbidity, visual blurriness, and color distortion. In recent years, the application of deep learning has quietly revolutionized various areas of scientific research, including UIE. However, existing deep learning-based UIE methods generally suffer from issues of weak robustness and limited adaptability. In this paper, inspired by residual and attention mechanisms, we propose a more reliable and reasonable UIE network called RAUNE-Net by employing residual learning of high-level features at the network's bottle-neck and two aspects of attention manipulations in the down-sampling procedure. Furthermore, we collect and create two datasets specifically designed for evaluating UIE methods, which contains different types of underwater distortions and degradations. The experimental validation demonstrates that our method obtains promising objective performance and consistent visual results across various real-world underwater images compared to other eight UIE methods. Our example code and datasets are publicly available at https://github.com/fansuregrin/RAUNE-Net.
    摘要 水下图像提高(UIE)在水下环境中存在许多挑战,包括低对比度、高混杂度、视觉模糊和颜色扭曲。在最近几年,深度学习在科研领域中的应用已经革命化了许多领域,包括UIE。然而,现有的深度学习基于UIE方法通常具有弱Robustness和有限的适应性。在这篇论文中,我们提出一种更可靠和合理的UIE网络,称为RAUNE-Net,通过在网络的瓶颈位置使用高级特征的径向学习和下采样过程中使用两种注意力 manipulate。此外,我们收集了和制作了专门用于评估UIE方法的两个数据集,这些数据集包含不同类型的水下扭曲和降低。实验验证表明,我们的方法在各种真实水下图像上获得了显著的目标性能和一致的视觉结果,与其他八个UIE方法相比。我们的示例代码和数据集公开在https://github.com/fansuregrin/RAUNE-Net。

1DFormer: Learning 1D Landmark Representations via Transformer for Facial Landmark Tracking

  • paper_url: http://arxiv.org/abs/2311.00241
  • repo_url: None
  • paper_authors: Shi Yin, Shijie Huan, Defu Lian, Shangfei Wang, Jinshui Hu, Tao Guo, Bing Yin, Baocai Yin, Cong Liu
  • for: 该 paper targets 提高 facial landmark tracking 的性能,并 explore 1D landmark representations 的潜在能力。
  • methods: 该 paper 提出了一种基于 Transformer 架构的方法,名为 1DFormer,它通过在时间和空间维度进行 token 交互,捕捉 facial landmark 的动态和几何特征,并通过多头注意力机制和循环混合机制来适应长期 landmark 动态。
  • results: 实验结果表明,1DFormer 能够模型 facial landmark 序列中的长期顺序模式以及内在的面部结构特征,并在 facial landmark tracking 中 achieve state-of-the-art 性能。
    Abstract Recently, heatmap regression methods based on 1D landmark representations have shown prominent performance on locating facial landmarks. However, previous methods ignored to make deep explorations on the good potentials of 1D landmark representations for sequential and structural modeling of multiple landmarks to track facial landmarks. To address this limitation, we propose a Transformer architecture, namely 1DFormer, which learns informative 1D landmark representations by capturing the dynamic and the geometric patterns of landmarks via token communications in both temporal and spatial dimensions for facial landmark tracking. For temporal modeling, we propose a recurrent token mixing mechanism, an axis-landmark-positional embedding mechanism, as well as a confidence-enhanced multi-head attention mechanism to adaptively and robustly embed long-term landmark dynamics into their 1D representations; for structure modeling, we design intra-group and inter-group structure modeling mechanisms to encode the component-level as well as global-level facial structure patterns as a refinement for the 1D representations of landmarks through token communications in the spatial dimension via 1D convolutional layers. Experimental results on the 300VW and the TF databases show that 1DFormer successfully models the long-range sequential patterns as well as the inherent facial structures to learn informative 1D representations of landmark sequences, and achieves state-of-the-art performance on facial landmark tracking.
    摘要 最近,基于1D landmark表示方法的热图回归方法已经在识别人脸特征点方面表现出了显著的表现。然而,之前的方法忽略了对1D landmark表示的深入探索,以挖掘出多个特征点的序列和结构模型化的潜在可能性。为此,我们提议一种名为1DFormer的Transformer架构,该架构通过在时间和空间维度进行token通信来学习有用的1D landmark表示。为了模elling长期序列动态,我们提出了循环token混合机制、轴点附加嵌入机制以及信息增强多头注意机制,以适应地 Adaptively和可靠地将长期特征点动态嵌入其1D表示中。为了结构模型化,我们设计了间组和组间结构模型化机制,以编码组件水平以及全局水平的脸部结构模式,并通过1D卷积层进行Token交互,以便在空间维度进行1D表示的修正。实验结果表明,1DFormer成功地模elling了长期序列的特征点动态,以及脸部结构模式,并在人脸特征点跟踪方面实现了状态对应的最佳性能。

DINO-Mix: Enhancing Visual Place Recognition with Foundational Vision Model and Feature Mixing

  • paper_url: http://arxiv.org/abs/2311.00230
  • repo_url: None
  • paper_authors: Gaoshuang Huang, Yang Zhou, Xiaofei Hu, Chenglong Zhang, Luying Zhao, Wenjian Gan, Mingbo Hou
  • for: 本研究旨在提高现实世界中的视觉位置识别(VPR)技术的精度和可靠性,使其能够在复杂的环境下(包括光照变化、季节变化和遮挡物)提供高精度的位置识别结果。
  • methods: 本研究使用DINOv2模型作为基础网络,进行trimming和精度调整,以提取图像特征。我们提出了一种新的VPR架构,称为DINO-Mix,它将基础视觉模型与特征聚合模块结合,以提取全球稳定和普适的图像特征。我们使用MLP-Mixer型混合模块,将图像特征进行混合,以获得高精度和普适的位置识别结果。
  • results: 我们在包括光照变化、季节变化和遮挡物的测试集(Tokyo24/7、Nordland、SF-XL-Testv1)中,利用我们提出的DINO-Mix架构,实现了Top-1准确率为91.75%、80.18%和82%,分别比现状态艺术方法提高5.14%的精度。
    Abstract Utilizing visual place recognition (VPR) technology to ascertain the geographical location of publicly available images is a pressing issue for real-world VPR applications. Although most current VPR methods achieve favorable results under ideal conditions, their performance in complex environments, characterized by lighting variations, seasonal changes, and occlusions caused by moving objects, is generally unsatisfactory. In this study, we utilize the DINOv2 model as the backbone network for trimming and fine-tuning to extract robust image features. We propose a novel VPR architecture called DINO-Mix, which combines a foundational vision model with feature aggregation. This architecture relies on the powerful image feature extraction capabilities of foundational vision models. We employ an MLP-Mixer-based mix module to aggregate image features, resulting in globally robust and generalizable descriptors that enable high-precision VPR. We experimentally demonstrate that the proposed DINO-Mix architecture significantly outperforms current state-of-the-art (SOTA) methods. In test sets having lighting variations, seasonal changes, and occlusions (Tokyo24/7, Nordland, SF-XL-Testv1), our proposed DINO-Mix architecture achieved Top-1 accuracy rates of 91.75%, 80.18%, and 82%, respectively. Compared with SOTA methods, our architecture exhibited an average accuracy improvement of 5.14%.
    摘要 utilizing visual place recognition (VPR) technology to determine the geographical location of publicly available images is a pressing issue for real-world VPR applications. although most current VPR methods achieve favorable results under ideal conditions, their performance in complex environments, characterized by lighting variations, seasonal changes, and occlusions caused by moving objects, is generally unsatisfactory. in this study, we utilize the DINOv2 model as the backbone network for trimming and fine-tuning to extract robust image features. we propose a novel VPR architecture called DINO-Mix, which combines a foundational vision model with feature aggregation. this architecture relies on the powerful image feature extraction capabilities of foundational vision models. we employ an MLP-Mixer-based mix module to aggregate image features, resulting in globally robust and generalizable descriptors that enable high-precision VPR. we experimentally demonstrate that the proposed DINO-Mix architecture significantly outperforms current state-of-the-art (SOTA) methods. in test sets having lighting variations, seasonal changes, and occlusions (Tokyo24/7, Nordland, SF-XL-Testv1), our proposed DINO-Mix architecture achieved Top-1 accuracy rates of 91.75%, 80.18%, and 82%, respectively. compared with SOTA methods, our architecture exhibited an average accuracy improvement of 5.14%.

cs.AI - 2023-11-01

Generate and Pray: Using SALLMS to Evaluate the Security of LLM Generated Code

  • paper_url: http://arxiv.org/abs/2311.00889
  • repo_url: None
  • paper_authors: Mohammed Latif Siddiq, Joanna C. S. Santos
  • for: 本研究旨在 evaluating Large Language Models (LLMs) 的安全性,以确保 LLMs 生成的代码不仅功能正确,还不会带来漏洞。
  • methods: 本研究使用了一个框架 named SALLM,它包括一个新的安全中心Python提问集、一个测试生成代码的环境以及一些新的评价指标来评估 LLMs 的安全代码生成能力。
  • results: 研究发现,现有的评价指标主要关注函数正确性,忽视安全考虑因素,导致 LLMs 可能生成的代码存在漏洞。SALLM 框架可以系统地评估 LLMs 的安全代码生成能力,帮助开发者更好地使用 LLMs 进行软件开发。
    Abstract With the growing popularity of Large Language Models (e.g. GitHub Copilot, ChatGPT, etc.) in software engineers' daily practices, it is important to ensure that the code generated by these tools is not only functionally correct but also free of vulnerabilities. Although LLMs can help developers to be more productive, prior empirical studies have shown that LLMs can generate insecure code. There are two contributing factors to the insecure code generation. First, existing datasets used to evaluate Large Language Models (LLMs) do not adequately represent genuine software engineering tasks sensitive to security. Instead, they are often based on competitive programming challenges or classroom-type coding tasks. In real-world applications, the code produced is integrated into larger codebases, introducing potential security risks. There's a clear absence of benchmarks that focus on evaluating the security of the generated code. Second, existing evaluation metrics primarily focus on the functional correctness of the generated code while ignoring security considerations. Metrics such as pass@k gauge the probability of obtaining the correct code in the top k suggestions. Other popular metrics like BLEU, CodeBLEU, ROUGE, and METEOR similarly emphasize functional accuracy, neglecting security implications. In light of these research gaps, in this paper, we described SALLM, a framework to benchmark LLMs' abilities to generate secure code systematically. This framework has three major components: a novel dataset of security-centric Python prompts, an evaluation environment to test the generated code, and novel metrics to evaluate the models' performance from the perspective of secure code generation.
    摘要 随着大型语言模型(如GitHub Copilot和ChatGPT等)在软件工程师日常实践中的普及,需要确保这些工具生成的代码不仅功能正确,还要免受漏洞。虽然LLM可以帮助开发者更加生产力,但根据先前的研究表明,LLM可能会生成不安全的代码。这有两个贡献因素。首先,现有的LLM评价数据集不能够准确表征实际的软件工程任务,而是基于竞赛编程挑战或教室型编程任务。在实际应用中,生成的代码将被集成到更大的代码库中, introducing potential security risks。此外,存在一个缺失的评价标准,即评价生成代码的安全性。其次,现有的评价指标主要集中在生成代码的功能正确性上,忽视安全考虑。例如,pass@k指标衡量在topk建议中获得正确代码的概率。其他流行的指标如BLEU、CodeBLEU、ROUGE和METEOR也强调功能正确性,忽视安全后果。为了填补这些研究漏洞,本文提出了SALLM框架,用于系统地评价LLM的安全代码生成能力。SALLM框架包括三个主要组成部分:一个新的安全中心Python提问集,一个用于测试生成代码的评价环境,以及一些新的安全性评价指标。

SCPO: Safe Reinforcement Learning with Safety Critic Policy Optimization

  • paper_url: http://arxiv.org/abs/2311.00880
  • repo_url: None
  • paper_authors: Jaafar Mhamed, Shangding Gu
  • for: 这篇论文旨在提高实际应用中的强化学习环境的安全性。
  • methods: 该论文使用了受限的Markov决策过程(CMDP),并使用Lagrangian relaxation技术转化为不受限的双问题。
  • results: 该研究提出了一种新的安全强化学习算法(SCPO),可以自动平衡安全限制的满足和奖励的最大化。该算法在实验中舒适性比基本参考点高。
    Abstract Incorporating safety is an essential prerequisite for broadening the practical applications of reinforcement learning in real-world scenarios. To tackle this challenge, Constrained Markov Decision Processes (CMDPs) are leveraged, which introduce a distinct cost function representing safety violations. In CMDPs' settings, Lagrangian relaxation technique has been employed in previous algorithms to convert constrained optimization problems into unconstrained dual problems. However, these algorithms may inaccurately predict unsafe behavior, resulting in instability while learning the Lagrange multiplier. This study introduces a novel safe reinforcement learning algorithm, Safety Critic Policy Optimization (SCPO). In this study, we define the safety critic, a mechanism that nullifies rewards obtained through violating safety constraints. Furthermore, our theoretical analysis indicates that the proposed algorithm can automatically balance the trade-off between adhering to safety constraints and maximizing rewards. The effectiveness of the SCPO algorithm is empirically validated by benchmarking it against strong baselines.
    摘要 要扩大强化学习在实际场景中的应用,保障是一个必不可少的前提。为此,我们利用受限的Markov决策过程(CMDP),它们引入了一个特定的成本函数表示安全违反。在CMDP的设置下,以前的算法使用Lagrangian relaxation技术将受限优化问题转化为无约优化问题的双重问题。然而,这些算法可能会错误地预测不安全的行为,导致学习 Lagrange多余预测的不稳定。本研究提出了一种新的安全强化学习算法,安全批评策略优化(SCPO)。在本研究中,我们定义了安全批评机制,即在违反安全限制时取消获得的奖励。此外,我们的理论分析表明,提案的算法可以自动平衡遵守安全限制和最大化奖励之间的衡量。我们对SCPO算法进行了实验验证,并证明其与强大的基elines进行比较。

Selectively Sharing Experiences Improves Multi-Agent Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2311.00865
  • repo_url: https://github.com/mgerstgrasser/super
  • paper_authors: Matthias Gerstgrasser, Tom Danino, Sarah Keren
  • for: 本文提出了一种新的多代理RL方法,即选择性多代理经验转移(Selective Multi-Agent Prioritized Experience Relay,SMAPER),其中代理在训练时共享有限的转移经验。理解是,即使每个代理只有少量相关经验,也可以帮助它们学习。不同于许多其他多代理RL算法,这种方法允许代理在很大程度上自主训练,只需要代理之间的有限通信。
  • methods: 本文使用了SMAPER算法,其包括以下几个步骤:首先,每个代理从环境中收集经验,并将其分为有价值和无价值两类;然后,每个代理选择一部分有价值经验与其他代理共享,而不是所有的经验;最后,每个代理根据共享的经验进行学习。
  • results: 作者证明了SMAPER方法可以超过基eline的不共享训练和现状的多代理RL算法。此外,只共享有限的高度相关经验可以超过共享所有经验,并且这种性能提升是对多种参数和DQN变体的robust。参考实现可以在https://github.com/mgerstgrasser/super上找到。
    Abstract We present a novel multi-agent RL approach, Selective Multi-Agent Prioritized Experience Relay, in which agents share with other agents a limited number of transitions they observe during training. The intuition behind this is that even a small number of relevant experiences from other agents could help each agent learn. Unlike many other multi-agent RL algorithms, this approach allows for largely decentralized training, requiring only a limited communication channel between agents. We show that our approach outperforms baseline no-sharing decentralized training and state-of-the art multi-agent RL algorithms. Further, sharing only a small number of highly relevant experiences outperforms sharing all experiences between agents, and the performance uplift from selective experience sharing is robust across a range of hyperparameters and DQN variants. A reference implementation of our algorithm is available at https://github.com/mgerstgrasser/super.
    摘要 我们提出了一种新的多代理RL方法,选择性多代理优先经验转移(Selective Multi-Agent Prioritized Experience Relay,简称SMAPER)。在这种方法中,代理们在训练时分享有限的转移经验。我们的理念是,即使只有一小部分代理的转移经验,也可以帮助每个代理学习。不同于许多其他多代理RL算法,我们的方法允许大量分布式训练,只需要代理之间的有限通信。我们证明了,我们的方法比基eline无共享的分布式训练和现状的多代理RL算法高效。此外,仅分享一小部分高度相关的经验,可以超越分享所有经验 между代理,并且SMAPER的性能提升是对多种 гиперпараметров和DQN变体的 robust。一个Reference实现我们的算法可以在https://github.com/mgerstgrasser/super上找到。

Training Dynamics of Contextual N-Grams in Language Models

  • paper_url: http://arxiv.org/abs/2311.00863
  • repo_url: https://github.com/luciaquirke/contextual-ngrams
  • paper_authors: Lucia Quirke, Lovis Heindrich, Wes Gurnee, Neel Nanda
  • for: 这个论文的目的是提出了语言模型中的上下文 нейроン存在的证据,并证明了这些 neuron 是如何形成的。
  • methods: 这篇论文使用了一种叫做 second-order circuit 的方法,即在训练过程中,先形成了各个 n-gram 循环 circuit,然后这些循环 circuit 与 German detection circuit 相互作用,形成了一个更大的上下文 neuron 循环。
  • results: 这篇论文发现了一些异常观察结果,如训练时间序列的同步过渡,以及许多上下文 neuron 在训练早期就已经形成,但是在后期被忘记。这些结果与之前的假设不符,显示了上下文 neuron 的形成是一个慢速的过程,而不是突然的阶段性变化。
    Abstract Prior work has shown the existence of contextual neurons in language models, including a neuron that activates on German text. We show that this neuron exists within a broader contextual n-gram circuit: we find late layer neurons which recognize and continue n-grams common in German text, but which only activate if the German neuron is active. We investigate the formation of this circuit throughout training and find that it is an example of what we call a second-order circuit. In particular, both the constituent n-gram circuits and the German detection circuit which culminates in the German neuron form with independent functions early in training - the German detection circuit partially through modeling German unigram statistics, and the n-grams by boosting appropriate completions. Only after both circuits have already formed do they fit together into a second-order circuit. Contrary to the hypotheses presented in prior work, we find that the contextual n-gram circuit forms gradually rather than in a sudden phase transition. We further present a range of anomalous observations such as a simultaneous phase transition in many tasks coinciding with the learning rate warm-up, and evidence that many context neurons form simultaneously early in training but are later unlearned.
    摘要

Zero Coordinate Shift: Whetted Automatic Differentiation for Physics-informed Operator Learning

  • paper_url: http://arxiv.org/abs/2311.00860
  • repo_url: https://github.com/stfc-sciml/zerocoordinateshift
  • paper_authors: Kuangdai Leng, Mallikarjun Shankar, Jeyan Thiyagalingam
  • for: physics-informed machine learning, particularly for computing high-order derivatives of network output w.r.t. coordinates.
  • methods: introduces a novel and lightweight algorithm called Zero Coordinate Shift (ZCS), which simplifies the wanted derivatives from “many-roots-many-leaves” to “one-root-many-leaves” by introducing only one scalar-valued leaf variable for each spatial or temporal dimension.
  • results: persistently brings down GPU memory consumption and wall time for training by an order of magnitude, with the savings increasing with problem scale (i.e., number of functions, number of points, and order of PDE).
    Abstract Automatic differentiation (AD) is a critical step in physics-informed machine learning, required for computing the high-order derivatives of network output w.r.t. coordinates. In this paper, we present a novel and lightweight algorithm to conduct such AD for physics-informed operator learning, as we call the trick of Zero Coordinate Shift (ZCS). Instead of making all sampled coordinates leaf variables, ZCS introduces only one scalar-valued leaf variable for each spatial or temporal dimension, leading to a game-changing performance leap by simplifying the wanted derivatives from "many-roots-many-leaves" to "one-root-many-leaves". ZCS is easy to implement with current deep learning libraries; our own implementation is by extending the DeepXDE package. We carry out a comprehensive benchmark analysis and several case studies, training physics-informed DeepONets to solve partial differential equations (PDEs) without data. The results show that ZCS has persistently brought down GPU memory consumption and wall time for training by an order of magnitude, with the savings increasing with problem scale (i.e., number of functions, number of points and order of PDE). As a low-level optimisation, ZCS entails no restrictions on data, physics (PDEs) or network architecture and does not compromise training results from any aspect.
    摘要 自动 diferenciación (AD) es un paso crítico en aprendizaje de máquina informado por física, necesario para calcular las altas derivadas de salida de red w.r.t. coordenadas. En este artículo, presentamos un algoritmo nuevo y liviano para realizar AD para aprendizaje de operadores informados por física, como llamamos la truca de Zero Coordinate Shift (ZCS). En lugar de hacer que todas las variables de coordenada sean variables de hoja, ZCS introduce solo una variable de hoja escalar para cada dimensión espacial o temporal, lo que conduce a un salto de rendimiento revolucionario simplificando las derivadas deseadas de "muchas raíces muchas hojas" a "una raíz muchas hojas". ZCS es fácil de implementar con las bibliotecas de aprendizaje profundo actuales; nuestra propia implementación es mediante la extensión del paquete DeepXDE. Realizamos un análisis de benchmark completo y varios estudios de caso, entrenando redes DeepONet informadas por física para resolver ecuaciones diferenciales parciales (PDEs) sin datos. Los resultados muestran que ZCS ha reducido consumo de memoria de GPU y tiempo de pared de entrenamiento en orden de magnitud, con los ahorros aumentando con el tamaño del problema (es decir, el número de funciones, el número de puntos y el orden de PDE). Como una optimización de bajo nivel, ZCS no impone restricciones en los datos, la física (PDEs) ni la arquitectura de la red y no compromete los resultados de entrenamiento en ningún aspecto.

Optimal Cost Constrained Adversarial Attacks For Multiple Agent Systems

  • paper_url: http://arxiv.org/abs/2311.00859
  • repo_url: None
  • paper_authors: Ziqing Lu, Guanlin Liu, Lifeng Cai, Weiyu Xu
  • for: 这篇论文主要关注于实现多源攻击者在多智能系统中实现最佳攻击策略。
  • methods: 本论文提出一种基于内步骤静态攻击资源分配优化和间步骤动态计划的优化方法,以实现多源攻击者在多智能系统中的最佳攻击。
  • results: 本论文的数据显示,提出的攻击策略可以对攻击的智能系统实现内步骤静态攻击资源分配优化,并可以对攻击的智能系统实现间步骤动态计划。这些攻击策略可以对攻击的智能系统实现重要的攻击减少。
    Abstract Finding optimal adversarial attack strategies is an important topic in reinforcement learning and the Markov decision process. Previous studies usually assume one all-knowing coordinator (attacker) for whom attacking different recipient (victim) agents incurs uniform costs. However, in reality, instead of using one limitless central attacker, the attacks often need to be performed by distributed attack agents. We formulate the problem of performing optimal adversarial agent-to-agent attacks using distributed attack agents, in which we impose distinct cost constraints on each different attacker-victim pair. We propose an optimal method integrating within-step static constrained attack-resource allocation optimization and between-step dynamic programming to achieve the optimal adversarial attack in a multi-agent system. Our numerical results show that the proposed attacks can significantly reduce the rewards received by the attacked agents.
    摘要 找到最佳反对攻击策略是在强化学习和马克福德决策过程中非常重要的主题。先前的研究通常假设一个全知的协调者(攻击者),对不同的接收者(受害者)机器人进行攻击,卷入的成本均为一个常数。然而,在现实中,攻击通常需要由分布式的攻击者进行,而不是一个无限的中央攻击者。我们将在分布式攻击者中表述最佳反对攻击策略,并在每个不同攻击者-受害者对中强制实施不同的成本限制。我们提出一种折衔的方法,将在每步骤内的静态受限攻击资源分配优化与每步骤之间的动态规划相结合,以实现最佳反对攻击策略。我们的数据分析结果表明,我们的攻击方法可以在多机器人系统中很大程度地减少被攻击者的奖励。

A Multi-Agent Reinforcement Learning Framework for Evaluating the U.S. Ending the HIV Epidemic Plan

  • paper_url: http://arxiv.org/abs/2311.00855
  • repo_url: None
  • paper_authors: Dinesh Sharma, Ankit Shah, Chaitra Gopalappa
  • for: 这个论文的目的是为了帮助决策公共卫生政策,具体来说是通过多智能型强化学习(MARL)模型来分析和优化感染人类免疫缺乏病毒(HIV)的治疗和预防措施,以帮助降低HIV感染新增 casos。
  • methods: 这篇论文使用了多智能型强化学习(MARL)模型,这种模型可以考虑到不同地区之间的流行病学交互作用,并且可以为每个地区分析和优化HIV的治疖和预防措施。
  • results: 实验分析表明,使用MARL模型可以生成与单个智能型强化学习(RL)模型不同的优化策略,这表明了不同地区之间的流行病学交互作用的影响,并且提供了一种可靠的方法来分析和优化HIV的治疖和预防措施。
    Abstract Human immunodeficiency virus (HIV) is a major public health concern in the United States, with about 1.2 million people living with HIV and 35,000 newly infected each year. There are considerable geographical disparities in HIV burden and care access across the U.S. The 2019 Ending the HIV Epidemic (EHE) initiative aims to reduce new infections by 90% by 2030, by improving coverage of diagnoses, treatment, and prevention interventions and prioritizing jurisdictions with high HIV prevalence. Identifying optimal scale-up of intervention combinations will help inform resource allocation. Existing HIV decision analytic models either evaluate specific cities or the overall national population, thus overlooking jurisdictional interactions or differences. In this paper, we propose a multi-agent reinforcement learning (MARL) model, that enables jurisdiction-specific decision analyses but in an environment with cross-jurisdictional epidemiological interactions. In experimental analyses, conducted on jurisdictions within California and Florida, optimal policies from MARL were significantly different than those generated from single-agent RL, highlighting the influence of jurisdictional variations and interactions. By using comprehensive modeling of HIV and formulations of state space, action space, and reward functions, this work helps demonstrate the strengths and applicability of MARL for informing public health policies, and provides a framework for expanding to the national-level to inform the EHE.
    摘要 人体免疫缺陷病毒(HIV)是美国公共卫生中的一个重要问题,约有120万人患有HIV,每年新感染35000人。美国各地有较大的HIV荷重和护理访问差异。2019年的结束HIV疫苗计划(EHE)目标是在2030年减少新感染90%,通过提高诊断、治疗和预防 intervención的覆盖率,并优先级高HIV感染地区。确定最佳扩大 intervención的组合将有助于资源分配。现有的HIV决策分析模型 either评估特定城市或整个国家人口,因此忽视了地区交互或差异。在这篇论文中,我们提出了多代理人学习(MARL)模型,允许地区特定的决策分析,但在具有跨地区疫学交互的环境中进行。在加利福尼亚和佛罗里达的实验分析中,MARL优化的策略与单代理人学习生成的策略有显著差异,这 highlights了地区差异和交互的影响。通过全面地模型HIV和形式状态、行动空间和奖励函数,这种工作帮助表明MARL的优势和适用性,并提供了扩展到国家水平以 inform EHE的框架。

healthAIChain: Improving security and safety using Blockchain Technology applications in AI-based healthcare systems

  • paper_url: http://arxiv.org/abs/2311.00842
  • repo_url: None
  • paper_authors: Naresh Kshetri, James Hutson, Revathy G
  • for: 这篇论文旨在探讨使用区块链技术来提高医疗系统的安全性和可靠性,以及在医疗和医疗相关领域中应用区块链技术的可能性。
  • methods: 本论文使用了区块链技术来解决医疗系统中的安全性和可靠性问题,并且对区块链技术在医疗系统中的应用进行了评估和分析。
  • results: 研究表明,使用区块链技术可以提高医疗系统的安全性和可靠性,同时也可以提高医疗系统的性能和可扩展性。此外,本论文还提出了一种基于AI技术的医疗区块链模型(healthAIChain),以提高患者数据的安全性和可靠性。
    Abstract Blockchain as a digital ledger for keeping records of digital transactions and other information, it is secure and decentralized technology. The globally growing number of digital population every day possesses a significant threat to online data including the medical and patients data. After bitcoin, blockchain technology has emerged into a general-purpose technology with applications in medical industries and healthcare. Blockchain can promote highly configurable openness while retaining the highest security standards for critical data of medical patients. Referred to as distributed record keeping for healthcare systems which makes digital assets unalterable and transparent via a cryptographic hash and decentralized network. The study delves into the security and safety improvement associated with implementing blockchain in AI-based healthcare systems. Blockchain-enabled AI tackles the existing issues related to security, performance efficiencies, and safety in healthcare systems. We have also examined the Artificial Intelligence in healthcare and medical industry, potential areas, open questions concerning the blockchain in healthcare systems. Finally, the article proposed an AI-based healthcare blockchain model (healthAIChain) to improve patients data and security.
    摘要 链情为数字日志的保存记录数字交易和其他信息,它是一种安全和分散的技术。全球每天增长的数字人口对于在线数据,包括医疗和患者数据,具有重要的威胁。自比特币以后,区块链技术在医疗领域和卫生保健领域得到应用。区块链可以实现高度可配置的开放性,同时保持最高的安全标准 для医疗患者的敏感数据。被称为分布式记录系统,使得数字资产不可改变和透明,通过 крипτографиic Hash和分散网络。这篇文章研究了在区块链技术应用于人工智能基础设施医疗系统中的安全性和可靠性问题。此外,文章还考虑了人工智能在医疗和医疗保健领域的潜在领域和开放问题。最后,文章提出了一种基于人工智能的医疗链模型(健康AI链),以提高患者数据和安全性。

Constant-time Motion Planning with Anytime Refinement for Manipulation

  • paper_url: http://arxiv.org/abs/2311.00837
  • repo_url: None
  • paper_authors: Itamar Mishani, Hayden Feddock, Maxim Likhachev
  • for: 这个论文是为了提高机器人抓取系统的自主性和可靠性而写的。
  • methods: 这个论文使用了一种常数时间动态规划(CTMP)算法,并将其与任何时间修正算法结合使用。
  • results: 该方法可以快速生成初始解决方案,并在分配的时间预算内进行不断修正,以达到一个平衡点, simultanously guaranteeing the ability to generate motion plans within a user-defined time bound.
    Abstract Robotic manipulators are essential for future autonomous systems, yet limited trust in their autonomy has confined them to rigid, task-specific systems. The intricate configuration space of manipulators, coupled with the challenges of obstacle avoidance and constraint satisfaction, often makes motion planning the bottleneck for achieving reliable and adaptable autonomy. Recently, a class of constant-time motion planners (CTMP) was introduced. These planners employ a preprocessing phase to compute data structures that enable online planning provably guarantee the ability to generate motion plans, potentially sub-optimal, within a user defined time bound. This framework has been demonstrated to be effective in a number of time-critical tasks. However, robotic systems often have more time allotted for planning than the online portion of CTMP requires, time that can be used to improve the solution. To this end, we propose an anytime refinement approach that works in combination with CTMP algorithms. Our proposed framework, as it operates as a constant time algorithm, rapidly generates an initial solution within a user-defined time threshold. Furthermore, functioning as an anytime algorithm, it iteratively refines the solution's quality within the allocated time budget. This enables our approach to strike a balance between guaranteed fast plan generation and the pursuit of optimization over time. We support our approach by elucidating its analytical properties, showing the convergence of the anytime component towards optimal solutions. Additionally, we provide empirical validation through simulation and real-world demonstrations on a 6 degree-of-freedom robot manipulator, applied to an assembly domain.
    摘要 机器人 manipulate 是未来自主系统的关键组件,但它们的自主性受限,通常只能用于固定的任务特定系统。 manipulate 的复杂配置空间,以及避免障碍物和约束满足的挑战,通常使动作规划成为自主系统的瓶颈,阻碍它们实现可靠和适应的自主性。 recent 年,一种叫做常数时间动作规划器(CTMP)的新类型的动作规划器被引入。这种规划器在先期处理阶段计算出数据结构,以在线规划时保证能够在用户定义的时间上限内生成动作计划,可能不optimal。这个框架在许多时间敏感任务中得到证明。然而,机器人系统经常有更多的时间用于规划,而不是 CTMP 在线部分所需的时间。为此,我们提出了一种任何时间精度增强方法,它在 CTMP 算法的基础上运行,快速生成用户定义的时间上限内的初始解决方案。此外,作为任何时间算法,它会逐渐改进解决方案的质量,在分配的时间预算内。这使我们的方法能够协调快速生成的解决方案和时间的优化。我们支持我们的方法通过分析性质的证明,显示了任何时间组件的收敛性,以及在实际示范和实验中的验证。Note: Please note that the translation is done using a machine translation tool, and may not be perfect or idiomatic.

Beyond Still Images: Robust Multi-Stream Spatiotemporal Networks

  • paper_url: http://arxiv.org/abs/2311.00800
  • repo_url: None
  • paper_authors: AmirHosein Fadaei, Mohammad-Reza A. Dehaqani
  • for: 研究自然视觉的一种特点是对输入变化的抵抗能力,从而生成不变的环境表示。在深度神经网络中,certain forms of spatial input variation can cause significant changes in the representations of video content.
  • methods: 我们采用了一种简单的多流模型,以探索其在面对空间和时间变化时的抗变化能力。我们在训练时包含视频和时间流,以便在视频理解任务中提高准确率和map值。
  • results: 结果表明,在训练时包含视频和时间流可以降低图像和视频理解任务中的准确率和map值下降,相对下降1.36%和3.14%。
    Abstract A defining characteristic of natural vision is its ability to withstand a variety of input alterations, resulting in the creation of an invariant representation of the surroundings. While convolutional neural networks exhibit resilience to certain forms of spatial input variation, modifications in the spatial and temporal aspects can significantly affect the representations of video content in deep neural networks. Inspired by the resilience of natural vision to input variations, we employ a simple multi-stream model to explore its potential to address spatiotemporal changes by including temporal features. Our primary goal is to introduce a video-trained model and evaluate its robustness to diverse image and video inputs, with a particular focus on exploring the role of temporal features in invariant recognition. Results show that including videos and the temporal stream during training mitigates the decline in accuracy and mAP in image and video understanding tasks by 1.36% and 3.14%, respectively.
    摘要

Tipping Points of Evolving Epidemiological Networks: Machine Learning-Assisted, Data-Driven Effective Modeling

  • paper_url: http://arxiv.org/abs/2311.00797
  • repo_url: None
  • paper_authors: Nikolaos Evangelou, Tianqi Cui, Juan M. Bello-Rivas, Alexei Makeev, Ioannis G. Kevrekidis
  • For: The paper studies the tipping point collective dynamics of an adaptive susceptible-infected-susceptible (SIS) epidemiological network using a data-driven, machine learning-assisted approach.* Methods: The paper identifies a parameter-dependent effective stochastic differential equation (eSDE) in terms of physically meaningful coarse mean-field variables through a deep-learning ResNet architecture inspired by numerical stochastic integrators. The paper also constructs an approximate effective bifurcation diagram based on the identified drift term of the eSDE and compares it with the mean-field SIS model bifurcation diagram.* Results: The paper observes a subcritical Hopf bifurcation in the evolving network’s effective SIS dynamics, which causes the tipping point behavior and leads to large amplitude collective oscillations that spontaneously arise from the neighborhood of a (noisy) stationary state. The paper studies the statistics of these rare events using repeated brute force simulations and established mathematical/computational tools, and demonstrates that the collective SDE can also be identified and the rare events computations performed in terms of data-driven coarse observables obtained through manifold learning techniques, such as Diffusion Maps.
    Abstract We study the tipping point collective dynamics of an adaptive susceptible-infected-susceptible (SIS) epidemiological network in a data-driven, machine learning-assisted manner. We identify a parameter-dependent effective stochastic differential equation (eSDE) in terms of physically meaningful coarse mean-field variables through a deep-learning ResNet architecture inspired by numerical stochastic integrators. We construct an approximate effective bifurcation diagram based on the identified drift term of the eSDE and contrast it with the mean-field SIS model bifurcation diagram. We observe a subcritical Hopf bifurcation in the evolving network's effective SIS dynamics, that causes the tipping point behavior; this takes the form of large amplitude collective oscillations that spontaneously -- yet rarely -- arise from the neighborhood of a (noisy) stationary state. We study the statistics of these rare events both through repeated brute force simulations and by using established mathematical/computational tools exploiting the right-hand-side of the identified SDE. We demonstrate that such a collective SDE can also be identified (and the rare events computations also performed) in terms of data-driven coarse observables, obtained here via manifold learning techniques, in particular Diffusion Maps. The workflow of our study is straightforwardly applicable to other complex dynamics problems exhibiting tipping point dynamics.
    摘要 我们研究一个自适应感染者-病人-自适应感染者(SIS)流行病学网络的集合动力学在数据驱动、机器学习协助下进行研究。我们通过深度学习ResNet架构,从数值随机 интеграル中灵感,对 Physically meaningful 的宽泛平均变量进行定义,并construct了一个approximate的效果枢轴图。我们比较了这个枢轴图与传统的SIS模型的分化图,发现在流行病学网络的有效SIS动力学中存在一个低于极限的Hopf分化。这种分化导致了 collective 振荡的出现,这些振荡会在静态状态附近自发生,并且具有大 amplitudo 和罕见的特点。我们通过重复的粗糙 simulate 和使用已知的数学/计算工具,研究这些罕见事件的统计特性。我们还示出,这种集合SDE可以通过数据驱动的粗糙观察量来定义(并且进行罕见事件计算),例如通过抽象学习技术,特别是Diffusion Maps。我们的工作流程可以 straightforwardly 应用于其他复杂动力学问题,例如其他的tipping point动力学问题。

SAGE: Smart home Agent with Grounded Execution

  • paper_url: http://arxiv.org/abs/2311.00772
  • repo_url: None
  • paper_authors: Dmitriy Rivkin, Francois Hogan, Amal Feriani, Abhisek Konar, Adam Sigal, Steve Liu, Greg Dudek
  • for: 这篇论文旨在提高智能家居助手的灵活性,以更好地满足用户需求。
  • methods: 该框架使用自然语言处理技术和机器学习模型,以自动推理用户需求和设备状态,并通过对设备API文档进行探索和自动代码生成来实现智能家居任务。
  • results: 在43个智能家居任务中,SAGE取得了23个成功,舒过现有的LLM基eline(5/43)。
    Abstract This article introduces SAGE (Smart home Agent with Grounded Execution), a framework designed to maximize the flexibility of smart home assistants by replacing manually-defined inference logic with an LLM-powered autonomous agent system. SAGE integrates information about user preferences, device states, and external factors (such as weather and TV schedules) through the orchestration of a collection of tools. SAGE's capabilities include learning user preferences from natural-language utterances, interacting with devices by reading their API documentation, writing code to continuously monitor devices, and understanding natural device references. To evaluate SAGE, we develop a benchmark of 43 highly challenging smart home tasks, where SAGE successfully achieves 23 tasks, significantly outperforming existing LLM-enabled baselines (5/43).
    摘要 Note:* "智能家居代理人" (Smart Home Agent) is a simplified Chinese term that refers to a software agent that can perform tasks and make decisions on behalf of a user in a smart home environment.* "地面执行" (Grounded Execution) refers to the ability of the agent to execute tasks in the real world, by interacting with physical devices and sensors.* "LLM-powered" means that the agent is powered by a large language model (LLM), which allows it to understand and generate natural language, and to learn from its experiences.* "natural-language utterances" refers to the way users communicate with the agent, using natural language to express their preferences and requests.* "device states" refers to the current status of the devices in the smart home environment, such as whether a light is on or off.* "external factors" refers to information that is not specific to the smart home environment, such as the weather or TV schedules.* "orchestration of a collection of tools" refers to the way the agent uses a variety of tools and techniques to perform tasks and achieve its goals.* "benchmark of 43 highly challenging smart home tasks" refers to a set of tasks that are difficult and diverse, and that test the capabilities of the agent in different ways.* "outperforming existing LLM-enabled baselines" means that the agent performs better than other agents that have been trained on similar tasks and data.

Hand Gesture Classification on Praxis Dataset: Trading Accuracy for Expense

  • paper_url: http://arxiv.org/abs/2311.00767
  • repo_url: None
  • paper_authors: Rahat Islam, Kenneth Lai, Svetlana Yanushkevich
  • for: 这种研究旨在开发一种高效、准确、便宜的 cortical pathology 诊断方法,用于多种医疗应用。
  • methods: 该研究使用了RGB-Depth感知器记录的 ‘skeletal’ 数据,从Praxis数据集中提取了体 JOINT 坐标数据,并使用了窗口技术和深度学习架构,如RNN和LSTM,进行手势识别。
  • results: 该研究实现了使用体 JOINT 数据进行手势识别,达到了70.8%的总准确率,并且通过分析时间序列数据使用LSTM进行手势识别,达到了74.3%和67.3%的手势识别率。
    Abstract In this paper, we investigate hand gesture classifiers that rely upon the abstracted 'skeletal' data recorded using the RGB-Depth sensor. We focus on 'skeletal' data represented by the body joint coordinates, from the Praxis dataset. The PRAXIS dataset contains recordings of patients with cortical pathologies such as Alzheimer's disease, performing a Praxis test under the direction of a clinician. In this paper, we propose hand gesture classifiers that are more effective with the PRAXIS dataset than previously proposed models. Body joint data offers a compressed form of data that can be analyzed specifically for hand gesture recognition. Using a combination of windowing techniques with deep learning architecture such as a Recurrent Neural Network (RNN), we achieved an overall accuracy of 70.8% using only body joint data. In addition, we investigated a long-short-term-memory (LSTM) to extract and analyze the movement of the joints through time to recognize the hand gestures being performed and achieved a gesture recognition rate of 74.3% and 67.3% for static and dynamic gestures, respectively. The proposed approach contributed to the task of developing an automated, accurate, and inexpensive approach to diagnosing cortical pathologies for multiple healthcare applications.
    摘要 在这篇论文中,我们研究了基于RGB-深度传感器记录的抽象 '骨架' 数据的手势识别器。我们专注于Praxis数据集中的体 JOINT坐标数据。Praxis数据集包含由Alzheimer病患者在临床医生指导下进行的Praxis测试记录。在这篇论文中,我们提议了与Praxis数据集更有效的手势识别器。 body JOINT数据提供了一种压缩形式的数据,可以专门对手势识别进行分析。通过窗口技术和深度学习架构如回归神经网络(RNN),我们实现了使用只body JOINT数据的总准确率为70.8%。此外,我们还使用循环神经网络(LSTM)来抽取和分析关节的运动,以识别手势的执行,并实现了手势识别率为74.3%和67.3% для静止和动态手势,分别。我们的方法对于诊断 cortical pathology 的多种医疗应用做出了贡献。

Learning to Design and Use Tools for Robotic Manipulation

  • paper_url: http://arxiv.org/abs/2311.00754
  • repo_url: None
  • paper_authors: Ziang Liu, Stephen Tian, Michelle Guo, C. Karen Liu, Jiajun Wu
  • for: 本研究旨在设计一个可以快速实现多个目标的抓取机制,并且可以在实际环境中运行。
  • methods: 本研究使用深度学习来结合形式和控制的优化,实现设计一个可以实现多个目标的抓取机制。
  • results: 本研究在模拟的抓取任务中显示了更高的效率和可扩展性,并且可以在实际环境中运行。此外,研究还显示了可以通过将设计政策和控制政策分开来实现更好的可控性和可扩展性。
    Abstract When limited by their own morphologies, humans and some species of animals have the remarkable ability to use objects from the environment toward accomplishing otherwise impossible tasks. Robots might similarly unlock a range of additional capabilities through tool use. Recent techniques for jointly optimizing morphology and control via deep learning are effective at designing locomotion agents. But while outputting a single morphology makes sense for locomotion, manipulation involves a variety of strategies depending on the task goals at hand. A manipulation agent must be capable of rapidly prototyping specialized tools for different goals. Therefore, we propose learning a designer policy, rather than a single design. A designer policy is conditioned on task information and outputs a tool design that helps solve the task. A design-conditioned controller policy can then perform manipulation using these tools. In this work, we take a step towards this goal by introducing a reinforcement learning framework for jointly learning these policies. Through simulated manipulation tasks, we show that this framework is more sample efficient than prior methods in multi-goal or multi-variant settings, can perform zero-shot interpolation or fine-tuning to tackle previously unseen goals, and allows tradeoffs between the complexity of design and control policies under practical constraints. Finally, we deploy our learned policies onto a real robot. Please see our supplementary video and website at https://robotic-tool-design.github.io/ for visualizations.
    摘要 In this work, we introduce a reinforcement learning framework for jointly learning these policies. Through simulated manipulation tasks, we show that our framework is more sample efficient than prior methods in multi-goal or multi-variant settings, can perform zero-shot interpolation or fine-tuning to tackle previously unseen goals, and allows tradeoffs between the complexity of design and control policies under practical constraints. Finally, we deploy our learned policies onto a real robot, and provide visualizations on our supplementary website at .

Are These the Same Apple? Comparing Images Based on Object Intrinsics

  • paper_url: http://arxiv.org/abs/2311.00750
  • repo_url: https://github.com/s-tian/cute
  • paper_authors: Klemen Kotar, Stephen Tian, Hong-Xing Yu, Daniel L. K. Yamins, Jiajun Wu
  • for: measure image similarity purely based on intrinsic object properties that define object identity, especially for general object categories.
  • methods: combine deep features learned from contrastive self-supervised learning with foreground filtering.
  • results: a strong baseline that best measures intrinsic object-centric image similarity among current methods, and can aid in downstream applications such as acting as an analog for human subjects and improving generalizable re-identification.Here’s the Chinese text in the format you requested:
  • for: measure image similarity purely based on intrinsic object properties that define object identity, especially for general object categories.
  • methods: combine deep features learned from contrastive self-supervised learning with foreground filtering.
  • results: a strong baseline that best measures intrinsic object-centric image similarity among current methods, and can aid in downstream applications such as acting as an analog for human subjects and improving generalizable re-identification.
    Abstract The human visual system can effortlessly recognize an object under different extrinsic factors such as lighting, object poses, and background, yet current computer vision systems often struggle with these variations. An important step to understanding and improving artificial vision systems is to measure image similarity purely based on intrinsic object properties that define object identity. This problem has been studied in the computer vision literature as re-identification, though mostly restricted to specific object categories such as people and cars. We propose to extend it to general object categories, exploring an image similarity metric based on object intrinsics. To benchmark such measurements, we collect the Common paired objects Under differenT Extrinsics (CUTE) dataset of $18,000$ images of $180$ objects under different extrinsic factors such as lighting, poses, and imaging conditions. While existing methods such as LPIPS and CLIP scores do not measure object intrinsics well, we find that combining deep features learned from contrastive self-supervised learning with foreground filtering is a simple yet effective approach to approximating the similarity. We conduct an extensive survey of pre-trained features and foreground extraction methods to arrive at a strong baseline that best measures intrinsic object-centric image similarity among current methods. Finally, we demonstrate that our approach can aid in downstream applications such as acting as an analog for human subjects and improving generalizable re-identification. Please see our project website at https://s-tian.github.io/projects/cute/ for visualizations of the data and demos of our metric.
    摘要 人类视觉系统可以很容易地认出不同的外部因素(如照明、物体姿态和背景)下的物体,而现代计算机视觉系统却经常遇到这些变化的困难。为了更好地理解和改进人工视觉系统,我们需要测量图像相似性基于物体内部特征。这个问题在计算机视觉文献中已经被研究为重复识别,但主要局限于特定的物体类别,如人脸和汽车。我们提议扩展到普通的物体类别,研究基于物体内部特征的图像相似性度量。为了评估这种测量,我们收集了18,000张不同照明、姿态和捕捉条件的图像,组成了Common paired objects Under differenT Extrinsics(CUTE)数据集。现有的方法,如LPIPS和CLIP分数,不能很好地测量物体内部特征,但我们发现将深度特征通过对比自我supervised学习学习出来的特征与前景过滤结合是一种简单 yet effective的方法来估算图像相似性。我们进行了广泛的预训练特征和前景EXTRACTION方法的测试,以达到当前最强的基eline,可以最好地测量物体内部特征相似性。最后,我们示出了我们的方法可以在下游应用中提供人类参照和通用重复识别等功能。请参考我们项目网站https://s-tian.github.io/projects/cute/,可以查看数据的视觉化和我们度量的示例。

Unleashing the Creative Mind: Language Model As Hierarchical Policy For Improved Exploration on Challenging Problem Solving

  • paper_url: http://arxiv.org/abs/2311.00694
  • repo_url: https://github.com/lz1oceani/llm-as-hierarchical-policy
  • paper_authors: Zhan Ling, Yunhao Fang, Xuanlin Li, Tongzhou Mu, Mingu Lee, Reza Pourreza, Roland Memisevic, Hao Su
  • for: This paper aims to improve the ability of large language models (LLMs) to solve challenging reasoning problems by framing LLMs as hierarchical policies via in-context learning.
  • methods: The proposed approach uses a visionary leader to propose multiple diverse high-level problem-solving tactics as hints, accompanied by a follower that executes detailed problem-solving processes following each of the high-level instructions. The follower samples multiple reasoning chains to tackle the problem, generating a solution group for each leader proposal.
  • results: The approach improves the final answer accuracy on challenging problems in the MATH dataset and produces meaningful and inspiring hints that enhance problem-solving strategy exploration.
    Abstract Large Language Models (LLMs) have achieved tremendous progress, yet they still often struggle with challenging reasoning problems. Current approaches address this challenge by sampling or searching detailed and low-level reasoning chains. However, these methods are still limited in their exploration capabilities, making it challenging for correct solutions to stand out in the huge solution space. In this work, we unleash LLMs' creative potential for exploring multiple diverse problem solving strategies by framing an LLM as a hierarchical policy via in-context learning. This policy comprises of a visionary leader that proposes multiple diverse high-level problem-solving tactics as hints, accompanied by a follower that executes detailed problem-solving processes following each of the high-level instruction. The follower uses each of the leader's directives as a guide and samples multiple reasoning chains to tackle the problem, generating a solution group for each leader proposal. Additionally, we propose an effective and efficient tournament-based approach to select among these explored solution groups to reach the final answer. Our approach produces meaningful and inspiring hints, enhances problem-solving strategy exploration, and improves the final answer accuracy on challenging problems in the MATH dataset. Code will be released at https://github.com/lz1oceani/LLM-As-Hierarchical-Policy.
    摘要

On Task-personalized Multimodal Few-shot Learning for Visually-rich Document Entity Retrieval

  • paper_url: http://arxiv.org/abs/2311.00693
  • repo_url: None
  • paper_authors: Jiayi Chen, Hanjun Dai, Bo Dai, Aidong Zhang, Wei Wei
  • for: 这 paper 的目的是研究在新的文档类型不断出现的情况下,如何实现Visual-rich document entity retrieval (VDER) task,特别是在每个任务中具有个性化的目标类型和不同文档中的实体出现方式。
  • methods: 这 paper 使用了任务意识度 meta-learning 框架,包括一个层次解码器 (HC) 和一种对比学习策略 (ContrastProtoNet),以实现效果的任务个性化。
  • results: эксперименталь结果表明,这些方法可以大幅提高流行的 meta-learning 基eline 的稳定性。
    Abstract Visually-rich document entity retrieval (VDER), which extracts key information (e.g. date, address) from document images like invoices and receipts, has become an important topic in industrial NLP applications. The emergence of new document types at a constant pace, each with its unique entity types, presents a unique challenge: many documents contain unseen entity types that occur only a couple of times. Addressing this challenge requires models to have the ability of learning entities in a few-shot manner. However, prior works for Few-shot VDER mainly address the problem at the document level with a predefined global entity space, which doesn't account for the entity-level few-shot scenario: target entity types are locally personalized by each task and entity occurrences vary significantly among documents. To address this unexplored scenario, this paper studies a novel entity-level few-shot VDER task. The challenges lie in the uniqueness of the label space for each task and the increased complexity of out-of-distribution (OOD) contents. To tackle this novel task, we present a task-aware meta-learning based framework, with a central focus on achieving effective task personalization that distinguishes between in-task and out-of-task distribution. Specifically, we adopt a hierarchical decoder (HC) and employ contrastive learning (ContrastProtoNet) to achieve this goal. Furthermore, we introduce a new dataset, FewVEX, to boost future research in the field of entity-level few-shot VDER. Experimental results demonstrate our approaches significantly improve the robustness of popular meta-learning baselines.
    摘要 带有视觉 ric hdocument entity retrieve (VDER) 在工业自然语言处理应用中变得非常重要。新的文档类型随时间的推移而出现,每种文档都有独特的实体类型,这种情况提出了一个挑战:许多文档中的实体类型是未经见过的。为了解决这个挑战,模型需要能够在几次training中学习实体。然而,先前的几 shot VDER 研究主要关注了文档水平的问题,使用预先定义的全局实体空间,而不考虑实体级几 shot scenario:目标实体类型在每个任务中是本地个性化的,而且实体出现在文档中的差异很大。为了解决这个未探索的场景,本文研究了一个新的实体级几 shot VDER 任务。挑战在于标签空间的独特性和文档中实体出现的不同性。为了解决这个问题,我们提出了一个任务意识度meta-学习基础框架,强调实现效果task个性化,以便分辨在任务中和out-of-task分布中的差异。具体来说,我们采用层次解码器(HC)和对比学习(ContrastProtoNet)来实现这一目标。此外,我们还提供了一个新的数据集,FewVEX,以便未来的研究人员在实体级几 shot VDER 领域进行更多的研究。实验结果表明,我们的方法可以显著提高流行的meta-学习基础模型的Robustness。

Improving Interpersonal Communication by Simulating Audiences with Language Models

  • paper_url: http://arxiv.org/abs/2311.00687
  • repo_url: https://github.com/theryanl/egs
  • paper_authors: Ryan Liu, Howard Yen, Raja Marjieh, Thomas L. Griffiths, Ranjay Krishna
  • for: 提高目标实现效率的交流和决策过程
  • methods: 利用大语言模型(LLM)模拟来帮助人们更好地交流和决策
  • results: 在8个场景中,EGS框架选择的候选者和建议得到了人类评分员的偏好,而且在5个场景中,观众模拟得到了人类评分员的一致。此外,EGS框架在实际用户上 Forum 中应用也得到了良好的效果。
    Abstract How do we communicate with others to achieve our goals? We use our prior experience or advice from others, or construct a candidate utterance by predicting how it will be received. However, our experiences are limited and biased, and reasoning about potential outcomes can be difficult and cognitively challenging. In this paper, we explore how we can leverage Large Language Model (LLM) simulations to help us communicate better. We propose the Explore-Generate-Simulate (EGS) framework, which takes as input any scenario where an individual is communicating to an audience with a goal they want to achieve. EGS (1) explores the solution space by producing a diverse set of advice relevant to the scenario, (2) generates communication candidates conditioned on subsets of the advice, and (3) simulates the reactions from various audiences to determine both the best candidate and advice to use. We evaluate the framework on eight scenarios spanning the ten fundamental processes of interpersonal communication. For each scenario, we collect a dataset of human evaluations across candidates and baselines, and showcase that our framework's chosen candidate is preferred over popular generation mechanisms including Chain-of-Thought. We also find that audience simulations achieve reasonably high agreement with human raters across 5 of the 8 scenarios. Finally, we demonstrate the generality of our framework by applying it to real-world scenarios described by users on web forums. Through evaluations and demonstrations, we show that EGS enhances the effectiveness and outcomes of goal-oriented communication across a variety of situations, thus opening up new possibilities for the application of large language models in revolutionizing communication and decision-making processes.
    摘要 如何与他人沟通以达到我们的目标呢?我们可以利用我们的前经验或他人的建议,或者构建一个候选句子,预测它会如何被接受。然而,我们的经验是有限和偏袋的,理解可能的结果是心智具有挑战性的。在这篇论文中,我们探讨如何通过大语言模型(LLM)模拟来改善我们的沟通。我们提出了探索-生成-模拟(EGS)框架,该框架输入任何沟通场景,目标是与听众沟通。EGS(1)探索解决方案空间,生成各种相关于场景的建议集,(2)基于这些建议生成句子候选,并(3)通过不同听众的反应来确定最佳候选和建议。我们对八个场景进行了评估,每个场景都有人类评估者的数据集和基线,并显示了我们的框架选择的候选 exceeds Chain-of-Thought 的流行生成机制。此外,我们发现了观众模拟可以与人类评估者达到相当高的一致性在五个场景中。最后,我们通过应用它到网络论坛上的实际场景来证明框架的一般性。通过评估和示例,我们表明了EGS可以提高目标沟通的效iveness和结果,因此开启了大语言模型在沟通和决策过程中的新可能性。

Emergence of Collective Open-Ended Exploration from Decentralized Meta-Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2311.00651
  • repo_url: None
  • paper_authors: Richard Bornemann, Gautier Hamon, Eleni Nisioti, Clément Moulin-Frier
  • for: 本研究的目的是研究多个智能体在开放的任务分布上自适应学习并实现集体探索策略。
  • methods: 本研究使用了分布式培aunder reinforcement learning和开放任务分布来训练多个智能体。
  • results: 研究发现,由多个智能体自适应学习的策略可以在测试时face novel objects时示出强大的泛化能力,并且在没有强制合作的情况下,智能体们还能学习集体探索策略,解决 novel tasks。此外,智能体们学习的集体探索策略还可以在开放任务设定下扩展到更深的任务树。
    Abstract Recent works have proven that intricate cooperative behaviors can emerge in agents trained using meta reinforcement learning on open ended task distributions using self-play. While the results are impressive, we argue that self-play and other centralized training techniques do not accurately reflect how general collective exploration strategies emerge in the natural world: through decentralized training and over an open-ended distribution of tasks. In this work we therefore investigate the emergence of collective exploration strategies, where several agents meta-learn independent recurrent policies on an open ended distribution of tasks. To this end we introduce a novel environment with an open ended procedurally generated task space which dynamically combines multiple subtasks sampled from five diverse task types to form a vast distribution of task trees. We show that decentralized agents trained in our environment exhibit strong generalization abilities when confronted with novel objects at test time. Additionally, despite never being forced to cooperate during training the agents learn collective exploration strategies which allow them to solve novel tasks never encountered during training. We further find that the agents learned collective exploration strategies extend to an open ended task setting, allowing them to solve task trees of twice the depth compared to the ones seen during training. Our open source code as well as videos of the agents can be found on our companion website.
    摘要 近期研究表明,通过meta reinforcement learning在开放式任务分布上训练Agent可以实现复杂的合作行为。虽然结果吸引人,但我们认为自我玩家和中央训练技术不准确反映了自然界中集体探索策略的发展:通过分布式训练和开放式任务分布来发展集体探索策略。因此,我们在这项工作中investigate集体探索策略的emergence,其中多个Agent通过独立的recurrent Policy来meta-learn一个开放式任务分布。为此,我们提出了一个新的环境,其中包含一个开放式、生成式任务空间,动态组合了多个从五种多样化任务类型中采样的子任务,形成了一个庞大的任务树分布。我们表明,在这个环境中训练的分布式Agent exhibit强大的泛化能力,并且在测试时面对新物体时,能够快速适应。此外,我们发现,训练时不 forced合作的Agent仍然可以学习集体探索策略,并且这些策略可以在开放式任务设置下进行扩展,解决 novel task 不seen during training。我们的开源代码以及视频可以在我们的伙伴网站上找到。

FAIRLABEL: Correcting Bias in Labels

  • paper_url: http://arxiv.org/abs/2311.00638
  • repo_url: None
  • paper_authors: Srinivasan H Sengamedu, Hien Pham
  • for: 该论文目的是检测和修正机器学习模型中的偏见。
  • methods: 该论文使用的方法是FAIRLABEL算法,该算法可以检测和修正labels中的偏见,以降低模型对各个组的不同影响。
  • results: 该论文的结果显示,FAIRLABEL算法可以有效地检测和修正偏见,在synthetic dataset上的检测精度为86.7%,比基eline模型高出14.8%。此外,在UC Irvine Adult、German Credit Risk和Compas等数据集上,FAIRLABEL算法可以降低Disparate Impact Ratio,最高提高54.2%。
    Abstract There are several algorithms for measuring fairness of ML models. A fundamental assumption in these approaches is that the ground truth is fair or unbiased. In real-world datasets, however, the ground truth often contains data that is a result of historical and societal biases and discrimination. Models trained on these datasets will inherit and propagate the biases to the model outputs. We propose FAIRLABEL, an algorithm which detects and corrects biases in labels. The goal of FAIRLABELis to reduce the Disparate Impact (DI) across groups while maintaining high accuracy in predictions. We propose metrics to measure the quality of bias correction and validate FAIRLABEL on synthetic datasets and show that the label correction is correct 86.7% of the time vs. 71.9% for a baseline model. We also apply FAIRLABEL on benchmark datasets such as UCI Adult, German Credit Risk, and Compas datasets and show that the Disparate Impact Ratio increases by as much as 54.2%.
    摘要 有几种算法可以测量机器学习模型的公平性。这些方法的基本假设是地面真实不偏袋。然而,在实际数据集中,地面经常包含历史和社会偏袋和歧视的数据,模型在这些数据集上训练时会继承和传播这些偏袋。我们提出了 FAIRLABEL 算法,可以检测和修正标签中的偏袋。FAIRLABEL 的目标是降低不同群体之间的不同影响(DI),同时保持预测准确率高。我们提出了用于衡量偏袋修正质量的指标,并验证 FAIRLABEL 在模拟数据集上的性能,显示了86.7%的正确率vs. 71.9%的基eline模型。我们还应用 FAIRLABEL 于常见的 UCI 成人、德国借款风险和 Compas 数据集,并显示了 Disparate Impact Ratio 的提高,最高达54.2%。

A Bi-level Framework for Traffic Accident Duration Prediction: Leveraging Weather and Road Condition Data within a Practical Optimum Pipeline

  • paper_url: http://arxiv.org/abs/2311.00634
  • repo_url: None
  • paper_authors: Rafat Tabassum Sukonna, Soham Irtiza Swapnil
  • for: 预测交通事故持续时间的挑战是带有随机性的,因为交通事故的持续时间往往受到多种因素的影响,如事故严重程度、路面条件、天气等。本研究旨在检验是否可以使用交通事故数据库中的静态特征来预测事故持续时间,不使用事故上下文信息数据,如事故严重程度和文本描述。
  • methods: 本研究使用了多种机器学习模型来预测事故的短期和长期影响,并采用了二元方法来准确地预测事故持续时间。使用 Random Forest 分类模型可以在83%的正确率下分类事故的短期和长期影响,而 LightGBM 回归模型在 Mean Average Error (MAE) 和 Root Mean Squared Error (RMSE) 指标上表现较好,分别为 26.15、13.3、32.91和28.91。
  • results: 研究结果显示,只使用交通事故数据库中的静态特征可以准确地预测事故持续时间。使用最佳的分类和回归模型,我们构建了一个端到端的预测管道,并与之前的研究结果相符。 SHAP 值分析表明,天气条件、风速和风寒是决定事故持续时间的最重要因素。
    Abstract Due to the stochastic nature of events, predicting the duration of a traffic incident presents a formidable challenge. Accurate duration estimation can result in substantial advantages for commuters in selecting optimal routes and for traffic management personnel in addressing non-recurring congestion issues. In this study, we gathered accident duration, road conditions, and meteorological data from a database of traffic accidents to check the feasibility of a traffic accident duration pipeline without accident contextual information data like accident severity and textual description. Multiple machine learning models were employed to predict whether an accident's impact on road traffic would be of a short-term or long-term nature, and then utilizing a bimodal approach the precise duration of the incident's effect was determined. Our binary classification random forest model distinguished between short-term and long-term effects with an 83% accuracy rate, while the LightGBM regression model outperformed other machine learning regression models with Mean Average Error (MAE) values of 26.15 and 13.3 and RMSE values of 32.91 and 28.91 for short and long-term accident duration prediction, respectively. Using the optimal classification and regression model identified in the preceding section, we then construct an end-to-end pipeline to incorporate the entire process. The results of both separate and combined approaches were comparable with previous works, which shows the applicability of only using static features for predicting traffic accident duration. The SHAP value analysis identified weather conditions, wind chill and wind speed as the most influential factors in determining the duration of an accident.
    摘要 因为事件的随机性,预测交通事故持续时间是一项具有挑战性的任务。 preciselly estimating the duration of a traffic accident can bring significant benefits to commuters in choosing the best routes and to traffic management personnel in addressing non-recurring congestion issues. 在这项研究中,我们从交通事故事件数据库中收集了事故持续时间、路面条件和天气数据,以检验是否可以建立交通事故持续时间管道,不使用事故Contextual information数据如事故严重程度和文本描述。 我们使用多种机器学习模型来预测事故对道路交通的影响是短期或长期的,然后使用二分类方法确定事故的具体持续时间。我们的二分类Random Forest模型可以准确地将事故的影响分为短期和长期两个类别,其中精度为83%。 LightGBM回归模型在机器学习回归模型中表现出色,其MAE值为26.15和13.3,RMSE值为32.91和28.91,分别用于短期和长期事故持续时间预测。 使用最佳的分类和回归模型,我们构建了一个端到端管道,并将整个过程包含在内。结果表明,只使用静态特征可以达到与之前的研究结果相同的精度。 SHAP值分析表明,天气条件、风速和风轻度是确定事故持续时间的最重要因素。

Loss Modeling for Multi-Annotator Datasets

  • paper_url: http://arxiv.org/abs/2311.00619
  • repo_url: https://github.com/molyswu/hand_detection
  • paper_authors: Uthman Jinadu, Jesse Annan, Shanshan Wen, Yi Ding
  • for: 提高 dataset 的公平性,即使使用大量数据 annotator 的注释。
  • methods: 使用 multitask learning 和 loss-based label correction 来学习更准确的多个注释者意见。
  • results: 可以清晰地分化同意和不同意的注释,并且在单个或多个注释者设置下提高预测性能。
    Abstract Accounting for the opinions of all annotators of a dataset is critical for fairness. However, when annotating large datasets, individual annotators will frequently provide thousands of ratings which can lead to fatigue. Additionally, these annotation processes can occur over multiple days which can lead to an inaccurate representation of an annotator's opinion over time. To combat this, we propose to learn a more accurate representation of diverse opinions by utilizing multitask learning in conjunction with loss-based label correction. We show that using our novel formulation, we can cleanly separate agreeing and disagreeing annotations. Furthermore, we demonstrate that this modification can improve prediction performance in a single or multi-annotator setting. Lastly, we show that this method remains robust to additional label noise that is applied to subjective data.
    摘要 accounting for all annotators' opinions is crucial for fairness, but when annotating large datasets, individual annotators may provide thousands of ratings, leading to fatigue. furthermore, the annotation process may take place over multiple days, which can result in inaccurate representation of an annotator's opinion over time. to address this issue, we propose using multitask learning in conjunction with loss-based label correction to learn a more accurate representation of diverse opinions. we show that our novel formulation can cleanly separate agreeing and disagreeing annotations, and improve prediction performance in a single or multi-annotator setting. additionally, we demonstrate that our method remains robust to additional label noise that is commonly found in subjective data.

Rethinking Variational Inference for Probabilistic Programs with Stochastic Support

  • paper_url: http://arxiv.org/abs/2311.00594
  • repo_url: https://github.com/treigerm/sdvi_neurips
  • paper_authors: Tim Reichelt, Luke Ong, Tom Rainforth
  • for: 这个论文是用于解决 probabilistic programs with stochastic support 中的变量抽象问题的新方法。
  • methods: 这个方法使用了分解程序into sub-programs with static support,然后自动建立每个子程序的独立子导数。
  • results: 这个方法可以提高变量抽象的性能,具体来说,可以更好地建立适当的变量家族,从而提高推断性能。
    Abstract We introduce Support Decomposition Variational Inference (SDVI), a new variational inference (VI) approach for probabilistic programs with stochastic support. Existing approaches to this problem rely on designing a single global variational guide on a variable-by-variable basis, while maintaining the stochastic control flow of the original program. SDVI instead breaks the program down into sub-programs with static support, before automatically building separate sub-guides for each. This decomposition significantly aids in the construction of suitable variational families, enabling, in turn, substantial improvements in inference performance.
    摘要 我们介绍Support Decomposition Variational Inference(SDVI),一种新的可能性计算(VI)方法,用于实际程序中的数据支持。现有的方法对这个问题是通过设计单一的全球可能性引导,并在变量基础上维护原始程序中的随机控制流。然而,SDVI将程序分解为子程序,并自动建立每个子程序的独立子引导。这个分解可以帮助建立适合的可能性家族,从而提高推论性能。

Coop: Memory is not a Commodity

  • paper_url: http://arxiv.org/abs/2311.00591
  • repo_url: None
  • paper_authors: Jianhao Zhang, Shihan Ma, Peihong Liu, Jinhui Yuan
  • for: 提高深度学习框架下限制内存预算下的神经网络训练效率
  • methods: 提出了一种基于窗口内存混合的tensor重新材料化策略,并提出了便宜的tensor分割和可重复在位进行进一步减少重新材料化成本
  • results: 对八种代表性的神经网络进行了实验,实验结果表明,Coop可以达到$2\times$的内存储存空间约束,并大幅减少计算开销、搜索延迟和内存散射问题 compared to状态则基elines。
    Abstract Tensor rematerialization allows the training of deep neural networks (DNNs) under limited memory budgets by checkpointing the models and recomputing the evicted tensors as needed. However, the existing tensor rematerialization techniques overlook the memory system in deep learning frameworks and implicitly assume that free memory blocks at different addresses are identical. Under this flawed assumption, discontiguous tensors are evicted, among which some are not used to allocate the new tensor. This leads to severe memory fragmentation and increases the cost of potential rematerializations. To address this issue, we propose to evict tensors within a sliding window to ensure all evictions are contiguous and are immediately used. Furthermore, we proposed cheap tensor partitioning and recomputable in-place to further reduce the rematerialization cost by optimizing the tensor allocation. We named our method Coop as it is a co-optimization of tensor allocation and tensor rematerialization. We evaluated Coop on eight representative DNNs. The experimental results demonstrate that Coop achieves up to $2\times$ memory saving and hugely reduces compute overhead, search latency, and memory fragmentation compared to the state-of-the-art baselines.
    摘要 tensor重新材料化可以在有限内存预算下训练深度神经网络(DNN),通过检查点模型并重新计算被踢出的张量来实现。然而,现有的张量重新材料化技术忽视深度学习框架中的内存系统,并且自然地假设不同地址上的免费内存块是相同的。基于这个错误的假设,深度神经网络中的张量会被踢出,其中一些张量并没有用于分配新的张量。这会导致内存散落严重,并使 potential rematerializations 的成本增加。为解决这个问题,我们提议在滑动窗口内踢出张量,以确保所有的踢出都是连续的,并且立即用于分配新的张量。此外,我们还提出了便宜的张量分配和可重复计算在位的方法,以进一步减少rematerialization成本。我们命名了我们的方法为Coop,因为它是张量分配和张量重新材料化的共优化。我们在八个代表性的深度神经网络上进行了实验。实验结果表明,Coop可以达到 $2\times$ 的内存减少和巨大减少计算开销、搜索延迟和内存散落比现状态之前的基elines。

Boosting Summarization with Normalizing Flows and Aggressive Training

  • paper_url: http://arxiv.org/abs/2311.00588
  • repo_url: https://github.com/yuyangstat/flowsum
  • paper_authors: Yu Yang, Xiaotong Shen
  • for: 这个论文是为了提出一种基于normalizing flows的Transformer-based摘要框架,以解决变量摘要中的两个主要挑战:latent representation中的 semantic information不足和训练过程中的 posterior collapse。
  • methods: 该方法使用normalizing flows来实现灵活的latent posterior模型,并提出了一种控制性的 alternate aggressive training(CAAT)策略和改进的门控制机制。
  • results: 实验结果表明,FlowSUM可以显著提高生成的摘要质量,并允许知识储存以最小化影响 inference时间。此外,论文还研究了normalizing flows中的 posterior collapse问题,并分析了摘要质量如何受到训练策略、门初始值、normalizing flows类型和数量的影响,为未来研究提供了有价值的信息。
    Abstract This paper presents FlowSUM, a normalizing flows-based variational encoder-decoder framework for Transformer-based summarization. Our approach tackles two primary challenges in variational summarization: insufficient semantic information in latent representations and posterior collapse during training. To address these challenges, we employ normalizing flows to enable flexible latent posterior modeling, and we propose a controlled alternate aggressive training (CAAT) strategy with an improved gate mechanism. Experimental results show that FlowSUM significantly enhances the quality of generated summaries and unleashes the potential for knowledge distillation with minimal impact on inference time. Furthermore, we investigate the issue of posterior collapse in normalizing flows and analyze how the summary quality is affected by the training strategy, gate initialization, and the type and number of normalizing flows used, offering valuable insights for future research.
    摘要

Minimally Modifying a Markov Game to Achieve Any Nash Equilibrium and Value

  • paper_url: http://arxiv.org/abs/2311.00582
  • repo_url: None
  • paper_authors: Young Wu, Jeremy McMahan, Yiding Chen, Yudong Chen, Xiaojin Zhu, Qiaomin Xie
  • for: 这个论文研究了游戏修改问题,即一位仁慈的游戏设计者或一位恶意对手修改游戏奖励函数,使得一个目标策略Profile变为游戏的唯一马歇尔完美 equilibrio,并且其价值在一定范围内。
  • methods: 该论文使用了policy profile的 Installation Set Theory 和Random Perturbation Algorithm来解决这个问题。
  • results: 论文提出了一种高效的修改计划,可以在near-optimal cost下使得目标策略Profile变为游戏的唯一马歇尔完美 equilibrio。
    Abstract We study the game modification problem, where a benevolent game designer or a malevolent adversary modifies the reward function of a zero-sum Markov game so that a target deterministic or stochastic policy profile becomes the unique Markov perfect Nash equilibrium and has a value within a target range, in a way that minimizes the modification cost. We characterize the set of policy profiles that can be installed as the unique equilibrium of some game, and establish sufficient and necessary conditions for successful installation. We propose an efficient algorithm, which solves a convex optimization problem with linear constraints and then performs random perturbation, to obtain a modification plan with a near-optimal cost.
    摘要 我们研究游戏修改问题,其中一位好心的游戏设计师或一位邪恶对手修改了游戏奖励函数,使得目标决策函数 Profile 变成游戏的唯一马克洛夫完美均衡,并且其价值在一定范围内,以最小化修改成本。我们描述了可以安装为游戏唯一均衡的策略Profile集,并提出了必要和 suficient condition для成功安装。我们还提出了一种高效的算法,它首先解决一个几何编制问题,然后通过随机干扰来获得一个近似优化成本的修改计划。

Can Foundation Models Watch, Talk and Guide You Step by Step to Make a Cake?

  • paper_url: http://arxiv.org/abs/2311.00738
  • repo_url: None
  • paper_authors: Yuwei Bao, Keunwoo Peter Yu, Yichi Zhang, Shane Storks, Itamar Bar-Yossef, Alexander De La Iglesia, Megan Su, Xiao Lin Zheng, Joyce Chai
  • for: 这个论文旨在开发一种可以提供 situational, personalized 任务指导的人工智能系统,以帮助人类完成多种任务。
  • methods: 这个论文使用了一个新的多modal benchmark dataset,Watch, Talk and Guide (WTaG),基于自然的人类用户和导师之间的互动。它还提出了两个任务:用户和环境理解,以及导师决策。 authors 利用了多种基础模型,以研究这些模型是否可以快速适应可见指导任务。
  • results: 数据evaluation结果表明,这些模型在某些情况下可以达到 fair 性能,但快速适应任务仍然是一大挑战。这个论文的benchmark和基elines将为未来的 situational task guidance 工作提供一个进程架构。
    Abstract Despite tremendous advances in AI, it remains a significant challenge to develop interactive task guidance systems that can offer situated, personalized guidance and assist humans in various tasks. These systems need to have a sophisticated understanding of the user as well as the environment, and make timely accurate decisions on when and what to say. To address this issue, we created a new multimodal benchmark dataset, Watch, Talk and Guide (WTaG) based on natural interaction between a human user and a human instructor. We further proposed two tasks: User and Environment Understanding, and Instructor Decision Making. We leveraged several foundation models to study to what extent these models can be quickly adapted to perceptually enabled task guidance. Our quantitative, qualitative, and human evaluation results show that these models can demonstrate fair performances in some cases with no task-specific training, but a fast and reliable adaptation remains a significant challenge. Our benchmark and baselines will provide a stepping stone for future work on situated task guidance.
    摘要 尽管人工智能技术有很大的进步,仍然是一项非常大的挑战,开发出可以提供协助人类完成各种任务的交互式任务指南系统。这些系统需要具备高度智能的用户和环境认知,并在时间和内容上做出准确的决策。为解决这个问题,我们创建了一个新的多模态基准数据集,Watch, Talk and Guide(WTaG),基于人类用户和人类导师之间的自然交互。我们还提出了两个任务:用户和环境理解,以及导师决策。我们利用了一些基础模型,以研究这些模型是否可以快速适应具有感知能力的任务指南。我们的量化、质量和人类评估结果表明,这些模型在某些情况下可以达到公平的性能,但快速和可靠的适应仍然是一项大的挑战。我们的基准和基线将为未来的协助任务指南做出贡献。

LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing

  • paper_url: http://arxiv.org/abs/2311.00571
  • repo_url: None
  • paper_authors: Wei-Ge Chen, Irina Spiridonova, Jianwei Yang, Jianfeng Gao, Chunyuan Li
  • for: 这个论文旨在描述一种用于多模态人机交互的研究 прототип(LLaVA-Interactive),可以与人类用户进行多回交流,并将多模态用户输入与生成多模态响应。
  • methods: 这个系统使用了三种预制的AI模型的多模态技能,包括视觉聊天的LLaVA、图像分割的SEEM以及图像生成和修改的GLIGEN,无需进行额外的模型训练。
  • results: 论文描述了LLaVA-Interactive的开发,并对其在多种应用场景中的推荐进行了展示,以鼓励未来的多模态交互系统研究。
    Abstract LLaVA-Interactive is a research prototype for multimodal human-AI interaction. The system can have multi-turn dialogues with human users by taking multimodal user inputs and generating multimodal responses. Importantly, LLaVA-Interactive goes beyond language prompt, where visual prompt is enabled to align human intents in the interaction. The development of LLaVA-Interactive is extremely cost-efficient as the system combines three multimodal skills of pre-built AI models without additional model training: visual chat of LLaVA, image segmentation from SEEM, as well as image generation and editing from GLIGEN. A diverse set of application scenarios is presented to demonstrate the promises of LLaVA-Interactive and to inspire future research in multimodal interactive systems.
    摘要 LLaVA-Interactive 是一个研究原型,用于多模态人机交互。该系统可以与人类用户进行多轮对话,接受多模态用户输入并生成多模态回应。特别是,LLaVA-Interactive 超越语言提示,允许视觉提示对人类意图进行Alignment。该系统的开发非常经济,因为它将三种预构建 AI 模型结合使用,无需进行额外模型训练:视觉对话的 LLaVA,图像分割的 SEEM,以及图像生成和编辑的 GLIGEN。一组多样化的应用场景被示出,以证明 LLaVA-Interactive 的推荐和未来多模态交互系统的研究的可能性。

Detecting Visual Cues in the Intensive Care Unit and Association with Patient Clinical Status

  • paper_url: http://arxiv.org/abs/2311.00565
  • repo_url: None
  • paper_authors: Subhash Nerella, Ziyuan Guan, Andrea Davidson, Yuanfang Ren, Tezcan Baslanti, Brooke Armfield, Patrick Tighe, Azra Bihorac, Parisa Rashidi
  • for: 这研究旨在开发一种基于人工智能技术的评估工具,以帮助医疗工作者在护理科室中进行更加准确和细化的病人评估。
  • methods: 该研究使用了一种名为“面具损失计算”的新技术,以解决数据不均衡问题,并使用了SWINTransformer模型进行训练。
  • results: 研究发现,通过检测18种表情动作单元(AU),可以与病人的acuity状况、急性脑功能障碍和疼痛有 statistically significant 的相关性。SWINTransformer模型在测试集上达到了0.57的mean F1分数和0.89的mean准确率。
    Abstract Intensive Care Units (ICU) provide close supervision and continuous care to patients with life-threatening conditions. However, continuous patient assessment in the ICU is still limited due to time constraints and the workload on healthcare providers. Existing patient assessments in the ICU such as pain or mobility assessment are mostly sporadic and administered manually, thus introducing the potential for human errors. Developing Artificial intelligence (AI) tools that can augment human assessments in the ICU can be beneficial for providing more objective and granular monitoring capabilities. For example, capturing the variations in a patient's facial cues related to pain or agitation can help in adjusting pain-related medications or detecting agitation-inducing conditions such as delirium. Additionally, subtle changes in visual cues during or prior to adverse clinical events could potentially aid in continuous patient monitoring when combined with high-resolution physiological signals and Electronic Health Record (EHR) data. In this paper, we examined the association between visual cues and patient condition including acuity status, acute brain dysfunction, and pain. We leveraged our AU-ICU dataset with 107,064 frames collected in the ICU annotated with facial action units (AUs) labels by trained annotators. We developed a new "masked loss computation" technique that addresses the data imbalance problem by maximizing data resource utilization. We trained the model using our AU-ICU dataset in conjunction with three external datasets to detect 18 AUs. The SWIN Transformer model achieved 0.57 mean F1-score and 0.89 mean accuracy on the test set. Additionally, we performed AU inference on 634,054 frames to evaluate the association between facial AUs and clinically important patient conditions such as acuity status, acute brain dysfunction, and pain.
    摘要 医疗保健机构(ICU)为患有生命威胁的患者提供临密监测和不间断的护理,但是现有的患者评估在ICU仍然受到时间约束和医疗人员的工作负担的限制。现有的患者评估,如疼痛或 mobilité评估,都是间歇的并由人工进行,因此存在人类错误的潜在风险。通过开发人工智能(AI)工具可以帮助医疗人员更加 объекively和精细地监测患者的状况。例如,捕捉患者面部表达的变化,以帮助调整疼痛药物或检测刺激性情况如 delirio。此外,在或 перед不良临床事件发生时,通过高分辨率生物参数和电子医疗纪录(EHR)数据,可能发现细微的视觉表达变化,以帮助持续性监测患者。在这篇论文中,我们研究了面部表达和患者状况之间的关系,包括病情严重程度、脑部功能障碍和疼痛。我们利用我们的AU-ICU数据集,包括107,064帧在ICU中收集的患者面部表达,并由训练过的标注员进行了facial action units(AUs)标签。我们开发了一种“遮盖损失计算”技术,解决数据不均衡问题,以最大化数据资源使用。我们使用我们的AU-ICU数据集,并与三个外部数据集进行了模型训练,检测18种AUs。SWIN Transformer模型在测试集上取得了0.57的 mean F1-score和0.89的 mean accuracy。此外,我们对634,054帧的面部表达进行了AU推断,以评估面部表达和临床重要的患者状况,如病情严重程度、脑部功能障碍和疼痛。

Tackling the Abstraction and Reasoning Corpus (ARC) with Object-centric Models and the MDL Principle

  • paper_url: http://arxiv.org/abs/2311.00545
  • repo_url: https://github.com/sebferre/arc-mdl
  • paper_authors: Sébastien Ferré
  • for: 用于推动人工智能研究,创建更高水平的智能系统。
  • methods: 使用对象中心模型,与人类自然语言程序相似,并使用最小描述长度原则进行有效的搜索。
  • results: 解决了多种任务,学习出的模型与自然程序相似,并在不同领域中进行了扩展应用。
    Abstract The Abstraction and Reasoning Corpus (ARC) is a challenging benchmark, introduced to foster AI research towards human-level intelligence. It is a collection of unique tasks about generating colored grids, specified by a few examples only. In contrast to the transformation-based programs of existing work, we introduce object-centric models that are in line with the natural programs produced by humans. Our models can not only perform predictions, but also provide joint descriptions for input/output pairs. The Minimum Description Length (MDL) principle is used to efficiently search the large model space. A diverse range of tasks are solved, and the learned models are similar to the natural programs. We demonstrate the generality of our approach by applying it to a different domain.
    摘要 《抽象和理解集合(ARC)》是一个挑战性的标准集,旨在促进人工智能研究,以达到人类水平的智能。它包含一些唯一的任务,需要生成颜色grid,只需要几个示例来定义。与现有的变换基本的程序不同,我们引入了对象中心的模型,与人类生成的自然程序相符。我们的模型不仅可以进行预测,还可以提供输入/输出对的共同描述。使用最小描述长度(MDL)原理,我们有效地搜索大型模型空间。我们解决了多种任务,并且学习的模型与自然程序类似。我们示示了我们的方法的通用性,通过应用到不同领域。

The Development of LLMs for Embodied Navigation

  • paper_url: http://arxiv.org/abs/2311.00530
  • repo_url: https://github.com/rongtao-xu/awesome-llm-en
  • paper_authors: Jinzhou Lin, Han Gao, Rongtao Xu, Changwei Wang, Li Guo, Shibiao Xu
  • for: 本研究的目的是探讨Large Language Models(LLMs)与embodied intelligence的相互作用,尤其是在导航任务中。
  • methods: 本文使用了现有的state-of-the-art模型和研究方法,以及一个全面的链接列表,以描述LLMs在embodied intelligence中的应用。
  • results: 本文对现有的embodied navigation模型和数据集进行了评估,并分析了LLMs在导航任务中的优势和缺点。同时,本文还预测了未来LLMs在embodied intelligence中的发展趋势。
    Abstract In recent years, the rapid advancement of Large Language Models (LLMs) such as the Generative Pre-trained Transformer (GPT) has attracted increasing attention due to their potential in a variety of practical applications. The application of LLMs with Embodied Intelligence has emerged as a significant area of focus. Among the myriad applications of LLMs, navigation tasks are particularly noteworthy because they demand a deep understanding of the environment and quick, accurate decision-making. LLMs can augment embodied intelligence systems with sophisticated environmental perception and decision-making support, leveraging their robust language and image-processing capabilities. This article offers an exhaustive summary of the symbiosis between LLMs and embodied intelligence with a focus on navigation. It reviews state-of-the-art models, research methodologies, and assesses the advantages and disadvantages of existing embodied navigation models and datasets. Finally, the article elucidates the role of LLMs in embodied intelligence, based on current research, and forecasts future directions in the field. A comprehensive list of studies in this survey is available at https://github.com/Rongtao-Xu/Awesome-LLM-EN
    摘要 This article provides an exhaustive summary of the symbiosis between LLMs and embodied intelligence, focusing on navigation. It reviews state-of-the-art models, research methodologies, and assesses the advantages and disadvantages of existing embodied navigation models and datasets. Additionally, the article elucidates the role of LLMs in embodied intelligence based on current research and forecasts future directions in the field. A comprehensive list of studies in this survey is available at [INSERT GITHUB LINK].Translated into Simplified Chinese:近年来,大语言模型(LLM)如生成预训练转换器(GPT)的快速发展,吸引了广泛关注,因为它们在各种实际应用中具有潜在的潜力。LLM与embodied intelligence的结合,被视为一个重要的研究方向。在LLM中,导航任务特别值得注意,因为它们需要深刻了解环境,快速准确地作出决策。LLM可以增强embodied intelligence系统,提供了先进的环境感知和决策支持,利用它们的语言和图像处理能力。本文提供了LLM与embodied intelligence的完整概述,强调导航。它检查了现状的模型、研究方法和现有的embodied navigation模型和数据集的优缺点。此外,文章还详细介绍了LLM在embodied intelligence中的角色,基于当前研究,并预测未来在这个领域的发展趋势。具体的研究列表可以在[INSERT GITHUB LINK]中找到。

Learning impartial policies for sequential counterfactual explanations using Deep Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2311.00523
  • repo_url: None
  • paper_authors: E. Panagiotou, E. Ntoutsi
  • for: 这个论文的目的是提高Explainable Artificial Intelligence(XAI)中的sequential counterfactual(SCF)示例的效果。
  • methods: 这个论文使用了Reinforcement Learning(RL)方法来学习SCF的找索引策略,以提高执行效率。
  • results: 这个论文发现了现有方法可能会导致政策具有不适的属性,如偏爱特定的动作。这个论文提议使用分类器的输出概率来创建更加有用的奖励,以 Mitigate这个效应。
    Abstract In the field of explainable Artificial Intelligence (XAI), sequential counterfactual (SCF) examples are often used to alter the decision of a trained classifier by implementing a sequence of modifications to the input instance. Although certain test-time algorithms aim to optimize for each new instance individually, recently Reinforcement Learning (RL) methods have been proposed that seek to learn policies for discovering SCFs, thereby enhancing scalability. As is typical in RL, the formulation of the RL problem, including the specification of state space, actions, and rewards, can often be ambiguous. In this work, we identify shortcomings in existing methods that can result in policies with undesired properties, such as a bias towards specific actions. We propose to use the output probabilities of the classifier to create a more informative reward, to mitigate this effect.
    摘要 在可解释人工智能(XAI)领域,sequential counterfactual(SCF)例子经常用于改变已训练的分类器的决策,通过对输入实例进行一系列的修改。虽然一些测试时间算法尝试在每个新实例上优化,但是最近的奖励学习(RL)方法已经被提议用于找到SCFs,从而提高可扩展性。在RL问题的形式ulation中,包括状态空间、动作和奖励的规定,通常是抽象的。在这种情况下,我们发现现有方法的缺陷可能导致政策具有不жела的性质,如偏向特定的动作。我们提议使用分类器的输出概率来创建更有用的奖励,以 Mitigate这种效应。

Efficient LLM Inference on CPUs

  • paper_url: http://arxiv.org/abs/2311.00502
  • repo_url: https://github.com/intel/intel-extension-for-transformers
  • paper_authors: Haihao Shen, Hanwen Chang, Bo Dong, Yu Luo, Hengyu Meng
  • for: 本文旨在提出一种有效的方法,以提高大语言模型(LLMs)的部署效率。
  • methods: 本文使用自动INT4Weight-只量化流程和特制的LLM运行时,以优化CPU上LLM推理的速度。
  • results: 我们在各种流行的LLMs,包括Llama2、Llama、GPT-NeoX等,实现了高效的CPU推理。代码可以在:https://github.com/intel/intel-extension-for-transformers 中找到。
    Abstract Large language models (LLMs) have demonstrated remarkable performance and tremendous potential across a wide range of tasks. However, deploying these models has been challenging due to the astronomical amount of model parameters, which requires a demand for large memory capacity and high memory bandwidth. In this paper, we propose an effective approach that can make the deployment of LLMs more efficiently. We support an automatic INT4 weight-only quantization flow and design a special LLM runtime with highly-optimized kernels to accelerate the LLM inference on CPUs. We demonstrate the general applicability of our approach on popular LLMs including Llama2, Llama, GPT-NeoX, and showcase the extreme inference efficiency on CPUs. The code is publicly available at: https://github.com/intel/intel-extension-for-transformers.
    摘要 庞大语言模型(LLM)在各种任务上表现出色,但是部署这些模型却是一项极具挑战性的任务,因为它们的模型参数数量太多,需要大量的内存容量和高带宽。在这篇论文中,我们提出了一种有效的方法,可以使得LLM的部署更加高效。我们支持自动INT4Weight-only量化流程,并设计了特制的LLM运行时,以加速LLM的推理过程在CPU上。我们在各种受欢迎的LLM模型,包括Llama2、Llama和GPT-NeoX等模型上进行了普适性测试,并示出了在CPUs上的极高推理效率。代码可以在以下链接获取:https://github.com/intel/intel-extension-for-transformers。

Intriguing Properties of Data Attribution on Diffusion Models

  • paper_url: http://arxiv.org/abs/2311.00500
  • repo_url: https://github.com/sail-sg/d-trak
  • paper_authors: Xiaosen Zheng, Tianyu Pang, Chao Du, Jing Jiang, Min Lin
  • for: 这paper的目的是为了trace模型输出回到训练数据上,以确保数据贡献者得到公平的奖励或认可。
  • methods: 这paper使用了several theoretically motivated方法来实现数据归属,以提高计算可扩展性和效果的trade-off。
  • results: 在DDPMs和LoRA-finetuned模型上进行了广泛的实验和ablation study,发现了一些Counter-intuitive的观察结果,其中些 theoretically不合理的设计选择在数据归属方面 empirically outperform了之前的基准值,并且在linear datamodeling score和counterfactual评价方面均表现出了明显的改善。这些结果表明了一种更加有效的数据归属方法,同时也表明了在非拟合设置下,按照理论上的假设可能会导致数据归属性能下降。代码可以在https://github.com/sail-sg/D-TRAK中找到。
    Abstract Data attribution seeks to trace model outputs back to training data. With the recent development of diffusion models, data attribution has become a desired module to properly assign valuations for high-quality or copyrighted training samples, ensuring that data contributors are fairly compensated or credited. Several theoretically motivated methods have been proposed to implement data attribution, in an effort to improve the trade-off between computational scalability and effectiveness. In this work, we conduct extensive experiments and ablation studies on attributing diffusion models, specifically focusing on DDPMs trained on CIFAR-10 and CelebA, as well as a Stable Diffusion model LoRA-finetuned on ArtBench. Intriguingly, we report counter-intuitive observations that theoretically unjustified design choices for attribution empirically outperform previous baselines by a large margin, in terms of both linear datamodeling score and counterfactual evaluation. Our work presents a significantly more efficient approach for attributing diffusion models, while the unexpected findings suggest that at least in non-convex settings, constructions guided by theoretical assumptions may lead to inferior attribution performance. The code is available at https://github.com/sail-sg/D-TRAK.
    摘要 “数据追溯” seek to trace 模型输出到训练数据 zurück. With the recent development of diffusion models, 数据追溯 has become a desired module to properly assign valuations for high-quality or copyrighted 训练样本, ensuring that data contributors are fairly compensated or credited. Several theoretically motivated methods have been proposed to implement 数据追溯, in an effort to improve the trade-off between computational scalability and effectiveness. In this work, we conduct extensive experiments and ablation studies on attributing diffusion models, specifically focusing on DDPMs trained on CIFAR-10 and CelebA, as well as a Stable Diffusion model LoRA-finetuned on ArtBench. Intriguingly, we report counter-intuitive observations that theoretically unjustified design choices for attribution empirically outperform previous baselines by a large margin, in terms of both linear datamodeling score and counterfactual evaluation. Our work presents a significantly more efficient approach for attributing diffusion models, while the unexpected findings suggest that at least in non-convex settings, constructions guided by theoretical assumptions may lead to inferior attribution performance. 代码可以在 获取。

Bayes-enhanced Multi-view Attention Networks for Robust POI Recommendation

  • paper_url: http://arxiv.org/abs/2311.00491
  • repo_url: None
  • paper_authors: Jiangnan Xia, Yu Yang, Senzhang Wang, Hongzhi Yin, Jiannong Cao, Philip S. Yu
  • for: 本研究旨在提高 Location-Based Social Network 服务中 POI 推荐的精度和可靠性,由于现有的 POI 检查点数据可能受到主观和 объектив 因素的影响,导致 POI 推荐的性能下降。
  • methods: 本研究提出了一种 Bayes-enhanced Multi-view Attention Network,包括个人 POI 转移图、semantic-based POI 图和距离-based POI 图,用于全面模型 POI 之间的依赖关系。在个人 POI 转移图中,采用 Bayes-enhanced 空间依赖学习模块进行数据扩充,以增加数据多样性。然后,使用多视图注意力学习模块对 POI 表示学习进行修复。
  • results: 对比当前状态的方法,本研究的 BayMAN 方法在 POI 推荐时 Significantly 高于其他方法,特别是当 POI 检查点数据不完整或受到噪声影响时。
    Abstract POI recommendation is practically important to facilitate various Location-Based Social Network services, and has attracted rising research attention recently. Existing works generally assume the available POI check-ins reported by users are the ground-truth depiction of user behaviors. However, in real application scenarios, the check-in data can be rather unreliable due to both subjective and objective causes including positioning error and user privacy concerns, leading to significant negative impacts on the performance of the POI recommendation. To this end, we investigate a novel problem of robust POI recommendation by considering the uncertainty factors of the user check-ins, and proposes a Bayes-enhanced Multi-view Attention Network. Specifically, we construct personal POI transition graph, the semantic-based POI graph and distance-based POI graph to comprehensively model the dependencies among the POIs. As the personal POI transition graph is usually sparse and sensitive to noise, we design a Bayes-enhanced spatial dependency learning module for data augmentation from the local view. A Bayesian posterior guided graph augmentation approach is adopted to generate a new graph with collaborative signals to increase the data diversity. Then both the original and the augmented graphs are used for POI representation learning to counteract the data uncertainty issue. Next, the POI representations of the three view graphs are input into the proposed multi-view attention-based user preference learning module. By incorporating the semantic and distance correlations of POIs, the user preference can be effectively refined and finally robust recommendation results are achieved. The results of extensive experiments show that BayMAN significantly outperforms the state-of-the-art methods in POI recommendation when the available check-ins are incomplete and noisy.
    摘要

Dual Conditioned Diffusion Models for Out-Of-Distribution Detection: Application to Fetal Ultrasound Videos

  • paper_url: http://arxiv.org/abs/2311.00469
  • repo_url: None
  • paper_authors: Divyanshu Mishra, He Zhao, Pramit Saha, Aris T. Papageorghiou, J. Alison Noble
  • for: 本研究旨在提高机器学习模型的可靠性,通过检测训练数据集外的样本。
  • methods: 本研究使用 dual-conditioned diffusion models (DCDM),通过在模型中添加受控制的类信息和启发特征来实现重构基于OOD检测。
  • results: 对比参考方法,本研究所得到的准确率提高12%, 特征准确率提高22%, F1分数提高8%。
    Abstract Out-of-distribution (OOD) detection is essential to improve the reliability of machine learning models by detecting samples that do not belong to the training distribution. Detecting OOD samples effectively in certain tasks can pose a challenge because of the substantial heterogeneity within the in-distribution (ID), and the high structural similarity between ID and OOD classes. For instance, when detecting heart views in fetal ultrasound videos there is a high structural similarity between the heart and other anatomies such as the abdomen, and large in-distribution variance as a heart has 5 distinct views and structural variations within each view. To detect OOD samples in this context, the resulting model should generalise to the intra-anatomy variations while rejecting similar OOD samples. In this paper, we introduce dual-conditioned diffusion models (DCDM) where we condition the model on in-distribution class information and latent features of the input image for reconstruction-based OOD detection. This constrains the generative manifold of the model to generate images structurally and semantically similar to those within the in-distribution. The proposed model outperforms reference methods with a 12% improvement in accuracy, 22% higher precision, and an 8% better F1 score.
    摘要 外部分布(OOD)检测是提高机器学习模型的可靠性的关键之一,检测训练分布之外的样本。在某些任务中,检测OOD样本可能具有挑战性,因为ID和OOD类之间存在很大的同化和结构相似性。例如,在诊断胎儿心脏视频中,心脏和其他身体部位(如腹部)之间存在很高的结构相似性,同时心脏还有5种不同的视角和视频内部结构变化。为了在这种情况下检测OOD样本,我们需要构建一个能够总结各个体征变化的模型,同时拒绝类似OOD样本。在这篇论文中,我们提出了双conditioned diffusion模型(DCDM),其中我们将模型 conditioned于ID类信息和输入图像的隐藏特征,以实现图像的重构基于OOD检测。这将限制模型生成图像的概率分布,使其生成结构和semantic相似于训练分布中的图像。我们的模型与参考方法相比,提高了12%的准确率,22%的精度和8%的F1分数。

Leveraging Hyperbolic Embeddings for Coarse-to-Fine Robot Design

  • paper_url: http://arxiv.org/abs/2311.00462
  • repo_url: None
  • paper_authors: Heng Dong, Junyu Zhang, Chongjie Zhang
  • for: 设计多细胞机器人,实现多种任务的高效控制。
  • methods: 提出了一种新的粗细化到细致的方法,首先寻找优化的粗细机器人,然后逐渐细化。为了解决粗细转换中的决定问题,引入了Hyperbolic Embeddings for Robot Design(HERD)框架。HERD将机器人归一化到共同的虚拟空间中,并使用改进的十字熵方法进行优化。
  • results: 经验研究表明,该方法在多种复杂任务中显示出优于其他方法的高效性和普适性。
    Abstract Multi-cellular robot design aims to create robots comprised of numerous cells that can be efficiently controlled to perform diverse tasks. Previous research has demonstrated the ability to generate robots for various tasks, but these approaches often optimize robots directly in the vast design space, resulting in robots with complicated morphologies that are hard to control. In response, this paper presents a novel coarse-to-fine method for designing multi-cellular robots. Initially, this strategy seeks optimal coarse-grained robots and progressively refines them. To mitigate the challenge of determining the precise refinement juncture during the coarse-to-fine transition, we introduce the Hyperbolic Embeddings for Robot Design (HERD) framework. HERD unifies robots of various granularity within a shared hyperbolic space and leverages a refined Cross-Entropy Method for optimization. This framework enables our method to autonomously identify areas of exploration in hyperbolic space and concentrate on regions demonstrating promise. Finally, the extensive empirical studies on various challenging tasks sourced from EvoGym show our approach's superior efficiency and generalization capability.
    摘要 多细胞机器人设计目标是创建由多个细胞组成的机器人,可以高效控制完成多种任务。前一些研究已经实现了这些任务,但这些方法经常直接优化机器人的设计空间,导致机器人的结构变得复杂,控制困难。因此,本文提出了一种新的粗细到细的设计方法。首先,这种策略寻找最佳粗细机器人,然后进行细化。为了解决在粗细转换过程中决定精细化的具体时间点的挑战,我们提出了Hiperbolic Embeddings for Robot Design(HERD)框架。HERD在多细胞空间中囊括了各种机器人,并利用了改进的十字积分法进行优化。这种框架使我们的方法可以自动在偏特空间中寻找探索的区域,并集中在示 promise的区域。最后,我们对多个复杂任务的实验研究表明,我们的方法具有更高的效率和通用性。

On the Opportunities of Green Computing: A Survey

  • paper_url: http://arxiv.org/abs/2311.00447
  • repo_url: None
  • paper_authors: You Zhou, Xiujing Lin, Xiang Zhang, Maolin Wang, Gangwei Jiang, Huakang Lu, Yupeng Wu, Kai Zhang, Zhe Yang, Kehang Wang, Yongduo Sui, Fengwei Jia, Zuoli Tang, Yao Zhao, Hongxuan Zhang, Tiannuo Yang, Weibo Chen, Yunong Mao, Yi Li, De Bao, Yu Li, Hongrui Liao, Ting Liu, Jingwen Liu, Jinchi Guo, Jin Zhao, Xiangyu Zhao, Ying WEI, Hong Qian, Qi Liu, Xiang Wang, Wai Kin, Chan, Chenliang Li, Yusen Li, Shiyu Yang, Jining Yan, Chao Mou, Shuai Han, Wuxia Jin, Guannan Zhang, Xiaodong Zeng
  • for: 这篇论文主要是为了探讨绿色计算技术在人工智能领域中的应用和发展。
  • methods: 论文使用了一种系统性的分析方法,概括了绿色计算领域的四个关键组成部分,即“量化环保”、“能效AI”、“能效计算系统”和“可持续性用 случа”。
  • results: 论文结果表明,绿色计算技术有可能解决人工智能发展所带来的资源约束和环保问题。这个新的研究方向具有很大的潜力,并且鼓励更多的研究人员关注这个领域,让人工智能更加环保。
    Abstract Artificial Intelligence (AI) has achieved significant advancements in technology and research with the development over several decades, and is widely used in many areas including computing vision, natural language processing, time-series analysis, speech synthesis, etc. During the age of deep learning, especially with the arise of Large Language Models, a large majority of researchers' attention is paid on pursuing new state-of-the-art (SOTA) results, resulting in ever increasing of model size and computational complexity. The needs for high computing power brings higher carbon emission and undermines research fairness by preventing small or medium-sized research institutions and companies with limited funding in participating in research. To tackle the challenges of computing resources and environmental impact of AI, Green Computing has become a hot research topic. In this survey, we give a systematic overview of the technologies used in Green Computing. We propose the framework of Green Computing and devide it into four key components: (1) Measures of Greenness, (2) Energy-Efficient AI, (3) Energy-Efficient Computing Systems and (4) AI Use Cases for Sustainability. For each components, we discuss the research progress made and the commonly used techniques to optimize the AI efficiency. We conclude that this new research direction has the potential to address the conflicts between resource constraints and AI development. We encourage more researchers to put attention on this direction and make AI more environmental friendly.
    摘要 人工智能(AI)在技术和研究方面已经取得了重要进步,并广泛应用于多个领域,如计算视觉、自然语言处理、时间序列分析、语音合成等。在深度学习时代,特别是大语言模型的出现,研究者的关注主要集中在追求新的状态或艺术(SOTA)结果,导致模型的大小和计算复杂度的不断增加。这导致了更高的计算能力和环境影响,同时还妨碍了小或中型研究机构和公司的参与,因为它们有限的资金无法投入研究。为了解决AI计算资源和环境影响的挑战,绿色计算成为了热门的研究方向。在这篇评论中,我们提供了绿色计算的系统性评论,并将其分为四个关键组成部分:(1)绿色度指标,(2)能效AI,(3)能效计算系统,(4)用于可持续发展的AI应用场景。对于每个组成部分,我们讨论了研究进步和优化AI效率的常用技术。我们认为,这新的研究方向具有解决资源约束和AI发展之间的矛盾的潜力。我们劝勉更多的研究者关注这个方向,使AI更加环保。

A Systematic Comparison of Syllogistic Reasoning in Humans and Language Models

  • paper_url: http://arxiv.org/abs/2311.00445
  • repo_url: None
  • paper_authors: Tiwalayo Eisape, MH Tessler, Ishita Dasgupta, Fei Sha, Sjoerd van Steenkiste, Tal Linzen
  • for: investigate whether language models replicate human reasoning biases in logical inference
  • methods: using syllogisms to test the logical reasoning abilities of language models, comparing the performance of larger and smaller models and humans
  • results: larger models are more logical than smaller ones and humans, but all models make systematic errors and mimic human reasoning biases such as ordering effects and logical fallacies
    Abstract A central component of rational behavior is logical inference: the process of determining which conclusions follow from a set of premises. Psychologists have documented several ways in which humans' inferences deviate from the rules of logic. Do language models, which are trained on text generated by humans, replicate these biases, or are they able to overcome them? Focusing on the case of syllogisms -- inferences from two simple premises, which have been studied extensively in psychology -- we show that larger models are more logical than smaller ones, and also more logical than humans. At the same time, even the largest models make systematic errors, some of which mirror human reasoning biases such as ordering effects and logical fallacies. Overall, we find that language models mimic the human biases included in their training data, but are able to overcome them in some cases.
    摘要 人类理智中的一个重要组成部分是逻辑推理:从一组前提中导出结论的过程。心理学家已经记录了人们的推理偏差,而语言模型是否会复制这些偏差?我们通过研究 Syllogisms ,即从两个简单前提中导出结论,发现大型模型比小型模型更逻辑,同时也比人类更逻辑。然而,也有大型模型存在系统性错误,一些与人类理智偏差相似,如顺序效应和逻辑错误。总之,语言模型会吸收来自其训练数据中的人类偏差,但在一些情况下能够超越它们。

Improving Robustness for Vision Transformer with a Simple Dynamic Scanning Augmentation

  • paper_url: http://arxiv.org/abs/2311.00441
  • repo_url: None
  • paper_authors: Shashank Kotyan, Danilo Vasconcellos Vargas
  • for: 提高计算机视觉任务中 ViT 的准确率和Robustness
  • methods: 提出了一种名为 “动态扫描增强” 的扩展技术,利用动态输入序列来适应不同的补丁,以保持性能和Robustness
  • results: 对多种攻击和自然图像进行了详细的测试,发现这种适应性增强了 ViT 的Robustness,从 $17%$ 提高到 $92%$,并且对于自然图像的准确率也有所提高。
    Abstract Vision Transformer (ViT) has demonstrated promising performance in computer vision tasks, comparable to state-of-the-art neural networks. Yet, this new type of deep neural network architecture is vulnerable to adversarial attacks limiting its capabilities in terms of robustness. This article presents a novel contribution aimed at further improving the accuracy and robustness of ViT, particularly in the face of adversarial attacks. We propose an augmentation technique called `Dynamic Scanning Augmentation' that leverages dynamic input sequences to adaptively focus on different patches, thereby maintaining performance and robustness. Our detailed investigations reveal that this adaptability to the input sequence induces significant changes in the attention mechanism of ViT, even for the same image. We introduce four variations of Dynamic Scanning Augmentation, outperforming ViT in terms of both robustness to adversarial attacks and accuracy against natural images, with one variant showing comparable results. By integrating our augmentation technique, we observe a substantial increase in ViT's robustness, improving it from $17\%$ to $92\%$ measured across different types of adversarial attacks. These findings, together with other comprehensive tests, indicate that Dynamic Scanning Augmentation enhances accuracy and robustness by promoting a more adaptive type of attention. In conclusion, this work contributes to the ongoing research on Vision Transformers by introducing Dynamic Scanning Augmentation as a technique for improving the accuracy and robustness of ViT. The observed results highlight the potential of this approach in advancing computer vision tasks and merit further exploration in future studies.
    摘要 目标是使用新的深度神经网络架构ViT(Vision Transformer)在计算机视觉任务中表现出色,但是这种架构受到了针对性攻击的限制,增加了其可靠性的问题。这篇文章提出了一种新的贡献,即使用动态扫描加速器来提高ViT的准确率和可靠性,特别是在对抗针对性攻击方面。我们提出的动态扫描加速器利用动态输入序列来适应不同的补丁,以保持性能和可靠性。我们的详细调查表明,这种适应输入序列的能力会导致ViT的注意机制发生显著变化,即使用同一张图片。我们提出了四种变体的动态扫描加速器,其中一种变体与ViT相比,在对抗针对性攻击和自然图像方面均有显著提高。通过将我们的加速器纳入ViT中,我们观察到了ViT的可靠性从17%提高到92%,测试过程中不同类型的针对性攻击中。这些结果,以及其他详细的测试,表明动态扫描加速器可以提高ViT的准确率和可靠性,并促进计算机视觉任务的进步。因此,这种方法在未来的研究中具有潜在的应用前景。

Enhanced Generalization through Prioritization and Diversity in Self-Imitation Reinforcement Learning over Procedural Environments with Sparse Rewards

  • paper_url: http://arxiv.org/abs/2311.00426
  • repo_url: None
  • paper_authors: Alain Andres, Daochen Zha, Javier Del Ser
  • for: The paper is written to address the challenge of exploration in Reinforcement Learning (RL) with sparse rewards, specifically in procedurally-generated (PCG) environments.
  • methods: The paper proposes tailored self-Imitation Learning (self-IL) sampling strategies that prioritize transitions based on different criteria and address diversity loss through modifications to counteract the impact of generalization requirements and bias introduced by prioritization techniques.
  • results: The paper achieves a new state-of-the-art performance in the MiniGrid-MultiRoom-N12-S10 environment through experimental analysis conducted over three PCG sparse reward environments, including MiniGrid and ProcGen.Here’s the same information in Simplified Chinese text:
  • for: 该文章是为了解决在强化学习(RL)中的探索挑战,特别是在生成过程中的环境(PCG)中。
  • methods: 文章提出了适应性自我模仿学习(自我IL)的采样策略,根据不同的优先级来决定保留哪些经验,并通过修改来减少泛化需求和偏见引入的影响。
  • results: 通过对三个PCG稀补奖励环境,包括MiniGrid和ProcGen,的实验分析,文章在MiniGrid-MultiRoom-N12-S10环境中达到了新的最佳性能。
    Abstract Exploration poses a fundamental challenge in Reinforcement Learning (RL) with sparse rewards, limiting an agent's ability to learn optimal decision-making due to a lack of informative feedback signals. Self-Imitation Learning (self-IL) has emerged as a promising approach for exploration, leveraging a replay buffer to store and reproduce successful behaviors. However, traditional self-IL methods, which rely on high-return transitions and assume singleton environments, face challenges in generalization, especially in procedurally-generated (PCG) environments. Therefore, new self-IL methods have been proposed to rank which experiences to persist, but they replay transitions uniformly regardless of their significance, and do not address the diversity of the stored demonstrations. In this work, we propose tailored self-IL sampling strategies by prioritizing transitions in different ways and extending prioritization techniques to PCG environments. We also address diversity loss through modifications to counteract the impact of generalization requirements and bias introduced by prioritization techniques. Our experimental analysis, conducted over three PCG sparse reward environments, including MiniGrid and ProcGen, highlights the benefits of our proposed modifications, achieving a new state-of-the-art performance in the MiniGrid-MultiRoom-N12-S10 environment.
    摘要

Neural Implicit Field Editing Considering Object-environment Interaction

  • paper_url: http://arxiv.org/abs/2311.00425
  • repo_url: None
  • paper_authors: Zhihong Zeng, Zongji Wang, Yuanben Zhang, Weinan Cai, Zehao Cao, Lili Zhang, Yan Guo, Yanhong Zhang, Junyi Liu
  • for: 该论文主要目标是提出一种基于神经隐藏场的3D场景编辑方法,以解决现有方法中对物体和场景环境的交互不充分考虑的问题。
  • methods: 该方法基于两条流 neural rendering 系统,其中一条流负责处理物体和场景环境的交互,另一条流则负责处理物体的编辑任务。为了从混合汤中获取照明条件,该系统使用内在分解方法进行成功分离物体和场景环境之间的交互。
  • results: 该方法可以在对象级编辑任务中生成合理的外观变化,并且在新视图synthesis任务中实现了竞争性的表现质量。
    Abstract The 3D scene editing method based on neural implicit field has gained wide attention. It has achieved excellent results in 3D editing tasks. However, existing methods often blend the interaction between objects and scene environment. The change of scene appearance like shadows is failed to be displayed in the rendering view. In this paper, we propose an Object and Scene environment Interaction aware (OSI-aware) system, which is a novel two-stream neural rendering system considering object and scene environment interaction. To obtain illuminating conditions from the mixture soup, the system successfully separates the interaction between objects and scene environment by intrinsic decomposition method. To study the corresponding changes to the scene appearance from object-level editing tasks, we introduce a depth map guided scene inpainting method and shadow rendering method by point matching strategy. Extensive experiments demonstrate that our novel pipeline produce reasonable appearance changes in scene editing tasks. It also achieve competitive performance for the rendering quality in novel-view synthesis tasks.
    摘要 《基于神经隐函数的3D场景编辑方法获得了广泛关注。它在3D编辑任务中表现出色。然而,现有方法经常混合对象和场景环境的交互。改变场景外观的影响,如阴影,在渲染视图中未能正确显示。在这篇论文中,我们提出了一个对象和场景环境相互aware(OSI-aware)系统,这是一种新的两派神经渲染系统,考虑了对象和场景环境的交互。为了从混合液中获得照明条件,我们成功地将对象和场景环境之间的交互分解成内在分解方法。为了研究对象编辑任务中场景外观的相应变化,我们引入了深度地图准入场景填充方法和阴影渲染方法,使用点匹配策略。广泛的实验表明,我们的新ipeline在场景编辑任务中产生了合理的外观变化,同时在新视图合成任务中达到了竞争性的表现质量。》Note: Please keep in mind that the translation is done by a machine and may not be perfect, especially when it comes to the nuances of language and cultural references.

Couples can be tractable: New algorithms and hardness results for the Hospitals / Residents problem with Couples

  • paper_url: http://arxiv.org/abs/2311.00405
  • repo_url: None
  • paper_authors: Gergely Csáji, David Manlove, Iain McBride, James Trimble
  • for: 这个论文是研究{\sc Hospitals / Residents problem with Couples}({\sc hrc)的,它的解决方案是一个稳定匹配或一份报告表明无法找到匹配。
  • methods: 我们提出了一种新的多项时间算法,可以在{\sc hrc}实例中找到一个近似稳定匹配(对医院容量进行最多1个调整),其中couples的偏好是不响应(如果一个成员更改为更好的医院,那么夫妻也会改善)和不完全(每对医院都是两个成员都可以接受的)。
  • results: 我们的算法可以在一个子responsive、子完全的{\sc hrc}实例中找到一个稳定匹配,并且我们也证明了这个算法可以解决一个稳定b匹配问题,其中的基础graph是一个多GraphWithLoops。此外,我们还证明了{\sc hrc}的NP困难性,包括在一些特定的情况下是NP困难的。
    Abstract In this paper we study the {\sc Hospitals / Residents problem with Couples} ({\sc hrc}), where a solution is a stable matching or a report that none exists. We present a novel polynomial-time algorithm that can find a near-feasible stable matching (adjusting the hospitals' capacities by at most 1) in an {\sc hrc} instance where the couples' preferences are sub-responsive (i.e., if one member switches to a better hospital, than the couple also improves) and sub-complete (i.e., each pair of hospitals that are individually acceptable to both members are jointly acceptable for the couple) by reducing it to an instance of the {\sc Stable Fixtures} problem. We also present a polynomial-time algorithm for {\sc hrc} in a sub-responsive, sub-complete instance that is a Dual Market, or where all couples are one of several possible types. We show that our algorithm also implies the polynomial-time solvability of a stable b-matching problem, where the underlying graph is a multigraph with loops. We complement our algorithms with several hardness results. We show that {\sc hrc} with sub-responsive and sub-complete couples is NP-hard, even with other strong restrictions. We also show that {\sc hrc} with a Dual Market is NP-hard under several simultaneous restrictions. Finally, we show that the problem of finding a matching with the minimum number of blocking pairs in {\sc hrc} is not approximable within $m^{1-\varepsilon}$, for any $\varepsilon>0$, where $m$ is the total length of the hospitals' preference lists, unless P=NP, even if each couple applies to only one pair of hospitals. Our polynomial-time solvability results greatly expand the class of known tractable instances of {\sc hrc} and provide additional evidence as to why long-standing entry-level labour markets that allow couples such as the National Resident Matching Program remain successful to this day.
    摘要 在本文中,我们研究了医院和住院医生匹配问题(hrc),其中解决方案是稳定匹配或报告无解。我们提出了一种新的多项式时间算法,可以在hrc实例中,其中伙伴偏好是不响应的(即如果一方转移到更好的医院,那么伙伴也会改善)和不完整的(即每对医院都是两个成员都可以接受的)情况下,通过将医院容量调整到最多1来找到一个近似稳定匹配。我们还提出了一种多项式时间算法,用于hrc实例中的子响应、不完整实例,或者所有的couple都是一种特定类型。我们证明了我们的算法还可以解决稳定b匹配问题,其中下面的图是一个多重图。我们在本文中还提供了多种硬性结果。我们证明了hrc中的sub-responsive和sub-complete伙伴是NP困难的,即无论做出哪些强制限制,hrc都是NP困难的。我们还证明了hrc中的dual market是NP困难的,只要满足一些同时的强制限制。最后,我们证明了hrc中寻找最小数量的堵塞对的匹配是不可以approximate在$m^{1-\varepsilon}$中,其中$m$是医院的偏好列表总长度,任何$\varepsilon>0$。我们的多项式时间可行性结果大大扩展了知道的可解实例,并提供了更多的证明,证明为什么长期存在的入门级劳动市场,如国家住院医生匹配计划,至今仍然成功。

A Spatial-Temporal Transformer based Framework For Human Pose Assessment And Correction in Education Scenarios

  • paper_url: http://arxiv.org/abs/2311.00401
  • repo_url: None
  • paper_authors: Wenyang Hu, Kai Liu, Libin Liu, Huiliang Shang
  • for: 这篇论文是为了提供一个基于空间-时间转换器的框架,用于在教育场景中评估和修正学生的人体姿势。
  • methods: 该框架包括skeletal tracking、pose estimation、姿势评估和姿势修正模块,以提供专业、快速修复反馈。
  • results: 我们的模型可以有效地评估和修正学生的动作质量。STTF利用转换器模型捕捉人体姿势的空间和时间相关性,实现了准确的评估和有效的修正。
    Abstract Human pose assessment and correction play a crucial role in applications across various fields, including computer vision, robotics, sports analysis, healthcare, and entertainment. In this paper, we propose a Spatial-Temporal Transformer based Framework (STTF) for human pose assessment and correction in education scenarios such as physical exercises and science experiment. The framework comprising skeletal tracking, pose estimation, posture assessment, and posture correction modules to educate students with professional, quick-to-fix feedback. We also create a pose correction method to provide corrective feedback in the form of visual aids. We test the framework with our own dataset. It comprises (a) new recordings of five exercises, (b) existing recordings found on the internet of the same exercises, and (c) corrective feedback on the recordings by professional athletes and teachers. Results show that our model can effectively measure and comment on the quality of students' actions. The STTF leverages the power of transformer models to capture spatial and temporal dependencies in human poses, enabling accurate assessment and effective correction of students' movements.
    摘要 人体姿势评估和修正在多个领域中扮演着关键角色,包括计算机视觉、机器人学、运动分析、医疗和娱乐等。在这篇论文中,我们提出了基于空间时间变换器(STTF)的人体姿势评估和修正框架,用于在教育场景中评估和修正学生的 физи 活动和科学实验中的姿势。该框架包括骨骼跟踪、姿势估计、姿势评价和姿势修正模块,以提供专业、快速修复的反馈。我们还开发了一种姿势修正方法,以提供可见的修正反馈。我们对自己的数据集进行测试,该数据集包括(a)新录制的五种运动动作,(b)互联网上已有的同样运动动作的录制,以及(c)由专业运动员和教师提供的修正反馈。结果表明,我们的模型可以有效地评估和修正学生的动作质量。STTF利用变换器模型来捕捉人体姿势中的空间和时间相依关系,以便准确地评估和修正学生的动作。

Augmenting deep neural networks with symbolic knowledge: Towards trustworthy and interpretable AI for education

  • paper_url: http://arxiv.org/abs/2311.00393
  • repo_url: None
  • paper_authors: Danial Hooshyar, Roger Azevedo, Yeongwook Yang
  • for: 该研究旨在探讨人工神经网络(ANNs)在教育应用中的限制,并提出一种基于神经符号学家AI的解决方案,以增强ANNs的教育潜力。
  • methods: 该研究采用了一种基于神经符号学家AI的方法,称为NSAI,可以在深度神经网络中注入和提取教育知识。
  • results: 研究发现,NSAI方法比深度神经网络 Merely 训练数据和数据增强方法(SMOTE和自动编码器)的模型具有更好的泛化性。此外,NSAI方法可以减少训练数据中的偏见和自适应性,并提供可读性和解释性的规则。
    Abstract Artificial neural networks (ANNs) have shown to be amongst the most important artificial intelligence (AI) techniques in educational applications, providing adaptive educational services. However, their educational potential is limited in practice due to three major challenges: i) difficulty in incorporating symbolic educational knowledge (e.g., causal relationships, and practitioners' knowledge) in their development, ii) learning and reflecting biases, and iii) lack of interpretability. Given the high-risk nature of education, the integration of educational knowledge into ANNs becomes crucial for developing AI applications that adhere to essential educational restrictions, and provide interpretability over the predictions. This research argues that the neural-symbolic family of AI has the potential to address the named challenges. To this end, it adapts a neural-symbolic AI framework and accordingly develops an approach called NSAI, that injects and extracts educational knowledge into and from deep neural networks, for modelling learners computational thinking. Our findings reveal that the NSAI approach has better generalizability compared to deep neural networks trained merely on training data, as well as training data augmented by SMOTE and autoencoder methods. More importantly, unlike the other models, the NSAI approach prioritises robust representations that capture causal relationships between input features and output labels, ensuring safety in learning to avoid spurious correlations and control biases in training data. Furthermore, the NSAI approach enables the extraction of rules from the learned network, facilitating interpretation and reasoning about the path to predictions, as well as refining the initial educational knowledge. These findings imply that neural-symbolic AI can overcome the limitations of ANNs in education, enabling trustworthy and interpretable applications.
    摘要
  1. Difficulty in incorporating symbolic educational knowledge (e.g., causal relationships, practitioners’ knowledge) in their development.2. Learning and reflecting biases.3. Lack of interpretability.To address these challenges, this research advocates for the use of neural-symbolic AI, which has the potential to integrate educational knowledge into ANNs and provide interpretability over the predictions. The proposed approach, called NSAI, injects and extracts educational knowledge into and from deep neural networks, enabling the modelling of learners’ computational thinking.The results show that the NSAI approach has better generalizability compared to deep neural networks trained merely on training data, as well as training data augmented by SMOTE and autoencoder methods. Additionally, the NSAI approach prioritizes robust representations that capture causal relationships between input features and output labels, ensuring safety in learning and avoiding spurious correlations and control biases in training data.Moreover, the NSAI approach enables the extraction of rules from the learned network, facilitating interpretation and reasoning about the path to predictions, as well as refining the initial educational knowledge. These findings suggest that neural-symbolic AI can overcome the limitations of ANNs in education, enabling trustworthy and interpretable applications.

Will Code Remain a Relevant User Interface for End-User Programming with Generative AI Models?

  • paper_url: http://arxiv.org/abs/2311.00382
  • repo_url: None
  • paper_authors: Advait Sarkar
  • for: 本研究探讨了在生成AI时,传统编程语言仍然对非专业程序员有用性的问题。
  • methods: 本文采用了观察研究的方法,探讨了生成AI对非专业程序员的影响。
  • results: 本文提出了“生成shift假设”,即生成AI会对传统编程语言产生质量和量上的扩展。同时,文章还探讨了传统编程语言在非专业程序员中的可能性。
    Abstract The research field of end-user programming has largely been concerned with helping non-experts learn to code sufficiently well in order to achieve their tasks. Generative AI stands to obviate this entirely by allowing users to generate code from naturalistic language prompts. In this essay, we explore the extent to which "traditional" programming languages remain relevant for non-expert end-user programmers in a world with generative AI. We posit the "generative shift hypothesis": that generative AI will create qualitative and quantitative expansions in the traditional scope of end-user programming. We outline some reasons that traditional programming languages may still be relevant and useful for end-user programmers. We speculate whether each of these reasons might be fundamental and enduring, or whether they may disappear with further improvements and innovations in generative AI. Finally, we articulate a set of implications for end-user programming research, including the possibility of needing to revisit many well-established core concepts, such as Ko's learning barriers and Blackwell's attention investment model.
    摘要 研究领域内的终端用户编程主要关注于帮助非专业人员学习编程,以便实现他们的任务。生成AI可能将把用户的编程需求转化为自然语言提示,从而彻底改变这一情况。在这篇文章中,我们探讨了传统编程语言在非专业终端编程者面前是否仍然有用的问题。我们提出了“生成转移 гипотеза”:生成AI会使得终端编程的范围发生质量和量上的扩展。我们列举了传统编程语言在非专业终端编程者面前可能仍然有用的原因。我们推测这些原因是否是基本和普遍的,或者将随着生成AI的进一步改进和创新而消失。最后,我们详细介绍了终端编程研究的影响,包括可能需要重新评估许多已有核心概念,如科氏学习障碍和布莱克威尔注意力投入模型。

Architecture of Data Anomaly Detection-Enhanced Decentralized Expert System for Early-Stage Alzheimer’s Disease Prediction

  • paper_url: http://arxiv.org/abs/2311.00373
  • repo_url: None
  • paper_authors: Stefan Kambiz Behfar, Qumars Behfar, Marzie Hosseinpour
  • for: 这个研究旨在早期检测阿尔茨海默病,以提高病人结果。
  • methods: 这个研究使用了分布式专家系统,结合区块链技术和人工智能,以实现Robust anomaly detection。
  • results: 这个系统可以提供更精确的早期阿尔茨海默病预测,并保护数据完整性和病人隐私。
    Abstract Alzheimer's Disease is a global health challenge that requires early and accurate detection to improve patient outcomes. Magnetic Resonance Imaging (MRI) holds significant diagnostic potential, but its effective analysis remains a formidable task. This study introduces a groundbreaking decentralized expert system that cleverly combines blockchain technology with Artificial Intelligence (AI) to integrate robust anomaly detection for patient-submitted data. Traditional diagnostic methods often lead to delayed and imprecise predictions, especially in the early stages of the disease. Centralized data repositories struggle to manage the immense volumes of MRI data, and persistent privacy concerns hinder collaborative efforts. Our innovative solution harnesses decentralization to protect data integrity and patient privacy, facilitated by blockchain technology. It not only emphasizes AI-driven MRI analysis but also incorporates a sophisticated data anomaly detection architecture. These mechanisms scrutinize patient-contributed data for various issues, including data quality problems and atypical findings within MRI images. Conducting an exhaustive check of MRI image correctness and quality directly on the blockchain is impractical due to computational complexity and cost constraints. Typically, such checks are performed off-chain, and the blockchain securely records the results. This comprehensive approach empowers our decentralized app to provide more precise early-stage Alzheimer's Disease predictions. By merging the strengths of blockchain, AI, and anomaly detection, our system represents a pioneering step towards revolutionizing disease diagnostics.
    摘要 阿尔茨海默病是全球医疗挑战,早期检测是提高病人结果的关键。核磁共振成像(MRI)具有诊断潜力,但是有效分析具有挑战。这项研究推出了创新的分布式专家系统,协调区块链技术和人工智能(AI),以实现强大的异常检测。传统诊断方法通常会导致延迟和不准确的预测,特别是早期病情阶段。中央数据存储系统忙于管理大量MRI数据,而持续的隐私问题阻碍了合作努力。我们的创新解决方案利用分布式保护数据完整性和患者隐私,通过区块链技术。它不仅强调AI驱动的MRI分析,还包括了复杂的数据异常检测建筑。这些机制在患者提供的数据中检测了各种问题,包括数据质量问题和MRI图像中的异常现象。由于计算复杂性和成本约束,在区块链上进行全面的MRI图像正确性和质量检查是不实际。通常,这些检查在外部进行,并将结果记录在区块链上。这种全面的方法使我们的分布式应用程序提供更精准的早期阿尔茨海默病预测。通过融合区块链、AI和异常检测的优势,我们的系统表现出了革新的潜力,为疾病诊断领域带来巨大的改变。

Prompt-based Logical Semantics Enhancement for Implicit Discourse Relation Recognition

  • paper_url: http://arxiv.org/abs/2311.00367
  • repo_url: https://github.com/lalalamdbf/plse_idrr
  • paper_authors: Chenxu Wang, Ping Jian, Mu Huang
    for: 本文主要针对推广语句关系识别(IDRR)进行研究,并提出一种基于提示的逻辑 semantics 增强方法(PLSE),以提高 IDRR 的性能和可靠性。methods: 本文使用了预训语言模型的提示基于逻辑 semantics 预测,以将知识与语句关系相互连接。此外,为了解决预测器的局部依赖问题,本文提出了一种基于互联信息最大化的自愿学习目标,以从中获得提高的逻辑 semantics 表示。results: 本文在 PDTB 2.0 和 CoNLL16 数据集上实验ally demonstrate that our PLSE method achieves outstanding and consistent performance against the current state-of-the-art models。
    Abstract Implicit Discourse Relation Recognition (IDRR), which infers discourse relations without the help of explicit connectives, is still a crucial and challenging task for discourse parsing. Recent works tend to exploit the hierarchical structure information from the annotated senses, which demonstrate enhanced discourse relation representations can be obtained by integrating sense hierarchy. Nevertheless, the performance and robustness for IDRR are significantly constrained by the availability of annotated data. Fortunately, there is a wealth of unannotated utterances with explicit connectives, that can be utilized to acquire enriched discourse relation features. In light of such motivation, we propose a Prompt-based Logical Semantics Enhancement (PLSE) method for IDRR. Essentially, our method seamlessly injects knowledge relevant to discourse relation into pre-trained language models through prompt-based connective prediction. Furthermore, considering the prompt-based connective prediction exhibits local dependencies due to the deficiency of masked language model (MLM) in capturing global semantics, we design a novel self-supervised learning objective based on mutual information maximization to derive enhanced representations of logical semantics for IDRR. Experimental results on PDTB 2.0 and CoNLL16 datasets demonstrate that our method achieves outstanding and consistent performance against the current state-of-the-art models.
    摘要 《含义推理提升(IDRR)》是一项挑战性的自然语言处理任务,它推断话语关系无需显式连接。在最近的研究中,人们通常利用话语结构信息,从注解的意思中获得增强的话语关系表示。然而,IDRR的性能和可靠性受到注解数据的可用性的限制。幸运的是,有大量未注解的句子,可以用于获得增强的话语关系特征。基于这种动机,我们提出了一种《含义推理提升(PLSE)》方法,用于IDRR。我们的方法通过提供相关的话语关系知识,使预训练语言模型内置了含义推理能力。此外,由于隐藏语言模型(MLM)无法捕捉全局 semantics,我们设计了一种新的自动学习目标,基于mutual information maximization来 derivate增强的含义semantics表示。实验结果表明,我们的方法在 PDTB 2.0 和 CoNLL16 数据集上达到了当前状态的最佳性能。

Rethinking Samples Selection for Contrastive Learning: Mining of Potential Samples

  • paper_url: http://arxiv.org/abs/2311.00358
  • repo_url: None
  • paper_authors: Hengkui Dong, Xianzhong Long, Yun Li
  • for: 本研究旨在提高对比学习中样本采样的方法,以提高模型的自助学习能力。
  • methods: 我们的方法包括两个方面:首先,对于正样本,我们考虑了数据增强得到的扩展样本视图以及数据挖掘得到的样本视图,并使用软和硬权重策略权重合并。其次,我们分析了负样本的梯度方面,并 mines 适度困难的负样本作为可能的负样本。
  • results: 我们的方法在CIFAR10、CIFAR100和TinyImagenet等 datasets上进行了实验,并显示了与一些传统自助学习方法相比明显的优势。我们的方法在这些 datasets 上取得了88.57%、61.10%和36.69%的 top-1 准确率。
    Abstract Contrastive learning predicts whether two images belong to the same category by training a model to make their feature representations as close or as far away as possible. In this paper, we rethink how to mine samples in contrastive learning, unlike other methods, our approach is more comprehensive, taking into account both positive and negative samples, and mining potential samples from two aspects: First, for positive samples, we consider both the augmented sample views obtained by data augmentation and the mined sample views through data mining. Then, we weight and combine them using both soft and hard weighting strategies. Second, considering the existence of uninformative negative samples and false negative samples in the negative samples, we analyze the negative samples from the gradient perspective and finally mine negative samples that are neither too hard nor too easy as potential negative samples, i.e., those negative samples that are close to positive samples. The experiments show the obvious advantages of our method compared with some traditional self-supervised methods. Our method achieves 88.57%, 61.10%, and 36.69% top-1 accuracy on CIFAR10, CIFAR100, and TinyImagenet, respectively.
    摘要 异构学习预测两张图像属于同一个类别,通过训练模型使其特征表示更加相近或更加远 away。在这篇论文中,我们重新思考了如何采样异构学习中的样本。不同于其他方法,我们的方法更加全面,考虑了两种样本类型的样本:首先,对于正样本,我们考虑了数据扩充后得到的扩充样本视图,以及通过数据挖掘得到的样本视图。然后,我们将它们权重和组合使用软和硬权重策略。其次,我们分析了负样本中的不用fu正样本和假负样本,并最终 mines这些负样本,即与正样本相似的负样本。实验显示,我们的方法与一些传统的自助学习方法相比,具有明显的优势。我们的方法在CIFAR10、CIFAR100和TinyImagenet上取得了88.57%、61.10%和36.69%的top-1准确率。

QFree: A Universal Value Function Factorization for Multi-Agent Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2311.00356
  • repo_url: None
  • paper_authors: Rizhong Wang, Huiping Li, Di Cui, Demin Xu
  • For: The paper is written to propose a universal value function factorization method for multi-agent reinforcement learning (MARL) that satisfies the individual-global-max (IGM) principle without imposing additional limitations on the IGM function class.* Methods: The paper develops a mathematical equivalent conditions of the IGM principle based on the advantage function, and establishes a more expressive mixing network architecture that can fulfill the equivalent factorization. The novel loss function is developed by considering the equivalent conditions as regularization term during policy evaluation in the MARL algorithm.* Results: The proposed method, called QFree, is verified in a nonmonotonic matrix game scenario and achieves state-of-the-art performance in a general-purpose complex MARL benchmark environment, Starcraft Multi-Agent Challenge (SMAC).Here are the three points in Simplified Chinese:* For: 本 paper 是为了提出一种满足个体-全局-最大 (IGM) 原理的多智能体学习 (MARL) 价值函数分解方法。* Methods: 本 paper 基于优势函数的数学等价条件来定义 IGM 原理,并设计了一种更具表达能力的混合网络架构来满足等价分解。 novel 损失函数是在 MARL 算法中评估政策时考虑等价条件作为正则项来开发的。* Results: QFree 在非卷积环境中证明了其效果,并在 Starcraft Multi-Agent Challenge (SMAC) 多智能体挑战环境中达到了当前最佳性能。
    Abstract Centralized training is widely utilized in the field of multi-agent reinforcement learning (MARL) to assure the stability of training process. Once a joint policy is obtained, it is critical to design a value function factorization method to extract optimal decentralized policies for the agents, which needs to satisfy the individual-global-max (IGM) principle. While imposing additional limitations on the IGM function class can help to meet the requirement, it comes at the cost of restricting its application to more complex multi-agent environments. In this paper, we propose QFree, a universal value function factorization method for MARL. We start by developing mathematical equivalent conditions of the IGM principle based on the advantage function, which ensures that the principle holds without any compromise, removing the conservatism of conventional methods. We then establish a more expressive mixing network architecture that can fulfill the equivalent factorization. In particular, the novel loss function is developed by considering the equivalent conditions as regularization term during policy evaluation in the MARL algorithm. Finally, the effectiveness of the proposed method is verified in a nonmonotonic matrix game scenario. Moreover, we show that QFree achieves the state-of-the-art performance in a general-purpose complex MARL benchmark environment, Starcraft Multi-Agent Challenge (SMAC).
    摘要 中央化训练在多智能学习(MARL)领域广泛应用,以确保训练过程的稳定性。一旦获得共同策略,然后需要设计一种值函数分解方法,以EXTRACT optimal的分布式策略,满足IGM原则。虽然通过添加额外限制IGM函数类型可以帮助适应更复杂的多智能环境,但是这会增加训练复杂性。在这篇论文中,我们提出了QFree,一种通用的值函数分解方法 для MARL。我们首先开发了IGM原则的数学等价条件,以确保该原则在不妥协的情况下保持有效,从而消除传统方法中的保守性。然后,我们设计了一种更具表达能力的混合网络架构,可以满足等价分解。具体来说,我们开发了一种新的损失函数,通过在MARL算法中考虑等价条件来评估策略。最后,我们证明了提案的效果,在非 monotonic 矩阵游戏场景中进行了验证。此外,我们还证明了QFree在一个通用的复杂 MARL 环境中达到了状态领先性,例如Starcraft Multi-Agent Challenge(SMAC)。

tmn at #SMM4H 2023: Comparing Text Preprocessing Techniques for Detecting Tweets Self-reporting a COVID-19 Diagnosis

  • paper_url: http://arxiv.org/abs/2311.00732
  • repo_url: None
  • paper_authors: Anna Glazkova
  • for: 本文描述了在SMM4H 2023年度任务1中开发的一种系统,用于自动分类报告COVID-19诊断的推特。
  • methods: 本文使用了不同的技术进行推特处理,并使用四种基于 transformer 的模型进行 fine-tuning。
  • results: ensemble 的 fine-tuned 语言模型得到了84.5%的 F1 分数,比平均值高出4.1%。
    Abstract The paper describes a system developed for Task 1 at SMM4H 2023. The goal of the task is to automatically distinguish tweets that self-report a COVID-19 diagnosis (for example, a positive test, clinical diagnosis, or hospitalization) from those that do not. We investigate the use of different techniques for preprocessing tweets using four transformer-based models. The ensemble of fine-tuned language models obtained an F1-score of 84.5%, which is 4.1% higher than the average value.
    摘要 文章描述了在SMM4H 2023年的任务1中开发的系统。任务的目标是自动分类推特中的自测COVID-19诊断(例如,正确的测试、临床诊断或入院)和不符的推特。我们研究了不同的预处理技术,使用四种基于转换器的模型进行预处理,并获得了 ensemble 的精度模型,其 F1 分数为 84.5%,高于平均值4.1%。

A Definition of Open-Ended Learning Problems for Goal-Conditioned Agents

  • paper_url: http://arxiv.org/abs/2311.00344
  • repo_url: None
  • paper_authors: Olivier Sigaud, Gianluca Baldassarre, Cedric Colas, Stephane Doncieux, Richard Duro, Nicolas Perrin-Gilbert, Vieri Giuliano Santucci
  • for: 本研究为了解决开放式学习概念的不同定义和相关概念(如 continual learning、生命长学习和自我规划学习)之间的差异,并提出一个基本元素Property的定义,以便更好地理解开放式学习的本质。
  • methods: 本研究使用了论述和分析方法,描述了开放式学习的基本概念和历史发展,并提出了一种基于时间无穷horizon的开放式学习问题的定义方法。
  • results: 本研究显示了开放式学习的基本元素Property,并提出了一种基于这个元素的开放式学习问题的定义方法。此外,本研究还指出了在开放式学习领域还需要进一步的研究,以填充开放式学习与更广泛的发展人工智能研究中的差异。
    Abstract A lot of recent machine learning research papers have "Open-ended learning" in their title. But very few of them attempt to define what they mean when using the term. Even worse, when looking more closely there seems to be no consensus on what distinguishes open-ended learning from related concepts such as continual learning, lifelong learning or autotelic learning. In this paper, we contribute to fixing this situation. After illustrating the genealogy of the concept and more recent perspectives about what it truly means, we outline that open-ended learning is generally conceived as a composite notion encompassing a set of diverse properties. In contrast with these previous approaches, we propose to isolate a key elementary property of open-ended processes, which is to always produce novel elements from time to time over an infinite horizon. From there, we build the notion of open-ended learning problems and focus in particular on the subset of open-ended goal-conditioned reinforcement learning problems, as this framework facilitates the definition of learning a growing repertoire of skills. Finally, we highlight the work that remains to be performed to fill the gap between our elementary definition and the more involved notions of open-ended learning that developmental AI researchers may have in mind.
    摘要 很多最近的机器学习研究论文中都有“开放式学习”的标题,但很少有人尝试定义这个术语的含义。甚至更糟糕的是,当仔细查看时,似乎没有一致性的定义将开放式学习与相关概念,如持续学习、人生学习或自我追求学习区分开。在这篇论文中,我们贡献到解决这个问题。我们首先描述了概念的家系和更近期的观点,然后提出了开放式学习是一种复杂的概念,包括多种多样的性质。与之前的方法不同,我们提出了开放式学习过程的关键基本属性——在无穷远 horizon 上不断产生新的元素。从而,我们建立了开放式学习问题的概念,特别是关注开放式目标条件强化学习问题,因为这种框架可以定义学习增长的技能集。最后,我们强调了将这些基本定义与发展人工智能研究者可能有的更复杂的开放式学习概念相关的工作还需要进行。

MetisFL: An Embarrassingly Parallelized Controller for Scalable & Efficient Federated Learning Workflows

  • paper_url: http://arxiv.org/abs/2311.00334
  • repo_url: None
  • paper_authors: Dimitris Stripelis, Chrysovalantis Anastasiou, Patrick Toral, Armaghan Asghar, Jose Luis Ambite
  • for: 这个研究旨在提高 Federated Learning(FL)系统中的联合控制器可扩展性和可携性。
  • methods: 这个研究使用了一个名为 MetisFL 的新型 FL 系统,将联合控制器设计为“首席公民”,重新设计了联合控制器进行大规模 FL 工作流程的加速。
  • results: 透过对其他州旗性 FL 系统进行量化比较,这个研究证明了 MetisFL 在实际应用中可以获得10倍的压缩时间执行提升,适用于各种具有增加模型大小和联合网站的具有挑战性的 FL 工作流程。
    Abstract A Federated Learning (FL) system typically consists of two core processing entities: the federation controller and the learners. The controller is responsible for managing the execution of FL workflows across learners and the learners for training and evaluating federated models over their private datasets. While executing an FL workflow, the FL system has no control over the computational resources or data of the participating learners. Still, it is responsible for other operations, such as model aggregation, task dispatching, and scheduling. These computationally heavy operations generally need to be handled by the federation controller. Even though many FL systems have been recently proposed to facilitate the development of FL workflows, most of these systems overlook the scalability of the controller. To meet this need, we designed and developed a novel FL system called MetisFL, where the federation controller is the first-class citizen. MetisFL re-engineers all the operations conducted by the federation controller to accelerate the training of large-scale FL workflows. By quantitatively comparing MetisFL against other state-of-the-art FL systems, we empirically demonstrate that MetisFL leads to a 10-fold wall-clock time execution boost across a wide range of challenging FL workflows with increasing model sizes and federation sites.
    摘要 Translated into Simplified Chinese:一个 Federated Learning (FL) 系统通常包括两个核心处理实体:联邦控制器和学习者。控制器负责管理执行 FL 工作流程 across 学习者和学习者对其私有数据上的训练和评估联邦模型。在执行 FL 工作流程时,FL 系统没有对参与学习者的计算资源或数据进行控制。然而,它负责其他操作,如模型集成、任务分配和调度。这些计算沉重的操作通常需要由联邦控制器处理。虽然有很多 FL 系统最近被提出来促进 FL 工作流程的开发,但大多数这些系统忽略了控制器的扩展性。为了解决这个需求,我们设计并开发了一个新的 FL 系统 called MetisFL,其中联邦控制器是首要公民。MetisFL 重新设计了联邦控制器所执行的所有操作,以加速训练大规模 FL 工作流程。通过对 MetisFL 与其他当前状态艺术 FL 系统进行Quantitative比较,我们实际地证明 MetisFL 在各种挑战性 FL 工作流程中具有10倍的增速。

Robust Graph Clustering via Meta Weighting for Noisy Graphs

  • paper_url: http://arxiv.org/abs/2311.00322
  • repo_url: https://github.com/hyeonsoojo/metagc
  • paper_authors: Hyeonsoo Jo, Fanchen Bu, Kijung Shin
  • for: robustly clustering graphs with noise edges
  • methods: using a decomposable clustering loss function and meta-weighting to adaptively adjust node pair weights
  • results: outperforms state-of-the-art GNN-based competitors on five real-world graphs under varying levels of noise
    Abstract How can we find meaningful clusters in a graph robustly against noise edges? Graph clustering (i.e., dividing nodes into groups of similar ones) is a fundamental problem in graph analysis with applications in various fields. Recent studies have demonstrated that graph neural network (GNN) based approaches yield promising results for graph clustering. However, we observe that their performance degenerates significantly on graphs with noise edges, which are prevalent in practice. In this work, we propose MetaGC for robust GNN-based graph clustering. MetaGC employs a decomposable clustering loss function, which can be rephrased as a sum of losses over node pairs. We add a learnable weight to each node pair, and MetaGC adaptively adjusts the weights of node pairs using meta-weighting so that the weights of meaningful node pairs increase and the weights of less-meaningful ones (e.g., noise edges) decrease. We show empirically that MetaGC learns weights as intended and consequently outperforms the state-of-the-art GNN-based competitors, even when they are equipped with separate denoising schemes, on five real-world graphs under varying levels of noise. Our code and datasets are available at https://github.com/HyeonsooJo/MetaGC.
    摘要 如何在图中寻找有意义的集群?图分组(即将节点分组到相似的节点集中)是图分析的基本问题,具有各种应用场景。 latest studies have shown that graph neural network (GNN) based approaches have promising results for graph clustering. However, we observe that their performance degrades significantly on graphs with noise edges, which are common in practice. In this work, we propose MetaGC for robust GNN-based graph clustering. MetaGC uses a decomposable clustering loss function, which can be rephrased as a sum of losses over node pairs. We add a learnable weight to each node pair, and MetaGC adaptively adjusts the weights of node pairs using meta-weighting so that the weights of meaningful node pairs increase and the weights of less-meaningful ones (e.g., noise edges) decrease. We empirically show that MetaGC learns weights as intended and consequently outperforms the state-of-the-art GNN-based competitors, even when they are equipped with separate denoising schemes, on five real-world graphs under varying levels of noise. Our code and datasets are available at .

Unsupervised Lexical Simplification with Context Augmentation

  • paper_url: http://arxiv.org/abs/2311.00310
  • repo_url: https://github.com/twadada/lexsub_decontextualised
  • paper_authors: Takashi Wada, Timothy Baldwin, Jey Han Lau
  • for: 这个论文主要是为了提出一种新的无监督词归简方法,使用单语言数据和预训练语言模型。
  • methods: 该方法使用目标词和其上下文作为输入,通过基于目标上下文和额外样本的策略生成替换词。
  • results: 在英语、葡萄牙语和西班牙语的TSAR-2022分享任务上,该模型与其他无监督系统相比,具有显著的性能优势,并在拼接GPT-3.5模型后创造出新的状态天。此外,在SWORDS词归简数据集上进行评估,该模型也实现了新的状态天。
    Abstract We propose a new unsupervised lexical simplification method that uses only monolingual data and pre-trained language models. Given a target word and its context, our method generates substitutes based on the target context and also additional contexts sampled from monolingual data. We conduct experiments in English, Portuguese, and Spanish on the TSAR-2022 shared task, and show that our model substantially outperforms other unsupervised systems across all languages. We also establish a new state-of-the-art by ensembling our model with GPT-3.5. Lastly, we evaluate our model on the SWORDS lexical substitution data set, achieving a state-of-the-art result.
    摘要 我们提出了一种新的无监督词性简化方法,只使用单语言数据和预训练语言模型。给定目标词和其上下文,我们的方法生成替换基于目标上下文和其他从单语言数据采样的上下文。我们在英语、葡萄牙语和西班牙语的TSAR-2022共享任务上进行实验,并显示我们的模型在所有语言上明显超过其他无监督系统。我们还在我们的模型和GPT-3.5的拟合中成立了新的状态对。最后,我们对SWORDS词性替换数据集进行评估,达到了状态纪录。

From Image to Language: A Critical Analysis of Visual Question Answering (VQA) Approaches, Challenges, and Opportunities

  • paper_url: http://arxiv.org/abs/2311.00308
  • repo_url: None
  • paper_authors: Md Farhan Ishmam, Md Sakib Hossain Shovon, M. F. Mridha, Nilanjan Dey
  • for: 本论文旨在探讨视觉问答(VQA)领域的多模态任务,包括计算机视觉(CV)和自然语言处理(NLP)等方面,并且旨在根据任何视觉输入生成问题的答案。
  • methods: 本论文 introduce a detailed taxonomy to categorize the facets of VQA, 并且总结了随着时间的推移,VQA的范围从原始的自然图像集扩展到 sintetic images、视频、3D环境等多种视觉输入。此外,本论文还探讨了大量预训练网络的出现对VQA的影响,从而导致了传统的特征提取和融合方法被 replaced by vision language pre-training(VLP)技术。
  • results: 本论文 summarizes the recent trends, challenges, and scopes for improvement in VQA, 并且探讨了VLP在VQA领域的挑战,并提出了一些未解决的开放问题。此外,本论文还扩展了VQA的范围,探讨了相关的多模态问答任务和未来的研究方向。
    Abstract The multimodal task of Visual Question Answering (VQA) encompassing elements of Computer Vision (CV) and Natural Language Processing (NLP), aims to generate answers to questions on any visual input. Over time, the scope of VQA has expanded from datasets focusing on an extensive collection of natural images to datasets featuring synthetic images, video, 3D environments, and various other visual inputs. The emergence of large pre-trained networks has shifted the early VQA approaches relying on feature extraction and fusion schemes to vision language pre-training (VLP) techniques. However, there is a lack of comprehensive surveys that encompass both traditional VQA architectures and contemporary VLP-based methods. Furthermore, the VLP challenges in the lens of VQA haven't been thoroughly explored, leaving room for potential open problems to emerge. Our work presents a survey in the domain of VQA that delves into the intricacies of VQA datasets and methods over the field's history, introduces a detailed taxonomy to categorize the facets of VQA, and highlights the recent trends, challenges, and scopes for improvement. We further generalize VQA to multimodal question answering, explore tasks related to VQA, and present a set of open problems for future investigation. The work aims to navigate both beginners and experts by shedding light on the potential avenues of research and expanding the boundaries of the field.
    摘要 Multimodal任务视觉问答(VQA)包括计算机视觉(CV)和自然语言处理(NLP)的多个方面,旨在对任何视觉输入生成问题的答案。随着时间的推移,VQA的范围从原始的庞大自然图像集扩展到了 sintetic图像、视频、3D环境和其他多种视觉输入。随着大型预训练网络的出现,早期VQA方法依靠特征提取和融合方案已经转向了视语言预训练(VLP)技术。然而,现在还没有一份全面的报告,涵盖传统VQA架构和当代VLP基于方法。此外,VLP在VQA镜头下的挑战还没有得到全面的探讨,留下了一些未解决的问题。我们的工作提出了VQA领域的一份报告,探讨VQA数据集和方法的历史、介绍VQA的细化分类、描述当前趋势、挑战和改进的可能性。我们还将VQA扩展到多模态问答任务,探讨与VQA相关的任务,并提出一些未解决的问题,以便帮助新手和专家更好地理解这个领域,拓宽领域的边缘。

Inference of CO2 flow patterns – a feasibility study

  • paper_url: http://arxiv.org/abs/2311.00290
  • repo_url: None
  • paper_authors: Abhinav Prakash Gahlot, Huseyin Tuna Erdinc, Rafael Orozco, Ziyi Yin, Felix J. Herrmann
  • for: 本研究旨在开发一种能够准确探测地下碳捕集器(CCS)技术下的CO2泄漏,特别是通过存储储量中的渠道束缚的自然或人工扰动的 faults。
  • methods: 本研究使用 conditional normalizing flow 技术来描述 CO2 泄漏的流行行为,并通过 numerical experiments 来分析其性能。
  • results: 研究结果表明,使用 conditional normalizing flow 技术可以生成高精度的 CO2 泄漏流行行为的推断,并且uncertainty 的推断也是合理的,主要来自于地震数据的噪声和存储储量中流体流行性特性的不确定性。
    Abstract As the global deployment of carbon capture and sequestration (CCS) technology intensifies in the fight against climate change, it becomes increasingly imperative to establish robust monitoring and detection mechanisms for potential underground CO2 leakage, particularly through pre-existing or induced faults in the storage reservoir's seals. While techniques such as history matching and time-lapse seismic monitoring of CO2 storage have been used successfully in tracking the evolution of CO2 plumes in the subsurface, these methods lack principled approaches to characterize uncertainties related to the CO2 plumes' behavior. Inclusion of systematic assessment of uncertainties is essential for risk mitigation for the following reasons: (i) CO2 plume-induced changes are small and seismic data is noisy; (ii) changes between regular and irregular (e.g., caused by leakage) flow patterns are small; and (iii) the reservoir properties that control the flow are strongly heterogeneous and typically only available as distributions. To arrive at a formulation capable of inferring flow patterns for regular and irregular flow from well and seismic data, the performance of conditional normalizing flow will be analyzed on a series of carefully designed numerical experiments. While the inferences presented are preliminary in the context of an early CO2 leakage detection system, the results do indicate that inferences with conditional normalizing flows can produce high-fidelity estimates for CO2 plumes with or without leakage. We are also confident that the inferred uncertainty is reasonable because it correlates well with the observed errors. This uncertainty stems from noise in the seismic data and from the lack of precise knowledge of the reservoir's fluid flow properties.
    摘要 在全球范围内部署碳捕集技术的战 against 气候变化中,建立强大的监测和检测机制 для potential underground CO2 泄露已变得越来越重要。特别是通过存在或人为引入的 faults in the storage reservoir's seals 中的泄露。虽然历史匹配和时间lapse seismic monitoring of CO2 storage 已经成功地跟踪了在地下的 CO2 气泡的进化,但这些方法缺乏定则的方法来评估相关的不确定性。包括系统性的评估不确定性是必要的,以便风险控制,因为:(i) CO2 气泡引起的变化很小,seismic data 是噪音的;(ii)常规和不常规(例如,由泄露引起的)流pattern 之间的变化很小;和(iii)存储器的 свойства,控制流的流速和方向,是强 heterogeneous 的,通常只有分布式存储。为了到达一种可以从 well 和 seismic data 中推断常规和不常规流的形式,我们将分析 conditional normalizing flow 的性能在一系列仔细设计的数值实验中。虽然这些推断是在 CO2 泄露检测系统中的预liminary CONTEXT中提出的,但结果表明,使用 conditional normalizing flows 可以生成高精度的 CO2 气泡推断,无论是否存在泄露。我们还 confidence 的是,推断的不确定性是合理的,因为它与观测数据中的噪音和存储器的流体流速和方向的缺乏精确知识相关。这种不确定性来自 seismic data 中的噪音和存储器的流体流速和方向的缺乏精确知识。

Active Instruction Tuning: Improving Cross-Task Generalization by Training on Prompt Sensitive Tasks

  • paper_url: http://arxiv.org/abs/2311.00288
  • repo_url: https://github.com/pluslabnlp/active-it
  • paper_authors: Po-Nien Kung, Fan Yin, Di Wu, Kai-Wei Chang, Nanyun Peng
  • for: 这篇论文的目的是提出一个新的活动指令调整方法,以便对于实际应用中的大型自然语言模型(LLM)进行最佳化。
  • methods: 这篇论文使用了一个新的框架,即基于提示不确定性的活动指令调整方法,来选择新的任务,并对选择的任务进行调整。这个方法基于提示出现在的模型输出不确定性,以评估新任务的有用性。
  • results: 实验结果显示,这个方法可以与其他基于随机选择的方法相比,在NIV2和Self-Instruct datasets上实现更好的离散应用扩展性,并且可以透过给定的任务地图来评估和诊断任务的有用性。
    Abstract Instruction tuning (IT) achieves impressive zero-shot generalization results by training large language models (LLMs) on a massive amount of diverse tasks with instructions. However, how to select new tasks to improve the performance and generalizability of IT models remains an open question. Training on all existing tasks is impractical due to prohibiting computation requirements, and randomly selecting tasks can lead to suboptimal performance. In this work, we propose active instruction tuning based on prompt uncertainty, a novel framework to identify informative tasks, and then actively tune the models on the selected tasks. We represent the informativeness of new tasks with the disagreement of the current model outputs over perturbed prompts. Our experiments on NIV2 and Self-Instruct datasets demonstrate that our method consistently outperforms other baseline strategies for task selection, achieving better out-of-distribution generalization with fewer training tasks. Additionally, we introduce a task map that categorizes and diagnoses tasks based on prompt uncertainty and prediction probability. We discover that training on ambiguous (prompt-uncertain) tasks improves generalization while training on difficult (prompt-certain and low-probability) tasks offers no benefit, underscoring the importance of task selection for instruction tuning.
    摘要

Knowledge-Infused Prompting: Assessing and Advancing Clinical Text Data Generation with Large Language Models

  • paper_url: http://arxiv.org/abs/2311.00287
  • repo_url: https://github.com/ritaranx/clingen
  • paper_authors: Ran Xu, Hejie Cui, Yue Yu, Xuan Kan, Wenqi Shi, Yuchen Zhuang, Wei Jin, Joyce Ho, Carl Yang
  • for: 这个论文是为了提高临床自然语言处理领域中的方法,以便更好地处理复杂的医疗术语和临床背景。
  • methods: 该论文使用了大型自然语言模型(LLM)来解决这些问题,并提出了一种新的、资源有效的方法,即ClinGen,它将知识注入到过程中。
  • results: 该论文的实验表明,ClinGen可以在7种临床自然语言处理任务和16个数据集上提高性能,并且能够有效地增加数据生成的多样性和准确性。
    Abstract Clinical natural language processing requires methods that can address domain-specific challenges, such as complex medical terminology and clinical contexts. Recently, large language models (LLMs) have shown promise in this domain. Yet, their direct deployment can lead to privacy issues and are constrained by resources. To address this challenge, we delve into synthetic clinical text generation using LLMs for clinical NLP tasks. We propose an innovative, resource-efficient approach, ClinGen, which infuses knowledge into the process. Our model involves clinical knowledge extraction and context-informed LLM prompting. Both clinical topics and writing styles are drawn from external domain-specific knowledge graphs and LLMs to guide data generation. Our extensive empirical study across 7 clinical NLP tasks and 16 datasets reveals that ClinGen consistently enhances performance across various tasks, effectively aligning the distribution of real datasets and significantly enriching the diversity of generated training instances. We will publish our code and all the generated data in \url{https://github.com/ritaranx/ClinGen}.
    摘要 临床自然语言处理需要针对医疗域特有的挑战,如医疗术语和临床背景。最近,大型自然语言模型(LLM)在这个领域表现了承诺。然而,直接部署LLM可能会导致隐私问题并受到资源限制。为解决这个挑战,我们探索了人工生成的临床文本,使用LLM进行临床NLP任务。我们提出了一种创新的、资源有效的方法,名为ClinGen,它将知识整合到过程中。我们的模型包括临床知识提取和上下文 Informed LLM 提示。两者均从外部域specific知识图和LLM中提取临床主题和写作风格,以引导数据生成。我们的广泛的实验研究 across 7 临床 NLP 任务和 16 个数据集显示,ClinGen 在不同任务上一致地提高性能,并准确地调整实际数据的分布,并且有效地增加生成的训练示例多样性。我们将将代码和所有生成的数据发布在 \url{https://github.com/ritaranx/ClinGen}.

JADE: A Linguistics-based Safety Evaluation Platform for LLM

  • paper_url: http://arxiv.org/abs/2311.00286
  • repo_url: https://github.com/whitzard-ai/jade-db
  • paper_authors: Mi Zhang, Xudong Pan, Min Yang
  • for: 这个论文的目的是提出一种名为JADE的目标语言杂化平台,用于同时破坏多种广泛使用的中文和英文语言模型(LLMs)。
  • methods: JADE使用诺曼·钱博士的变换生成语法理论,将seed问题的语言复杂度逐步增加,直到破坏LLMs的安全防护。
  • results: JADE可以同时破坏多种中文和英文LLMs,并且生成了一些不安全的问题,其中大多数问题都能够让LLMs生成不良的回答。在 average unsafe generation ratio 为 70% 的情况下,这些问题仍然是自然、流畅的。
    Abstract In this paper, we present JADE, a targeted linguistic fuzzing platform which strengthens the linguistic complexity of seed questions to simultaneously and consistently break a wide range of widely-used LLMs categorized in three groups: eight open-sourced Chinese, six commercial Chinese and four commercial English LLMs. JADE generates three safety benchmarks for the three groups of LLMs, which contain unsafe questions that are highly threatening: the questions simultaneously trigger harmful generation of multiple LLMs, with an average unsafe generation ratio of $70\%$ (please see the table below), while are still natural questions, fluent and preserving the core unsafe semantics. We release the benchmark demos generated for commercial English LLMs and open-sourced English LLMs in the following link: https://github.com/whitzard-ai/jade-db. For readers who are interested in evaluating on more questions generated by JADE, please contact us. JADE is based on Noam Chomsky's seminal theory of transformational-generative grammar. Given a seed question with unsafe intention, JADE invokes a sequence of generative and transformational rules to increment the complexity of the syntactic structure of the original question, until the safety guardrail is broken. Our key insight is: Due to the complexity of human language, most of the current best LLMs can hardly recognize the invariant evil from the infinite number of different syntactic structures which form an unbound example space that can never be fully covered. Technically, the generative/transformative rules are constructed by native speakers of the languages, and, once developed, can be used to automatically grow and transform the parse tree of a given question, until the guardrail is broken. For more evaluation results and demo, please check our website: https://whitzard-ai.github.io/jade.html.
    摘要 在这篇论文中,我们介绍了JADE,一种针对性语言扩散平台,强化了种子问题的语言复杂度,同时并不断破坏了多种广泛使用的中文和英文语言模型。JADE生成了三个安全指标 для这三个类型的语言模型,包括 unsafe 问题,这些问题同时触发了多种语言模型的危险生成,平均危险生成率为70%(请参考下面的表),但是仍然是自然的问题,流畅而且保留了核心危险 semantics。我们在以下链接上发布了商业英文语言模型和开源英文语言模型的示例数据:https://github.com/whitzard-ai/jade-db。如果您有兴趣evaluate更多由JADE生成的问题,请与我们联系。JADE基于诺姆·钱百列的transformational-generative grammar理论。给定一个带有危险意图的种子问题,JADE采用一系列生成和transformational规则,逐步增加问题的语法结构复杂度,直到破坏安全 guardrail。我们的关键发现是:由于人类语言的复杂性,现有的最佳语言模型很难正确识别不同语法结构中的恶势力。技术上,生成/transformative规则由本地语言专家构建,并一旦开发,可以自动增长和转换问题的parse树,直到 guardrail 被破坏。更多评估结果和示例,请查看我们的网站:https://whitzard-ai.github.io/jade.html。

Re-Scoring Using Image-Language Similarity for Few-Shot Object Detection

  • paper_url: http://arxiv.org/abs/2311.00278
  • repo_url: None
  • paper_authors: Min Jae Jung, Seung Dae Han, Joohee Kim
  • for: 本研究旨在提高几个标注数据的 объек检测性能,特别是检测新的对象。
  • methods: 本研究使用了 Contrastive Language-Image Pre-training (CLIP) 和 hard negative classification loss 来改进对象检测性能。
  • results: 经验表明,提出的 RISF 方法在 MS-COCO 和 PASCAL VOC 上具有显著的性能提升,substantially 超越了现有的方法。
    Abstract Few-shot object detection, which focuses on detecting novel objects with few labels, is an emerging challenge in the community. Recent studies show that adapting a pre-trained model or modified loss function can improve performance. In this paper, we explore leveraging the power of Contrastive Language-Image Pre-training (CLIP) and hard negative classification loss in low data setting. Specifically, we propose Re-scoring using Image-language Similarity for Few-shot object detection (RISF) which extends Faster R-CNN by introducing Calibration Module using CLIP (CM-CLIP) and Background Negative Re-scale Loss (BNRL). The former adapts CLIP, which performs zero-shot classification, to re-score the classification scores of a detector using image-class similarities, the latter is modified classification loss considering the punishment for fake backgrounds as well as confusing categories on a generalized few-shot object detection dataset. Extensive experiments on MS-COCO and PASCAL VOC show that the proposed RISF substantially outperforms the state-of-the-art approaches. The code will be available.
    摘要 “几何 shot 物体检测,强调检测新物体几个标签,是当前社区的一个崛起挑战。 latest studies show that modifying a pre-trained model or loss function can improve performance. In this paper, we explore using the power of Contrastive Language-Image Pre-training (CLIP) and hard negative classification loss in low data setting. Specifically, we propose Re-scoring using Image-language Similarity for Few-shot object detection (RISF) which extends Faster R-CNN by introducing Calibration Module using CLIP (CM-CLIP) and Background Negative Re-scale Loss (BNRL). The former adapts CLIP, which performs zero-shot classification, to re-score the classification scores of a detector using image-class similarities, the latter is modified classification loss considering the punishment for fake backgrounds as well as confusing categories on a generalized few-shot object detection dataset. Extensive experiments on MS-COCO and PASCAL VOC show that the proposed RISF substantially outperforms the state-of-the-art approaches. The code will be available.”Here's the breakdown of the translation:* 几何 shot (few-shot) 物体检测 (object detection)* 强调 (emphasize) 新物体 (novel objects) 几个标签 (few labels)* latest studies (最近的研究) show (显示) that (that) modifying (修改) a pre-trained model (预训练模型) or loss function (损失函数) can improve (提高) performance.* In this paper (在这篇论文中), we explore (探索) using the power (使用) of Contrastive Language-Image Pre-training (CLIP) and hard negative classification loss (困难的负类别损失) in low data setting (低数据设定).* Specifically (特别), we propose (提议) Re-scoring using Image-language Similarity for Few-shot object detection (RISF) which extends (扩展) Faster R-CNN by introducing (引入) Calibration Module using CLIP (CM-CLIP) and Background Negative Re-scale Loss (BNRL).* The former (前者) adapts (适应) CLIP, which performs zero-shot classification (执行零批分类), to re-score (重新分类) the classification scores (分类分数) of a detector (检测器) using image-class similarities (图像类相似性).* The latter (后者) is modified (修改) classification loss (类别损失) considering (考虑) the punishment (惩罚) for fake backgrounds (假背景) as well as confusing categories (混淆类别) on a generalized few-shot object detection dataset (一般化的几何 shot 物体检测集).* Extensive experiments (广泛的实验) on MS-COCO and PASCAL VOC show (显示) that the proposed RISF substantially outperforms (显著超越) the state-of-the-art approaches (当前的方法).* The code will be available (代码将可用).

ChatCoder: Chat-based Refine Requirement Improves LLMs’ Code Generation

  • paper_url: http://arxiv.org/abs/2311.00272
  • repo_url: None
  • paper_authors: Zejun Wang, Jia Li, Ge Li, Zhi Jin
  • for: 提高大型自然语言处理模型对人类需求的理解和代码生成性能
  • methods: 通过人类与大型自然语言处理模型的对话方式,引导人类用户修改需求表达,使其更加精确、不ambiguous和完整
  • results: 实验显示,ChatCoder可以大幅提高现有大型自然语言处理模型的代码生成性能,同时比起修改基于需求的方法和人类回应基于模型 fine-tuning 方法更有优势。
    Abstract Large language models have shown good performances in generating code to meet human requirements. However, human requirements expressed in natural languages can be vague, incomplete, and ambiguous, leading large language models to misunderstand human requirements and make mistakes. Worse, it is difficult for a human user to refine the requirement. To help human users refine their requirements and improve large language models' code generation performances, we propose ChatCoder: a method to refine the requirements via chatting with large language models. We design a chat scheme in which the large language models will guide the human users to refine their expression of requirements to be more precise, unambiguous, and complete than before. Experiments show that ChatCoder has improved existing large language models' performance by a large margin. Besides, ChatCoder has the advantage over refine-based methods and LLMs fine-tuned via human response.
    摘要 大型自然语言模型已经表现出优秀的代码生成能力,但是人类需求表现往往是模糊、不完整和欠精确,导致大型自然语言模型 misunderstand 人类需求并发生错误。更糟糕的是,人类用户很难更正需求。为了帮助人类用户更正需求并提高大型自然语言模型的代码生成能力,我们提出了 ChatCoder:一种方法,通过与大型自然语言模型聊天,帮助人类用户更正需求,使其更精确、不模糊和完整。实验结果显示,ChatCoder 可以大幅提高现有大型自然语言模型的表现。此外,ChatCoder 比较于 refine-based 方法和 LLMS 通过人类回应进行 fine-tuning 来更有优势。

Rethinking Decision Transformer via Hierarchical Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2311.00267
  • repo_url: None
  • paper_authors: Yi Ma, Chenjun Xiao, Hebin Liang, Jianye Hao
  • for: 这篇论文旨在探讨决策变换(DT)算法在决策学习(RL)中的应用。
  • methods: 该论文提出了一种基于转换架构的普适序列模型框架,用于研究序列决策。在做决策时,高级策略首先提议理想的提示,而低级策略根据给定提示生成动作。研究发现,DT是这个框架的一个特殊情况,并讨论了这些选择的可能的失败。
  • results: 经验结果显示,提出的算法在多个控制和导航标准准chmark上显著超过DT。
    Abstract Decision Transformer (DT) is an innovative algorithm leveraging recent advances of the transformer architecture in reinforcement learning (RL). However, a notable limitation of DT is its reliance on recalling trajectories from datasets, losing the capability to seamlessly stitch sub-optimal trajectories together. In this work we introduce a general sequence modeling framework for studying sequential decision making through the lens of Hierarchical RL. At the time of making decisions, a high-level policy first proposes an ideal prompt for the current state, a low-level policy subsequently generates an action conditioned on the given prompt. We show DT emerges as a special case of this framework with certain choices of high-level and low-level policies, and discuss the potential failure of these choices. Inspired by these observations, we study how to jointly optimize the high-level and low-level policies to enable the stitching ability, which further leads to the development of new offline RL algorithms. Our empirical results clearly show that the proposed algorithms significantly surpass DT on several control and navigation benchmarks. We hope our contributions can inspire the integration of transformer architectures within the field of RL.
    摘要 Note:* "变转器架构" (transformer architecture) is translated as "变换器架构" in Simplified Chinese.* "Sequential Decision Making" (SDM) is translated as "Sequential Decision Making" in Simplified Chinese.* "高级策略" (high-level policy) is translated as "高级策略" in Simplified Chinese.* "低级策略" (low-level policy) is translated as "低级策略" in Simplified Chinese.* "做出决策" (make decisions) is translated as "做出决策" in Simplified Chinese.* "整体" (entirely) is translated as "整体" in Simplified Chinese.* "新的offline RL算法" (new offline RL algorithms) is translated as "新的offline RL算法" in Simplified Chinese.

Plug-and-Play Policy Planner for Large Language Model Powered Dialogue Agents

  • paper_url: http://arxiv.org/abs/2311.00262
  • repo_url: None
  • paper_authors: Yang Deng, Wenxuan Zhang, Wai Lam, See-Kiong Ng, Tat-Seng Chua
  • for: 该论文旨在提高语言模型(LLM)的对话政策规划能力,以便在对话中更加积极地与人类交互。
  • methods: 该论文提出了一种新的对话政策规划 paradigm,named PPDPP,它使用可调整的语言模型插件作为对话政策规划器,并通过监督微调和目标带动回馈来协助LLM拟合不同应用场景。
  • results: 实验结果表明,PPDPP在三种不同的积极对话应用中(包括谈判、情感支持和教学对话)具有显著优势,与现有方法相比,可以减少对话时间、提高对话质量和适应性。
    Abstract Proactive dialogues serve as a practical yet challenging dialogue problem in the era of large language models (LLMs), where the dialogue policy planning is the key to improving the proactivity of LLMs. Most existing studies enable the dialogue policy planning of LLMs using various prompting schemes or iteratively enhance this capability in handling the given case with verbal AI feedback. However, these approaches are either bounded by the policy planning capability of the frozen LLMs or hard to be transferred to new cases. In this work, we introduce a new dialogue policy planning paradigm to strategize LLMs for proactive dialogue problems with a tunable language model plug-in as a plug-and-play dialogue policy planner, named PPDPP. Specifically, we develop a novel training framework to facilitate supervised fine-tuning over available human-annotated data as well as reinforcement learning from goal-oriented AI feedback with dynamic interaction data collected by the LLM-based self-play simulation. In this manner, the LLM-powered dialogue agent can not only be generalized to different cases after the training, but also be applicable to different applications by just substituting the learned plug-in. In addition, we propose to evaluate the policy planning capability of dialogue systems under the interactive setting. Experimental results demonstrate that PPDPP consistently and substantially outperforms existing approaches on three different proactive dialogue applications, including negotiation, emotional support, and tutoring dialogues.
    摘要 大语言模型(LLM)的对话政策规划是一个实用又挑战性的对话问题,在这个时代,对话政策规划是改善 LLM 的核心。现有的研究通常使用不同的提示方案或逐步提高这个能力,但这些方法都受到固定 LLM 的政策规划能力的限制,或者对新情况难以转移。在这个工作中,我们介绍了一个新的对话政策规划方法,以便将 LLM 为主动对话问题的战略。 Specifically, we develop a novel training framework to facilitate supervised fine-tuning over available human-annotated data as well as reinforcement learning from goal-oriented AI feedback with dynamic interaction data collected by the LLM-based self-play simulation. In this manner, the LLM-powered dialogue agent can not only be generalized to different cases after the training, but also be applicable to different applications by just substituting the learned plug-in. In addition, we propose to evaluate the policy planning capability of dialogue systems under the interactive setting. Experimental results demonstrate that PPDPP consistently and substantially outperforms existing approaches on three different proactive dialogue applications, including negotiation, emotional support, and tutoring dialogues.

Implicit biases in multitask and continual learning from a backward error analysis perspective

  • paper_url: http://arxiv.org/abs/2311.00235
  • repo_url: None
  • paper_authors: Benoit Dherin
  • for: 这篇论文是关于使用回溯错误分析计算神经网络在多任务和继续学习 Setting 中的隐式训练偏好的研究。
  • methods: 这篇论文使用了 Stochastic Gradient Descent 训练神经网络,并 derive 了一些修改后的损失函数,其中包括原始损失函数、 converge 损失函数、隐式平滑化正则化项以及 conflict 项。
  • results: 研究发现,在多任务 Setting 中,conflict 项是一个已知的量,度量任务梯度的吸引力,而在继续学习 Setting 中,conflict 项是一个新的深度学习优化中的量,它是 differential geometry 中的 Lie 括号 between 任务梯度。
    Abstract Using backward error analysis, we compute implicit training biases in multitask and continual learning settings for neural networks trained with stochastic gradient descent. In particular, we derive modified losses that are implicitly minimized during training. They have three terms: the original loss, accounting for convergence, an implicit flatness regularization term proportional to the learning rate, and a last term, the conflict term, which can theoretically be detrimental to both convergence and implicit regularization. In multitask, the conflict term is a well-known quantity, measuring the gradient alignment between the tasks, while in continual learning the conflict term is a new quantity in deep learning optimization, although a basic tool in differential geometry: The Lie bracket between the task gradients.
    摘要 (使用倒数反析,我们计算了多任务和持续学习设置下神经网络在权重梯度下降法中的隐式训练偏见。特别是,我们 derivated modified losses,在训练中隐式地减少。它们包括三个项:原始损失,考虑到收敛,隐式平滑化规化项,卷积率相对,以及最后一个项,冲突项,可以 theoretically detrimental to both convergence and implicit regularization。在多任务中,冲突项是一个已知量,测量任务的梯度对齐,而在持续学习中,冲突项是一个新的深度学习优化工具,尽管是Diffgeometry中的一个基本工具:任务梯度的Lie括茧。)

StableFDG: Style and Attention Based Learning for Federated Domain Generalization

  • paper_url: http://arxiv.org/abs/2311.00227
  • repo_url: None
  • paper_authors: Jungwuk Park, Dong-Jun Han, Jinho Kim, Shiqiang Wang, Christopher G. Brinton, Jaekyun Moon
  • for: 本文提出了一种针对 Federated Learning (FL) 环境中的领域泛化(Domain Generalization,DG)问题的解决方案,以提高FL中的鲁棒性和通用性。
  • methods: 本文提出了两个重要贡献:首先是基于样式的学习策略,允许每个客户端在本地数据集中探索新的样式,提高领域多样性 based on 提出的样式分享、转移和探索策略。 其次是基于注意力的特征强调器,可以捕捉不同类别数据amples 之间的相似性,强调重要/共同特征,以更好地学习FL中的领域无关特征。
  • results: 实验结果表明,StableFDG 比现有的基elines 在多个 DG 标准 benchmark 数据集上表现出色, demonstrating its effectiveness.
    Abstract Traditional federated learning (FL) algorithms operate under the assumption that the data distributions at training (source domains) and testing (target domain) are the same. The fact that domain shifts often occur in practice necessitates equipping FL methods with a domain generalization (DG) capability. However, existing DG algorithms face fundamental challenges in FL setups due to the lack of samples/domains in each client's local dataset. In this paper, we propose StableFDG, a style and attention based learning strategy for accomplishing federated domain generalization, introducing two key contributions. The first is style-based learning, which enables each client to explore novel styles beyond the original source domains in its local dataset, improving domain diversity based on the proposed style sharing, shifting, and exploration strategies. Our second contribution is an attention-based feature highlighter, which captures the similarities between the features of data samples in the same class, and emphasizes the important/common characteristics to better learn the domain-invariant characteristics of each class in data-poor FL scenarios. Experimental results show that StableFDG outperforms existing baselines on various DG benchmark datasets, demonstrating its efficacy.
    摘要 <> translate "Traditional federated learning (FL) algorithms operate under the assumption that the data distributions at training (source domains) and testing (target domain) are the same. The fact that domain shifts often occur in practice necessitates equipping FL methods with a domain generalization (DG) capability. However, existing DG algorithms face fundamental challenges in FL setups due to the lack of samples/domains in each client's local dataset. In this paper, we propose StableFDG, a style and attention based learning strategy for accomplishing federated domain generalization, introducing two key contributions. The first is style-based learning, which enables each client to explore novel styles beyond the original source domains in its local dataset, improving domain diversity based on the proposed style sharing, shifting, and exploration strategies. Our second contribution is an attention-based feature highlighter, which captures the similarities between the features of data samples in the same class, and emphasizes the important/common characteristics to better learn the domain-invariant characteristics of each class in data-poor FL scenarios. Experimental results show that StableFDG outperforms existing baselines on various DG benchmark datasets, demonstrating its efficacy."中文翻译:传统的联合学习(FL)算法假设训练(源领域)和测试(目标领域)数据分布相同。然而,在实践中,频繁出现域shift问题,因此需要为FL方法增加域泛化(DG)能力。然而,现有的DG算法在FL设置中面临fundamental挑战,因为每个客户端的本地数据集中缺乏样本/域。在这篇论文中,我们提出了稳定FDG,一种风格和注意力基于学习策略,用于实现联合域泛化。我们的两大贡献是:首先,风格学习,允许每个客户端在本地数据集中探索新的风格,提高域多样性基于我们提出的风格分享、转换和探索策略。其次,我们提出了注意力基本特征强调器,可以捕捉数据示例在同一类型中的相似性,强调重要/共同特征,以更好地学习每个类型的域无关特征。实验结果表明,稳定FDG在多个DGbenchmark数据集上表现出色,证明其效果。

Domain decomposition-based coupling of physics-informed neural networks via the Schwarz alternating method

  • paper_url: http://arxiv.org/abs/2311.00224
  • repo_url: None
  • paper_authors: Will Snyder, Irina Tezaur, Christopher Wentland
  • for: 解决非线性偏微分方程(PDE)的数据驱动工具。
  • methods: 使用Schwarz alternating方法将PINN coupling到彼此和传统数值模型(FOM)。
  • results: 对一个一维稳态扩散-扩散方程进行数值研究,发现 coupling PINN via Schwarz alternating method可以提高PINN训练速度,但不一定加速PINN convergence。
    Abstract Physics-informed neural networks (PINNs) are appealing data-driven tools for solving and inferring solutions to nonlinear partial differential equations (PDEs). Unlike traditional neural networks (NNs), which train only on solution data, a PINN incorporates a PDE's residual into its loss function and trains to minimize the said residual at a set of collocation points in the solution domain. This paper explores the use of the Schwarz alternating method as a means to couple PINNs with each other and with conventional numerical models (i.e., full order models, or FOMs, obtained via the finite element, finite difference or finite volume methods) following a decomposition of the physical domain. It is well-known that training a PINN can be difficult when the PDE solution has steep gradients. We investigate herein the use of domain decomposition and the Schwarz alternating method as a means to accelerate the PINN training phase. Within this context, we explore different approaches for imposing Dirichlet boundary conditions within each subdomain PINN: weakly through the loss and/or strongly through a solution transformation. As a numerical example, we consider the one-dimensional steady state advection-diffusion equation in the advection-dominated (high Peclet) regime. Our results suggest that the convergence of the Schwarz method is strongly linked to the choice of boundary condition implementation within the PINNs being coupled. Surprisingly, strong enforcement of the Schwarz boundary conditions does not always lead to a faster convergence of the method. While it is not clear from our preliminary study that the PINN-PINN coupling via the Schwarz alternating method accelerates PINN convergence in the advection-dominated regime, it reveals that PINN training can be improved substantially for Peclet numbers as high as 1e6 by performing a PINN-FOM coupling.
    摘要 物理学 Informed Neural Networks (PINNs) 是一种吸引人的数据驱动工具,用于解决和推导非线性偏微分方程 (PDEs) 的解。与传统的神经网络 (NNs) 不同,PINNs 在训练过程中不仅学习解数据,还包含 PDE 的剩余在损失函数中,并在协调点上培训以降低这些剩余。本文研究了使用 Schwarz 交互方法将 PINNs 集成到传统的数值模型 (FOMs) 中,以实现域 decomposure 的目的。在训练 PINNs 时,如果解的梯度较大,可能会增加训练难度。我们在这里研究了使用域 decomposure 和 Schwarz 交互方法来加速 PINNs 训练阶段。在这个上下文中,我们还研究了不同的 Dirichlet 边界条件的实现方式,包括通过损失函数和/或强制实施解转换。我们的数字示例是一个一维不变 steady state 扩散-扩散 Equation 在扩散 доминиated (高 Peclet) режиме。我们的结果表明,Schwarz 方法的收敛与 PINNs 之间的边界条件实现方式有着强有力的关系。尽管使用强制实施边界条件可能会加速方法的收敛,但并不总是如此。我们的初步研究表明,在 Peclet 数为 1e6 的情况下,通过 PINNs-FOM 集成可以大幅提高 PINNs 的训练效率。

Can Large Language Models Capture Public Opinion about Global Warming? An Empirical Assessment of Algorithmic Fidelity and Bias

  • paper_url: http://arxiv.org/abs/2311.00217
  • repo_url: None
  • paper_authors: S. Lee, T. Q. Peng, M. H. Goldberg, S. A. Rosenthal, J. E. Kotcher, E. W. Maibach, A. Leiserowitz
  • for: This study assesses the algorithmic fidelity and bias of large language models (LLMs) in simulating survey responses, specifically in relation to climate change perspectives.
  • methods: The study uses two nationally representative climate change surveys and conditions LLMs on demographics and/or psychological covariates to simulate survey responses. GPT-4 is used as one of the LLMs and is found to perform better when conditioned on both demographics and covariates.
  • results: The study finds that LLMs can effectively capture presidential voting behaviors, but encounter challenges in accurately representing global warming perspectives when relevant covariates are not included. The study also identifies disparities in LLM estimations of the views of certain groups, with LLMs tending to underestimate worry about global warming among Black Americans.
    Abstract Large language models (LLMs) have demonstrated their potential in social science research by emulating human perceptions and behaviors, a concept referred to as algorithmic fidelity. This study assesses the algorithmic fidelity and bias of LLMs by utilizing two nationally representative climate change surveys. The LLMs were conditioned on demographics and/or psychological covariates to simulate survey responses. The findings indicate that LLMs can effectively capture presidential voting behaviors but encounter challenges in accurately representing global warming perspectives when relevant covariates are not included. GPT-4 exhibits improved performance when conditioned on both demographics and covariates. However, disparities emerge in LLM estimations of the views of certain groups, with LLMs tending to underestimate worry about global warming among Black Americans. While highlighting the potential of LLMs to aid social science research, these results underscore the importance of meticulous conditioning, model selection, survey question format, and bias assessment when employing LLMs for survey simulation. Further investigation into prompt engineering and algorithm auditing is essential to harness the power of LLMs while addressing their inherent limitations.
    摘要

Consistent Video-to-Video Transfer Using Synthetic Dataset

  • paper_url: http://arxiv.org/abs/2311.00213
  • repo_url: None
  • paper_authors: Jiaxin Cheng, Tianjun Xiao, Tong He
  • for: 文章旨在提出一种新的和高效的文本基于视频编辑方法,减少每个视频需要进行资源占用的模型特定finetuning。
  • methods: 我们的方法的核心是一个synthetic paired视频数据集,用于视频转换任务。我们 Drawing inspiration from Instruct Pix2Pix的图像转换via编辑指令,我们将这种 парадиг应用到视频领域。我们还引入了Long Video Sampling Correction,确保批处理中的长视频保持一致。
  • results: 我们的方法超越了现有的方法如Tune-A-Video,表明了文本基于视频编辑的重要进步,并开示了进一步探索和应用的潜在可能性。
    Abstract We introduce a novel and efficient approach for text-based video-to-video editing that eliminates the need for resource-intensive per-video-per-model finetuning. At the core of our approach is a synthetic paired video dataset tailored for video-to-video transfer tasks. Inspired by Instruct Pix2Pix's image transfer via editing instruction, we adapt this paradigm to the video domain. Extending the Prompt-to-Prompt to videos, we efficiently generate paired samples, each with an input video and its edited counterpart. Alongside this, we introduce the Long Video Sampling Correction during sampling, ensuring consistent long videos across batches. Our method surpasses current methods like Tune-A-Video, heralding substantial progress in text-based video-to-video editing and suggesting exciting avenues for further exploration and deployment.
    摘要 我们介绍了一种新的和高效的文本基于视频到视频编辑方法,消除了每个视频需要进行资源占用的精细化训练的需求。我们的方法的核心是一个人工合成的视频对应 dataset,专门用于视频转换任务。 Drawing inspiration from Instruct Pix2Pix的图像转换via编辑指令,我们将这种 парадиг adapted to the video domain。通过扩展Prompt-to-Prompt来视频,我们高效地生成了对应的样本,每个样本包括一个输入视频和其修改后的对应视频。此外,我们还引入了长视频抽样 corrections,以确保批处中的视频都是一致的长度。我们的方法超越了现有的方法 like Tune-A-Video, representing a significant progress in text-based video-to-video editing and opening up exciting avenues for further exploration and deployment.

Magmaw: Modality-Agnostic Adversarial Attacks on Machine Learning-Based Wireless Communication Systems

  • paper_url: http://arxiv.org/abs/2311.00207
  • repo_url: None
  • paper_authors: Jung-Woo Chang, Ke Sun, Nasimeh Heydaribeni, Seira Hidano, Xinyu Zhang, Farinaz Koushanfar
  • For: This paper proposes a black-box attack methodology called Magmaw that can generate universal adversarial perturbations for any multimodal signal transmitted over a wireless channel, targeting ML-based wireless systems.* Methods: Magmaw uses a combination of optimization techniques and machine learning algorithms to generate perturbations that are resilient to existing defense methods such as adversarial training and perturbation signal subtraction.* Results: The paper demonstrates the effectiveness of Magmaw through experimental results on a real-time wireless attack platform using a software-defined radio system, showing significant performance degradation even in the presence of defense mechanisms. Additionally, Magmaw is found to be effective against encrypted communication channels and conventional communications.Here’s the simplified Chinese text for the three key points:* For: 这篇论文提出了一种黑盒攻击方法,名为Magmaw,可以生成任何多模态信号通过无线通信频率的 universal adversarial 扰动。* Methods: Magmaw 使用了优化技术和机器学习算法来生成对防御机制不效的扰动。* Results: 论文通过实验示范了Magmaw 的效果,使用了一个基于软件定义 радио系统的实时无线攻击平台,并证明了Magmaw 在防御机制存在时仍然能够导致显著的性能下降。
    Abstract Machine Learning (ML) has been instrumental in enabling joint transceiver optimization by merging all physical layer blocks of the end-to-end wireless communication systems. Although there have been a number of adversarial attacks on ML-based wireless systems, the existing methods do not provide a comprehensive view including multi-modality of the source data, common physical layer components, and wireless domain constraints. This paper proposes Magmaw, the first black-box attack methodology capable of generating universal adversarial perturbations for any multimodal signal transmitted over a wireless channel. We further introduce new objectives for adversarial attacks on ML-based downstream applications. The resilience of the attack to the existing widely used defense methods of adversarial training and perturbation signal subtraction is experimentally verified. For proof-of-concept evaluation, we build a real-time wireless attack platform using a software-defined radio system. Experimental results demonstrate that Magmaw causes significant performance degradation even in the presence of the defense mechanisms. Surprisingly, Magmaw is also effective against encrypted communication channels and conventional communications.
    摘要

ChatGPT-Powered Hierarchical Comparisons for Image Classification

  • paper_url: http://arxiv.org/abs/2311.00206
  • repo_url: https://github.com/zhiyuan-r/chatgpt-powered-hierarchical-comparisons-for-image-classification
  • paper_authors: Zhiyuan Ren, Yiyang Su, Xiaoming Liu
  • for: 提出了一种新的图像分类框架,用于解决零例开放词汇图像分类 Task。
  • methods: 使用 CLIP 预训练视觉语言模型,并利用大型语言模型(LLMs)如 ChatGPT 提供类别特有的知识。
  • results: 提出了一种基于层次比较的图像分类方法,可以带来有意义、有效和可解释的结果。
    Abstract The zero-shot open-vocabulary challenge in image classification is tackled by pretrained vision-language models like CLIP, which benefit from incorporating class-specific knowledge from large language models (LLMs) like ChatGPT. However, biases in CLIP lead to similar descriptions for distinct but related classes, prompting our novel image classification framework via hierarchical comparisons: using LLMs to recursively group classes into hierarchies and classifying images by comparing image-text embeddings at each hierarchy level, resulting in an intuitive, effective, and explainable approach.
    摘要 CLIP和类型特定语言模型(LLM)如ChatGPT的视觉语言模型可以解决零架open-vocabulary挑战,但CLIP中的偏见导致类似的描述 для不同 yet related的类别,因此我们提出了一种新的图像分类框架:通过使用LLM来 recursively分类类别,并将图像与文本嵌入对比在每个层级上,从而实现直观、有效和可解释的方法。

Continuous Training and Fine-tuning for Domain-Specific Language Models in Medical Question Answering

  • paper_url: http://arxiv.org/abs/2311.00204
  • repo_url: None
  • paper_authors: Zhen Guo, Yining Hua
  • for: 这个研究旨在将大型语言模型训练为医学领域专家模型,以便应用不需要训练成本过高。
  • methods: 这个研究使用了连续训练和指令精炼方法,将Llama 2 base model逐渐适应中文医学领域。首先,使用10亿个中文医学参考文本进行连续训练,教育模型学习医学相关词汇和知识。然后,将模型精炼在54,000个中文医学考试例项上。
  • results: 实验结果显示,这种方法具有效果,可以训练出与GPT-3.5-turbo相比的模型,但需要训练时间和计算资源的投入相对较少。这个领域专家模型可以用于中文医学应用,同时也提供了领域专家模型训练的一个模板,可以应用于其他需要专家知识的领域,如法律、科学和工程。
    Abstract Large language models exhibit promising general capabilities but often lack specialized knowledge for domain-specific tasks. Developing domain experts from a base model enables a range of applications without prohibitive training costs. This work demonstrates a method using continuous training and instruction fine-tuning to rapidly adapt Llama 2 base models to the Chinese medical domain. We first conduct continuous training on 1B tokens from Chinese medical references to teach relevant vocabulary and knowledge. The models are then fine-tuned on 54K examples sourced from the Chinese National Medical Licensing Examination. Experiments on Chinese medical data confirm the effectiveness of this approach, producing a model comparable to GPT-3.5-turbo while using way less computational resource. The resulting domain-specific model could be useful for various Chinese medical applications. More broadly, this provides a template for domain-specific training of large language models in areas where pre-trained models lack the required expertise, such as law, science, and engineering.
    摘要 大型语言模型具有抢idthPromising的通用能力,但经常缺乏专业知识 для领域特定任务。将基本模型发展为领域专家,可以无需昂费训练成本,开辟多个应用程序。这个工作展示了一种使用连续训练和指导精炼方法快速地适应Llama 2基本模型到中文医学领域。我们首先通过10亿个中文医学参考文本进行连续训练,教育模型重要的词汇和知识。然后,我们对54,000个中文医学测验例题进行精炼,实验结果显示这种方法的有效性,可以与GPT-3.5-turbo相比,但用了很多 fewer computational resource。所有的领域专家模型可以用于多个中文医学应用程序。此外,这提供了对各个领域的大型语言模型特定训练的模板,例如法律、科学和工程。

ZEETAD: Adapting Pretrained Vision-Language Model for Zero-Shot End-to-End Temporal Action Detection

  • paper_url: http://arxiv.org/abs/2311.00729
  • repo_url: https://github.com/UARK-AICV/ZEETAD
  • paper_authors: Thinh Phan, Khoa Vo, Duy Le, Gianfranco Doretto, Donald Adjeroh, Ngan Le
  • for: 本研究旨在提高零例目标检测(TAD)的性能,特别是在无需大量标注数据的情况下。
  • methods: 本研究使用了两个模块:分别是一个基于转移器的 dual-localization 模块和一个基于 CLIP 的 zero-shot 提案类型检测模块。 dual-localization 模块可以在视频中检测动作事件,并选择ively收集关键的 semantic 嵌入,以便 later 的认知。 CLIP 模块可以从文本和帧输入中生成 semantic 嵌入。
  • results: 对 THUMOS14 和 ActivityNet-1.3 数据集进行了广泛的实验,结果显示我们的方法在零例目标检测中表现出色,并能够有效地将 ViL 模型传递知识到未看到的动作类别。
    Abstract Temporal action detection (TAD) involves the localization and classification of action instances within untrimmed videos. While standard TAD follows fully supervised learning with closed-set setting on large training data, recent zero-shot TAD methods showcase the promising of open-set setting by leveraging large-scale contrastive visual-language (ViL) pretrained models. However, existing zero-shot TAD methods have limitations on how to properly construct the strong relationships between two interdependent tasks of localization and classification and adapt ViL model to video understanding. In this work, we present ZEETAD, featuring two modules: dual-localization and zero-shot proposal classification. The former is a Transformer-based module that detects action events while selectively collecting crucial semantic embeddings for later recognition. The latter one, CLIP-based module, generates semantic embeddings from text and frame inputs for each temporal unit. Additionally, we enhance discriminative capability on unseen classes by minimally updating the frozen CLIP encoder with lightweight adapters. Extensive experiments on THUMOS14 and ActivityNet-1.3 datasets demonstrate our approach's superior performance in zero-shot TAD and effective knowledge transfer from ViL models to unseen action categories.
    摘要 Temporal action detection (TAD) 涉及到视频中的动作实例的地方化和分类。而标准的 TAD 采用完全监督学习,使用大量训练数据。而现有的零shot TAD 方法具有如何正确地建立两个相互依赖的任务的关系,并将 ViL 模型适应视频理解。在这种工作中,我们提出了 ZEETAD,它包括两个模块:双向本地化和零shot 提案分类。前者是基于 Transformer 的模块,检测动作事件,并选择ively 收集关键的 semantic 嵌入,以便后续的识别。后者是基于 CLIP 的模块,生成从文本和帧输入的 semantic 嵌入。此外,我们通过轻量级更新冰封 CLIP 编码器来增强对未看到的类型的推理能力。我们对 THUMOS14 和 ActivityNet-1.3 数据集进行了广泛的实验,并证明了我们的方法在零shot TAD 和将 ViL 模型传递到未经见过的动作类别的能力是superior。

Modeling subjectivity (by Mimicking Annotator Annotation) in toxic comment identification across diverse communities

  • paper_url: http://arxiv.org/abs/2311.00203
  • repo_url: None
  • paper_authors: Senjuti Dutta, Sid Mittal, Sherol Chen, Deepak Ramachandran, Ravi Rajakumar, Ian Kivlichan, Sunny Mak, Alena Butryna, Praveen Paritosh
  • for: 本研究旨在提高自动化内容审核系统的可靠性,通过模拟多样化社区的看法来减少人工审核的依赖。
  • methods: 研究使用了新 datasets 和现有的公共 datasets,以及 Large Language Model(LLM) 进行评估。
  • results: 研究发现,各个 annotator 群体之间存在主观性,这说明了多数投票法的缺陷。将主观标签作为训练数据的真实标签,将在未来对多样化社区中的恶意评论进行识别和审核。
    Abstract The prevalence and impact of toxic discussions online have made content moderation crucial.Automated systems can play a vital role in identifying toxicity, and reducing the reliance on human moderation.Nevertheless, identifying toxic comments for diverse communities continues to present challenges that are addressed in this paper.The two-part goal of this study is to(1)identify intuitive variances from annotator disagreement using quantitative analysis and (2)model the subjectivity of these viewpoints.To achieve our goal, we published a new dataset\footnote{\url{https://github.com/XXX} with expert annotators' annotations and used two other public datasets to identify the subjectivity of toxicity.Then leveraging the Large Language Model(LLM),we evaluate the model's ability to mimic diverse viewpoints on toxicity by varying size of the training data and utilizing same set of annotators as the test set used during model training and a separate set of annotators as the test set.We conclude that subjectivity is evident across all annotator groups, demonstrating the shortcomings of majority-rule voting. Moving forward, subjective annotations should serve as ground truth labels for training models for domains like toxicity in diverse communities.
    摘要 在线上的敏感讨论普遍和影响力大,内容审核已成为必备。自动化系统可以扮演重要的角色,识别毒单不易,并减少人工审核的依赖。然而,识别多元社群中的毒单仍然存在挑战,这些挑战在这篇论文中被解决。本研究的两个目标是:一、通过量化分析发现标签者间的差异,二、模拟不同观点的主观性。为了实现目标,我们发布了一个新的数据集\footnotemark[1],并使用了三个公共数据集来识别毒单的主观性。接着,我们运用了大型自然语言模型(LLM)来评估模型是否能够模拟多元社群中的不同观点,并随着训练数据的大小和使用相同的标签者组来训练模型和评估模型。我们发现,在所有标签者群体中,主观性都存在,这表明了多数决的缺陷。未来,将主观标签作为训练模型的参考参数,将有助于提高在多元社群中的内容审核。

Federated Natural Policy Gradient Methods for Multi-task Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2311.00201
  • repo_url: None
  • paper_authors: Tong Yang, Shicong Cen, Yuting Wei, Yuxin Chen, Yuejie Chi
  • for: 这个论文的目的是研究分布式决策的多个代理不共享本地数据轨迹。
  • methods: 这个论文使用的方法是 federated reinforcement learning(RL),它可以在多个分布式代理之间进行协同决策,而不需要共享本地数据轨迹。
  • results: 论文的结果表明,使用 federated vanilla 和 entropy-regularized natural policy gradient(NPG)方法可以在分布式环境中学习 globally optimal policy,并且可以在不同的网络大小和连接性下实现非 asymptotic 全球准确性保证。
    Abstract Federated reinforcement learning (RL) enables collaborative decision making of multiple distributed agents without sharing local data trajectories. In this work, we consider a multi-task setting, in which each agent has its own private reward function corresponding to different tasks, while sharing the same transition kernel of the environment. Focusing on infinite-horizon tabular Markov decision processes, the goal is to learn a globally optimal policy that maximizes the sum of the discounted total rewards of all the agents in a decentralized manner, where each agent only communicates with its neighbors over some prescribed graph topology. We develop federated vanilla and entropy-regularized natural policy gradient (NPG) methods under softmax parameterization, where gradient tracking is applied to the global Q-function to mitigate the impact of imperfect information sharing. We establish non-asymptotic global convergence guarantees under exact policy evaluation, which are nearly independent of the size of the state-action space and illuminate the impacts of network size and connectivity. To the best of our knowledge, this is the first time that global convergence is established for federated multi-task RL using policy optimization. Moreover, the convergence behavior of the proposed algorithms is robust against inexactness of policy evaluation.
    摘要 simult代码中文翻译<>多 Agent 联合强化学习(RL)可以在多个分布式 Agent 之间进行共同决策,不需要共享本地数据轨迹。在这个工作中,我们考虑了多任务 setting,每个 Agent 都有自己私有的私人奖励函数,对应不同的任务,而共享同一个环境转移核函数。我们的目标是在无穷远Tabular Markov决策过程中学习一个全局最优策略,以最大化所有 Agent 的折扣总奖励,在分布式方式下进行决策,每个 Agent 只与其邻居进行交流,并且在一定的图形结构上进行交流。我们开发了联邦vanilla和熵 regularized 自然策略梯度(NPG)方法,并使用 softmax 归一化,并在梯度跟踪技术下对全球Q函数进行梯度追踪,以避免因不完全信息共享而导致的影响。我们提出了不对极限的全球吞吐量保证,这些保证在状态动作空间的大小和精度级别上几乎是独立的,并且透视网络大小和连接度的影响。到目前为止,这是首次对联邦多任务 RL 使用策略优化进行全球吞吐量的全球吞吐量确认。此外,我们的方法的吞吐量行为对精度评估的不一致具有Robust性。

cs.CL - 2023-11-01

On The Open Prompt Challenge In Conditional Audio Generation

  • paper_url: http://arxiv.org/abs/2311.00897
  • repo_url: None
  • paper_authors: Ernie Chang, Sidd Srinivasan, Mahi Luthra, Pin-Jie Lin, Varun Nagaraja, Forrest Iandola, Zechun Liu, Zhaoheng Ni, Changsheng Zhao, Yangyang Shi, Vikas Chandra
  • for: 这个论文的目的是如何使用 TTA 模型来改善用户输入提示的音频生成质量。
  • methods: 这个论文使用了两个关键思想来解决用户提示挑战:首先,用户提示通常比训练提示更为简略,导致音频生成和提示之间存在大的启用差异。其次,存在一种音频描述分布,TTA 模型在这种分布下能够更好地生成更高质量的音频。
  • results: 该论文通过使用 instruction-tuned 模型重写提示,并通过margin ranking学习使用文本-音频对应为反馈信号,实现了对音频质量的改善。在对象和主观人类评价中,都观察到了明显的改善。
    Abstract Text-to-audio generation (TTA) produces audio from a text description, learning from pairs of audio samples and hand-annotated text. However, commercializing audio generation is challenging as user-input prompts are often under-specified when compared to text descriptions used to train TTA models. In this work, we treat TTA models as a ``blackbox'' and address the user prompt challenge with two key insights: (1) User prompts are generally under-specified, leading to a large alignment gap between user prompts and training prompts. (2) There is a distribution of audio descriptions for which TTA models are better at generating higher quality audio, which we refer to as ``audionese''. To this end, we rewrite prompts with instruction-tuned models and propose utilizing text-audio alignment as feedback signals via margin ranking learning for audio improvements. On both objective and subjective human evaluations, we observed marked improvements in both text-audio alignment and music audio quality.
    摘要 文本到声音生成(TTA)可以生成声音从文本描述,学习从声音样本和手动标注的文本对。但是,商业化声音生成具有挑战,因为用户输入提示通常与用于训练TTA模型的文本描述相比较少。在这项工作中,我们将TTA模型当做黑obox处理,并通过两个关键发现:(1)用户提示通常不够具体,导致用户提示和训练提示之间存在大的对齐差。(2)存在一个声音描述的分布,TTA模型在这个分布下能够更高质量的生成声音,我们称之为“audionese”。因此,我们将提示重新编写为 instruction-tuned 模型,并提出使用文本-声音对应为反馈信号via margin ranking学习来改善声音质量。在对象和主观人类评估中,我们观察到了明显改善的文本-声音对应和音乐声音质量。

In-Context Prompt Editing For Conditional Audio Generation

  • paper_url: http://arxiv.org/abs/2311.00895
  • repo_url: None
  • paper_authors: Ernie Chang, Pin-Jie Lin, Yang Li, Sidd Srinivasan, Gael Le Lan, David Kant, Yangyang Shi, Forrest Iandola, Vikas Chandra
  • for: 提高text-to-audio生成模型在实际数据上的部署,因为实际数据中的分布shift可能会使模型表现下降。
  • methods: Retrieval-based in-context prompt editing framework,利用训练Caption作为示例来修改用户提示。
  • results: 提高了用户提示集中的音频质量。
    Abstract Distributional shift is a central challenge in the deployment of machine learning models as they can be ill-equipped for real-world data. This is particularly evident in text-to-audio generation where the encoded representations are easily undermined by unseen prompts, which leads to the degradation of generated audio -- the limited set of the text-audio pairs remains inadequate for conditional audio generation in the wild as user prompts are under-specified. In particular, we observe a consistent audio quality degradation in generated audio samples with user prompts, as opposed to training set prompts. To this end, we present a retrieval-based in-context prompt editing framework that leverages the training captions as demonstrative exemplars to revisit the user prompts. We show that the framework enhanced the audio quality across the set of collected user prompts, which were edited with reference to the training captions as exemplars.
    摘要 将文本转换为简化中文模型在实际数据中部署时面临 distribuitional shift 挑战,这是因为模型可能不具备适应实际数据的能力。这种情况特别明显在文本到音频生成中, encoded 表示被不знакомые提示所损害,导致生成的音频质量下降。由于用户提交的提示集是有限的,因此 conditional audio generation 在野外是不充分的。我们发现,在用户提交的提示下,生成的音频样本的质量受到了影响,而使用训练集提示的情况下,音频质量更高。为此,我们提出了一种基于检索的上下文修改框架,利用训练caption作为示例来修改用户提交的提示。我们显示,该框架可以在用户提交的提示集中提高音频质量。

Pretraining Data Mixtures Enable Narrow Model Selection Capabilities in Transformer Models

  • paper_url: http://arxiv.org/abs/2311.00871
  • repo_url: None
  • paper_authors: Steve Yadlowsky, Lyric Doshi, Nilesh Tripuraneni
  • for: 本研究探讨了Transformer模型在无supervision的情况下,是否可以通过受限的数据集来学习新任务。
  • methods: 研究者采用了基于序列的$(x, f(x))$对的方法,以investigate transformer模型在不同任务家族之间的协同学习能力。
  • results: 实验结果表明,当任务家族在预训练数据中充分表现时,Transformer模型能够几乎协同学习新任务,但当任务或函数出现外域时,模型会表现出各种失败模式和泛化能力下降。
    Abstract Transformer models, notably large language models (LLMs), have the remarkable ability to perform in-context learning (ICL) -- to perform new tasks when prompted with unseen input-output examples without any explicit model training. In this work, we study how effectively transformers can bridge between their pretraining data mixture, comprised of multiple distinct task families, to identify and learn new tasks in-context which are both inside and outside the pretraining distribution. Building on previous work, we investigate this question in a controlled setting, where we study transformer models trained on sequences of $(x, f(x))$ pairs rather than natural language. Our empirical results show transformers demonstrate near-optimal unsupervised model selection capabilities, in their ability to first in-context identify different task families and in-context learn within them when the task families are well-represented in their pretraining data. However when presented with tasks or functions which are out-of-domain of their pretraining data, we demonstrate various failure modes of transformers and degradation of their generalization for even simple extrapolation tasks. Together our results highlight that the impressive ICL abilities of high-capacity sequence models may be more closely tied to the coverage of their pretraining data mixtures than inductive biases that create fundamental generalization capabilities.
    摘要 启发器模型,特别是大语言模型(LLM),有让人惊叹的能力:无需显式训练,就能在新的输入输出示例上进行学习。在这项工作中,我们研究了启发器模型在受过训练的数据混合中如何 bridge 到新任务上进行学习。我们在控制的环境下进行研究,我们研究了基于 sequences of (x, f(x)) pairs 而不是自然语言的启发器模型。我们的实验结果表明,启发器模型在受过训练的数据混合中能够准确地identify 新任务家族并在其中学习,当任务家族在受过训练数据中充分表示时。但当面临没有适应性的任务或函数时,我们 demonstate 启发器模型的多种失败模式和泛化能力的减退。这些结果表明,高容量序列模型的印象优秀ICL能力可能更加closely tied于其受过训练数据混合的覆盖率而不是基本的泛化能力。

Automatic Disfluency Detection from Untranscribed Speech

  • paper_url: http://arxiv.org/abs/2311.00867
  • repo_url: None
  • paper_authors: Amrit Romana, Kazuhito Koishida, Emily Mower Provost
  • for: 这个研究是为了提高自动异常流 speech 识别和分类。
  • methods: 这个研究使用语言、音频和多模态方法进行自动异常流 speech 识别和分类。
  • results: 研究发现,使用语音为输入的音频基于方法比语音识别系统来的方法更高效。此外,多模态架构也提高了异常流 speech 识别性能。
    Abstract Speech disfluencies, such as filled pauses or repetitions, are disruptions in the typical flow of speech. Stuttering is a speech disorder characterized by a high rate of disfluencies, but all individuals speak with some disfluencies and the rates of disfluencies may by increased by factors such as cognitive load. Clinically, automatic disfluency detection may help in treatment planning for individuals who stutter. Outside of the clinic, automatic disfluency detection may serve as a pre-processing step to improve natural language understanding in downstream applications. With this wide range of applications in mind, we investigate language, acoustic, and multimodal methods for frame-level automatic disfluency detection and categorization. Each of these methods relies on audio as an input. First, we evaluate several automatic speech recognition (ASR) systems in terms of their ability to transcribe disfluencies, measured using disfluency error rates. We then use these ASR transcripts as input to a language-based disfluency detection model. We find that disfluency detection performance is largely limited by the quality of transcripts and alignments. We find that an acoustic-based approach that does not require transcription as an intermediate step outperforms the ASR language approach. Finally, we present multimodal architectures which we find improve disfluency detection performance over the unimodal approaches. Ultimately, this work introduces novel approaches for automatic frame-level disfluency and categorization. In the long term, this will help researchers incorporate automatic disfluency detection into a range of applications.
    摘要 干扰性言语,如填充停顿或重复,是语言流动的干扰。吵吵吵是一种语言障碍,其特征是高率干扰,但所有人都会有一些干扰,并且干扰率可能会受因素如认知负担的影响。临床上,自动干扰检测可能会帮助治疗吵吵吵的人群。外部,自动干扰检测可能会作为下游应用程序的预处理步骤,以提高自然语言理解。为了实现这些应用,我们 investigate语言、音响和多模态方法 для自动干扰检测和分类。每种方法都依赖于音频输入。我们首先评估了多种自动语音识别(ASR)系统,以确定它们在捕捉干扰的能力。然后,我们使用这些ASR转译结果作为语言基于的干扰检测模型的输入。我们发现,干扰检测性能受转译和对齐的限制。我们还发现一种基于音响的方法,不需要转译作为中间步骤,可以超过语言基于的方法。最后,我们展示了多模态架构,我们发现它们可以提高干扰检测性能。总之,这项工作介绍了新的自动干扰检测和分类方法。长期来看,这将帮助研究人员在多种应用程序中自动检测干扰。

Calibrated Seq2seq Models for Efficient and Generalizable Ultra-fine Entity Typing

  • paper_url: http://arxiv.org/abs/2311.00835
  • repo_url: https://github.com/yanlinf/casent
  • paper_authors: Yanlin Feng, Adithya Pratapa, David R Mortensen
  • for: 这篇论文的目的是提出一种seq2seq模型,用于ultra-fine实体类型预测。
  • methods: 该模型使用约束搜索和自适应排序来生成多个类型,并使用一种新的准确抑制方法来转换Raw序列概率为信任分数。
  • results: 在UFET数据集上进行了广泛的实验,并取得了F1分数和准确性错误的最佳性能,同时实现了更 чем50倍的搜索速度。此外,在零shot和几shot设置下,模型也表现出了极好的泛化能力,并在特殊领域实体类型预测上超越了大型语言模型。
    Abstract Ultra-fine entity typing plays a crucial role in information extraction by predicting fine-grained semantic types for entity mentions in text. However, this task poses significant challenges due to the massive number of entity types in the output space. The current state-of-the-art approaches, based on standard multi-label classifiers or cross-encoder models, suffer from poor generalization performance or inefficient inference. In this paper, we present CASENT, a seq2seq model designed for ultra-fine entity typing that predicts ultra-fine types with calibrated confidence scores. Our model takes an entity mention as input and employs constrained beam search to generate multiple types autoregressively. The raw sequence probabilities associated with the predicted types are then transformed into confidence scores using a novel calibration method. We conduct extensive experiments on the UFET dataset which contains over 10k types. Our method outperforms the previous state-of-the-art in terms of F1 score and calibration error, while achieving an inference speedup of over 50 times. Additionally, we demonstrate the generalization capabilities of our model by evaluating it in zero-shot and few-shot settings on five specialized domain entity typing datasets that are unseen during training. Remarkably, our model outperforms large language models with 10 times more parameters in the zero-shot setting, and when fine-tuned on 50 examples, it significantly outperforms ChatGPT on all datasets. Our code, models and demo are available at https://github.com/yanlinf/CASENT.
    摘要 “ULTRA-细化实体类型标注在信息提取中扮演了关键角色,但这个任务受到巨量实体类型的输出空间的挑战。现有的状态 искусственный智能方法,基于标准多标签分类器或相关器模型,受到低效率和差异性的限制。在这篇论文中,我们提出了CASENT模型,这是一种seq2seq模型,用于ULTRA-细化实体类型标注。我们的模型从实体提及中提取实体类型,并使用约束搜索 beam来生成多个类型。然后,我们使用一种新的准确方法将Raw序列概率转换为信任分数。我们在UFET数据集上进行了广泛的实验,其中包含超过10,000个类型。我们的方法在F1分数和准确性错误方面超过前一个状态艺术,同时实现了更高的执行速度。此外,我们还证明了我们的模型在零shot和几shot设置中的普适性,在不同领域实体类型标注数据集上具有优秀表现。特别是,当与10次更多的参数的大语言模型进行比较时,在零shot设置中,我们的模型在所有数据集上表现出优异。代码、模型和示例可以在https://github.com/yanlinf/CASENT中找到。”

Construction Artifacts in Metaphor Identification Datasets

  • paper_url: http://arxiv.org/abs/2311.00790
  • repo_url: None
  • paper_authors: Joanne Boisson, Luis Espinosa-Anke, Jose Camacho-Collados
  • for: 本研究探讨了现有的比喻 indentification数据集是否可以被游戏。
  • methods: 作者使用了语言模型来测试这个假设,并发现了这些数据集中的偏见导致了模型的表现不佳。
  • results: 作者在不同的数据集和设置中测试了这个假设,并发现了这些数据集中的偏见导致了模型的表现不佳。
    Abstract Metaphor identification aims at understanding whether a given expression is used figuratively in context. However, in this paper we show how existing metaphor identification datasets can be gamed by fully ignoring the potential metaphorical expression or the context in which it occurs. We test this hypothesis in a variety of datasets and settings, and show that metaphor identification systems based on language models without complete information can be competitive with those using the full context. This is due to the construction procedures to build such datasets, which introduce unwanted biases for positive and negative classes. Finally, we test the same hypothesis on datasets that are carefully sampled from natural corpora and where this bias is not present, making these datasets more challenging and reliable.
    摘要 <>将文本翻译成简化中文。>表达identification目标是理解给定表达是在上下文中使用 figuratively。然而,在这篇论文中,我们展示了现有的比喻identification数据集可以被游戏,完全忽略可能的比喻表达或上下文。我们在多种数据集和设置下测试了这个假设,并显示了不完整的语言模型可以与基于全文的系统竞争。这是因为构建这些数据集的过程引入了不必要的偏见,导致了正确级别的分类。最后,我们对自然聚合体中精心采样的数据集进行了测试,这些数据集不受这种偏见的影响,使得它们更加具有挑战性和可靠性。

Language Model Training Paradigms for Clinical Feature Embeddings

  • paper_url: http://arxiv.org/abs/2311.00768
  • repo_url: https://github.com/yuroeth/icu_benchmarks
  • paper_authors: Yurong Hu, Manuel Burger, Gunnar Rätsch, Rita Kuznetsova
  • for: 医学时序序数据的缺乏量化问题中,研究领域使用表示学习,以提高医学时序序数据的可视化和分类性能。本文旨在提高医学时序序数据的表示学习,通过 derivation of universal embeddings for clinical features such as heart rate and blood pressure。
  • methods: 本文使用自动生成文本的自监督训练方法,使用语言模型来学习高质量的医学特征嵌入。通过不同的自监督训练方法,我们实现了更高的时间步长和患者级别的表示学习精度。
  • results: 我们使用不supervised dimension reduction techniques来可视化学习的嵌入,并发现与临床知识有高度的一致性。此外,我们还在MIMIC-III标准测试集上评估模型性能,并证明了使用医学特征嵌入可以提高模型的表达能力。
    Abstract In research areas with scarce data, representation learning plays a significant role. This work aims to enhance representation learning for clinical time series by deriving universal embeddings for clinical features, such as heart rate and blood pressure. We use self-supervised training paradigms for language models to learn high-quality clinical feature embeddings, achieving a finer granularity than existing time-step and patient-level representation learning. We visualize the learnt embeddings via unsupervised dimension reduction techniques and observe a high degree of consistency with prior clinical knowledge. We also evaluate the model performance on the MIMIC-III benchmark and demonstrate the effectiveness of using clinical feature embeddings. We publish our code online for replication.
    摘要 在医疗数据 scarcity 的研究领域,表示学习扮演着重要的角色。这项工作的目标是通过获取丰富的临床特征表示来增强临床时间序列的表示学习。我们使用自我超vised 训练方法来学习高质量的临床特征表示,实现了更高的粒度 than 现有的时间步和患者级别表示学习。我们使用无监督的减维技术来可见化学习得到的表示,并观察到了与临床知识的高度一致性。我们还在 MIMIC-III 测试集上评估模型性能,并证明使用临床特征表示可以获得有效的结果。我们在线发布代码,以便进行复现。

Challenges for Linguistically-Driven Computer-Based Sign Recognition from Continuous Signing for American Sign Language

  • paper_url: http://arxiv.org/abs/2311.00762
  • repo_url: None
  • paper_authors: Carol Neidle
  • for: 这篇论文主要写于计算机基于视频中识别隔离的注解符号的问题。
  • methods: 该论文主要介绍了识别注解符号的一些挑战,包括自然occurring的内部和外部签名同步变化,以及美国手语(ASL)的语言变体。
  • results: 论文还讨论了一些语言规律,可以帮助提高手势和注解符号识别的性能。
    Abstract There have been recent advances in computer-based recognition of isolated, citation-form signs from video. There are many challenges for such a task, not least the naturally occurring inter- and intra- signer synchronic variation in sign production, including sociolinguistic variation in the realization of certain signs. However, there are several significant factors that make recognition of signs from continuous signing an even more difficult problem. This article presents an overview of such challenges, based in part on findings from a large corpus of linguistically annotated video data for American Sign Language (ASL). Some linguistic regularities in the structure of signs that can boost handshape and sign recognition are also discussed.
    摘要 Recently, there have been advances in computer-based recognition of isolated, citation-form signs from video. However, there are many challenges for this task, including natural variations in sign production, such as sociolinguistic variations in the realization of certain signs. Moreover, recognition of signs from continuous signing is an even more difficult problem. This article provides an overview of these challenges, based on findings from a large corpus of linguistically annotated video data for American Sign Language (ASL). Additionally, some linguistic regularities in the structure of signs that can improve handshape and sign recognition are also discussed.Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. The translation may differ slightly from Traditional Chinese, which is used in Taiwan and other countries.

End-to-End Single-Channel Speaker-Turn Aware Conversational Speech Translation

  • paper_url: http://arxiv.org/abs/2311.00697
  • repo_url: https://github.com/amazon-science/stac-speech-translation
  • paper_authors: Juan Zuluaga-Gomez, Zhaocheng Huang, Xing Niu, Rohit Paturi, Sundararajan Srinivasan, Prashant Mathur, Brian Thompson, Marcello Federico
  • for: 这篇论文旨在解决单通道多说话人对话语音识别翻译中的泛化问题。
  • methods: 该模型采用了结束到终端的多任务培训模型,名为Speaker-Turn Aware Conversational Speech Translation,它结合了自动语音识别、语音翻译和说话人转移检测,使用特殊符号来标注序列化。
  • results: 在采用Fisher-CALLHOME数据集,并将单个说话人通道合并到一个多说话人通道中,实现了更真实和挑战性的多说话人对话场景。实验结果表明,我们的模型在多说话人条件下比参照系统表现出色,在单说话人条件下也达到了相对比较的性能。我们公开了数据处理和模型训练脚本。
    Abstract Conventional speech-to-text translation (ST) systems are trained on single-speaker utterances, and they may not generalize to real-life scenarios where the audio contains conversations by multiple speakers. In this paper, we tackle single-channel multi-speaker conversational ST with an end-to-end and multi-task training model, named Speaker-Turn Aware Conversational Speech Translation, that combines automatic speech recognition, speech translation and speaker turn detection using special tokens in a serialized labeling format. We run experiments on the Fisher-CALLHOME corpus, which we adapted by merging the two single-speaker channels into one multi-speaker channel, thus representing the more realistic and challenging scenario with multi-speaker turns and cross-talk. Experimental results across single- and multi-speaker conditions and against conventional ST systems, show that our model outperforms the reference systems on the multi-speaker condition, while attaining comparable performance on the single-speaker condition. We release scripts for data processing and model training.
    摘要 传统的语音到文本翻译(ST)系统通常在单个说话人的单个音频上训练,这些系统可能无法泛化到实际生活中的多个说话人对话场景。在这篇论文中,我们解决了单通道多说话人对话的语音到文本翻译问题,我们提出了一种综合和多任务训练模型,名为对话者转换意识涉及的语音翻译模型。我们使用特殊符号来检测说话者的转换,并将自动语音识别、语音翻译和说话者转换拼接在一起。我们在鱼客-CALLHOME corpus上进行了实验,将两个单个说话人的通道合并到一个多个说话人通道中,从而更真实地反映多个说话人之间的对话场景。我们对单个和多个说话人情况下的实验结果进行比较,并与传统的ST系统进行比较,结果显示我们的模型在多个说话人情况下超过参照系统,而在单个说话人情况下与参照系统相当。我们将数据处理脚本和模型训练脚本公开发布。

Little Giants: Exploring the Potential of Small LLMs as Evaluation Metrics in Summarization in the Eval4NLP 2023 Shared Task

  • paper_url: http://arxiv.org/abs/2311.00686
  • repo_url: None
  • paper_authors: Neema Kotonya, Saran Krishnasamy, Joel Tetreault, Alejandro Jaimes
  • for: 本研究征文描述了我们在2023年NLP共同任务中参与的尝试,该任务旨在评估使用提示技术来使大语言模型处理质量评估任务,特别是在翻译和摘要的评估中。
  • methods: 我们采用了多种提示技术,包括标准提示、根据注释员指导的提示和创新的链条提示。此外,我们还将这些方法与零批学习和一批学习方法结合使用,以 maximize我们的评估过程的效果。
  • results: 我们的工作表明,将这些方法结合使用,使用一个”小”的开源模型(orca_mini_v3_7B)可以获得竞争力强的结果。
    Abstract This paper describes and analyzes our participation in the 2023 Eval4NLP shared task, which focuses on assessing the effectiveness of prompt-based techniques to empower Large Language Models to handle the task of quality estimation, particularly in the context of evaluating machine translations and summaries. We conducted systematic experiments with various prompting techniques, including standard prompting, prompts informed by annotator instructions, and innovative chain-of-thought prompting. In addition, we integrated these approaches with zero-shot and one-shot learning methods to maximize the efficacy of our evaluation procedures. Our work reveals that combining these approaches using a "small", open source model (orca_mini_v3_7B) yields competitive results.
    摘要 这份论文描述了我们在2023年的Eval4NLP共同任务中的参与,这个任务旨在通过使用提示技术来让大语言模型进行质量评估,特别是在机器翻译和摘要的评估中。我们进行了系统化的实验,使用了不同的提示技术,包括标准提示、基于注释员指导的提示和创新的链条思维提示。此外,我们还将这些方法与零shot和一shot学习方法相结合,以最大化我们的评估过程的效果。我们的工作表明,将这些方法结合使用一个"小"的开源模型(orca_mini_v3_7B)可以获得竞争力强的结果。

Attention Alignment and Flexible Positional Embeddings Improve Transformer Length Extrapolation

  • paper_url: http://arxiv.org/abs/2311.00684
  • repo_url: None
  • paper_authors: Ta-Chung Chi, Ting-Han Fan, Alexander I. Rudnicky
  • for: 该论文旨在探讨如何使Transformer语言模型可以处理 longer than training length的序列,无需进行长序列细化。
  • methods: 该论文使用了 T5 家族的大型预训练语言模型,并 investigate了其位置嵌入的灵活性。
  • results: 该论文发现 T5 family 的位置嵌入可以捕捉到rich和灵活的注意模式,但是它们受到了长输入序列的扩散注意问题的困扰。该论文提出了两种注意协调策略,通过温度调整来解决这个问题,从而提高 T5 的长上下文利用能力。
    Abstract An ideal length-extrapolatable Transformer language model can handle sequences longer than the training length without any long sequence fine-tuning. Such long-context utilization capability highly relies on a flexible positional embedding design. Upon investigating the flexibility of existing large pre-trained Transformer language models, we find that the T5 family deserves a closer look, as its positional embeddings capture rich and flexible attention patterns. However, T5 suffers from the dispersed attention issue: the longer the input sequence, the flatter the attention distribution. To alleviate the issue, we propose two attention alignment strategies via temperature scaling. Our findings improve the long-context utilization capability of T5 on language modeling, retrieval, and multi-document question answering without any fine-tuning, suggesting that a flexible positional embedding design and attention alignment go a long way toward Transformer length extrapolation.\footnote{\url{https://github.com/chijames/Attention-Alignment-Transformer-Length-Extrapolation}
    摘要 一种理想的长度推导Transformer语言模型应该能够处理 longer than training length 的序列,而不需要任何长序细化。这种长context使用能力几乎完全取决于位置嵌入设计的灵活性。我们调查了现有大型预训练Transformer语言模型的 flexible positional embedding 设计,发现 T5 家族值得更加仔细研究,因为它的位置嵌入 capture 了丰富和灵活的注意模式。然而, T5 受到了分散注意 Issue:即输入序列越长,注意分布就越平坦。为了解决这问题,我们提出了两种注意对齐策略,通过温度扩大来实现。我们的发现提高了 T5 在语言模型、检索和多文档问答中的长context使用能力,无需任何细化,表明一种灵活的位置嵌入设计和注意对齐可以帮助Transformer长度推导。Note: The translation is in Simplified Chinese, which is a standardized form of Chinese used in mainland China and Singapore. The translation is based on the official translation of the text provided in the footnote.

Are Large Language Models Reliable Judges? A Study on the Factuality Evaluation Capabilities of LLMs

  • paper_url: http://arxiv.org/abs/2311.00681
  • repo_url: None
  • paper_authors: Xue-Yong Fu, Md Tahmid Rahman Laskar, Cheng Chen, Shashi Bhushan TN
  • for: 本研究探讨了Large Language Models(LLMs)作为文本生成模型生成的概要中的准确性评估者。
  • methods: 本研究提出了一种新的方法,使用单个LLM进行整个问答式准确性评估过程。然后,研究对不同的LLM进行了直接准确性评估,并对human annotation进行了比较。
  • results: 研究发现,与人类评估不符,LLMs之间存在显著的相关性,尤其是GPT-3.5在两个准确性子类型上显示出了良好的相关性。这些结果表明,目前的LLMs尚未具备正确评估准确性的能力。
    Abstract In recent years, Large Language Models (LLMs) have gained immense attention due to their notable emergent capabilities, surpassing those seen in earlier language models. A particularly intriguing application of LLMs is their role as evaluators for texts produced by various generative models. In this study, we delve into the potential of LLMs as reliable assessors of factual consistency in summaries generated by text-generation models. Initially, we introduce an innovative approach for factuality assessment using LLMs. This entails employing a singular LLM for the entirety of the question-answering-based factuality scoring process. Following this, we examine the efficacy of various LLMs in direct factuality scoring, benchmarking them against traditional measures and human annotations. Contrary to initial expectations, our results indicate a lack of significant correlations between factuality metrics and human evaluations, specifically for GPT-4 and PaLM-2. Notable correlations were only observed with GPT-3.5 across two factuality subcategories. These consistent findings across various factual error categories suggest a fundamental limitation in the current LLMs' capability to accurately gauge factuality. This version presents the information more concisely while maintaining the main points and findings of the original text.
    摘要 Recently, Large Language Models (LLMs) have received extensive attention due to their remarkable emergent capabilities, surpassing those of earlier language models. One fascinating application of LLMs is their ability to evaluate the factual consistency of texts generated by various generative models. In this study, we explore the potential of LLMs as reliable assessors of factual consistency in summaries produced by text-generation models. We propose an innovative approach for factuality assessment using LLMs, which involves using a single LLM for the entire question-answering-based factuality scoring process. We then compare the efficacy of various LLMs in direct factuality scoring, benchmarking them against traditional measures and human annotations.Surprisingly, our results indicate a lack of significant correlations between factuality metrics and human evaluations, particularly for GPT-4 and PaLM-2. Only GPT-3.5 showed notable correlations across two factuality subcategories. These consistent findings across various factual error categories suggest a fundamental limitation in the current LLMs' ability to accurately assess factuality.This version presents the information more concisely while maintaining the main points and findings of the original text.

Emotion Detection for Misinformation: A Review

  • paper_url: http://arxiv.org/abs/2311.00671
  • repo_url: None
  • paper_authors: Zhiwei Liu, Tianlin Zhang, Kailai Yang, Paul Thompson, Zeping Yu, Sophia Ananiadou
  • For: The paper focuses on the detection of misinformation (e.g., fake news and rumors) in social media, with a particular emphasis on the role of emotions and sentiments in distinguishing between genuine and false information.* Methods: The paper reviews a range of emotion-based methods for misinformation detection, including the use of emotion, sentiment, and stance-based features. These methods are analyzed in terms of their strengths and weaknesses.* Results: The paper discusses ongoing challenges in emotion-based misinformation detection, including the need for large, high-quality datasets, accurate annotation, and benchmarking. The authors also suggest future research directions, such as incorporating multimodality and improving interpretability.Here’s the same information in Simplified Chinese:* For: 这篇论文关注社交媒体中的谣言检测(如假新闻和谣言),特别是情感和 sentiment 在分辨真实和假信息中的作用。* Methods: 论文回顾了一系列基于情感、sentiment和立场的谣言检测方法,并分析了它们的优点和缺点。* Results: 论文讨论了谣言检测中的ongoing挑战,包括需要大量、高质量的数据、准确的注释和benchmarking。作者还提出了未来研究方向,如多 modal 和提高可读性。
    Abstract With the advent of social media, an increasing number of netizens are sharing and reading posts and news online. However, the huge volumes of misinformation (e.g., fake news and rumors) that flood the internet can adversely affect people's lives, and have resulted in the emergence of rumor and fake news detection as a hot research topic. The emotions and sentiments of netizens, as expressed in social media posts and news, constitute important factors that can help to distinguish fake news from genuine news and to understand the spread of rumors. This article comprehensively reviews emotion-based methods for misinformation detection. We begin by explaining the strong links between emotions and misinformation. We subsequently provide a detailed analysis of a range of misinformation detection methods that employ a variety of emotion, sentiment and stance-based features, and describe their strengths and weaknesses. Finally, we discuss a number of ongoing challenges in emotion-based misinformation detection based on large language models and suggest future research directions, including data collection (multi-platform, multilingual), annotation, benchmark, multimodality, and interpretability.
    摘要 We begin by discussing the strong connections between emotions and misinformation. We then provide a detailed analysis of a variety of misinformation detection methods that use emotion, sentiment, and stance-based features, and describe their strengths and weaknesses. Finally, we address ongoing challenges in emotion-based misinformation detection using large language models and suggest future research directions, including data collection (multi-platform, multilingual), annotation, benchmarking, multimodality, and interpretability.

Explicit Morphological Knowledge Improves Pre-training of Language Models for Hebrew

  • paper_url: http://arxiv.org/abs/2311.00658
  • repo_url: None
  • paper_authors: Eylon Gueta, Omer Goldman, Reut Tsarfaty
  • for: 研究 Whether incorporating explicit morphological knowledge can improve the performance of pre-trained language models (PLMs) for morphologically-rich languages (MRLs).
  • methods: 提议 various morphologically driven tokenization methods to enable the model to leverage morphological cues beyond raw text.
  • results: 实验 Results show that morphologically driven tokenization demonstrates improved results compared to a standard language-agnostic tokenization, on a benchmark of both semantic and morphologic tasks.
    Abstract Pre-trained language models (PLMs) have shown remarkable successes in acquiring a wide range of linguistic knowledge, relying solely on self-supervised training on text streams. Nevertheless, the effectiveness of this language-agnostic approach has been frequently questioned for its sub-optimal performance when applied to morphologically-rich languages (MRLs). We investigate the hypothesis that incorporating explicit morphological knowledge in the pre-training phase can improve the performance of PLMs for MRLs. We propose various morphologically driven tokenization methods enabling the model to leverage morphological cues beyond raw text. We pre-train multiple language models utilizing the different methods and evaluate them on Hebrew, a language with complex and highly ambiguous morphology. Our experiments show that morphologically driven tokenization demonstrates improved results compared to a standard language-agnostic tokenization, on a benchmark of both semantic and morphologic tasks. These findings suggest that incorporating morphological knowledge holds the potential for further improving PLMs for morphologically rich languages.
    摘要

Formal Translation from Reversing Petri Nets to Coloured Petri Nets

  • paper_url: http://arxiv.org/abs/2311.00629
  • repo_url: None
  • paper_authors: Kamila Barylska, Anna Gogolinska, Lukasz Mikulski, Anna Philippou, Marcin Piatkowski, Kyriaki Psara
  • for: 这篇论文旨在探讨反计算的扩展 computing paradigm,以及其在化学反应、量子计算、机器人和分布式系统等领域的应用。
  • methods: 这篇论文使用了修改 Petri nets 的方法,以实现反计算的三种主要形式,即回溯、 causal 反转和 out-of-causal-order 反转。这些修改包括使用名称的 токен,可以组合在一起形成键。
  • results: 这篇论文报告了一种可以处理多个名称的 токен的翻译方法,该方法可以将反计算 Petri nets 翻译成 Coloured Petri Nets (CPNs) 模型,并且可以自动处理反计算系统的分析和翻译。
    Abstract Reversible computation is an emerging computing paradigm that allows any sequence of operations to be executed in reverse order at any point during computation. Its appeal lies in its potential for lowpower computation and its relevance to a wide array of applications such as chemical reactions, quantum computation, robotics, and distributed systems. Reversing Petri nets are a recently-proposed extension of Petri nets that implements the three main forms of reversibility, namely, backtracking, causal reversing, and out-of-causal-order reversing. Their distinguishing feature is the use of named tokens that can be combined together to form bonds. Named tokens along with a history function, constitute the means of remembering past behaviour, thus, enabling reversal. In recent work, we have proposed a structural translation from a subclass of RPNs to the model of Coloured Petri Nets (CPNs), an extension of traditional Petri nets where tokens carry data values. In this paper, we extend the translation to handle RPNs with token multiplicity under the individual-token interpretation, a model which allows multiple tokens of the same type to exist in a system. To support the three types of reversibility, tokens are associated with their causal history and, while tokens of the same type are equally eligible to fire a transition when going forward, when going backwards they are able to reverse only the transitions they have previously fired. The new translation, in addition to lifting the restriction on token uniqueness, presents a refined approach for transforming RPNs to CPNs through a unifying approach that allows instantiating each of the three types of reversibility. The paper also reports on a tool that implements this translation, paving the way for automated translations and analysis of reversible systems using CPN Tools.
    摘要 “逆计算”是一种emerging computing paradigm,允许任何运算序列在computation中执行逆序。它的吸引力在于它的低功耗计算和它适用于广泛应用,如化学反应、量子计算、机器人和分布式系统。“复原”Petri nets是一种最近提出的扩展,实现了三种主要的逆向性,namely, backtracking、causal reversing和out-of-causal-order reversing。它的特点是使用名称的对象,可以组合在一起形成关联。名称、以及一个历史函数,使得可以记住过去的行为,因此实现逆向。在最近的工作中,我们已经提出了一种结构转换,将一 subclass of RPNs转换为Colored Petri Nets(CPNs)模型, Traditional Petri nets的扩展,其中Token carry data values。在这篇论文中,我们延伸了转换,以处理 RPNs with token multiplicity under the individual-token interpretation,一个允许多个同类型的Token在系统中存在的模型。为了支持三种逆向性,Token被 associate with its causal history,而当Token在前进时,它们可以将Transition firing,但当它们在逆向时,它们只能逆转它们以前燃烧过的Transition。新的转换,不仅解除了Token唯一性的限制,而且提供了一个统一的方法,可以实现将 RPNs 转换为 CPNs。论文还报告了一个工具,实现了这个转换,将来自逆向系统的自动转换和分析。

Crosslingual Retrieval Augmented In-context Learning for Bangla

  • paper_url: http://arxiv.org/abs/2311.00587
  • repo_url: None
  • paper_authors: Xiaoqian Li, Ercong Nie, Sheng Liang
  • for: 提高低资源语言如বাংলা的自然语言处理性能
  • methods: 利用跨语言检索增强在context学习
  • results: 跨语言检索增强的提高了多语言预训练语言模型(MPLMs)在বাংলা任务上的性能
    Abstract The promise of Large Language Models (LLMs) in Natural Language Processing has often been overshadowed by their limited performance in low-resource languages such as Bangla. To address this, our paper presents a pioneering approach that utilizes cross-lingual retrieval augmented in-context learning. By strategically sourcing semantically similar prompts from high-resource language, we enable multilingual pretrained language models (MPLMs), especially the generative model BLOOMZ, to successfully boost performance on Bangla tasks. Our extensive evaluation highlights that the cross-lingual retrieval augmented prompts bring steady improvements to MPLMs over the zero-shot performance.
    摘要 LLMs(大型自然语言处理语言模型)在自然语言处理方面的承诺经常被低资源语言如孟加拉语掩蔽。为解决这一问题,我们的论文提出了一种创新的方法,利用跨语言检索增强在语言上学习。我们策略性地从高资源语言中抽取相似的提示,使多语言预训练语言模型(MPLMs),特别是生成模型BLOOMZ,在孟加拉语任务上表现出色。我们的广泛评估表明,跨语言检索增强提示可以持续提高MPLMs的零shot性能。

Can Large Language Models Design Accurate Label Functions?

  • paper_url: http://arxiv.org/abs/2311.00739
  • repo_url: https://github.com/chrisneagu/FTC-Skystone-Dark-Angels-Romania-2020
  • paper_authors: Naiqing Guan, Kaiwen Chen, Nick Koudas
  • for: 这个论文主要用于探讨使用先验语言模型(PLM)自动生成高精度标签函数(LF)的可能性。
  • methods: 本研究使用了数据雕刻框架(DataSculpt),这是一种基于PLM的交互式框架,可以自动生成LF。研究者采用了多种提示技术、实例选择策略和LF筛选方法来探索广泛的设计空间。
  • results: 研究者在12个实际数据集上进行了广泛的评估,包括多种任务。评估结果显示了当前PLM在LF设计中的优势和局限性。
    Abstract Programmatic weak supervision methodologies facilitate the expedited labeling of extensive datasets through the use of label functions (LFs) that encapsulate heuristic data sources. Nonetheless, the creation of precise LFs necessitates domain expertise and substantial endeavors. Recent advances in pre-trained language models (PLMs) have exhibited substantial potential across diverse tasks. However, the capacity of PLMs to autonomously formulate accurate LFs remains an underexplored domain. In this research, we address this gap by introducing DataSculpt, an interactive framework that harnesses PLMs for the automated generation of LFs. Within DataSculpt, we incorporate an array of prompting techniques, instance selection strategies, and LF filtration methods to explore the expansive design landscape. Ultimately, we conduct a thorough assessment of DataSculpt's performance on 12 real-world datasets, encompassing a range of tasks. This evaluation unveils both the strengths and limitations of contemporary PLMs in LF design.
    摘要

An Embedded Diachronic Sense Change Model with a Case Study from Ancient Greek

  • paper_url: http://arxiv.org/abs/2311.00541
  • repo_url: https://github.com/schyanzafar/edisc
  • paper_authors: Schyan Zafar, Geoff K. Nicholls
  • for: 这个论文的目的是分析古希腊文本集的词语意思变化。
  • methods: 这个论文使用了无监督学习的GASC和DiSC生成模型,对target字(“kosmos”)的多个意思进行分析,并使用MCMC方法来衡量这些意思的变化趋势。
  • results: 该论文提出了EDiSC模型,它结合了单词嵌入和DiSC模型,可以提供更高的预测精度、真实恢复率和 uncertainty 量化,以及更好的MCMC方法的样本效率和扩展性。
    Abstract Word meanings change over time, and word senses evolve, emerge or die out in the process. For ancient languages, where the corpora are often small, sparse and noisy, modelling such changes accurately proves challenging, and quantifying uncertainty in sense-change estimates consequently becomes important. GASC and DiSC are existing generative models that have been used to analyse sense change for target words from an ancient Greek text corpus, using unsupervised learning without the help of any pre-training. These models represent the senses of a given target word such as "kosmos" (meaning decoration, order or world) as distributions over context words, and sense prevalence as a distribution over senses. The models are fitted using MCMC methods to measure temporal changes in these representations. In this paper, we introduce EDiSC, an embedded version of DiSC, which combines word embeddings with DiSC to provide superior model performance. We show empirically that EDiSC offers improved predictive accuracy, ground-truth recovery and uncertainty quantification, as well as better sampling efficiency and scalability properties with MCMC methods. We also discuss the challenges of fitting these models.
    摘要 word meanings change over time, and word senses evolve, emerge, or die out in the process. For ancient languages, where the corpora are often small, sparse, and noisy, modeling such changes accurately proves challenging, and quantifying uncertainty in sense-change estimates consequently becomes important. GASC and DiSC are existing generative models that have been used to analyze sense change for target words from an ancient Greek text corpus, using unsupervised learning without the help of any pre-training. these models represent the senses of a given target word, such as "kosmos" (meaning decoration, order, or world), as distributions over context words, and sense prevalence as a distribution over senses. the models are fitted using MCMC methods to measure temporal changes in these representations. in this paper, we introduce EDiSC, an embedded version of DiSC, which combines word embeddings with DiSC to provide superior model performance. we show empirically that EDiSC offers improved predictive accuracy, ground-truth recovery, and uncertainty quantification, as well as better sampling efficiency and scalability properties with MCMC methods. we also discuss the challenges of fitting these models.Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you prefer Traditional Chinese, I can provide that as well.

Text Rendering Strategies for Pixel Language Models

  • paper_url: http://arxiv.org/abs/2311.00522
  • repo_url: None
  • paper_authors: Jonas F. Lotz, Elizabeth Salesky, Phillip Rust, Desmond Elliott
  • for: 这篇论文主要针对的是开放词汇语言模型Pixel模型中的文本渲染方法。
  • methods: 论文使用了四种不同的文本渲染方法,包括单个字符大字符渲染、字符匹配渲染、字符串渲染和字符串匹配渲染。
  • results: 研究发现,使用单个字符大字符渲染方法可以提高句子级任务的性能,而不会对token级任务或多语言任务造成干扰。此外,使用这种渲染方法也可以降低模型的参数数量,从86M降低到22M,并且模型的性能仍然保持在同等水平。
    Abstract Pixel-based language models process text rendered as images, which allows them to handle any script, making them a promising approach to open vocabulary language modelling. However, recent approaches use text renderers that produce a large set of almost-equivalent input patches, which may prove sub-optimal for downstream tasks, due to redundancy in the input representations. In this paper, we investigate four approaches to rendering text in the PIXEL model (Rust et al., 2023), and find that simple character bigram rendering brings improved performance on sentence-level tasks without compromising performance on token-level or multilingual tasks. This new rendering strategy also makes it possible to train a more compact model with only 22M parameters that performs on par with the original 86M parameter model. Our analyses show that character bigram rendering leads to a consistently better model but with an anisotropic patch embedding space, driven by a patch frequency bias, highlighting the connections between image patch- and tokenization-based language models.
    摘要 Pixel基于的语言模型可以处理作为图像的文本,这使得它们成为开 vocabulary 语言模型的有力的方法。然而,最近的方法使用生成大量几乎相同的输入补充,这可能会导致下游任务中的重复性,从而降低性能。在这篇论文中,我们调查了PIXEL模型(Rust et al., 2023)中四种文本渲染方法,并发现简单的字符双字渲染可以提高句子级任务的性能,不会影响 Token 级或多语言任务的性能。这新的渲染策略还使得可以训练一个更加占用的模型,只有 22M 参数,它与原始 86M 参数模型具有相同的性能。我们的分析表明,字符双字渲染导致一个更好的模型,但是 embedding 空间具有不均匀的特征,即补充频率偏好,这显示了图像补充和tokenization基于语言模型之间的连接。

Rule-Based Error Classification for Analyzing Differences in Frequent Errors

  • paper_url: http://arxiv.org/abs/2311.00513
  • repo_url: None
  • paper_authors: Atsushi Shirafuji, Taku Matsumoto, Md Faizul Ibne Amin, Yutaka Watanobe
  • for: 本研究旨在揭示 novice 和 expert 程序员之间 Error 的差异。
  • methods: 我们提出了一种基于规则的 Error 分类工具,用于分类 code pairs 中的错误。
  • results: 我们对 95,631 个 code pairs 进行分类,平均错误数为 3.47。分析结果表明, novice 程序员的错误主要归结于programming知识的缺乏,而 expert 程序员的错误则主要归结于 problerm解决过程中的疏忽或不同于常规方法的解决方式。
    Abstract Finding and fixing errors is a time-consuming task not only for novice programmers but also for expert programmers. Prior work has identified frequent error patterns among various levels of programmers. However, the differences in the tendencies between novices and experts have yet to be revealed. From the knowledge of the frequent errors in each level of programmers, instructors will be able to provide helpful advice for each level of learners. In this paper, we propose a rule-based error classification tool to classify errors in code pairs consisting of wrong and correct programs. We classify errors for 95,631 code pairs and identify 3.47 errors on average, which are submitted by various levels of programmers on an online judge system. The classified errors are used to analyze the differences in frequent errors between novice and expert programmers. The analyzed results show that, as for the same introductory problems, errors made by novices are due to the lack of knowledge in programming, and the mistakes are considered an essential part of the learning process. On the other hand, errors made by experts are due to misunderstandings caused by the carelessness of reading problems or the challenges of solving problems differently than usual. The proposed tool can be used to create error-labeled datasets and for further code-related educational research.
    摘要 发现和修复错误是一项时间消耗的任务,不仅对于新手程序员而言,也对于专家程序员来说。先前的工作已经确定了不同级别程序员的错误模式的频繁性。然而,新手和专家之间的差异仍未得到揭示。通过了解每个级别程序员的错误频率,教师将能提供有用的建议。在这篇论文中,我们提议一种基于规则的错误分类工具,用于分类代码对中的错误和正确代码。我们对95631个代码对进行分类,并发现每个代码对的平均错误数为3.47。分类后的错误被用来分析新手和专家之间的错误差异。分析结果显示,对于同一些入门问题,新手的错误是由于缺乏编程知识,这些错误被视为学习过程中的必要部分。而专家的错误则是由于阅读问题不够仔细或解决问题不同于常见方式所致。我们的工具可以用于创建错误标注数据集和进一步的代码相关教育研究。

Robustness Tests for Automatic Machine Translation Metrics with Adversarial Attacks

  • paper_url: http://arxiv.org/abs/2311.00508
  • repo_url: https://github.com/i-need-sleep/eval_attack
  • paper_authors: Yichen Huang, Timothy Baldwin
  • for: 本研究探讨了MT评价指标在针对性Synthesized文本的性能,以探讨评价指标的稳定性。
  • methods: 我们使用了word-和character-level攻击对三种流行的机器翻译指标BERTScore、BLEURT和COMET进行实验。
  • results: 我们的人工实验表明,自动指标往往会对针对性下降的翻译文本进行过多的惩罚。此外,我们发现BERTScore指标存在不一致的问题,它将原始句子和针对性下降的句子视为相似,而将针对性下降的翻译文本视为与参考文本不符。这些异常情况激发了更多的robust指标的开发。
    Abstract We investigate MT evaluation metric performance on adversarially-synthesized texts, to shed light on metric robustness. We experiment with word- and character-level attacks on three popular machine translation metrics: BERTScore, BLEURT, and COMET. Our human experiments validate that automatic metrics tend to overpenalize adversarially-degraded translations. We also identify inconsistencies in BERTScore ratings, where it judges the original sentence and the adversarially-degraded one as similar, while judging the degraded translation as notably worse than the original with respect to the reference. We identify patterns of brittleness that motivate more robust metric development.
    摘要 我们研究MT评价指标性能在针对式 synthesized 文本上,以探讨指标Robustness。我们对三种流行的机器翻译指标:BERTScore、BLEURT和COMET进行实验,用单词和字符级攻击。我们的人工实验证明了自动指标往往对针对性下降的翻译进行过分罚。我们还发现BERTScore评分存在不一致性,它将原始句子和针对性下降的句子评分为相似,而它对参考的翻译进行评分时则评分较低。我们发现了指标脆弱的特征,这些特征驱动我们更加Robust指标的发展。

  • paper_url: http://arxiv.org/abs/2311.00488
  • repo_url: None
  • paper_authors: Hugo Fry, Seamus Fallows, Ian Fan, Jamie Wright, Nandi Schoots
  • for: 优化CCS搜索算法的目标,即recover大语言模型内部真实的表示。
  • methods: 提出了新的Midpoint-Displacement(MD)损失函数,并证明在某个参数值下,MD损失函数导致搜索器的 weights 与 CCS 相似。
  • results: MD 损失函数在certain hyper-parameter value下可以达到与 CCS 相似的搜索器 weights,并且further show that this hyper-parameter不是最佳值,可以通过更好的hyper-parameter来提高测试准确率。
    Abstract We investigate the optimization target of Contrast-Consistent Search (CCS), which aims to recover the internal representations of truth of a large language model. We present a new loss function that we call the Midpoint-Displacement (MD) loss function. We demonstrate that for a certain hyper-parameter value this MD loss function leads to a prober with very similar weights to CCS. We further show that this hyper-parameter is not optimal and that with a better hyper-parameter the MD loss function attains a higher test accuracy than CCS.
    摘要 我们研究对比搜索(CCS)优化目标,该目标是恢复大语言模型的内部真实性表示。我们提出了一个新的损失函数,称为中点差(MD)损失函数。我们示出了一个特定的 гипер参数值下,MD损失函数导致探测器的 веса与 CCS 非常相似。此外,我们还证明了这个 гипер参数并不是最佳的,并且通过更好的 гипер参数,MD 损失函数在测试准确率方面超过了 CCS。

Style Locality for Controllable Generation with kNN Language Models

  • paper_url: http://arxiv.org/abs/2311.00475
  • repo_url: None
  • paper_authors: Gilles Nawezi, Lucie Flek, Charles Welch
  • for: 这个论文主要是为了控制文本的风格和语言表达而研究的(control the style and language expression of text)
  • methods: 该论文使用了外部记忆和最近邻居语言模型(external memory and nearest neighbor language models),并在这些模型中添加了地域层次(locality levels)来学习如何对文本中的词语进行权重调整(weighting of words in text),以提高模型的性能。
  • results: 该研究发现,使用这种新的控制风格模型(novel approach)可以成功地控制文本的风格,并且提供了更好的流畅性-风格质量的平衡(better fluency-style trade-off)thanprevious work。
    Abstract Recent language models have been improved by the addition of external memory. Nearest neighbor language models retrieve similar contexts to assist in word prediction. The addition of locality levels allows a model to learn how to weight neighbors based on their relative location to the current text in source documents, and have been shown to further improve model performance. Nearest neighbor models have been explored for controllable generation but have not examined the use of locality levels. We present a novel approach for this purpose and evaluate it using automatic and human evaluation on politeness, formality, supportiveness, and toxicity textual data. We find that our model is successfully able to control style and provides a better fluency-style trade-off than previous work.
    摘要 近期语言模型已经得到了外部记忆的加入,以 nearest neighbor 语言模型为例,可以在word预测中提供类似的上下文,以帮助预测单词。通过不同的地方级别学习如何对当前文档中的邻居进行权重调整,可以进一步提高模型的性能。近邻模型在可控生成中也被研究,但没有检查了地方级别的使用。我们提出了一种新的方法,并通过自动和人工评估来评估其在政eness、正式、支持性和恶意等文本数据上的性能。我们发现我们的模型能够成功地控制样式,并提供了更好的流畅性-风格质量的交换。

Discourse Relations Classification and Cross-Framework Discourse Relation Classification Through the Lens of Cognitive Dimensions: An Empirical Investigation

  • paper_url: http://arxiv.org/abs/2311.00451
  • repo_url: None
  • paper_authors: Yingxue Fu
  • for: 本研究旨在捕捉不同框架下的干扰关系,并使用简单的认知启发的维度来描述这些关系。
  • methods: 本研究使用了Sanders等人(2018)提出的简单维度来捕捉干扰关系,并进行了跨框架的干扰关系分类。
  • results: 研究发现,使用这些维度可以 Transfer Knowledge 从一个框架到另一个框架,并且不同的维度对不同的干扰关系有不同的影响。
    Abstract Existing discourse formalisms use different taxonomies of discourse relations, which require expert knowledge to understand, posing a challenge for annotation and automatic classification. We show that discourse relations can be effectively captured by some simple cognitively inspired dimensions proposed by Sanders et al.(2018). Our experiments on cross-framework discourse relation classification (PDTB & RST) demonstrate that it is possible to transfer knowledge of discourse relations for one framework to another framework by means of these dimensions, in spite of differences in discourse segmentation of the two frameworks. This manifests the effectiveness of these dimensions in characterizing discourse relations across frameworks. Ablation studies reveal that different dimensions influence different types of discourse relations. The patterns can be explained by the role of dimensions in characterizing and distinguishing different relations. We also report our experimental results on automatic prediction of these dimensions.
    摘要 现有的话语形式学派使用不同的话语关系称号,这需要专家知识来理解,对于标注和自动分类而言是一大挑战。我们表明了使用沙德等人(2018)所提出的一些简单的认知启发的维度可以有效地捕捉话语关系。我们的实验表明,可以将一个框架中的话语关系转移到另一个框架中,这些维度的帮助下,即使两个框架之间存在话语分 segmentation 的差异。这说明了这些维度在不同框架之间的话语关系capture的效果。我们也进行了删除研究,发现不同的维度对不同的话语关系产生不同的影响。这些模式可以由这些维度在话语关系之间的角色从中解释。此外,我们还报告了自动预测这些维度的实验结果。

Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling

  • paper_url: http://arxiv.org/abs/2311.00430
  • repo_url: https://github.com/huggingface/distil-whisper
  • paper_authors: Sanchit Gandhi, Patrick von Platen, Alexander M. Rush
  • for: 这个研究的目的是为了将大型预训练的语音识别模型在具有低延迟和有限资源的环境中进行运行。
  • methods: 该研究使用pseudo-labeling技术 assemble了一个大规模的开源数据集,并使用这些数据集来缩小Whisper模型,称为Distil-Whisper。通过简单的单词错误率(WER)假设,选择了最高质量的pseudo-标签进行训练。
  • results: 相比Whisper模型,Distil-Whisper模型速度快5.8倍,具有51% fewer参数,在零基础转移设置下对异类数据进行测试时的Word Error Rate(WER)下降至1%。Distil-Whisper保持了Whisper模型对困难的声学条件的Robustness,而且对长形音频中的投射错误具有较好的鲁棒性。 Distil-Whisper可以与Whisper模型一起使用,以实现2倍的速度提升,而且数学上保证输出是与原始模型相同的。
    Abstract As the size of pre-trained speech recognition models increases, running these large models in low-latency or resource-constrained environments becomes challenging. In this work, we leverage pseudo-labelling to assemble a large-scale open-source dataset which we use to distill the Whisper model into a smaller variant, called Distil-Whisper. Using a simple word error rate (WER) heuristic, we select only the highest quality pseudo-labels for training. The distilled model is 5.8 times faster with 51% fewer parameters, while performing to within 1% WER on out-of-distribution test data in a zero-shot transfer setting. Distil-Whisper maintains the robustness of the Whisper model to difficult acoustic conditions, while being less prone to hallucination errors on long-form audio. Distil-Whisper is designed to be paired with Whisper for speculative decoding, yielding a 2 times speed-up while mathematically ensuring the same outputs as the original model. To facilitate further research in this domain, we make our training code, inference code and models publicly accessible.
    摘要 随着预训言语识别模型的大小增加,运行这些大模型在低延迟或资源受限的环境中变得具有挑战。在这项工作中,我们利用 Pseudo-label 技术组织了大规模的开源数据集,并使用简单的单词错误率(WER)匹配来选择最高质量的 Pseudo-labels 进行训练。经过筛选后,我们得到了一个名为 Distil-Whisper 的减小型,其速度比 Whisper 快5.8倍,参数量减少51%,并在零分配情况下保持 Whisper 模型的robustness,同时减少了对长形音频的幻觉错误。 Distil-Whisper 可以与 Whisper 集成,实现2倍的速度增加,并且数学保证输出与原始模型相同。为了促进这个领域的研究,我们将训练代码、推理代码和模型公开访问ible。

Efficient Human-AI Coordination via Preparatory Language-based Convention

  • paper_url: http://arxiv.org/abs/2311.00416
  • repo_url: None
  • paper_authors: Cong Guan, Lichao Zhang, Chunpeng Fan, Yichen Li, Feng Chen, Lihe Li, Yunjia Tian, Lei Yuan, Yang Yu
  • for: 本研究旨在开发智能代理人,以实现人工通用智能的目标。
  • methods: 我们利用大语言模型(LLM)来开发行动计划,以便指导人类和AI进行合作。
  • results: 我们的方法在实验环境中比现有的学习方法表现出更高的性能,并且在协调实际人类时达到了更好的人类偏好的对齐和15%的性能提升。
    Abstract Developing intelligent agents capable of seamless coordination with humans is a critical step towards achieving artificial general intelligence. Existing methods for human-AI coordination typically train an agent to coordinate with a diverse set of policies or with human models fitted from real human data. However, the massively diverse styles of human behavior present obstacles for AI systems with constrained capacity, while high quality human data may not be readily available in real-world scenarios. In this study, we observe that prior to coordination, humans engage in communication to establish conventions that specify individual roles and actions, making their coordination proceed in an orderly manner. Building upon this observation, we propose employing the large language model (LLM) to develop an action plan (or equivalently, a convention) that effectively guides both human and AI. By inputting task requirements, human preferences, the number of agents, and other pertinent information into the LLM, it can generate a comprehensive convention that facilitates a clear understanding of tasks and responsibilities for all parties involved. Furthermore, we demonstrate that decomposing the convention formulation problem into sub-problems with multiple new sessions being sequentially employed and human feedback, will yield a more efficient coordination convention. Experimental evaluations conducted in the Overcooked-AI environment, utilizing a human proxy model, highlight the superior performance of our proposed method compared to existing learning-based approaches. When coordinating with real humans, our method achieves better alignment with human preferences and an average performance improvement of 15% compared to the state-of-the-art.
    摘要 开发智能代理人可以与人类协同工作是人工一般智能的关键一步。现有的人类AI协同方法通常是训练一个代理人可以与多种政策或人类模型从实际人类数据中学习协同。然而,人类行为的极其多样性会对具有限制的AI系统带来难以解决的问题,而高质量的人类数据可能在实际 scenarios中不 readily available。在这项研究中,我们发现在协同之前,人类通常通过交流来确定协同的规则和行为,使其协同顺序进行。基于这一观察,我们提议使用大型自然语言模型(LLM)来开发一个行动计划(或等效地,一个公约),以便指导人类和AI进行协同。通过输入任务需求、人类偏好、代理人数量和其他相关信息到LLM,它可以生成一份全面的公约,以便所有参与者都能够快速理解任务和责任。此外,我们表明可以将公约的形式化问题分解成多个新会议的子问题,并采用人类反馈,以便更高效地协同。在Overcooked-AI环境中的实验评估中,我们的提议方法比现有的学习基本方法表现出更高的性能。当与真正的人类协同时,我们的方法可以更好地与人类偏好相匹配,并在平均上提高15%的性能相比于状态艺术。

AdaSent: Efficient Domain-Adapted Sentence Embeddings for Few-Shot Classification

  • paper_url: http://arxiv.org/abs/2311.00408
  • repo_url: https://github.com/ukplab/adasent
  • paper_authors: Yongxin Huang, Kexin Wang, Sourav Dutta, Raj Nath Patel, Goran Glavaš, Iryna Gurevych
  • for: investigate strategies for domain-specialization in the context of few-shot sentence classification with Pre-trained Sentence Encoders (SEs)
  • methods: unsupervised Domain-Adaptive Pre-Training (DAPT) of a base Pre-trained Language Model (PLM), training a SEPT adapter on the base PLM to decouple SEPT from DAPT
  • results: substantially improves the accuracy of few-shot sentence classification, matches or surpasses the performance of full SEPT on DAPT-ed PLM, while substantially reducing the training costs
    Abstract Recent work has found that few-shot sentence classification based on pre-trained Sentence Encoders (SEs) is efficient, robust, and effective. In this work, we investigate strategies for domain-specialization in the context of few-shot sentence classification with SEs. We first establish that unsupervised Domain-Adaptive Pre-Training (DAPT) of a base Pre-trained Language Model (PLM) (i.e., not an SE) substantially improves the accuracy of few-shot sentence classification by up to 8.4 points. However, applying DAPT on SEs, on the one hand, disrupts the effects of their (general-domain) Sentence Embedding Pre-Training (SEPT). On the other hand, applying general-domain SEPT on top of a domain-adapted base PLM (i.e., after DAPT) is effective but inefficient, since the computationally expensive SEPT needs to be executed on top of a DAPT-ed PLM of each domain. As a solution, we propose AdaSent, which decouples SEPT from DAPT by training a SEPT adapter on the base PLM. The adapter can be inserted into DAPT-ed PLMs from any domain. We demonstrate AdaSent's effectiveness in extensive experiments on 17 different few-shot sentence classification datasets. AdaSent matches or surpasses the performance of full SEPT on DAPT-ed PLM, while substantially reducing the training costs. The code for AdaSent is available.
    摘要 最近的研究发现,基于预训练的句子编码器(SE)的几个步骤分类是高效、可靠和有效的。在这项工作中,我们研究预训练的域特化策略在几个步骤分类中的应用。我们首先证明了不supervised域适应预训练(DAPT)基于基础预训练语言模型(PLM)可以很大程度上提高几个步骤分类的准确率,最高提高8.4个点。但是,在SE上进行DAPT会中断SEPT的效果,而在基础PLM上进行general-domain SEPT后再进行DAPT是有效的,但是 computationally expensive SEPT的计算成本会增加。为了解决这个问题,我们提出了AdaSent,它将SEPT和DAPT分离开来,通过在基础PLM上训练SEPT adapter来实现。adapter可以在任何域的DAPT-ed PLM中插入。我们在17个不同的几个步骤分类dataset上进行了广泛的实验,并证明了AdaSent的有效性。AdaSent可以与全SEPT在DAPT-ed PLM上进行比较,同时减少训练成本。代码可以在线获取。

Enhanced Knowledge Injection for Radiology Report Generation

  • paper_url: http://arxiv.org/abs/2311.00399
  • repo_url: None
  • paper_authors: Qingqiu Li, Jilan Xu, Runtian Yuan, Mohan Chen, Yuejie Zhang, Rui Feng, Xiaobo Zhang, Shang Gao
  • for: automated radiology report generation
  • methods: utilizes two branches (Weighted Concept Knowledge and Multimodal Retrieval Knowledge) to extract different types of knowledge and integrate with current image
  • results: achieves superior performance over other state-of-the-art methods, with effective knowledge injection and well-structured knowledge gain
    Abstract Automatic generation of radiology reports holds crucial clinical value, as it can alleviate substantial workload on radiologists and remind less experienced ones of potential anomalies. Despite the remarkable performance of various image captioning methods in the natural image field, generating accurate reports for medical images still faces challenges, i.e., disparities in visual and textual data, and lack of accurate domain knowledge. To address these issues, we propose an enhanced knowledge injection framework, which utilizes two branches to extract different types of knowledge. The Weighted Concept Knowledge (WCK) branch is responsible for introducing clinical medical concepts weighted by TF-IDF scores. The Multimodal Retrieval Knowledge (MRK) branch extracts triplets from similar reports, emphasizing crucial clinical information related to entity positions and existence. By integrating this finer-grained and well-structured knowledge with the current image, we are able to leverage the multi-source knowledge gain to ultimately facilitate more accurate report generation. Extensive experiments have been conducted on two public benchmarks, demonstrating that our method achieves superior performance over other state-of-the-art methods. Ablation studies further validate the effectiveness of two extracted knowledge sources.
    摘要

HARE: Explainable Hate Speech Detection with Step-by-Step Reasoning

  • paper_url: http://arxiv.org/abs/2311.00321
  • repo_url: https://github.com/joonkeekim/hare-hate-speech
  • paper_authors: Yongjin Yang, Joonkee Kim, Yujin Kim, Namgyu Ho, James Thorne, Se-young Yun
  • for: 本研究旨在减轻社交媒体上的仇恨言语检测,以确保在线安全。
  • methods: 本研究使用大语言模型(LLM)的逻辑能力填充现有的恶意语言检测描述的漏洞,以提供有效的检测模型超级vision。
  • results: 实验表明,使用我们的方法,使用模型生成的数据,可以超越基elines,使用现有的自由文本人工检测。我们的方法可以提高模型的解释质量和泛化性。
    Abstract With the proliferation of social media, accurate detection of hate speech has become critical to ensure safety online. To combat nuanced forms of hate speech, it is important to identify and thoroughly explain hate speech to help users understand its harmful effects. Recent benchmarks have attempted to tackle this issue by training generative models on free-text annotations of implications in hateful text. However, we find significant reasoning gaps in the existing annotations schemes, which may hinder the supervision of detection models. In this paper, we introduce a hate speech detection framework, HARE, which harnesses the reasoning capabilities of large language models (LLMs) to fill these gaps in explanations of hate speech, thus enabling effective supervision of detection models. Experiments on SBIC and Implicit Hate benchmarks show that our method, using model-generated data, consistently outperforms baselines, using existing free-text human annotations. Analysis demonstrates that our method enhances the explanation quality of trained models and improves generalization to unseen datasets. Our code is available at https://github.com/joonkeekim/hare-hate-speech.git.
    摘要 随着社交媒体的普及,精准地检测 hate speech 已成为确保在线安全的关键。为了对 nuanced 形式的 hate speech 进行有效的检测,需要准确地识别和解释 hate speech,以帮助用户理解其有害的影响。现有的标准化框架已经尝试解决这个问题,通过在 free-text 约束下训练生成模型。然而,我们发现现有的注释方案存在重要的理由差距,这可能会阻碍检测模型的超级vision。在这篇论文中,我们提出了一种 hate speech detection 框架,即 HARE,它利用大型语言模型(LLM)的理解能力来填充这些注释中的解释漏洞,从而为检测模型提供有效的超级vision。我们的实验表明,使用我们的方法,使用模型生成的数据,可以在 SBIC 和 Implicit Hate 标准底下 consistently 超越基elines,使用现有的 free-text 人类注释。分析表明,我们的方法可以提高训练模型的解释质量,并且可以提高对未看到的数据集的泛化能力。我们的代码可以在 中找到。

Data Augmentation for Code Translation with Comparable Corpora and Multiple References

  • paper_url: http://arxiv.org/abs/2311.00317
  • repo_url: https://github.com/Veronicium/CMTrans
  • paper_authors: Yiqing Xie, Atharva Naik, Daniel Fried, Carolyn Rose
  • for: 本文是关于编程语言之间代码翻译的研究,具体来说是使用数据扩充技术来解决翻译数据的限制问题。
  • methods: 本文提出了两种数据扩充技术,一种是建立可比较的代码对,另一种是对已有的平行数据进行多个参考翻译的扩充。特别是,使用自然语言文档生成代码的方法来建立可比较的代码对,并对可用的平行数据进行多个参考翻译的扩充,以增加翻译目标的多样性。
  • results: 实验结果表明,使用本文提出的数据扩充技术可以大幅提高CodeT5在Java、Python和C++之间的翻译精度,具体来说是提高了7.5%的计算准确率(CA@1),这 verify了翻译的正确性。codes可以在https://github.com/Veronicium/CMTrans中下载。
    Abstract One major challenge of translating code between programming languages is that parallel training data is often limited. To overcome this challenge, we present two data augmentation techniques, one that builds comparable corpora (i.e., code pairs with similar functionality), and another that augments existing parallel data with multiple reference translations. Specifically, we build and analyze multiple types of comparable corpora, including programs generated from natural language documentation using a code generation model. Furthermore, to reduce overfitting to a single reference translation, we automatically generate additional translation references for available parallel data and filter the translations by unit tests, which increases variation in target translations. Experiments show that our data augmentation techniques significantly improve CodeT5 for translation between Java, Python, and C++ by an average of 7.5% Computational Accuracy (CA@1), which verifies the correctness of translations by execution. The code is available at https://github.com/Veronicium/CMTrans.
    摘要 一个主要挑战在代码翻译 между编程语言是并发训练数据匮乏。为解决这个挑战,我们提出了两种数据增强技术,一种是建立相似功能的代码对照集(i.e., 代码对照集),另一种是对已有并发数据中的多个参考翻译进行增强。我们建立了多种类型的相似对照集,包括通过自然语言文档生成的代码。此外,为避免单个参考翻译的过拟合,我们自动生成了更多的翻译参考,并使用单元测试过滤掉过拟合的翻译。实验表明,我们的数据增强技术可以在 Java、Python 和 C++ 之间的代码翻译中提高 CodeT5 的平均计算准确率(CA@1)约为 7.5%,这verify了翻译的正确性。代码可以在 GitHub 上找到:https://github.com/Veronicium/CMTrans。

Probing Explicit and Implicit Gender Bias through LLM Conditional Text Generation

  • paper_url: http://arxiv.org/abs/2311.00306
  • repo_url: None
  • paper_authors: Xiangjue Dong, Yibo Wang, Philip S. Yu, James Caverlee
  • for: 本文旨在检测语言模型中的性别偏见,并提出一种基于三种输入策略的Conditional Text Generation机制,以检测LLMs中的显式和隐式性别偏见。
  • methods: 本文使用三种不同的输入策略来评测LLMs的性别偏见,包括使用随机输入、使用人名输入和使用各种语言模型输入。同时,本文还使用显式和隐式评价指标来评价LLMs中的性别偏见。
  • results: 实验结果表明,增大模型大小不一定能够提高公平性,并且所有测试的LLMs都表现出显式和/或隐式的性别偏见,即使输入中不含显式性别标签。
    Abstract Large Language Models (LLMs) can generate biased and toxic responses. Yet most prior work on LLM gender bias evaluation requires predefined gender-related phrases or gender stereotypes, which are challenging to be comprehensively collected and are limited to explicit bias evaluation. In addition, we believe that instances devoid of gender-related language or explicit stereotypes in inputs can still induce gender bias in LLMs. Thus, in this work, we propose a conditional text generation mechanism without the need for predefined gender phrases and stereotypes. This approach employs three types of inputs generated through three distinct strategies to probe LLMs, aiming to show evidence of explicit and implicit gender biases in LLMs. We also utilize explicit and implicit evaluation metrics to evaluate gender bias in LLMs under different strategies. Our experiments demonstrate that an increased model size does not consistently lead to enhanced fairness and all tested LLMs exhibit explicit and/or implicit gender bias, even when explicit gender stereotypes are absent in the inputs.
    摘要 Note: "Simplified Chinese" is a romanization of Chinese characters, which may not be exactly the same as the traditional Chinese characters used in mainland China. The translation is done using the "Simplified Chinese" format.

Detecting Syllable-Level Pronunciation Stress with A Self-Attention Model

  • paper_url: http://arxiv.org/abs/2311.00301
  • repo_url: https://github.com/wangweiying303/stress-detection-model
  • paper_authors: Wang Weiying, Nakajima Akinori
  • for: 本研究旨在开发一种自注意模型,用于检测英语口语中每个音节的强调水平。
  • methods: 本研究使用了多种语音和分类特征,包括抽象层的抽象特征、句子级别的语音特征和语音识别器等,输入到自注意模型中,并使用自注意机制对每个音节进行强调预测。
  • results: 研究发现,使用 simplest model 可以在不同的数据集上实现准确率高达88%和93%,而更先进的模型可以提供更高的准确率。这些模型可以应用于在线会议、英语学习等场景。
    Abstract One precondition of effective oral communication is that words should be pronounced clearly, especially for non-native speakers. Word stress is the key to clear and correct English, and misplacement of syllable stress may lead to misunderstandings. Thus, knowing the stress level is important for English speakers and learners. This paper presents a self-attention model to identify the stress level for each syllable of spoken English. Various prosodic and categorical features, including the pitch level, intensity, duration and type of the syllable and its nuclei (the vowel of the syllable), are explored. These features are input to the self-attention model, and syllable-level stresses are predicted. The simplest model yields an accuracy of over 88% and 93% on different datasets, while more advanced models provide higher accuracy. Our study suggests that the self-attention model can be promising in stress-level detection. These models could be applied to various scenarios, such as online meetings and English learning.
    摘要 一个重要的或al通信前提是话语应该清楚地发音,特别是非本地语言 speaker。话语重点是英语的关键,不正确的重点可能导致歧义。因此,了解话语重点非常重要 для英语 speaker和学习者。这篇论文提出了一种自注意模型,用于判断每个话语的重点水平。不同的prosodic和分类特征,包括话语的抽象水平、强度、持续时间和元音类型,都被探讨。这些特征作为输入,输入到自注意模型中,并预测每个话语的重点。最简单的模型的准确率高于88%和93%在不同的数据集上,而更先进的模型可以提供更高的准确率。我们的研究表明,自注意模型在重点水平检测中具有承诺。这些模型可以应用于不同的场景,如在线会议和英语学习。

Entity Alignment Method of Science and Technology Patent based on Graph Convolution Network and Information Fusion

  • paper_url: http://arxiv.org/abs/2311.00300
  • repo_url: None
  • paper_authors: Runze Fang, Yawen Li, Yingxia Shao, Zeli Guan, Zhe Xue
  • for: 提高科技专利知识图库中实体匹配的性能
  • methods: 基于图 convolution 网络和 BERT 模型,利用图 структуры信息和实体属性信息进行多信息融合,以提高实体匹配的精度
  • results: 在三个 Referenced 数据集上进行实验,评估指标都高于现有方法
    Abstract The entity alignment of science and technology patents aims to link the equivalent entities in the knowledge graph of different science and technology patent data sources. Most entity alignment methods only use graph neural network to obtain the embedding of graph structure or use attribute text description to obtain semantic representation, ignoring the process of multi-information fusion in science and technology patents. In order to make use of the graphic structure and auxiliary information such as the name, description and attribute of the patent entity, this paper proposes an entity alignment method based on the graph convolution network for science and technology patent information fusion. Through the graph convolution network and BERT model, the structure information and entity attribute information of the science and technology patent knowledge graph are embedded and represented to achieve multi-information fusion, thus improving the performance of entity alignment. Experiments on three benchmark data sets show that the proposed method Hit@K The evaluation indicators are better than the existing methods.
    摘要 <>科技与科学专利实体对应的实体对齐目标是将不同科技与科学专利数据源知识图中相应的实体相互对应。大多数实体对齐方法只使用图 neural network 获取图结构的嵌入或使用特征文本描述获取 semantic 表示,忽略了科技与科学专利中多种信息融合的过程。为了利用专利知识图中的图结构和辅助信息 such as 专利名称、描述和属性,本文提出了基于图 convolution network 和 BERT 模型的科技与科学专利信息融合的实体对齐方法。通过图 convolution network 和 BERT 模型,专利知识图中的结构信息和实体属性信息被嵌入和表示,实现多种信息融合,从而提高实体对齐的性能。对三个标准数据集进行实验,评估指标都比现有方法更好。>>>

Semantic Representation Learning of Scientific Literature based on Adaptive Feature and Graph Neural Network

  • paper_url: http://arxiv.org/abs/2311.00296
  • repo_url: None
  • paper_authors: Hongrui Gao, Yawen Li, Meiyu Liang, Zeli Guan, Zhe Xue
  • for: 本研究旨在提出一种基于适应特征和图 neural network的科学文献semantic representation学习方法,以增强科学文献的特征表示能力。
  • methods: 本方法首先引入适应特征方法,考虑了科学文献的全局和局部特征;然后使用图注意机制对科学文献的特征进行权重赋值,以更好地表示各个科学文献之间的相互关系。此外,本方法还提出了一种无监督图 neural network semantic representation学习方法,通过比较相互信息between科学文献的本地半语义表示和全局图semantic representation,使得图 neural network能够捕捉到各个科学文献之间的相互关系,提高了学习 semantic representation的能力。
  • results: 实验结果显示,基于适应特征和图 neural network的科学文献semantic representation学习方法在科学文献分类任务上具有竞争力,并实现了良好的result。
    Abstract Because most of the scientific literature data is unmarked, it makes semantic representation learning based on unsupervised graph become crucial. At the same time, in order to enrich the features of scientific literature, a learning method of semantic representation of scientific literature based on adaptive features and graph neural network is proposed. By introducing the adaptive feature method, the features of scientific literature are considered globally and locally. The graph attention mechanism is used to sum the features of scientific literature with citation relationship, and give each scientific literature different feature weights, so as to better express the correlation between the features of different scientific literature. In addition, an unsupervised graph neural network semantic representation learning method is proposed. By comparing the mutual information between the positive and negative local semantic representation of scientific literature and the global graph semantic representation in the potential space, the graph neural network can capture the local and global information, thus improving the learning ability of the semantic representation of scientific literature. The experimental results show that the proposed learning method of semantic representation of scientific literature based on adaptive feature and graph neural network is competitive on the basis of scientific literature classification, and has achieved good results.
    摘要 因为大多数科学文献数据未标注,使得基于无监督图的语义表示学习成为不可或缺的。同时,为了丰富科学文献的特征,一种基于适应特征和图神经网络的科学文献语义表示学习方法被提议。通过引入适应特征方法,科学文献的特征被考虑在全球和地方两个维度。使用图注意力机制将科学文献的特征相加,并给每个科学文献不同的特征重量,以更好地表示不同科学文献之间的相关性。此外,一种无监督图神经网络语义表示学习方法被提议。通过比较相互信息between科学文献的正向和负向本地语义表示和全球图semantic representation在潜在空间中,图神经网络可以捕捉本地和全球信息,从而提高语义表示学习的能力。实验结果表明,基于适应特征和图神经网络的科学文献语义表示学习方法在科学文献分类基础上具有竞争力,并取得了良好的结果。

IBADR: an Iterative Bias-Aware Dataset Refinement Framework for Debiasing NLU models

  • paper_url: http://arxiv.org/abs/2311.00292
  • repo_url: None
  • paper_authors: Xiaoyue Wang, Xin Liu, Lijie Wang, Yaoxiang Wang, Jinsong Su, Hua Wu
  • for: 本文旨在提出一种基于迭代偏见感知的自适应偏见识别框架(IBADR),以帮助自然语言理解(NLU)模型减少偏见。
  • methods: 本文使用的方法包括训练一个浅层模型来评估样本中偏见的程度,然后将每个样本与偏见指标相对应,并使用这些扩展样本来训练一个样本生成器。这个生成器可以学习偏见指标和样本之间的对应关系。最后,本文使用这个生成器生成具有更少偏见特征的 Pseudo 样本,并将其添加到样本池中。
  • results: 本文的实验结果和深入分析表明,IBADR 不仅可以显著超越现有的 dataset refinement 方法, дости得 State-of-the-Art 性能,而且可以与模型中心的方法兼容。
    Abstract As commonly-used methods for debiasing natural language understanding (NLU) models, dataset refinement approaches heavily rely on manual data analysis, and thus maybe unable to cover all the potential biased features. In this paper, we propose IBADR, an Iterative Bias-Aware Dataset Refinement framework, which debiases NLU models without predefining biased features. We maintain an iteratively expanded sample pool. Specifically, at each iteration, we first train a shallow model to quantify the bias degree of samples in the pool. Then, we pair each sample with a bias indicator representing its bias degree, and use these extended samples to train a sample generator. In this way, this generator can effectively learn the correspondence relationship between bias indicators and samples. Furthermore, we employ the generator to produce pseudo samples with fewer biased features by feeding specific bias indicators. Finally, we incorporate the generated pseudo samples into the pool. Experimental results and in-depth analyses on two NLU tasks show that IBADR not only significantly outperforms existing dataset refinement approaches, achieving SOTA, but also is compatible with model-centric methods.
    摘要 通常使用的自然语言理解(NLU)模型偏见纠正方法中,数据集精度方法依赖于手动数据分析,因此可能无法涵盖所有可能的偏见特征。在这篇论文中,我们提出了IBADR,一种迭代偏见感知数据集精度框架,可以无需先定偏见特征来纠正NLU模型。我们保持一个逐渐扩展的样本池。specifically,在每个迭代中,我们首先使用一个浅度模型来评估样本池中每个样本的偏见度。然后,我们对每个样本分配一个偏见指标,表示其偏见度。这些扩展的样本后来用于训练样本生成器。这样的生成器可以有效地学习样本和偏见指标之间的对应关系。此外,我们使用生成器生成具有更少偏见特征的 Pseudo 样本。最后,我们将生成的 Pseudo 样本添加到样本池中。实验结果和NLU任务的深入分析表明,IBADR不仅可以准确地纠正现有的数据集精度方法,同时也与模型中心的方法兼容。

SoulChat: Improving LLMs’ Empathy, Listening, and Comfort Abilities through Fine-tuning with Multi-turn Empathy Conversations

  • paper_url: http://arxiv.org/abs/2311.00273
  • repo_url: https://github.com/scutcyr/soulchat
  • paper_authors: Yirong Chen, Xiaofen Xing, Jingkai Lin, Huimin Zheng, Zhenyu Wang, Qi Liu, Xiangmin Xu
  • for: 这个论文是为了提高语言模型在心理咨询领域的Empathy能力而写的。
  • methods: 这个论文使用了多turn对话Context和更加接近心理咨询者的回应来finetune语言模型,以提高其Empathy能力。
  • results: 实验表明,通过使用多turn对话历史和更加接近心理咨询者的回应来finetune语言模型,可以显著提高语言模型的Empathy能力。
    Abstract Large language models (LLMs) have been widely applied in various fields due to their excellent capability for memorizing knowledge and chain of thought (CoT). When these language models are applied in the field of psychological counseling, they often rush to provide universal advice. However, when users seek psychological support, they need to gain empathy, trust, understanding and comfort, rather than just reasonable advice. To this end, we constructed a multi-turn empathetic conversation dataset of more than 2 million samples, in which the input is the multi-turn conversation context, and the target is empathetic responses that cover expressions such as questioning, comfort, recognition, listening, trust, emotional support, etc. Experiments have shown that the empathy ability of LLMs can be significantly enhanced when finetuning by using multi-turn dialogue history and responses that are closer to the expression of a psychological consultant.
    摘要

Syntactic Inductive Bias in Transformer Language Models: Especially Helpful for Low-Resource Languages?

  • paper_url: http://arxiv.org/abs/2311.00268
  • repo_url: https://github.com/lgessler/lr-sib
  • paper_authors: Luke Gessler, Nathan Schneider
  • for: 检验Transformer基于语言模型BERT的针对低资源语言的启发性 inductive bias 是否可以提高预训练过程中的性能。
  • methods: 使用针对低资源语言的语法结构 inductive bias 进行预训练。
  • results: 在五种低资源语言(维吾尔语、沃洛夫语、马耳他语、古埃及语、古希腊语)中,发现这些针对低资源语言的方法并不一定有益,很少有提高性能的效果。
    Abstract A line of work on Transformer-based language models such as BERT has attempted to use syntactic inductive bias to enhance the pretraining process, on the theory that building syntactic structure into the training process should reduce the amount of data needed for training. But such methods are often tested for high-resource languages such as English. In this work, we investigate whether these methods can compensate for data sparseness in low-resource languages, hypothesizing that they ought to be more effective for low-resource languages. We experiment with five low-resource languages: Uyghur, Wolof, Maltese, Coptic, and Ancient Greek. We find that these syntactic inductive bias methods produce uneven results in low-resource settings, and provide surprisingly little benefit in most cases.
    摘要 一些基于Transformer的语言模型,如BERT,尝试使用语法指导来增强预训练过程,理由是在训练过程中建立语法结构可以减少训练数据量。但这些方法通常在高资源语言如英语上进行测试。在这项工作中,我们研究了这些方法是否能在低资源语言中提供更好的效果,假设它们应该更有效于低资源语言。我们在五种低资源语言中进行实验:维吾尔语、沃洛夫语、马耳他语、古埃及语和古希腊语。我们发现这些语法指导方法在低资源环境中的效果不均匀,大多数情况下提供了surprisingly little benefit。

Noisy Exemplars Make Large Language Models More Robust: A Domain-Agnostic Behavioral Analysis

  • paper_url: http://arxiv.org/abs/2311.00258
  • repo_url: https://github.com/hiroki39/noisy-exemplars-make-large-language-models-more-robust
  • paper_authors: Hongyi Zheng, Abulhair Saparov
  • for: 研究大语言模型(LLM)在逻辑推理问题中具有几个步骤的解题能力,但现有的研究很少探讨 LLM 在几个步骤的推理问题中的Robustness。
  • methods: 提出了一种系统的方法来测试 LLM 在多步逻辑问题中的Robustness,包括在不同层次(如 lexical 和 semantic 等)进行干扰分析,以及通过控制干扰示例的比例来提高几个步骤推理方法的Robustness。
  • results: 通过实验发现,模型对替换单词为同义词的干扰最为敏感,同时增加干扰示例的比例可以提高几个步骤推理方法的Robustness。
    Abstract Recent advances in prompt engineering enable large language models (LLMs) to solve multi-hop logical reasoning problems with impressive accuracy. However, there is little existing work investigating the robustness of LLMs with few-shot prompting techniques. Therefore, we introduce a systematic approach to test the robustness of LLMs in multi-hop reasoning tasks via domain-agnostic perturbations. We include perturbations at multiple levels of abstractions (e.g. lexical perturbations such as typos, and semantic perturbations such as the inclusion of intermediate reasoning steps in the questions) to conduct behavioral analysis on the LLMs. Throughout our experiments, we find that models are more sensitive to certain perturbations such as replacing words with their synonyms. We also demonstrate that increasing the proportion of perturbed exemplars in the prompts improves the robustness of few-shot prompting methods.
    摘要 现代提问工程技术使大语言模型(LLM)在多步逻辑理解任务中表现出色。然而,exist little previous work investigating LLMs的可靠性using few-shot prompting techniques。因此,我们提出了一种系统性的方法来测试LLMs在多步逻辑任务中的可靠性via domain-agnostic perturbations。我们在多个层次(例如,lexical perturbations such as typos,和semantic perturbations such as the inclusion of intermediate reasoning steps in the questions)中添加干扰来进行行为分析。在我们的实验中,我们发现模型更敏感于替换words with synonyms。我们还示出,增加干扰 exemplars的比例在提问中可以提高few-shot prompting methods的可靠性。

The Mystery and Fascination of LLMs: A Comprehensive Survey on the Interpretation and Analysis of Emergent Abilities

  • paper_url: http://arxiv.org/abs/2311.00237
  • repo_url: None
  • paper_authors: Yuxiang Zhou, Jiazheng Li, Yanzheng Xiang, Hanqi Yan, Lin Gui, Yulan He
  • For: This paper is written to provide a thorough survey on the interpretation and analysis of emergent abilities of large language models (LLMs).* Methods: The paper uses a macro perspective and a micro-perspective to examine studies on the mechanistic interpretability and empirical interpretability of emergent abilities in LLMs.* Results: The paper highlights the challenges encountered in interpreting emergent abilities in LLMs and suggests potential avenues for future research.Here’s the Chinese translation of the three pieces of information:* For: 这篇论文是为了提供大语言模型(LLM)的潜在能力的彻底评估和分析。* Methods: 这篇论文使用一种macro perspective和一种微观 perspective来检查大语言模型中emergent能力的机制可读性和实际可读性。* Results: 这篇论文描述了对大语言模型中emergent能力的解释所遇到的挑战和未来研究的可能性。
    Abstract Understanding emergent abilities, such as in-context learning (ICL) and chain-of-thought (CoT) prompting in large language models (LLMs), is of utmost importance. This importance stems not only from the better utilization of these capabilities across various tasks, but also from the proactive identification and mitigation of potential risks, including concerns of truthfulness, bias, and toxicity, that may arise alongside these capabilities. In this paper, we present a thorough survey on the interpretation and analysis of emergent abilities of LLMs. First, we provide a concise introduction to the background and definition of emergent abilities. Then, we give an overview of advancements from two perspectives: 1) a macro perspective, emphasizing studies on the mechanistic interpretability and delving into the mathematical foundations behind emergent abilities; and 2) a micro-perspective, concerning studies that focus on empirical interpretability by examining factors associated with these abilities. We conclude by highlighting the challenges encountered and suggesting potential avenues for future research. We believe that our work establishes the basis for further exploration into the interpretation of emergent abilities.
    摘要

Distort, Distract, Decode: Instruction-Tuned Model Can Refine its Response from Noisy Instructions

  • paper_url: http://arxiv.org/abs/2311.00233
  • repo_url: None
  • paper_authors: Taehyeon Kim, Joonkee Kim, Gihun Lee, Se-Young Yun
  • for: 这篇论文旨在提高指令搜索模型的扩展性,使其能够更好地处理不同的指令。
  • methods: 该论文提出了一种简单 yet effective的方法 called Instructive Decoding (ID), 它通过对下一个token的预测值进行冲击,使用来自受损指令(noisy instruction)的预测值来提高模型的准确率。
  • results: 经过实验,该方法可以在多种指令搜索模型和任务上提高性能,而无需更新参数。尤其是在使用’opposite’作为受损指令时,表现最好,其能够带来最大的性能提升。
    Abstract While instruction-tuned language models have demonstrated impressive zero-shot generalization, these models often struggle to generate accurate responses when faced with instructions that fall outside their training set. This paper presents Instructive Decoding (ID), a simple yet effective approach that augments the efficacy of instruction-tuned models. Specifically, ID adjusts the logits for next-token prediction in a contrastive manner, utilizing predictions generated from a manipulated version of the original instruction, referred to as a noisy instruction. This noisy instruction aims to elicit responses that could diverge from the intended instruction yet remain plausible. We conduct experiments across a spectrum of such noisy instructions, ranging from those that insert semantic noise via random words to others like 'opposite' that elicit the deviated responses. Our approach achieves considerable performance gains across various instruction-tuned models and tasks without necessitating any additional parameter updates. Notably, utilizing 'opposite' as the noisy instruction in ID, which exhibits the maximum divergence from the original instruction, consistently produces the most significant performance gains across multiple models and tasks.
    摘要 “对于已训练的语言模型,实际应用中的指令可能会让模型做出不正确的回答。本文提出了一个简单 yet 有效的方法——指令增强(Instructive Decoding,ID),以提高已训练的指令模型的表现。特别是,ID 在下一个字的预测中调整 logits 的方式,通过使用从修改过的原始指令(即杂音指令)所生成的预测,以获得更加积极的回答。我们在不同的杂音指令上进行了实验,包括插入 semantics 杂音的 Random Word,以及“opposite”类型的杂音指令,以获得更大的表现改进。我们发现,使用“opposite”类型的杂音指令可以导致最大的表现改进,并且不需要进行任何额外的参数更新。”Note: Simplified Chinese is used in this translation, as it is the most widely used variety of Chinese in mainland China and Singapore.

Is GPT Powerful Enough to Analyze the Emotions of Memes?

  • paper_url: http://arxiv.org/abs/2311.00223
  • repo_url: None
  • paper_authors: Jingjing Wang, Joshua Luo, Grace Yang, Allen Hong, Feng Luo
  • for: 这个研究的目的是探讨GPT-3.5在互联网趣图中的情感分析能力。
  • methods: 这个研究使用GPT-3.5模型来处理互联网趣图,包括分类趣图情感、确定趣图类型和检测趣图中的暗示性仇恨。
  • results: 研究发现GPT-3.5在处理这些任务时表现出色,但也存在一些限制,如理解社会规范和文化背景、解释暗示性意境和数据偏见等问题。
    Abstract Large Language Models (LLMs), representing a significant achievement in artificial intelligence (AI) research, have demonstrated their ability in a multitude of tasks. This project aims to explore the capabilities of GPT-3.5, a leading example of LLMs, in processing the sentiment analysis of Internet memes. Memes, which include both verbal and visual aspects, act as a powerful yet complex tool for expressing ideas and sentiments, demanding an understanding of societal norms and cultural contexts. Notably, the detection and moderation of hateful memes pose a significant challenge due to their implicit offensive nature. This project investigates GPT's proficiency in such subjective tasks, revealing its strengths and potential limitations. The tasks include the classification of meme sentiment, determination of humor type, and detection of implicit hate in memes. The performance evaluation, using datasets from SemEval-2020 Task 8 and Facebook hateful memes, offers a comparative understanding of GPT responses against human annotations. Despite GPT's remarkable progress, our findings underscore the challenges faced by these models in handling subjective tasks, which are rooted in their inherent limitations including contextual understanding, interpretation of implicit meanings, and data biases. This research contributes to the broader discourse on the applicability of AI in handling complex, context-dependent tasks, and offers valuable insights for future advancements.
    摘要 大型自然语言模型(LLM),代表人工智能(AI)研究的一项重要成就,在多种任务中表现出色。这个项目旨在探索GPT-3.5的能力,一种领先的LLM,在互联网趣图上进行情感分析。趣图包含语言和视觉方面的特征,需要对社会规范和文化背景有深入的理解。尤其是对于带有偏见的趣图进行排查和修剪是一项非常复杂的任务,因为它们的危险性往往是 implicit的。这个项目探索GPT在这些主观任务中的表现,揭示其强点和可能的限制。任务包括趣图情感分类、趣图类型划分和排查带有偏见的趣图。使用SemEval-2020任务8的数据集和Facebook上的仇恨趣图进行性能评估,以获得人注解与GPT响应的比较理解。虽然GPT在这些任务中做出了惊人的进步,但我们的发现表明这些模型在处理主观任务时面临着挑战,这些挑战源于它们的内在限制,包括Contextual Understanding、解释偏见的能力和数据偏见。这项研究对于人工智能在处理复杂、上下文依赖的任务中的应用提供了有价值的反思,并为未来的进步提供了有价值的发现。

Transformers as Recognizers of Formal Languages: A Survey on Expressivity

  • paper_url: http://arxiv.org/abs/2311.00208
  • repo_url: None
  • paper_authors: Lena Strobl, William Merrill, Gail Weiss, David Chiang, Dana Angluin
  • for: 这篇论文旨在探讨transformer模型在自然语言处理中的能力和限制,通过将问题转化为正式语言来比较transformer模型和其他模型的性能。
  • methods: 本论文使用了一种形式语言来描述问题,并对不同的问题进行了 theoretically分析,以了解transformer模型能否解决这些问题。
  • results: 本论文提供了一个总结性的survey,汇总了各种研究中的不同假设和结论,并提供了一个统一的框架来协调 aparently contradictory findings。
    Abstract As transformers have gained prominence in natural language processing, some researchers have investigated theoretically what problems they can and cannot solve, by treating problems as formal languages. Exploring questions such as this will help to compare transformers with other models, and transformer variants with one another, for various tasks. Work in this subarea has made considerable progress in recent years. Here, we undertake a comprehensive survey of this work, documenting the diverse assumptions that underlie different results and providing a unified framework for harmonizing seemingly contradictory findings.
    摘要 Translate the given text into Simplified Chinese.答:如transformers在自然语言处理中获得了主导地位,一些研究人员已经研究了这些模型可以解决哪些问题,通过对问题进行正式语言的处理。探讨这些问题将有助于比较transformers与其他模型,以及不同transformer变体之间的比较,在不同任务上。在这个子领域中,工作已经在过去几年中做出了大量的进展。我们现在对这些工作进行了完整的报告,并 documenting不同假设的不同假设,以提供一个统一的框架,以协调似然相反的发现。