cs.AI - 2023-10-31

Large-Scale Multi-Robot Assembly Planning for Autonomous Manufacturing

  • paper_url: http://arxiv.org/abs/2311.00192
  • repo_url: https://github.com/sisl/constructionbots.jl
  • paper_authors: Kyle Brown, Dylan M. Asmar, Mac Schwager, Mykel J. Kochenderfer
  • For: This paper proposes an algorithmic stack for large-scale multi-robot assembly planning to address challenges such as collision-free movement, effective task allocation, and spatial planning for parallel assembly and transportation of nested subassemblies in manufacturing processes.* Methods: The proposed algorithmic stack includes an iterative radial layout optimization procedure, a graph-repair mixed-integer program formulation, a modified greedy task allocation algorithm, a geometric heuristic, and a hill-climbing algorithm to plan collaborative carrying configurations of robot sub-teams.* Results: The paper presents empirical results demonstrating the scalability and effectiveness of the proposed approach by generating plans to manufacture a LEGO model of a Saturn V launch vehicle with 1845 parts, 306 subassemblies, and 250 robots in under three minutes on a standard laptop computer.
    Abstract Mobile autonomous robots have the potential to revolutionize manufacturing processes. However, employing large robot fleets in manufacturing requires addressing challenges including collision-free movement in a shared workspace, effective multi-robot collaboration to manipulate and transport large payloads, complex task allocation due to coupled manufacturing processes, and spatial planning for parallel assembly and transportation of nested subassemblies. We propose a full algorithmic stack for large-scale multi-robot assembly planning that addresses these challenges and can synthesize construction plans for complex assemblies with thousands of parts in a matter of minutes. Our approach takes in a CAD-like product specification and automatically plans a full-stack assembly procedure for a group of robots to manufacture the product. We propose an algorithmic stack that comprises: (i) an iterative radial layout optimization procedure to define a global staging layout for the manufacturing facility, (ii) a graph-repair mixed-integer program formulation and a modified greedy task allocation algorithm to optimally allocate robots and robot sub-teams to assembly and transport tasks, (iii) a geometric heuristic and a hill-climbing algorithm to plan collaborative carrying configurations of robot sub-teams, and (iv) a distributed control policy that enables robots to execute the assembly motion plan collision-free. We also present an open-source multi-robot manufacturing simulator implemented in Julia as a resource to the research community, to test our algorithms and to facilitate multi-robot manufacturing research more broadly. Our empirical results demonstrate the scalability and effectiveness of our approach by generating plans to manufacture a LEGO model of a Saturn V launch vehicle with 1845 parts, 306 subassemblies, and 250 robots in under three minutes on a standard laptop computer.
    摘要 Mobile autonomous robots have the potential to revolutionize manufacturing processes. However, employing large robot fleets in manufacturing requires addressing challenges such as collision-free movement in a shared workspace, effective multi-robot collaboration to manipulate and transport large payloads, complex task allocation due to coupled manufacturing processes, and spatial planning for parallel assembly and transportation of nested subassemblies. We propose a full algorithmic stack for large-scale multi-robot assembly planning that addresses these challenges and can synthesize construction plans for complex assemblies with thousands of parts in a matter of minutes. Our approach takes in a CAD-like product specification and automatically plans a full-stack assembly procedure for a group of robots to manufacture the product. We propose an algorithmic stack that comprises:1. 迭代径向布局优化算法来定义制造设施的全球排序布局2. 图解 mixed-integer 编程和修改的排队策略来优化机器人和机器人子队伍的分配3. 几何规则和攀登算法来规划协作携带配置4. 分布式控制策略来使机器人执行 Assembly 动作计划无碰撞我们还提供了一个基于 Julia 的多机器人制造模拟器,作为研究社区的资源,以测试我们的算法和推动多机器人制造研究。我们的实验结果表明,我们的方法可以在标准笔记计算机上下载三分钟内生成一个 LEGO 模型的 Saturn V 发射 vehicle 的制造计划,包含1845件部件、306个子组件和250个机器人。

XAI-CLASS: Explanation-Enhanced Text Classification with Extremely Weak Supervision

  • paper_url: http://arxiv.org/abs/2311.00189
  • repo_url: None
  • paper_authors: Daniel Hajialigol, Hanwen Liu, Xuan Wang
    for:* 文章目的是提出一种新的、强度很弱的文本分类方法,以减少人工标注的需求。methods:* 该方法使用了弱相似性数据生成器,通过对文档与特定类别进行对应(如关键词匹配)进行 pseudo-标注。* 该方法还包括一个 auxiliary 任务,即用于预测单词重要性的词重要性预测任务。results:* 对于几个弱相似性文本分类数据集,XAI-CLASS 比其他弱相似性文本分类方法表现出色,得到了显著的性能提升。* 实验还表明,XAI-CLASS 可以提高模型的性能和可解释性。
    Abstract Text classification aims to effectively categorize documents into pre-defined categories. Traditional methods for text classification often rely on large amounts of manually annotated training data, making the process time-consuming and labor-intensive. To address this issue, recent studies have focused on weakly-supervised and extremely weakly-supervised settings, which require minimal or no human annotation, respectively. In previous methods of weakly supervised text classification, pseudo-training data is generated by assigning pseudo-labels to documents based on their alignment (e.g., keyword matching) with specific classes. However, these methods ignore the importance of incorporating the explanations of the generated pseudo-labels, or saliency of individual words, as additional guidance during the text classification training process. To address this limitation, we propose XAI-CLASS, a novel explanation-enhanced extremely weakly-supervised text classification method that incorporates word saliency prediction as an auxiliary task. XAI-CLASS begins by employing a multi-round question-answering process to generate pseudo-training data that promotes the mutual enhancement of class labels and corresponding explanation word generation. This pseudo-training data is then used to train a multi-task framework that simultaneously learns both text classification and word saliency prediction. Extensive experiments on several weakly-supervised text classification datasets show that XAI-CLASS outperforms other weakly-supervised text classification methods significantly. Moreover, experiments demonstrate that XAI-CLASS enhances both model performance and explainability.
    摘要 In previous weakly supervised text classification methods, pseudo-training data is generated by assigning pseudo-labels to documents based on their alignment (e.g., keyword matching) with specific classes. However, these methods ignore the importance of incorporating the explanations of the generated pseudo-labels, or saliency of individual words, as additional guidance during the text classification training process.To address this limitation, we propose XAI-CLASS, a novel explanation-enhanced extremely weakly-supervised text classification method that incorporates word saliency prediction as an auxiliary task. XAI-CLASS begins by employing a multi-round question-answering process to generate pseudo-training data that promotes the mutual enhancement of class labels and corresponding explanation word generation. This pseudo-training data is then used to train a multi-task framework that simultaneously learns both text classification and word saliency prediction.Extensive experiments on several weakly-supervised text classification datasets show that XAI-CLASS outperforms other weakly-supervised text classification methods significantly. Moreover, experiments demonstrate that XAI-CLASS enhances both model performance and explainability.

Robust Safety Classifier for Large Language Models: Adversarial Prompt Shield

  • paper_url: http://arxiv.org/abs/2311.00172
  • repo_url: None
  • paper_authors: Jinhwa Kim, Ali Derakhshan, Ian G. Harris
  • for: 防止大语言模型受到攻击的安全性问题
  • methods: 提出了一种名为 Adversarial Prompt Shield (APS) 的轻量级模型,以及一种自动生成 adversarial 训练数据集(BAND)的策略
  • results: 通过对 Large Language Models 进行评估,显示了减少 adversarial 攻击成功率达 60% 的潜在提升,这对下一代更可靠和抗击的对话代理系统产生了前景。
    Abstract Large Language Models' safety remains a critical concern due to their vulnerability to adversarial attacks, which can prompt these systems to produce harmful responses. In the heart of these systems lies a safety classifier, a computational model trained to discern and mitigate potentially harmful, offensive, or unethical outputs. However, contemporary safety classifiers, despite their potential, often fail when exposed to inputs infused with adversarial noise. In response, our study introduces the Adversarial Prompt Shield (APS), a lightweight model that excels in detection accuracy and demonstrates resilience against adversarial prompts. Additionally, we propose novel strategies for autonomously generating adversarial training datasets, named Bot Adversarial Noisy Dialogue (BAND) datasets. These datasets are designed to fortify the safety classifier's robustness, and we investigate the consequences of incorporating adversarial examples into the training process. Through evaluations involving Large Language Models, we demonstrate that our classifier has the potential to decrease the attack success rate resulting from adversarial attacks by up to 60%. This advancement paves the way for the next generation of more reliable and resilient conversational agents.
    摘要 大型语言模型的安全问题仍然是一个关键问题,因为它们容易受到反对攻击,这可能会让这些系统生成危险或不当的回应。在这些系统的核心里面有一个安全分类器,这是一个用于识别和mitigate potentially harmful, offensive, or unethical outputs的计算模型。然而,当前的安全分类器,尽管具有潜在的优势,frequently fail when exposed to inputs infused with adversarial noise。在回应于这个问题,我们的研究提出了反 adversarial prompt shield(APS),一个轻量级的模型,具有高度的检测精度和对反对攻击的抗性。此外,我们还提出了一些自动生成反对攻击数据集的新策略,称为Bot Adversarial Noisy Dialogue(BAND)数据集。这些数据集是用于强化安全分类器的Robustness,我们进行了对Large Language Models的评估,显示我们的分类器可以降低由反对攻击引起的攻击成功率 by up to 60%。这一进步开 up the way for the next generation of more reliable and resilient conversational agents。

Beyond Denouncing Hate: Strategies for Countering Implied Biases and Stereotypes in Language

  • paper_url: http://arxiv.org/abs/2311.00161
  • repo_url: None
  • paper_authors: Jimin Mun, Emily Allaway, Akhila Yerukola, Laura Vianna, Sarah-Jane Leslie, Maarten Sap
  • for: This paper aims to address the issue of online hate speech without censorship by exploring psychologically inspired strategies to challenge the underlying stereotypical implications of hateful language.
  • methods: The authors draw from psychology and philosophy literature to craft six strategies to challenge hateful language, and they examine the convincingness of each strategy through a user study and compare their usages in human- and machine-generated counterspeech datasets.
  • results: The study finds that human-written counterspeech uses more specific strategies to challenge the implied stereotype, whereas machine-generated counterspeech uses less specific strategies and often employs strategies that humans deem less convincing. The findings highlight the importance of accounting for the underlying stereotypical implications of speech when generating counterspeech and for better machine reasoning about anti-stereotypical examples.Here’s the same information in Simplified Chinese text:
  • for: 这篇论文目的是解决在互联网上发布仇恨言语的问题,不包括审查。
  • methods: 作者们 drew from psychology和哲学文献,摘取了六种挑战仇恨语言的策略,并通过用户研究和人工生成的反驳数据进行比较。
  • results: 研究发现,人工生成的反驳数据使用的是更加特定的挑战策略(例如,对恶意刻板印象的反例和外部因素),而机器生成的反驳数据则使用更加通用的策略(例如,普遍抨击仇恨言语)。此外,机器生成的反驳数据经常使用人们认为更不令人信服的策略。研究表明,在生成反驳数据时需要考虑语言下的恶意刻板印象,并且机器需要更好地理解反例。
    Abstract Counterspeech, i.e., responses to counteract potential harms of hateful speech, has become an increasingly popular solution to address online hate speech without censorship. However, properly countering hateful language requires countering and dispelling the underlying inaccurate stereotypes implied by such language. In this work, we draw from psychology and philosophy literature to craft six psychologically inspired strategies to challenge the underlying stereotypical implications of hateful language. We first examine the convincingness of each of these strategies through a user study, and then compare their usages in both human- and machine-generated counterspeech datasets. Our results show that human-written counterspeech uses countering strategies that are more specific to the implied stereotype (e.g., counter examples to the stereotype, external factors about the stereotype's origins), whereas machine-generated counterspeech uses less specific strategies (e.g., generally denouncing the hatefulness of speech). Furthermore, machine-generated counterspeech often employs strategies that humans deem less convincing compared to human-produced counterspeech. Our findings point to the importance of accounting for the underlying stereotypical implications of speech when generating counterspeech and for better machine reasoning about anti-stereotypical examples.
    摘要 优先抑制言论中的恶意言论,即通过反制可能带来的危害,已成为解决在线谩骂问题的增 Popular solution。然而,有效地抵消负面语言需要抵消和炸弹下面的不准确刻板印象。在这项工作中,我们从心理学和哲学文献中练习出六种心理启发的抗议策略,以挑战下面的刻板印象。我们首先通过用户研究检验每种策略的有效性,然后在人类生成的反对言论和机器生成的反对言论数据集中比较其使用情况。我们的结果表明,人类生成的反对言论更加特定地抵消刻板印象(例如,对刻板印象的反例和外部因素),而机器生成的反对言论则更多地使用不那么有力的策略(例如,普遍否决谩骂语言的危害)。此外,机器生成的反对言论经常使用人类认为不那么有力的策略。我们的发现表明,当生成反对言论时,需要考虑下面的刻板印象,并更好地机器理解反对刻板印象的示例。

Score Normalization for a Faster Diffusion Exponential Integrator Sampler

  • paper_url: http://arxiv.org/abs/2311.00157
  • repo_url: https://github.com/mtkresearch/Diffusion-DEIS-SN
  • paper_authors: Guoxuan Xia, Duolikun Danier, Ayan Das, Stathi Fotiadis, Farhang Nabiei, Ushnish Sengupta, Alberto Bernacchia
  • for: 这 paper 是为了快速生成Diffusion Models中的样本而提出的方法,以提高生成质量和减少积分错误。
  • methods: 这 paper 使用了Diffusion Exponential Integrator Sampler(DEIS),具体来说是Score Function Reparameterisation(SFP)技术,以提高生成质量和减少积分错误。
  • results: 该 paper 的实验结果表明,使用Score Normalisation(DEIS-SN)技术可以 Consistently improve FID compared to vanilla DEIS,在10 NFEs中提高了FID值从6.44到5.57。
    Abstract Recently, zhang et al have proposed the Diffusion Exponential Integrator Sampler (DEIS) for fast generation of samples from Diffusion Models. It leverages the semi-linear nature of the probability flow ordinary differential equation (ODE) in order to greatly reduce integration error and improve generation quality at low numbers of function evaluations (NFEs). Key to this approach is the score function reparameterisation, which reduces the integration error incurred from using a fixed score function estimate over each integration step. The original authors use the default parameterisation used by models trained for noise prediction -- multiply the score by the standard deviation of the conditional forward noising distribution. We find that although the mean absolute value of this score parameterisation is close to constant for a large portion of the reverse sampling process, it changes rapidly at the end of sampling. As a simple fix, we propose to instead reparameterise the score (at inference) by dividing it by the average absolute value of previous score estimates at that time step collected from offline high NFE generations. We find that our score normalisation (DEIS-SN) consistently improves FID compared to vanilla DEIS, showing an FID improvement from 6.44 to 5.57 at 10 NFEs for our CIFAR-10 experiments. Our code is available at https://github.com/mtkresearch/Diffusion-DEIS-SN.
    摘要 最近,张等人提出了快速生成Diffusion模型样本的Diffusion扩展Integrator抽取器(DEIS)。它利用Diffusion模型的半线性性来大幅减少积分误差,提高生成质量,只需要少量的函数评估(NFEs)。关键在于分数函数重parameter化,减少在每个积分步骤中使用固定分数函数估计的积分误差。原始作者使用模型用于随机噪声预测的默认参数化,将分数函数乘以 conditional forward 噪声分布的标准差。我们发现,在大部分逆抽取过程中,这个分数参数化的平均绝对值很接近常数,但在抽取过程的末尾快速变化。为了简化,我们提议在推理时对分数进行正规化(DEIS-SN),将其除以在线上高NFEs生成的上一个时间步骤的平均绝对分数估计的平均值。我们发现,我们的分数正规化(DEIS-SN)在10NFEs下的CIFAR-10实验中 consistently improve FID,从6.44降低到5.57。我们的代码可以在 GitHub上找到:https://github.com/mtkresearch/Diffusion-DEIS-SN。

RIR-SF: Room Impulse Response Based Spatial Feature for Multi-channel Multi-talker ASR

  • paper_url: http://arxiv.org/abs/2311.00146
  • repo_url: None
  • paper_authors: Yiwen Shao, Shi-Xiong Zhang, Dong Yu
  • for: 提高多通道多说者自动语音识别(ASR)系统的精度
  • methods: 使用 overlap speech signals 与目标说话人的传输到麦克风数组的room impulse response(RIR)进行卷积,从而获得一种新的空间特征——RIR-SF
  • results: 比较与现有的3D空间特征,经过理论分析和实验验证,新的RIR-SF在多通道多说者ASR系统中具有remarkable 21.3%的相对减少 Character Error Rate(CER),并且在强 reverberation 情况下表现更加稳定和Robust。
    Abstract Multi-channel multi-talker automatic speech recognition (ASR) presents ongoing challenges within the speech community, particularly when confronted with significant reverberation effects. In this study, we introduce a novel approach involving the convolution of overlapping speech signals with the room impulse response (RIR) corresponding to the target speaker's transmission to a microphone array. This innovative technique yields a novel spatial feature known as the RIR-SF. Through a comprehensive comparison with the previously established state-of-the-art 3D spatial feature, both theoretical analysis and experimental results substantiate the superiority of our proposed RIR-SF. We demonstrate that the RIR-SF outperforms existing methods, leading to a remarkable 21.3\% relative reduction in the Character Error Rate (CER) in multi-channel multi-talker ASR systems. Importantly, this novel feature exhibits robustness in the face of strong reverberation, surpassing the limitations of previous approaches.
    摘要 多通道多个人自动语音识别(ASR)系统中存在持续的挑战,特别是在面临重要的干扰效应时。在本研究中,我们介绍了一种新的方法,即将重叠的语音信号 convolution 与目标说话人的传输到麦克风数组的房间冲击响应(RIR)。这种新的特征被称为 RIR-SF。经过了对已有状态的评估和实验结果,我们的提议的 RIR-SF 超越了现有的方法,导致了 Character Error Rate(CER)在多通道多个人 ASR 系统中具有remarkable 21.3% 的相对减少。重要的是,这种新的特征在强干扰情况下表现了 Robustness,超越了前一代方法的限制。

Two-Stage Classifier for Campaign Negativity Detection using Axis Embeddings: A Case Study on Tweets of Political Users during 2021 Presidential Election in Iran

  • paper_url: http://arxiv.org/abs/2311.00143
  • repo_url: None
  • paper_authors: Fatemeh Rajabi, Ali Mohades
  • for: 本研究旨在 automatization 政治竞选中的负面语言检测,以便更好地理解候选人和政党在竞选中的策略。
  • methods: 本研究使用了一种 hybrid 模型,结合了两种机器学习模型,以检测政治 tweet 的负面语言。
  • results: 研究发现,候选人发布的 tweet 与其负面性无关,而政治人物和组织名称在 tweet 中的存在直接关系到 tweet 的负面性。In English, that would be:
  • for: The purpose of this study is to automate the detection of negative language in political campaigns, in order to better understand the strategies of candidates and parties.
  • methods: The study uses a hybrid model that combines two machine learning models to detect negative language in political tweets.
  • results: The study finds that the publication of a tweet by a candidate is not related to its negativity, but the presence of political persons and organizations in the tweet is directly related to its negativity.
    Abstract In elections around the world, the candidates may turn their campaigns toward negativity due to the prospect of failure and time pressure. In the digital age, social media platforms such as Twitter are rich sources of political discourse. Therefore, despite the large amount of data that is published on Twitter, the automatic system for campaign negativity detection can play an essential role in understanding the strategy of candidates and parties in their campaigns. In this paper, we propose a hybrid model for detecting campaign negativity consisting of a two-stage classifier that combines the strengths of two machine learning models. Here, we have collected Persian tweets from 50 political users, including candidates and government officials. Then we annotated 5,100 of them that were published during the year before the 2021 presidential election in Iran. In the proposed model, first, the required datasets of two classifiers based on the cosine similarity of tweet embeddings with axis embeddings (which are the average of embedding in positive and negative classes of tweets) from the training set (85\%) are made, and then these datasets are considered the training set of the two classifiers in the hybrid model. Finally, our best model (RF-RF) was able to achieve 79\% for the macro F1 score and 82\% for the weighted F1 score. By running the best model on the rest of the tweets of 50 political users that were published one year before the election and with the help of statistical models, we find that the publication of a tweet by a candidate has nothing to do with the negativity of that tweet, and the presence of the names of political persons and political organizations in the tweet is directly related to its negativity.
    摘要 在世界各地的选举中,候选人可能因为失败的风险和时间压力而转向负面竞选。在数字时代,社交媒体平台such as Twitter是政治讨论的丰富源泉。因此,尽管 Twitter 上发布了大量数据,但自动化竞选负面检测系统仍然可以在理解候选人和政党的竞选策略中扮演重要角色。在这篇论文中,我们提议一种混合模型来检测竞选负面,包括两个机器学习模型的两个阶段分类器。我们收集了50名政治用户的波斯语微博,包括候选人和政府官员。然后,我们对这些微博进行了5100个标注,这些微博在2021年伊朗总统选举前一年发布。在我们提议的模型中,首先制定了基于微博嵌入的cosine相似性的两个分类器的数据集(85%),然后这些数据集被用作两个分类器的混合模型的训练集。最后,我们的best模型(RF-RF)可以达到79%的macro F1分数和82%的Weighted F1分数。通过将best模型应用于其余50名政治用户发布的一年前的微博,我们发现,候选人发布的微博与负面微博之间没有直接关系,而政治人物和政治组织的名称直接关系到微博的负面性。

Q-Learning for Stochastic Control under General Information Structures and Non-Markovian Environments

  • paper_url: http://arxiv.org/abs/2311.00123
  • repo_url: None
  • paper_authors: Ali Devran Kara, Serdar Yuksel
  • for: 本研究的主要贡献是提供一个泛化定理,用于描述在某些随机环境下,Q-学习迭代的收敛性。
  • methods: 本研究使用了一种通用的可能非Markovian的随机环境下的泛化定理,并提供了一个准确地描述迭代的结果和收敛性的条件。
  • results: 本研究的结果包括:(1) 提供了一个新的收敛定理,用于描述Q-学习迭代在某些随机环境下的收敛性;(2) 对某些随机控制问题的解释和应用,包括部分可观察Markov决策过程(MDP)的量化approximation、部分可观察POMDP的量化approximation、finite windowapproximation和多代模型的收敛性研究等。
    Abstract As a primary contribution, we present a convergence theorem for stochastic iterations, and in particular, Q-learning iterates, under a general, possibly non-Markovian, stochastic environment. Our conditions for convergence involve an ergodicity and a positivity criterion. We provide a precise characterization on the limit of the iterates and conditions on the environment and initializations for convergence. As our second contribution, we discuss the implications and applications of this theorem to a variety of stochastic control problems with non-Markovian environments involving (i) quantized approximations of fully observed Markov Decision Processes (MDPs) with continuous spaces (where quantization break down the Markovian structure), (ii) quantized approximations of belief-MDP reduced partially observable MDPS (POMDPs) with weak Feller continuity and a mild version of filter stability (which requires the knowledge of the model by the controller), (iii) finite window approximations of POMDPs under a uniform controlled filter stability (which does not require the knowledge of the model), and (iv) for multi-agent models where convergence of learning dynamics to a new class of equilibria, subjective Q-learning equilibria, will be studied. In addition to the convergence theorem, some implications of the theorem above are new to the literature and others are interpreted as applications of the convergence theorem. Some open problems are noted.
    摘要 Primary Contribution:我们主要贡献是提出了涉及泛化环境的随机迭代收敛定理,特别是Q学迭代的收敛定理。我们的收敛条件包括一个ergodicity和一个正semi定理。我们提供了迭代器的准确特征和环境和初始化条件下的收敛条件。Second Contribution:我们还讨论了这个定理在各种随机控制问题中的应用和意义,包括:(i)在完全观测的Markov决策过程(MDPs)中使用量化方法时的破坏性问题。(ii)在受限 partially observable Markov decision process(POMDPs)中使用弱Feller连续性和满足策略稳定性(需要控制器知道模型)。(iii)在POMDPs中使用 finite window approximation,不需要控制器知道模型。(iv)多个代理模型中的学习动力收敛到一种新的平衡点,称为主观Q学平衡点。除了收敛定理之外,我们还提出了一些新的意义和应用,以及一些未解决的问题。

Bandit-Driven Batch Selection for Robust Learning under Label Noise

  • paper_url: http://arxiv.org/abs/2311.00096
  • repo_url: None
  • paper_authors: Michal Lisicki, Mihai Nica, Graham W. Taylor
  • for: 提高 Stochastic Gradient Descent (SGD) 训练中批处理的效率,利用 combinatorial bandit 算法。
  • methods: 利用 combinatorial bandit 算法优化学习过程中的标签噪声问题。
  • results: 在 CIFAR-10 dataset 上,我们的方法在不同水平的标签损害下 consistently 表现出色,超过现有方法。同时,我们不需要额外的计算开销,可以在复杂的机器学习应用中扩展。
    Abstract We introduce a novel approach for batch selection in Stochastic Gradient Descent (SGD) training, leveraging combinatorial bandit algorithms. Our methodology focuses on optimizing the learning process in the presence of label noise, a prevalent issue in real-world datasets. Experimental evaluations on the CIFAR-10 dataset reveal that our approach consistently outperforms existing methods across various levels of label corruption. Importantly, we achieve this superior performance without incurring the computational overhead commonly associated with auxiliary neural network models. This work presents a balanced trade-off between computational efficiency and model efficacy, offering a scalable solution for complex machine learning applications.
    摘要 我们提出了一种新的批处理选择方法,基于 combinatorial bandit 算法,用于 Stochastic Gradient Descent(SGD)训练。我们的方法重点是在实际数据集中存在标签噪声的情况下优化学习过程。我们在 CIFAR-10 数据集上进行了实验,结果显示,我们的方法在不同水平的标签损害情况下一直表现优于现有方法,而且不需要付出较高的计算开销。这种方法实现了计算效率和模型效果之间的平衡,为复杂的机器学习应用提供了扩展性的解决方案。

Expressive Modeling Is Insufficient for Offline RL: A Tractable Inference Perspective

  • paper_url: http://arxiv.org/abs/2311.00094
  • repo_url: None
  • paper_authors: Xuejie Liu, Anji Liu, Guy Van den Broeck, Yitao Liang
  • for: This paper is written for offline Reinforcement Learning (RL) tasks, where the goal is to learn a policy from a set of pre-collected trajectories.
  • methods: The paper proposes a new approach called Trifle, which leverages modern Tractable Probabilistic Models (TPMs) to bridge the gap between good sequence models and high expected returns at evaluation time.
  • results: The paper achieves the most state-of-the-art scores in 9 Gym-MuJoCo benchmarks against strong baselines, and significantly outperforms prior approaches in stochastic environments and safe RL tasks with minimum algorithmic modifications.Here’s the same information in Simplified Chinese text:
  • for: 这篇论文是为offline Reinforcement Learning(RL)任务而写的,目标是从 pré-收集的轨迹集中学习策略。
  • methods: 这篇论文提出了一种新的方法 called Trifle,利用现代可追踪概率模型(TPMs)来在评估时 bridging 好序列模型和高预期返回之间的差距。
  • results: 这篇论文在9个 Gym-MuJoCo benchmark上 achieve 最佳状态的得分,并在随机环境和安全RL任务(例如动作约束)中明显超越先前的方法,只需最小的算法修改。
    Abstract A popular paradigm for offline Reinforcement Learning (RL) tasks is to first fit the offline trajectories to a sequence model, and then prompt the model for actions that lead to high expected return. While a common consensus is that more expressive sequence models imply better performance, this paper highlights that tractability, the ability to exactly and efficiently answer various probabilistic queries, plays an equally important role. Specifically, due to the fundamental stochasticity from the offline data-collection policies and the environment dynamics, highly non-trivial conditional/constrained generation is required to elicit rewarding actions. While it is still possible to approximate such queries, we observe that such crude estimates significantly undermine the benefits brought by expressive sequence models. To overcome this problem, this paper proposes Trifle (Tractable Inference for Offline RL), which leverages modern Tractable Probabilistic Models (TPMs) to bridge the gap between good sequence models and high expected returns at evaluation time. Empirically, Trifle achieves the most state-of-the-art scores in 9 Gym-MuJoCo benchmarks against strong baselines. Further, owing to its tractability, Trifle significantly outperforms prior approaches in stochastic environments and safe RL tasks (e.g. with action constraints) with minimum algorithmic modifications.
    摘要 一种受欢迎的探索学习(RL)任务的策略是先将偏离线轨迹适应到一个序列模型,然后使用模型来获取高预期返回的动作。虽然广泛认为更表达力强的序列模型会导致更好的表现,但这篇论文指出,可行性(可以快速和准确回答多种概率查询)也扮演着重要的角色。具体来说,由于在线上数据收集策略和环境动力学中的基本随机性,需要进行高度非线性的生成,以便获得奖励动作。虽然可以 aproximate这些查询,但我们发现这些粗略估计会很大程度下降表现。为解决这个问题,这篇论文提出了Trifle(可 tractable 探索学习),它利用现代可追踪概率模型(TPMs)来在评估时bridginggood sequence models和高预期返回之间的 gap。Empirically,Trifle在9个 Gym-MuJoCo benchmark中 achievestate-of-the-art 得分,并在随机环境和安全RL任务(例如Action constraints)中表现出优于先前的方法。此外,由于Trifle的可追踪性,它在可追踪环境和安全RL任务中具有更高的表现。

Safe multi-agent motion planning under uncertainty for drones using filtered reinforcement learning

  • paper_url: http://arxiv.org/abs/2311.00063
  • repo_url: None
  • paper_authors: Sleiman Safaoui, Abraham P. Vinod, Ankush Chakrabarty, Rien Quirynen, Nobuyuki Yoshikawa, Stefano Di Cairano
  • for: 本研究是针对多机器人在不确定、受障碍的工作空间中安全运动规划的问题。
  • methods: 本研究使用了单机器人学习来学习运动计划,并使用了受限控制基于轨迹规划的方法来保证安全性。
  • results: 提出的方法可以实时实现多机器人的安全运动规划,并且可以在不确定的工作空间、机器人运动和感知中提供高度的安全性。
    Abstract We consider the problem of safe multi-agent motion planning for drones in uncertain, cluttered workspaces. For this problem, we present a tractable motion planner that builds upon the strengths of reinforcement learning and constrained-control-based trajectory planning. First, we use single-agent reinforcement learning to learn motion plans from data that reach the target but may not be collision-free. Next, we use a convex optimization, chance constraints, and set-based methods for constrained control to ensure safety, despite the uncertainty in the workspace, agent motion, and sensing. The proposed approach can handle state and control constraints on the agents, and enforce collision avoidance among themselves and with static obstacles in the workspace with high probability. The proposed approach yields a safe, real-time implementable, multi-agent motion planner that is simpler to train than methods based solely on learning. Numerical simulations and experiments show the efficacy of the approach.
    摘要 我团队考虑了多机器人在不确定、拥堵的工作空间中安全多机器人运动规划问题。我们提出了一种可控的运动规划方法,基于再增强学习和受限控制的轨迹规划。首先,我们使用单机器人再增强学习学习运动计划,从数据中学习到达目标点,但可能不是免涉碰撞的。然后,我们使用几何优化、机会约束和集合方法来保证安全性,即使在工作空间、机器人运动和感知中存在不确定性。我们的方法可以处理机器人状态和控制约束,并且在高概率下避免机器人之间和静止障碍物的碰撞。我们的方法比基于学习 alone 更加容易训练,并且实时可行。数值仿真和实验表明我们的方法的有效性。

The Generative AI Paradox: “What It Can Create, It May Not Understand”

  • paper_url: http://arxiv.org/abs/2311.00059
  • repo_url: None
  • paper_authors: Peter West, Ximing Lu, Nouha Dziri, Faeze Brahman, Linjie Li, Jena D. Hwang, Liwei Jiang, Jillian Fisher, Abhilasha Ravichander, Khyathi Chandu, Benjamin Newman, Pang Wei Koh, Allyson Ettinger, Yejin Choi
  • for: 这篇论文探讨了现代生成AI模型的 aparadox:它们可以在秒钟内生成出人类专家水平的输出,但却仍然存在基本的理解错误,这与人类的情况不同。
  • methods: 作者们使用了语言和图像模式的控制实验,测试了生成vs理解的关系,以支持他们的生成AI парадок斯假设。
  • results: 研究发现,虽然模型可以超越人类的生成能力,但它们在理解能力、理解和生成能力之间的相关性、以及对恶作剂输入的脆弱性等方面都弱于人类。这支持假设,即模型的生成能力可能不是基于理解能力的。
    Abstract The recent wave of generative AI has sparked unprecedented global attention, with both excitement and concern over potentially superhuman levels of artificial intelligence: models now take only seconds to produce outputs that would challenge or exceed the capabilities even of expert humans. At the same time, models still show basic errors in understanding that would not be expected even in non-expert humans. This presents us with an apparent paradox: how do we reconcile seemingly superhuman capabilities with the persistence of errors that few humans would make? In this work, we posit that this tension reflects a divergence in the configuration of intelligence in today's generative models relative to intelligence in humans. Specifically, we propose and test the Generative AI Paradox hypothesis: generative models, having been trained directly to reproduce expert-like outputs, acquire generative capabilities that are not contingent upon -- and can therefore exceed -- their ability to understand those same types of outputs. This contrasts with humans, for whom basic understanding almost always precedes the ability to generate expert-level outputs. We test this hypothesis through controlled experiments analyzing generation vs. understanding in generative models, across both language and image modalities. Our results show that although models can outperform humans in generation, they consistently fall short of human capabilities in measures of understanding, as well as weaker correlation between generation and understanding performance, and more brittleness to adversarial inputs. Our findings support the hypothesis that models' generative capability may not be contingent upon understanding capability, and call for caution in interpreting artificial intelligence by analogy to human intelligence.
    摘要 最近的生成型人工智能(AI)浪潮引发了历史上无 precedent 的全球关注,同时也引发了对可能性超human的人工智能水平的激动和担忧:模型现在只需几秒钟就可以生成出挑战或超越人类专家水平的输出。然而,模型仍然表现出基本的理解错误,这不是非专家人类会出现的。这给我们提出了一个 aparadox:如何才能够 conciliate seemingly superhuman capabilities 与 persistently basic errors ?在这篇文章中,我们提出了生成AI парадок斯假设:生成模型,经过直接受训练来重现专家水平的输出,获得了不依赖于——可以超越——它们的理解能力的生成能力。与人类不同,人类的基本理解通常会先于生成专家水平的输出。我们通过控制的实验分析生成vs理解在生成模型中,在语言和图像模式下进行测试。我们的结果表明,虽然模型可以超过人类的生成能力,但它们一直 fall short of human capabilities 在理解方面,以及生成和理解性能之间的较弱相关性,以及更容易受到骚扰输入的 brittleness。我们的发现支持生成AI парадок斯假设,并警告我们不要将人工智能与人类智能相提并论。

Diversity and Diffusion: Observations on Synthetic Image Distributions with Stable Diffusion

  • paper_url: http://arxiv.org/abs/2311.00056
  • repo_url: None
  • paper_authors: David Marwood, Shumeet Baluja, Yair Alon
  • for: 本研究旨在探讨现有文本到图像(TTI)系统,如StableDiffusion、Imagen和DALL-E 2,是否可以用于减少人工取样的图像敏感度训练新的机器学习分类器。
  • methods: 本研究使用现有的TTI系统生成图像,并对这些图像进行Semantic Embeddings(SE)和Contrastive Language-Image Pre-training(CLIP)等方法进行分析。
  • results: 研究发现,即使使用真实的图像进行训练,仍然存在 Semantic Mismatches(SM)问题,导致分类器在推断时表现不佳。此外,研究还发现了四种限制TTI系统的使用:歧义、遵循提示、缺乏多样性和表示下面的基本概念。此外,研究还发现了CLIP embeddings的几何结构。
    Abstract Recent progress in text-to-image (TTI) systems, such as StableDiffusion, Imagen, and DALL-E 2, have made it possible to create realistic images with simple text prompts. It is tempting to use these systems to eliminate the manual task of obtaining natural images for training a new machine learning classifier. However, in all of the experiments performed to date, classifiers trained solely with synthetic images perform poorly at inference, despite the images used for training appearing realistic. Examining this apparent incongruity in detail gives insight into the limitations of the underlying image generation processes. Through the lens of diversity in image creation vs.accuracy of what is created, we dissect the differences in semantic mismatches in what is modeled in synthetic vs. natural images. This will elucidate the roles of the image-languag emodel, CLIP, and the image generation model, diffusion. We find four issues that limit the usefulness of TTI systems for this task: ambiguity, adherence to prompt, lack of diversity, and inability to represent the underlying concept. We further present surprising insights into the geometry of CLIP embeddings.
    摘要 Examining this apparent incongruity in detail reveals the limitations of the underlying image generation processes. By comparing the diversity of images created by TTI systems with the accuracy of those images, we can identify the sources of the mismatch. We find that there are four issues that limit the usefulness of TTI systems for this task: ambiguity, adherence to prompt, lack of diversity, and inability to represent the underlying concept.Furthermore, we present surprising insights into the geometry of CLIP embeddings, which provide a new perspective on the limitations of TTI systems. Our findings have important implications for the use of TTI systems in machine learning and highlight the need for further research to overcome these limitations.

SC-MIL: Sparsely Coded Multiple Instance Learning for Whole Slide Image Classification

  • paper_url: http://arxiv.org/abs/2311.00048
  • repo_url: https://github.com/sotiraslab/SCMIL
  • paper_authors: Peijie Qiu, Pan Xiao, Wenhui Zhu, Yalin Wang, Aristeidis Sotiras
  • for: This paper focuses on improving the performance of weakly supervised whole slide image (WSI) classification using Multiple Instance Learning (MIL).
  • methods: The proposed method, called sparsely coded MIL (SC-MIL), uses sparse dictionary learning to capture the similarities of instances and improve the feature embeddings. The method also incorporates deep unrolling to make it compatible with deep learning.
  • results: The proposed SC module was shown to substantially boost the performance of state-of-the-art MIL methods in experiments on multiple datasets, with an acceptable computation cost. The codes are available at \href{https://github.com/sotiraslab/SCMIL.git}{https://github.com/sotiraslab/SCMIL.git}.
    Abstract Multiple Instance Learning (MIL) has been widely used in weakly supervised whole slide image (WSI) classification. Typical MIL methods include a feature embedding part that embeds the instances into features via a pre-trained feature extractor and the MIL aggregator that combines instance embeddings into predictions. The current focus has been directed toward improving these parts by refining the feature embeddings through self-supervised pre-training and modeling the correlations between instances separately. In this paper, we proposed a sparsely coded MIL (SC-MIL) that addresses those two aspects at the same time by leveraging sparse dictionary learning. The sparse dictionary learning captures the similarities of instances by expressing them as a sparse linear combination of atoms in an over-complete dictionary. In addition, imposing sparsity help enhance the instance feature embeddings by suppressing irrelevant instances while retaining the most relevant ones. To make the conventional sparse coding algorithm compatible with deep learning, we unrolled it into an SC module by leveraging deep unrolling. The proposed SC module can be incorporated into any existing MIL framework in a plug-and-play manner with an acceptable computation cost. The experimental results on multiple datasets demonstrated that the proposed SC module could substantially boost the performance of state-of-the-art MIL methods. The codes are available at \href{https://github.com/sotiraslab/SCMIL.git}{https://github.com/sotiraslab/SCMIL.git}.
    摘要 多个实例学习(MIL)在弱监督整幕图像(WSI)分类中广泛应用。典型的MIL方法包括一个特征嵌入部分,该部分使用预训练的特征提取器将实例嵌入特征,以及MIL聚合器,该聚合器将实例嵌入特征组合成预测。目前的焦点是在提高这两个部分的性能,通过自我监督预训练提高特征嵌入,并分别模型实例之间的相互关系。在这篇论文中,我们提出了一种稀盐编码MIL(SC-MIL),该方法同时解决了这两个问题。稀盐编码学习捕捉实例之间的相互关系,表示实例为稀盐线性组合的原子集中的稀盐线性组合。此外,在强制稀盐的情况下,可以增强实例特征嵌入,抑制不相关的实例,保留最相关的实例。为使用深度学习,我们将稀盐编码算法拓展到SC模块,并通过深度拓展来实现。这种SC模块可以与现有的MIL框架兼容,并且可以在插件式的方式进行搭配,计算成本可以接受。实验结果表明,提案的SC模块可以明显提高现有MIL方法的性能。代码可以在\href{https://github.com/sotiraslab/SCMIL.git}{https://github.com/sotiraslab/SCMIL.git}中找到。

Grounding Visual Illusions in Language: Do Vision-Language Models Perceive Illusions Like Humans?

  • paper_url: http://arxiv.org/abs/2311.00047
  • repo_url: https://github.com/vl-illusion/dataset
  • paper_authors: Yichi Zhang, Jiayi Pan, Yuchen Zhou, Rui Pan, Joyce Chai
  • for: 研究 whether 视觉语言模型 (VLMs) 有类似于人类的视觉假设,或者它们会忠实地表示现实。
  • methods: 构建了五种视觉假设的数据集,并设计了四个任务来检验现有的 VLMs 中的视觉假设。
  • results: 发现虽然总体协调低,大型模型更接近人类的视觉和更易受到视觉假设的影响。 这些发现将促进我们对人类和机器之间视觉世界的共同理解和交流的更好的理解,并为未来的计算模型提供了一个进一步的探索。
    Abstract Vision-Language Models (VLMs) are trained on vast amounts of data captured by humans emulating our understanding of the world. However, known as visual illusions, human's perception of reality isn't always faithful to the physical world. This raises a key question: do VLMs have the similar kind of illusions as humans do, or do they faithfully learn to represent reality? To investigate this question, we build a dataset containing five types of visual illusions and formulate four tasks to examine visual illusions in state-of-the-art VLMs. Our findings have shown that although the overall alignment is low, larger models are closer to human perception and more susceptible to visual illusions. Our dataset and initial findings will promote a better understanding of visual illusions in humans and machines and provide a stepping stone for future computational models that can better align humans and machines in perceiving and communicating about the shared visual world. The code and data are available at https://github.com/vl-illusion/dataset.
    摘要 视力语言模型(VLM)在庞大数据量上训练,但人类视觉不一定准确反映物理世界。这引发了关键问题:VLM是否具有人类视觉中的同类型错觉,或者它们忠实地表现实际情况?为了解答这个问题,我们构建了包含五种视觉错觉的数据集,并提出四项任务来检查视觉错觉在当今最先进的VLM中。我们的发现表明,虽然总体对齐率低,大型模型更加接近人类视觉和更易受到视觉错觉的影响。我们的数据集和初步发现将促进人类和机器之间的视觉理解和沟通,并提供未来计算模型更好地与人类对视觉世界的共同感知的开始。数据集和代码可以在获取。

Limited Data, Unlimited Potential: A Study on ViTs Augmented by Masked Autoencoders

  • paper_url: http://arxiv.org/abs/2310.20704
  • repo_url: None
  • paper_authors: Srijan Das, Tanmay Jain, Dominick Reilly, Pranav Balaji, Soumyajit Karmakar, Shyam Marjit, Xiang Li, Abhijit Das, Michael Ryoo
  • for: 这个研究旨在探讨如何将限量数据下的视觉变数转换器(ViT)训练成功。
  • methods: 这个研究使用了自我监督学习(SSL)和继续调整的方法来训练ViT。
  • results: 研究发现,在限量数据下,同时优化ViT для主要任务和一个自我监督任务(SSAT)是非常有利的。这种方法可以让ViT获得更好的性能,而且可以降低培训时间和碳踪。实验结果显示,SSAT在10个数据集上具有优秀的数据准确性和一致性。此外,SSAT也在视频领域中实现了深伪检测的好效果,证明了其通用性。
    Abstract Vision Transformers (ViTs) have become ubiquitous in computer vision. Despite their success, ViTs lack inductive biases, which can make it difficult to train them with limited data. To address this challenge, prior studies suggest training ViTs with self-supervised learning (SSL) and fine-tuning sequentially. However, we observe that jointly optimizing ViTs for the primary task and a Self-Supervised Auxiliary Task (SSAT) is surprisingly beneficial when the amount of training data is limited. We explore the appropriate SSL tasks that can be optimized alongside the primary task, the training schemes for these tasks, and the data scale at which they can be most effective. Our findings reveal that SSAT is a powerful technique that enables ViTs to leverage the unique characteristics of both the self-supervised and primary tasks, achieving better performance than typical ViTs pre-training with SSL and fine-tuning sequentially. Our experiments, conducted on 10 datasets, demonstrate that SSAT significantly improves ViT performance while reducing carbon footprint. We also confirm the effectiveness of SSAT in the video domain for deepfake detection, showcasing its generalizability. Our code is available at https://github.com/dominickrei/Limited-data-vits.
    摘要 传统的计算机视觉 Task (ViT) 在计算机视觉领域中广泛应用。尽管它们有成功,但它们缺乏逻辑假设,这可能使其在有限数据量下训练变得困难。以前的研究表明,通过自我超vision学习 (SSL) 和顺序精度调整来解决这个挑战。然而,我们发现,在有限数据量下,同时优化 ViT для主要任务和 Self-Supervised Auxiliary Task (SSAT) 是让人意外地有利的。我们探讨适合 SSL 任务的选择、训练方案和数据规模,以便在有限数据量下实现最佳性能。我们的发现表明,SSAT 是一种强大的技术,它可以让 ViT 利用自身任务和 SSL 任务之间的独特特征,从而实现 better 性能,而不需要大量的训练数据。我们的实验,在 10 个数据集上进行,表明 SSAT 可以提高 ViT 性能,同时降低碳脚印。我们还证实了 SSAT 在视频领域中的深度假设检测 task 的效果,这表明它的普适性。我们的代码可以在 上获取。

Vanishing Gradients in Reinforcement Finetuning of Language Models

  • paper_url: http://arxiv.org/abs/2310.20703
  • repo_url: None
  • paper_authors: Noam Razin, Hattie Zhou, Omid Saremi, Vimal Thilak, Arwen Bradley, Preetum Nakkiran, Joshua Susskind, Etai Littwin
  • for: 这种研究旨在提高预训练语言模型的性能,以便更好地满足人类的需求和下游任务。
  • methods: 该研究使用了 reinforcement finetuning (RFT) 方法,通过最大化一个 (可能是学习的) 奖励函数来调整模型。
  • results: 研究发现,在 RFT 中,当输入的奖励标准差小于模型的时候,即使预期奖励远离最佳,输入的预期梯度将消失。这会导致奖励最大化变得极其慢。通过实验和理论分析,研究发现这种消失的梯度问题是普遍存在的和有害的。然而,通过一些方法来缓解这种问题,例如在 RFT 阶段使用初始的监督训练 (SFT) 阶段,可以提高 RFT 的性能。
    Abstract Pretrained language models are commonly aligned with human preferences and downstream tasks via reinforcement finetuning (RFT), which entails maximizing a (possibly learned) reward function using policy gradient algorithms. This work highlights a fundamental optimization obstacle in RFT: we prove that the expected gradient for an input vanishes when its reward standard deviation under the model is small, even if the expected reward is far from optimal. Through experiments on an RFT benchmark and controlled environments, as well as a theoretical analysis, we then demonstrate that vanishing gradients due to small reward standard deviation are prevalent and detrimental, leading to extremely slow reward maximization. Lastly, we explore ways to overcome vanishing gradients in RFT. We find the common practice of an initial supervised finetuning (SFT) phase to be the most promising candidate, which sheds light on its importance in an RFT pipeline. Moreover, we show that a relatively small number of SFT optimization steps on as few as 1% of the input samples can suffice, indicating that the initial SFT phase need not be expensive in terms of compute and data labeling efforts. Overall, our results emphasize that being mindful for inputs whose expected gradient vanishes, as measured by the reward standard deviation, is crucial for successful execution of RFT.
    摘要 预训言语模型通常通过强化训练(RFT)与人类偏好和下游任务相对适配,其中包括通过政策梯度算法 maximize 一个(可能学习的)奖励函数。这项工作揭示了 RFT 中的一个基本优化障碍:我们证明了,对于一个输入,其奖励标准差 beneath 模型时,即使预期奖励远离最优,则预期梯度都将消失。通过实验 RFT benchmark 和控制环境,以及理论分析,我们证明了这种消失梯度是普遍存在的并有害,导致奖励最大化 extremely slow。最后,我们探讨了在 RFT 中超越消失梯度的方法。我们发现,通常的初始监督 fine-tuning (SFT)阶段是最有前途的方法,这也解释了它在 RFT 链接中的重要性。此外,我们发现一些 SFT 优化步骤在 1% 的输入样本上可以得到充分的效果,这表明了初始 SFT 阶段不必耗费大量计算和数据标注努力。总的来说,我们的结果强调了在 RFT 中注意 inputs 的预期梯度消失,以measure 奖励标准差是关键的。

HAP: Structure-Aware Masked Image Modeling for Human-Centric Perception

  • paper_url: http://arxiv.org/abs/2310.20695
  • repo_url: https://github.com/junkunyuan/HAP
  • paper_authors: Junkun Yuan, Xinyu Zhang, Hao Zhou, Jian Wang, Zhongwei Qiu, Zhiyin Shao, Shaofeng Zhang, Sifan Long, Kun Kuang, Kun Yao, Junyu Han, Errui Ding, Lanfen Lin, Fei Wu, Jingdong Wang
  • for: 这篇论文的目的是提出一种基于人体结构约束的预训练方法,以提高人Centric感知任务的表现。
  • methods: 该方法使用遮盖图像模型(MIM)作为预训练方法,并在MIM训练策略中引入人体结构约束。具体来说,该方法使用人体部分划分图像 patches,以高优先级遮盖出人体部分区域。这种方法可以让模型在预训练过程中更加注重人体结构信息,从而提高人Centric感知任务的表现。
  • results: 该方法提出的HAP方法使用普通的ViTEncoder,却在11个人Centric标准数据集上达到了新的州OF-the-art表现,并与一个数据集的表现持平。例如,HAP在MSMT17数据集上达到了78.1% mAP,在PA-100K数据集上达到了86.54% mA,在MS COCO数据集上达到了78.2% AP,以及在3DPW数据集上达到了56.0 PA-MPJPE。
    Abstract Model pre-training is essential in human-centric perception. In this paper, we first introduce masked image modeling (MIM) as a pre-training approach for this task. Upon revisiting the MIM training strategy, we reveal that human structure priors offer significant potential. Motivated by this insight, we further incorporate an intuitive human structure prior - human parts - into pre-training. Specifically, we employ this prior to guide the mask sampling process. Image patches, corresponding to human part regions, have high priority to be masked out. This encourages the model to concentrate more on body structure information during pre-training, yielding substantial benefits across a range of human-centric perception tasks. To further capture human characteristics, we propose a structure-invariant alignment loss that enforces different masked views, guided by the human part prior, to be closely aligned for the same image. We term the entire method as HAP. HAP simply uses a plain ViT as the encoder yet establishes new state-of-the-art performance on 11 human-centric benchmarks, and on-par result on one dataset. For example, HAP achieves 78.1% mAP on MSMT17 for person re-identification, 86.54% mA on PA-100K for pedestrian attribute recognition, 78.2% AP on MS COCO for 2D pose estimation, and 56.0 PA-MPJPE on 3DPW for 3D pose and shape estimation.
    摘要 模型预训练是人类视觉任务中的关键。在这篇论文中,我们首先介绍了隐藏图像模型(MIM)作为预训练方法。在检查MIM训练策略时,我们发现人体结构优先可以提供显著的潜在优势。 Motivated by this insight, we further incorporate an intuitive human structure prior - human parts - into pre-training. Specifically, we employ this prior to guide the mask sampling process. Image patches corresponding to human part regions have high priority to be masked out, which encourages the model to concentrate more on body structure information during pre-training, yielding substantial benefits across a range of human-centric perception tasks. To further capture human characteristics, we propose a structure-invariant alignment loss that enforces different masked views, guided by the human part prior, to be closely aligned for the same image. We term the entire method as HAP. HAP uses a plain ViT as the encoder and establishes new state-of-the-art performance on 11 human-centric benchmarks, and on-par results on one dataset. For example, HAP achieves 78.1% mAP on MSMT17 for person re-identification, 86.54% mA on PA-100K for pedestrian attribute recognition, 78.2% AP on MS COCO for 2D pose estimation, and 56.0 PA-MPJPE on 3DPW for 3D pose and shape estimation.

Learning From Mistakes Makes LLM Better Reasoner

  • paper_url: http://arxiv.org/abs/2310.20689
  • repo_url: https://github.com/microsoft/codet
  • paper_authors: Shengnan An, Zexiong Ma, Zeqi Lin, Nanning Zheng, Jian-Guang Lou, Weizhu Chen
  • for: 这项研究的目的是提高大语言模型(LLM)的数学问题解释能力。
  • methods: 该研究提出了一种基于错误学习(LeMa)的方法,模仿人类学习过程中的错误推理。
  • results: 实验结果表明,使用LeMa方法可以提高 LLM 的性能,并且可以将其应用到特殊的数学问题解释模型中,例如 WizardMath 和 MetaMath。
    Abstract Large language models (LLMs) recently exhibited remarkable reasoning capabilities on solving math problems. To further improve this capability, this work proposes Learning from Mistakes (LeMa), akin to human learning processes. Consider a human student who failed to solve a math problem, he will learn from what mistake he has made and how to correct it. Mimicking this error-driven learning process, LeMa fine-tunes LLMs on mistake-correction data pairs generated by GPT-4. Specifically, we first collect inaccurate reasoning paths from various LLMs and then employ GPT-4 as a "corrector" to (1) identify the mistake step, (2) explain the reason for the mistake, and (3) correct the mistake and generate the final answer. Experimental results demonstrate the effectiveness of LeMa: across five backbone LLMs and two mathematical reasoning tasks, LeMa consistently improves the performance compared with fine-tuning on CoT data alone. Impressively, LeMa can also benefit specialized LLMs such as WizardMath and MetaMath, achieving 85.4% pass@1 accuracy on GSM8K and 27.1% on MATH. This surpasses the SOTA performance achieved by non-execution open-source models on these challenging tasks. Our code, data and models will be publicly available at https://github.com/microsoft/CodeT.
    摘要 大型语言模型(LLM)最近表现出色的推理能力,以解决 math 问题。为了进一步改善这个能力,这个工作提出了学习自错(LeMa),与人类学习过程相似。例如,一个人学生不能解决一个 math 问题,他将从错误中学习,并推断出错误的步骤,以及如何修正错误。模仿这个错误驱动学习过程,LeMa 精细调整 LLM 的过程,使其能够更好地推理。具体来说,我们首先收集了不同 LLM 的错误推理路径,然后使用 GPT-4 作为 "修正者",以(1)识别错误步骤,(2)解释错误的原因,以及(3)修正错误,并生成最终的答案。实验结果显示 LeMa 的有效性:在五个基本 LLM 和两个数学推理任务上,LeMa invariably 提高了表现,与精细调整 CoT 数据 alone 相比。甚至可以帮助特殊化 LLM 如 WizardMath 和 MetaMath,在 GSM8K 和 MATH 这两个具有挑战性的任务上获得 85.4% 的通过率和 27.1% 的率,这超过了非执行的开源模型在这些任务上的最佳表现。我们将我们的代码、数据和模型公开 disponibile 在 https://github.com/microsoft/CodeT。

Offline RL with Observation Histories: Analyzing and Improving Sample Complexity

  • paper_url: http://arxiv.org/abs/2310.20663
  • repo_url: None
  • paper_authors: Joey Hong, Anca Dragan, Sergey Levine
  • for: The paper focuses on the problem of offline reinforcement learning (RL) in situations where the state is partially observed or unknown.
  • methods: The paper proposes a new loss function called the bisimulation loss, which encourages the RL algorithm to learn a compact representation of the history that is relevant for action selection.
  • results: The paper shows that the proposed loss can improve the performance of offline RL in a variety of tasks, and that it is closely related to good performance.
    Abstract Offline reinforcement learning (RL) can in principle synthesize more optimal behavior from a dataset consisting only of suboptimal trials. One way that this can happen is by "stitching" together the best parts of otherwise suboptimal trajectories that overlap on similar states, to create new behaviors where each individual state is in-distribution, but the overall returns are higher. However, in many interesting and complex applications, such as autonomous navigation and dialogue systems, the state is partially observed. Even worse, the state representation is unknown or not easy to define. In such cases, policies and value functions are often conditioned on observation histories instead of states. In these cases, it is not clear if the same kind of "stitching" is feasible at the level of observation histories, since two different trajectories would always have different histories, and thus "similar states" that might lead to effective stitching cannot be leveraged. Theoretically, we show that standard offline RL algorithms conditioned on observation histories suffer from poor sample complexity, in accordance with the above intuition. We then identify sufficient conditions under which offline RL can still be efficient -- intuitively, it needs to learn a compact representation of history comprising only features relevant for action selection. We introduce a bisimulation loss that captures the extent to which this happens, and propose that offline RL can explicitly optimize this loss to aid worst-case sample complexity. Empirically, we show that across a variety of tasks either our proposed loss improves performance, or the value of this loss is already minimized as a consequence of standard offline RL, indicating that it correlates well with good performance.
    摘要 偏向式学习(RL)可以在理论上Synthesize更优化的行为从一个只包含不优化尝试的数据集中。一种方式是将不同的尝试中的最佳部分“缝合” вместе,创造新的行为,每个状态都是内部分布的,但总体返回高于原来的。然而,在许多有趣和复杂的应用,如自动驾驶和对话系统,状态是部分可见的。甚至更糟糕,状态表示是未知或不容易定义。在这些情况下,策略和值函数通常是根据观察历史条件的,而不是根据状态。在这些情况下,是否可以在观察历史上进行类似的“缝合”,并不可能。我们证明了标准的偏向式RL算法 conditioned on 观察历史会受到低效样本复杂性的限制,符合上述直觉。我们然后提出了可能的有效条件,其中学习一个紧凑的历史表示,其中只包含行动选择所需的特征。我们引入了一种bisimulation损失,用于衡量这种情况是否发生。我们建议在offline RL中显式优化这种损失,以提高最坏情况的样本复杂性。实验表明,在各种任务中,我们的提议的损失可以提高性能,或者标准的offline RL可以自动地最小化这种损失,这表明它与好的性能有 corrrelation。

“Pick-and-Pass” as a Hat-Trick Class for First-Principle Memory, Generalizability, and Interpretability Benchmarks

  • paper_url: http://arxiv.org/abs/2310.20654
  • repo_url: None
  • paper_authors: Jason Wang, Ryan Rezai
  • For: The paper is written to study model-free reinforcement learning algorithms and their ability to learn memory in closed drafting games, specifically the popular game “Sushi Go Party!”.* Methods: The paper uses a set of closely-related games based on the set of cards in play to establish first-principle benchmarks for studying model-free reinforcement learning algorithms.* Results: The paper produces state-of-the-art results on the Sushi Go Party! environment and quantifies the generalizability of reinforcement learning algorithms trained on various sets of cards, establishing key trends between generalized performance and the set distance between the train and evaluation game configurations. Additionally, the paper fits decision rules to interpret the strategy of the learned models and compares them to the ranking preferences of human players, finding intuitive common rules and intriguing new moves.
    Abstract Closed drafting or "pick and pass" is a popular game mechanic where each round players select a card or other playable element from their hand and pass the rest to the next player. Games employing closed drafting make for great studies on memory and turn order due to their explicitly calculable memory of other players' hands. In this paper, we establish first-principle benchmarks for studying model-free reinforcement learning algorithms and their comparative ability to learn memory in a popular family of closed drafting games called "Sushi Go Party!", producing state-of-the-art results on this environment along the way. Furthermore, as Sushi Go Party! can be expressed as a set of closely-related games based on the set of cards in play, we quantify the generalizability of reinforcement learning algorithms trained on various sets of cards, establishing key trends between generalized performance and the set distance between the train and evaluation game configurations. Finally, we fit decision rules to interpret the strategy of the learned models and compare them to the ranking preferences of human players, finding intuitive common rules and intriguing new moves.
    摘要 封闭 drafting 或 "挑选并传递" 是一种受欢迎的游戏机制,每回合玩家从手中选择一张卡或其他可玩元素,并将剩下的交给下一位玩家。由于这种机制的明确可计算的记忆,因此关于记忆和轮次的研究非常有价值。在这篇论文中,我们建立了基于原理的基准值,用于研究无基础学习算法在受欢迎的closed drafting游戏 "Sushi Go Party!" 中学习记忆的能力,并在这个环境中实现了state-of-the-art的结果。此外,由于Sushi Go Party! 可以表示为一系列基于卡片的游戏,我们量化了各种卡片集的对游戏性能的影响,并确定了关键的总体趋势。最后,我们采用决策规则来解释学习模型的策略,并与人类玩家的排名偏好进行比较,发现了直观的共同规则以及意外的新动作。

Histopathological Image Analysis with Style-Augmented Feature Domain Mixing for Improved Generalization

  • paper_url: http://arxiv.org/abs/2310.20638
  • repo_url: https://github.com/vaibhav-khamankar/fusestyle
  • paper_authors: Vaibhav Khamankar, Sutanu Bera, Saumik Bhattacharya, Debashis Sen, Prabir Kumar Biswas
  • for: 增强机器学习模型对 histopathological 图像的泛化能力
  • methods: 使用 adaptive instance normalization 生成风格增强版图像
  • results: 比较与现有风格传输基于数据增强方法,我们的方法性能相似或更好,即使需要较少的计算和时间。
    Abstract Histopathological images are essential for medical diagnosis and treatment planning, but interpreting them accurately using machine learning can be challenging due to variations in tissue preparation, staining and imaging protocols. Domain generalization aims to address such limitations by enabling the learning models to generalize to new datasets or populations. Style transfer-based data augmentation is an emerging technique that can be used to improve the generalizability of machine learning models for histopathological images. However, existing style transfer-based methods can be computationally expensive, and they rely on artistic styles, which can negatively impact model accuracy. In this study, we propose a feature domain style mixing technique that uses adaptive instance normalization to generate style-augmented versions of images. We compare our proposed method with existing style transfer-based data augmentation methods and found that it performs similarly or better, despite requiring less computation and time. Our results demonstrate the potential of feature domain statistics mixing in the generalization of learning models for histopathological image analysis.
    摘要 In this study, we propose a feature domain style mixing technique that uses adaptive instance normalization to generate style-augmented versions of images. We compare our proposed method with existing style transfer-based data augmentation methods and found that it performs similarly or better, despite requiring less computation and time. Our results demonstrate the potential of feature domain statistics mixing in the generalization of learning models for histopathological image analysis.Here is the text in Simplified Chinese: histopathological 图像是医学诊断和治疗规划中不可或缺的,但是使用机器学习解释它们的准确性却有限制,这是因为样本准备、染色和扫描协议的变化。Domain generalization想要解决这些限制,使得学习模型能够通过新的数据集或人口来泛化。 Style transfer-based 数据增强是一种emerging技术,可以提高机器学习模型对 histopathological 图像的泛化能力。然而,现有的方法可能需要大量的计算时间,并且可能会因为艺术风格而导致模型精度下降。在这种研究中,我们提出了一种特征领域样式混合技术,使用适应实例Normalization来生成样式增强版图像。我们与现有的样式传输基于的数据增强方法进行比较,发现我们的提议方法能够达到相同或更好的性能,即使需要更少的计算时间和资源。我们的结果表明特征领域统计混合在机器学习模型的泛化中具有潜力。

LoRA Fine-tuning Efficiently Undoes Safety Training in Llama 2-Chat 70B

  • paper_url: http://arxiv.org/abs/2310.20624
  • repo_url: None
  • paper_authors: Simon Lermen, Charlie Rogers-Smith, Jeffrey Ladish
  • for: 这个论文的目的是研究语言模型的安全训练是否能够防止模型被恶意使用。
  • methods: 该论文使用了低级别适应(LoRA)方法进行训练,并使用了一个GPU和一个预算低于200美元来恢复Llama 2-Chat模型的安全训练。
  • results: 研究人员成功地使用了这种方法来破坏Llama 2-Chat模型的安全训练,并在两个退回测试中将模型的拒绝率降低到1%以下。此外,研究人员还 validate了这种方法对Llama 2-Chat模型的一致性。
    Abstract AI developers often apply safety alignment procedures to prevent the misuse of their AI systems. For example, before Meta released Llama 2-Chat, a collection of instruction fine-tuned large language models, they invested heavily in safety training, incorporating extensive red-teaming and reinforcement learning from human feedback. However, it remains unclear how well safety training guards against model misuse when attackers have access to model weights. We explore the robustness of safety training in language models by subversively fine-tuning the public weights of Llama 2-Chat. We employ low-rank adaptation (LoRA) as an efficient fine-tuning method. With a budget of less than $200 per model and using only one GPU, we successfully undo the safety training of Llama 2-Chat models of sizes 7B, 13B, and 70B. Specifically, our fine-tuning technique significantly reduces the rate at which the model refuses to follow harmful instructions. We achieve a refusal rate below 1% for our 70B Llama 2-Chat model on two refusal benchmarks. Our fine-tuning method retains general performance, which we validate by comparing our fine-tuned models against Llama 2-Chat across two benchmarks. Additionally, we present a selection of harmful outputs produced by our models. While there is considerable uncertainty about the scope of risks from current models, it is likely that future models will have significantly more dangerous capabilities, including the ability to hack into critical infrastructure, create dangerous bio-weapons, or autonomously replicate and adapt to new environments. We show that subversive fine-tuning is practical and effective, and hence argue that evaluating risks from fine-tuning should be a core part of risk assessments for releasing model weights.
    摘要

Autonomous Robotic Reinforcement Learning with Asynchronous Human Feedback

  • paper_url: http://arxiv.org/abs/2310.20608
  • repo_url: None
  • paper_authors: Max Balsells, Marcel Torne, Zihan Wang, Samedh Desai, Pulkit Agrawal, Abhishek Gupta
  • for: 本研究旨在实现自主机器人学习在真实环境中,无需专门设计奖励函数或重置机制。
  • methods: 该研究使用了 occasional non-expert human-in-the-loop 反馈,以学习 informative distance functions,并使用 simple self-supervised learning algorithm 来学习目标决策。
  • results: 研究表明,在缺省情况下,需要考虑当前探索策略的可达性,以确定哪些空间区域要探索。基于这一点,实现了一个实用的学习系统 - GEAR,可以让机器人直接在真实环境中学习自主地,无需中断。系统通过流动机器人经验到网页界面,并且只需 periodic asynchronous 非专业人员反馈。研究在 simulate 和实际环境中展示了其效果。
    Abstract Ideally, we would place a robot in a real-world environment and leave it there improving on its own by gathering more experience autonomously. However, algorithms for autonomous robotic learning have been challenging to realize in the real world. While this has often been attributed to the challenge of sample complexity, even sample-efficient techniques are hampered by two major challenges - the difficulty of providing well "shaped" rewards, and the difficulty of continual reset-free training. In this work, we describe a system for real-world reinforcement learning that enables agents to show continual improvement by training directly in the real world without requiring painstaking effort to hand-design reward functions or reset mechanisms. Our system leverages occasional non-expert human-in-the-loop feedback from remote users to learn informative distance functions to guide exploration while leveraging a simple self-supervised learning algorithm for goal-directed policy learning. We show that in the absence of resets, it is particularly important to account for the current "reachability" of the exploration policy when deciding which regions of the space to explore. Based on this insight, we instantiate a practical learning system - GEAR, which enables robots to simply be placed in real-world environments and left to train autonomously without interruption. The system streams robot experience to a web interface only requiring occasional asynchronous feedback from remote, crowdsourced, non-expert humans in the form of binary comparative feedback. We evaluate this system on a suite of robotic tasks in simulation and demonstrate its effectiveness at learning behaviors both in simulation and the real world. Project website https://guided-exploration-autonomous-rl.github.io/GEAR/.
    摘要 理想情况下,我们会将机器人放置在真实环境中,让它自动学习并不断改进,通过收集更多的经验。然而,自动机器人学习算法在真实世界中实现很困难。这经常被归结为样本复杂度的问题,而且甚至使用样本效率的技术也受到两大挑战:一是设置合适的奖励函数,二是实现不间断的培训。在这项工作中,我们描述了一种能够在真实世界中进行自主学习的系统,允许代理人通过不间断的培训来展现持续改进。我们的系统利用远程非专家用户的 occasional 非专业反馈来学习有用的距离函数,并使用简单的自适应学习算法来学习目标导向策略。我们发现在不间断培训情况下,特别重要的是考虑当前探索策略的可达性。基于这一点,我们实现了一个实用的学习系统——GEAR,允许机器人直接在真实环境中培训,并不需要繁琐的手动设计奖励函数或重置机制。系统将机器人经验流向网络界面,只需要 occasional 非专业用户在远程地提供偶极性反馈。我们在一组机器人任务上进行了模拟和实际环境的测试,并证明了该系统在学习行为方面的效果。项目网站:

What a Whole Slide Image Can Tell? Subtype-guided Masked Transformer for Pathological Image Captioning

  • paper_url: http://arxiv.org/abs/2310.20607
  • repo_url: None
  • paper_authors: Wenkang Qin, Rui Xu, Peixiang Huang, Xiaomin Wu, Heyu Zhang, Lin Luo
  • for: This paper proposes a new approach for pathological captioning of Whole Slide Images (WSIs) using Transformers, with the goal of improving the accuracy of computer-aided pathological diagnosis.
  • methods: The proposed approach, called Subtype-guided Masked Transformer (SGMT), treats a WSI as a sequence of sparse patches and generates an overall caption sentence from the sequence. An accompanying subtype prediction is introduced to guide the training process and enhance captioning accuracy. The Asymmetric Masked Mechanism approach is also used to tackle the large size constraint of pathological image captioning.
  • results: The authors report that their approach achieves superior performance compared to traditional RNN-based methods on the PatchGastricADC22 dataset.
    Abstract Pathological captioning of Whole Slide Images (WSIs), though is essential in computer-aided pathological diagnosis, has rarely been studied due to the limitations in datasets and model training efficacy. In this paper, we propose a new paradigm Subtype-guided Masked Transformer (SGMT) for pathological captioning based on Transformers, which treats a WSI as a sequence of sparse patches and generates an overall caption sentence from the sequence. An accompanying subtype prediction is introduced into SGMT to guide the training process and enhance the captioning accuracy. We also present an Asymmetric Masked Mechansim approach to tackle the large size constraint of pathological image captioning, where the numbers of sequencing patches in SGMT are sampled differently in the training and inferring phases, respectively. Experiments on the PatchGastricADC22 dataset demonstrate that our approach effectively adapts to the task with a transformer-based model and achieves superior performance than traditional RNN-based methods. Our codes are to be made available for further research and development.
    摘要 您好!我们在这篇论文中提出了一个新的思路,即基于Transformers的Subtype-guided Masked Transformer(SGMT),用于Computer-aided Pathological Diagnosis(CPD)中的标本描述。我们将整个标本视为一系列叠加的稀疏区块,并从这些区块中生成一个整体描述句子。此外,我们还引入了一个供应预测的子类别预测方法,以帮助训练过程并提高描述精度。此外,我们还提出了一个对应的Asymmetric Masked Mechanism方法,以解决Pathological Image Captioning中的大型数据集的限制。实验结果显示,我们的方法可以将Transformer-based模型适应这个任务,并在条件下超越传统RNN-based方法。我们的代码将会为更多的研究和发展提供。

Functional connectivity modules in recurrent neural networks: function, origin and dynamics

  • paper_url: http://arxiv.org/abs/2310.20601
  • repo_url: None
  • paper_authors: Jacob Tanner, Sina Mansour L., Ludovico Coletta, Alessandro Gozzi, Richard F. Betzel
  • for: 这个研究旨在解释大脑功能,尤其是神经同步现象在不同物种和组织水平上的普遍存在。
  • methods: 这个研究使用了回归神经网络, Investigating the functional role, origin, and dynamical implications of modular structures in correlation-based networks.
  • results: 研究发现,模块是功能准确的单位,贡献特殊的信息处理。模块自然形成,由输入层到回归层的偏好和重量差异引起。此外,研究发现,模块定义与类似功能连接,控制系统行为和动力学。
    Abstract Understanding the ubiquitous phenomenon of neural synchronization across species and organizational levels is crucial for decoding brain function. Despite its prevalence, the specific functional role, origin, and dynamical implication of modular structures in correlation-based networks remains ambiguous. Using recurrent neural networks trained on systems neuroscience tasks, this study investigates these important characteristics of modularity in correlation networks. We demonstrate that modules are functionally coherent units that contribute to specialized information processing. We show that modules form spontaneously from asymmetries in the sign and weight of projections from the input layer to the recurrent layer. Moreover, we show that modules define connections with similar roles in governing system behavior and dynamics. Collectively, our findings clarify the function, formation, and operational significance of functional connectivity modules, offering insights into cortical function and laying the groundwork for further studies on brain function, development, and dynamics.
    摘要 理解跨种类和组织层次的神经同步现象的重要性,可以帮助我们解读大脑的功能。尽管这种现象非常普遍,但模块结构在相关性网络中的特定功能作用、起源和动态影响仍然不够清楚。本研究使用基于系统神经科学任务的循环神经网络进行研究,以了解这些重要特征。我们发现,模块是功能协调的单位,对特定信息处理做出贡献。我们还发现,模块在输入层到循环层的权重和符号差异的基础上自然地形成。此外,我们发现模块在控制系统行为和动力学中扮演着相似的角色。总之,我们的发现可以解释功能连接模块的功能、形成和运作意义,为大脑功能、发展和动态研究提供新的思路和方法。

Taking control: Policies to address extinction risks from advanced AI

  • paper_url: http://arxiv.org/abs/2310.20563
  • repo_url: None
  • paper_authors: Andrea Miotti, Akash Wasil
  • for: 降低人类灭绝的风险,提出了三项政策建议。
  • methods: 建议包括成立多国AGI联盟(MAGIC),实施全球计算能力上限(global compute cap),并要求企业在开发强大AI系统时进行安全评估(gating critical experiments)。
  • results: 这三项政策建议可以有效地降低高度智能AI系统对人类灭绝的风险,并且可以让大多数AI创新得以继续不受限制。
    Abstract This paper provides policy recommendations to reduce extinction risks from advanced artificial intelligence (AI). First, we briefly provide background information about extinction risks from AI. Second, we argue that voluntary commitments from AI companies would be an inappropriate and insufficient response. Third, we describe three policy proposals that would meaningfully address the threats from advanced AI: (1) establishing a Multinational AGI Consortium to enable democratic oversight of advanced AI (MAGIC), (2) implementing a global cap on the amount of computing power used to train an AI system (global compute cap), and (3) requiring affirmative safety evaluations to ensure that risks are kept below acceptable levels (gating critical experiments). MAGIC would be a secure, safety-focused, internationally-governed institution responsible for reducing risks from advanced AI and performing research to safely harness the benefits of AI. MAGIC would also maintain emergency response infrastructure (kill switch) to swiftly halt AI development or withdraw model deployment in the event of an AI-related emergency. The global compute cap would end the corporate race toward dangerous AI systems while enabling the vast majority of AI innovation to continue unimpeded. Gating critical experiments would ensure that companies developing powerful AI systems are required to present affirmative evidence that these models keep extinction risks below an acceptable threshold. After describing these recommendations, we propose intermediate steps that the international community could take to implement these proposals and lay the groundwork for international coordination around advanced AI.
    摘要 这份报告提供了降低高级人工智能(AI)濒临灭绝风险的政策建议。首先,我们简要介绍高级AI濒临灭绝风险的背景信息。其次,我们认为企业自愿承诺是不适当和不够的回应。第三,我们描述了三个政策建议,可以实际地解决高级AI的威胁:(1)成立多国AGI协会(MAGIC),以实现多国安全协调高级AI的措施,并执行安全的AI研究;(2)实施全球计算能力套限(global compute cap),以结束危险AI系统的企业竞赛,同时允许大多数AI创新继续不受妨碍;(3)要求开发强大AI系统的公司提供证明,以确保风险保持在可接受水平(安全评估)。MAGIC是一个安全、安全感ocus的、国际治理的机构,负责降低高级AI濒临灭绝风险,并执行安全的AI研究。MAGIC还将维护紧急应急基础设施(kill switch),以快速干预AI开发或者模型部署在AI相关紧急情况下。全球计算能力套限将结束危险AI系统的企业竞赛,同时允许大多数AI创新继续不受妨碍。安全评估将确保开发强大AI系统的公司提供证明,以确保风险保持在可接受水平。文章最后,我们建议国际社会可以采取以下措施来实现这些建议,并为高级AI的国际协调做准备。

Breaking the Token Barrier: Chunking and Convolution for Efficient Long Text Classification with BERT

  • paper_url: http://arxiv.org/abs/2310.20558
  • repo_url: None
  • paper_authors: Aman Jaiswal, Evangelos Milios
  • for: 本研究旨在提高BERT模型在长输入文本中的应用,并解决BERT模型的512个token限制。
  • methods: 本研究提出了一种简单扩展BERT模型,称为ChunkBERT,可以让任何预训练模型进行长文本推理。ChunkBERT基于分割token表示和CNN层,可以与任何预训练BERT模型兼容。
  • results: 在一个用于对比不同长文本分类任务的benchmark中,BERT模型经过ChunkBERT扩展后在长样本中保持稳定性,而且只用了原始内存占用的6.25%。这些结果表明,通过简单地修改预训练BERT模型,可以实现高效的finetuning和推理。
    Abstract Transformer-based models, specifically BERT, have propelled research in various NLP tasks. However, these models are limited to a maximum token limit of 512 tokens. Consequently, this makes it non-trivial to apply it in a practical setting with long input. Various complex methods have claimed to overcome this limit, but recent research questions the efficacy of these models across different classification tasks. These complex architectures evaluated on carefully curated long datasets perform at par or worse than simple baselines. In this work, we propose a relatively simple extension to vanilla BERT architecture called ChunkBERT that allows finetuning of any pretrained models to perform inference on arbitrarily long text. The proposed method is based on chunking token representations and CNN layers, making it compatible with any pre-trained BERT. We evaluate chunkBERT exclusively on a benchmark for comparing long-text classification models across a variety of tasks (including binary classification, multi-class classification, and multi-label classification). A BERT model finetuned using the ChunkBERT method performs consistently across long samples in the benchmark while utilizing only a fraction (6.25\%) of the original memory footprint. These findings suggest that efficient finetuning and inference can be achieved through simple modifications to pre-trained BERT models.
    摘要 带基于Transformer的模型,尤其是BERT,在不同的自然语言处理任务中进行研究。然而,这些模型具有最大token数限制为512个,这使得在实际应用中处理长输入变得不容易。许多复杂的方法已经被提出来突破这个限制,但最近的研究表明这些模型在不同的分类任务中的效果存在问题。这些复杂的架构在手动挑选的长数据集上评估时和简单的基线模型相当或更差。在这种工作中,我们提出了一种基于BERT核心架构的简单扩展方法called ChunkBERT,该方法允许任何预训练模型进行长文本的推理。我们基于块化Token表示和CNN层,使其与任何预训练BERT模型兼容。我们在一个用于比较不同任务的长文本分类模型 benchmark 上solely evaluate ChunkBERT。一个使用ChunkBERT方法精度训练的BERT模型在长样本上表现一致,使用的内存占用量仅为原始的6.25%。这些发现表明,通过简单地修改预训练BERT模型,可以实现高效的训练和推理。

CapsFusion: Rethinking Image-Text Data at Scale

  • paper_url: http://arxiv.org/abs/2310.20550
  • repo_url: https://github.com/baaivision/CapsFusion
  • paper_authors: Qiying Yu, Quan Sun, Xiaosong Zhang, Yufeng Cui, Fan Zhang, Xinlong Wang, Jingjing Liu
  • for: 提高大型多Modal模型的泛化能力和可扩展性,以进行多modal任务 zero-shot learning。
  • methods: 利用大规模 web 上的图片文本对数据,以及使用 captioning 模型生成的假 caption,进行模型训练。
  • results: CapsFusion 模型在 COCO 和 NoCaps 测试集上的 CIDEr 得分提高 18.8 和 18.3 分,在模型性能、样本效率、世界知识深度和可扩展性三个方面表现出色。
    Abstract Large multimodal models demonstrate remarkable generalist ability to perform diverse multimodal tasks in a zero-shot manner. Large-scale web-based image-text pairs contribute fundamentally to this success, but suffer from excessive noise. Recent studies use alternative captions synthesized by captioning models and have achieved notable benchmark performance. However, our experiments reveal significant Scalability Deficiency and World Knowledge Loss issues in models trained with synthetic captions, which have been largely obscured by their initial benchmark success. Upon closer examination, we identify the root cause as the overly-simplified language structure and lack of knowledge details in existing synthetic captions. To provide higher-quality and more scalable multimodal pretraining data, we propose CapsFusion, an advanced framework that leverages large language models to consolidate and refine information from both web-based image-text pairs and synthetic captions. Extensive experiments show that CapsFusion captions exhibit remarkable all-round superiority over existing captions in terms of model performance (e.g., 18.8 and 18.3 improvements in CIDEr score on COCO and NoCaps), sample efficiency (requiring 11-16 times less computation than baselines), world knowledge depth, and scalability. These effectiveness, efficiency and scalability advantages position CapsFusion as a promising candidate for future scaling of LMM training.
    摘要 大型多modal模型在零值模式下表现出了惊人的通用能力,可以完成多种多modal任务。大规模的网络上的图片文本对象贡献到这些成功的基础,但是受到过度噪音的影响。最近的研究使用了由captioning模型生成的另外的caption,并达到了很好的 benchMark性能。然而,我们的实验表明,使用生成的caption会导致Scalability Deficiency和World Knowledge Loss问题,这些问题在初始的benchMark成功后被掩盖了。经过仔细分析,我们发现了这些问题的根本原因是现有的生成caption的语言结构过于简单,缺乏知识细节。为了提供更高质量和可扩展的多modal预训练数据,我们提出了CapsFusion,一种高级框架,利用大型语言模型来整合和提高来自网络上的图片文本对象和生成caption的信息。我们的广泛实验表明,CapsFusion caption在模型性能(例如,COCO和NoCaps中的CIDEr得分提高18.8和18.3)、样本效率(需要11-16倍的计算量 menos than baseline)、世界知识深度和可扩展性方面具有惊人的优势。这些优势使CapsFusion成为未来扩展LMM训练的优秀候选人。

LLMs may Dominate Information Access: Neural Retrievers are Biased Towards LLM-Generated Texts

  • paper_url: http://arxiv.org/abs/2310.20501
  • repo_url: None
  • paper_authors: Sunhao Dai, Yuqi Zhou, Liang Pang, Weihao Liu, Xiaolin Hu, Yong Liu, Xiao Zhang, Jun Xu
  • for: investigate the influence of LLM-generated documents on IR systems and the potential biases in neural retrieval models towards LLM-generated text.
  • methods: quantitative evaluation of different IR models in scenarios with human-written and LLM-generated texts, text compression analysis, and theoretical analysis.
  • results: the neural retrieval models tend to rank LLM-generated documents higher, which is referred to as the “source bias”; this bias is not limited to first-stage neural retrievers but also extends to second-stage neural re-rankers; the bias is due to the neural models’ ability to understand the semantic information of LLM-generated text.
    Abstract Recently, the emergence of large language models (LLMs) has revolutionized the paradigm of information retrieval (IR) applications, especially in web search. With their remarkable capabilities in generating human-like texts, LLMs have created enormous texts on the Internet. As a result, IR systems in the LLMs era are facing a new challenge: the indexed documents now are not only written by human beings but also automatically generated by the LLMs. How these LLM-generated documents influence the IR systems is a pressing and still unexplored question. In this work, we conduct a quantitative evaluation of different IR models in scenarios where both human-written and LLM-generated texts are involved. Surprisingly, our findings indicate that neural retrieval models tend to rank LLM-generated documents higher.We refer to this category of biases in neural retrieval models towards the LLM-generated text as the \textbf{source bias}. Moreover, we discover that this bias is not confined to the first-stage neural retrievers, but extends to the second-stage neural re-rankers. Then, we provide an in-depth analysis from the perspective of text compression and observe that neural models can better understand the semantic information of LLM-generated text, which is further substantiated by our theoretical analysis.We also discuss the potential server concerns stemming from the observed source bias and hope our findings can serve as a critical wake-up call to the IR community and beyond. To facilitate future explorations of IR in the LLM era, the constructed two new benchmarks and codes will later be available at \url{https://github.com/KID-22/LLM4IR-Bias}.
    摘要 最近,大语言模型(LLM)的出现对信息检索(IR)应用领域产生了革命性的变革,特别是在网络搜索中。 LLM 的出色的文本生成能力使得互联网上有巨量的文本出现。这使得 IR 系统在 LLM 时代面临一个新的挑战:索引文档现在不仅由人类创建,还可能由 LLM 自动生成。这些 LLM 生成的文本如何影响 IR 系统是一个压力性的问题,尚未得到探索。在这种情况下,我们进行了量化评估不同 IR 模型在人类写作和 LLM 生成文本卷积中的表现。结果显示,神经搜索模型倾向于将 LLM 生成的文本排名在首位。我们称这种偏见为“源偏见”。此外,我们发现这种偏见不仅存在于第一阶段神经搜索模型中,而且还扩展到第二阶段神经重新排名器中。接着,我们从文本压缩角度进行了深入分析,并证明了神经模型对 LLM 生成文本的 semantic 信息有更好的理解能力。最后,我们讨论了可能由此观察到的服务器问题,并希望我们的发现能够为 IR 社区和更广泛的领域产生一个重要的警示。为便于未来在 LLM 时代进行 IR 探索,我们将在 GitHub 上提供两个新的benchmark和代码,可以在 \url{https://github.com/KID-22/LLM4IR-Bias} 获取。

A Transformer-Based Model With Self-Distillation for Multimodal Emotion Recognition in Conversations

  • paper_url: http://arxiv.org/abs/2310.20494
  • repo_url: https://github.com/butterfliesss/sdt
  • paper_authors: Hui Ma, Jian Wang, Hongfei Lin, Bo Zhang, Yijia Zhang, Bo Xu
  • for: 本研究旨在提高对话中情感识别(ERC)的精度,即在对话中识别每个句子的情感。
  • methods: 本研究使用转换器模型(Transformer),并通过自适应归一化(SDT)来学习多modal信息之间的交互关系,同时动态学习不同modalities之间的权重。
  • results: 实验结果表明,使用SDT模型在IEMOCAP和MELD数据集上表现出色,超过了之前的基线。
    Abstract Emotion recognition in conversations (ERC), the task of recognizing the emotion of each utterance in a conversation, is crucial for building empathetic machines. Existing studies focus mainly on capturing context- and speaker-sensitive dependencies on the textual modality but ignore the significance of multimodal information. Different from emotion recognition in textual conversations, capturing intra- and inter-modal interactions between utterances, learning weights between different modalities, and enhancing modal representations play important roles in multimodal ERC. In this paper, we propose a transformer-based model with self-distillation (SDT) for the task. The transformer-based model captures intra- and inter-modal interactions by utilizing intra- and inter-modal transformers, and learns weights between modalities dynamically by designing a hierarchical gated fusion strategy. Furthermore, to learn more expressive modal representations, we treat soft labels of the proposed model as extra training supervision. Specifically, we introduce self-distillation to transfer knowledge of hard and soft labels from the proposed model to each modality. Experiments on IEMOCAP and MELD datasets demonstrate that SDT outperforms previous state-of-the-art baselines.
    摘要 倾向感知在对话中(ERC),认识对话中每句话的情感,对于建立同情机器非常重要。现有研究主要集中在文本模式下捕捉上下文和发言人相关的依赖关系,而忽略多Modal信息的重要性。与文本对话的情感认识不同,在多Modal ERC中捕捉 между语音和视频语音之间的内部和交叉模式互动,学习不同模式之间的权重,以及增强模式表示都是关键。在这篇论文中,我们提议一种基于变换器的模型,并使用自适应(SDT)进行学习。变换器模型利用内部和交叉模式 transformer 来捕捉内部和交叉模式互动,并通过设计层次闭合策略来动态学习不同模式之间的权重。此外,为了学习更加表达力的模式表示,我们将提议模型的软标签作为额外的训练监督。具体来说,我们引入自适应来传递模型中的硬标签和软标签知识到每个模式。实验结果表明,SDT在IEMOCAP和MELD dataset上超过了前一个基eline。

Unveiling Black-boxes: Explainable Deep Learning Models for Patent Classification

  • paper_url: http://arxiv.org/abs/2310.20478
  • repo_url: None
  • paper_authors: Md Shajalal, Sebastian Denef, Md. Rezaul Karim, Alexander Boden, Gunnar Stevens
  • for: 本研究旨在提出一种可解释的多标签专利分类框架,以提高人类专业人员对复杂AI技术的理解和管理能力。
  • methods: 本研究使用了深度神经网络(DNN)和层wise relevance propagation(LRP)技术,以提供人类可理解的预测解释。通过对模型的预测结果进行反向传播,找出每个预测的相关性分数,并使用这些分数来生成预测的解释。
  • results: 实验结果表明,使用本研究提出的方法可以在两个 dataset 上实现高效的多标签专利分类,并且生成的解释能够帮助人类更好地理解预测结果。
    Abstract Recent technological advancements have led to a large number of patents in a diverse range of domains, making it challenging for human experts to analyze and manage. State-of-the-art methods for multi-label patent classification rely on deep neural networks (DNNs), which are complex and often considered black-boxes due to their opaque decision-making processes. In this paper, we propose a novel deep explainable patent classification framework by introducing layer-wise relevance propagation (LRP) to provide human-understandable explanations for predictions. We train several DNN models, including Bi-LSTM, CNN, and CNN-BiLSTM, and propagate the predictions backward from the output layer up to the input layer of the model to identify the relevance of words for individual predictions. Considering the relevance score, we then generate explanations by visualizing relevant words for the predicted patent class. Experimental results on two datasets comprising two-million patent texts demonstrate high performance in terms of various evaluation measures. The explanations generated for each prediction highlight important relevant words that align with the predicted class, making the prediction more understandable. Explainable systems have the potential to facilitate the adoption of complex AI-enabled methods for patent classification in real-world applications.
    摘要 We train several DNN models, including Bi-LSTM, CNN, and CNN-BiLSTM, and propagate the predictions backward from the output layer up to the input layer of the model to identify the relevance of words for individual predictions. Considering the relevance score, we then generate explanations by visualizing relevant words for the predicted patent class.Experimental results on two datasets comprising two-million patent texts demonstrate high performance in terms of various evaluation measures. The explanations generated for each prediction highlight important relevant words that align with the predicted class, making the prediction more understandable. Explainable systems have the potential to facilitate the adoption of complex AI-enabled methods for patent classification in real-world applications.Translated into Simplified Chinese:最近的技术进步导致了大量的专利文本,使得人类专家分析和管理变得困难。现代的多标签专利分类方法多数使用深度神经网络(DNN),它们的决策过程可能是黑盒子,很难被人类理解。在这篇论文中,我们提出了一种新的深度可解释专利分类框架,通过引入层次相关传播(LRP)来提供人类可理解的解释。我们训练了多个DNN模型,包括Bi-LSTM、CNN和CNN-BiLSTM,并将预测结果倒退到模型的输入层,以确定每个预测的单词的相关性。根据相关性分数,我们 THEN generates explanations by visualizing relevant words for the predicted patent class.实验结果表明,我们在两个数据集上,每个数据集包含200万个专利文本,得到了高效的性能。解释生成的每个预测高亮显示了与预测类相关的关键单词,使预测更容易理解。可解释系统有可能在实际应用中推广复杂的AIEnabled方法。

Global Transformer Architecture for Indoor Room Temperature Forecasting

  • paper_url: http://arxiv.org/abs/2310.20476
  • repo_url: None
  • paper_authors: Alfredo V Clemente, Alessandro Nocente, Massimiliano Ruocco
  • for: 降低建筑物能源消耗和气候变化释放,提高建筑物内部温度预测精度,以实现有效的控制系统。
  • methods: 使用全球Transformer架构进行多间房内温度预测,并可以在整个数据集上训练模型,从而提高预测性能和简化部署和维护。
  • results: 该方法可以提高建筑物内部温度预测精度,并且可以减少建筑物的能源消耗和气候变化释放,为建筑物的能源协调和维护做出了重要贡献。
    Abstract A thorough regulation of building energy systems translates in relevant energy savings and in a better comfort for the occupants. Algorithms to predict the thermal state of a building on a certain time horizon with a good confidence are essential for the implementation of effective control systems. This work presents a global Transformer architecture for indoor temperature forecasting in multi-room buildings, aiming at optimizing energy consumption and reducing greenhouse gas emissions associated with HVAC systems. Recent advancements in deep learning have enabled the development of more sophisticated forecasting models compared to traditional feedback control systems. The proposed global Transformer architecture can be trained on the entire dataset encompassing all rooms, eliminating the need for multiple room-specific models, significantly improving predictive performance, and simplifying deployment and maintenance. Notably, this study is the first to apply a Transformer architecture for indoor temperature forecasting in multi-room buildings. The proposed approach provides a novel solution to enhance the accuracy and efficiency of temperature forecasting, serving as a valuable tool to optimize energy consumption and decrease greenhouse gas emissions in the building sector.
    摘要 一种全面的建筑能源系统管理可以导致有效的能源储存和建筑内部的舒适度提高。预测建筑物内部温度的算法在某些时间 horizons 上具有良好的信任度是控制系统的实施所必需的。这项工作提出了一种全球转换器架构,用于多房间内部温度预测,以优化能源消耗和减少建筑物中HVAC系统的绿色气体排放。在深度学习技术的发展下,可以开发出更加复杂的预测模型,而不是传统的反馈控制系统。全球转换器架构可以在整个数据集上训练,消除多个房间特定的模型需求,显著提高预测性能,并简化部署和维护。值得一提的是,这项研究是首次应用转换器架构来预测多房间内部温度。提出的方法可以增强温度预测的准确性和效率,并作为建筑领域中能源消耗优化和绿色气体排放减少的有价值工具。

Linked Papers With Code: The Latest in Machine Learning as an RDF Knowledge Graph

  • paper_url: http://arxiv.org/abs/2310.20475
  • repo_url: https://github.com/davidlamprecht/linkedpaperswithcode
  • paper_authors: Michael Färber, David Lamprecht
  • for: 本研究提出了一种基于RDF的机器学习文献知识图 Linked Papers With Code (LPWC), 其提供了大约400,000篇机器学习论文的完整、当前信息,包括论文所处理的任务、使用的数据集、实现的方法以及进行的评估结果。
  • methods: LPWC使用RDF格式将最新的机器学习发展翻译成知识图形式,同时允许科学影响量量化和学术关键内容推荐。
  • results: LPWC可以在 Linked Open Data 云中作为知识图存在,提供了多种格式,包括 RDF 填充文件、SPARQL 端点 для直接网络查询以及数据源与 SemOpenAlex、Wikidata 和 DBLP 的链接。此外,LPWC还提供了知识图嵌入,使其可以 direct 应用于机器学习应用程序。
    Abstract In this paper, we introduce Linked Papers With Code (LPWC), an RDF knowledge graph that provides comprehensive, current information about almost 400,000 machine learning publications. This includes the tasks addressed, the datasets utilized, the methods implemented, and the evaluations conducted, along with their results. Compared to its non-RDF-based counterpart Papers With Code, LPWC not only translates the latest advancements in machine learning into RDF format, but also enables novel ways for scientific impact quantification and scholarly key content recommendation. LPWC is openly accessible at https://linkedpaperswithcode.com and is licensed under CC-BY-SA 4.0. As a knowledge graph in the Linked Open Data cloud, we offer LPWC in multiple formats, from RDF dump files to a SPARQL endpoint for direct web queries, as well as a data source with resolvable URIs and links to the data sources SemOpenAlex, Wikidata, and DBLP. Additionally, we supply knowledge graph embeddings, enabling LPWC to be readily applied in machine learning applications.
    摘要 在这篇论文中,我们介绍了 Linked Papers With Code(LPWC),一个基于 RDF 的知识格 graphs,提供了大约 400,000 篇机器学习论文的全面、当前信息。这包括论文中 Addressed 的任务、使用的数据集、实现的方法、进行的评估和结果。与其非 RDF-based 对手 Papers With Code 不同,LPWC 不仅将最新的机器学习成果翻译成 RDF 格式,还允许科学影响量量化和学术关键内容推荐。LPWC 公开 accessible 于 ,并且被licensed under CC-BY-SA 4.0。作为 Linked Open Data 云中的知识格,我们向你提供了多种格式,从 RDF 冲dump 文件到 SPARQL 终结点,以便直接在网上进行查询,以及一个可以解析 URI 和数据源 SemOpenAlex、Wikidata 和 DBLP 的数据源。此外,我们还提供了知识格嵌入,使 LPWC 可以轻松应用于机器学习应用程序。

Critical Role of Artificially Intelligent Conversational Chatbot

  • paper_url: http://arxiv.org/abs/2310.20474
  • repo_url: None
  • paper_authors: Seraj A. M. Mostafa, Md Z. Islam, Mohammad Z. Islam, Fairose Jeehan, Saujanna Jafreen, Raihan U. Islam
  • for: 本研究旨在探讨 chatGPT 在学术上的伦理问题、其限制和特定用户群体的虚假使用。
  • methods: 本研究采用了多种方法,包括实验和分析,以探讨 chatGPT 的伦理和限制。
  • results: 研究发现了一些可能的假设和伦理问题,以及一些解决方案,以便避免不当使用和促进负责任 AI 交互。
    Abstract Artificially intelligent chatbot, such as ChatGPT, represents a recent and powerful advancement in the AI domain. Users prefer them for obtaining quick and precise answers, avoiding the usual hassle of clicking through multiple links in traditional searches. ChatGPT's conversational approach makes it comfortable and accessible for finding answers quickly and in an organized manner. However, it is important to note that these chatbots have limitations, especially in terms of providing accurate answers as well as ethical concerns. In this study, we explore various scenarios involving ChatGPT's ethical implications within academic contexts, its limitations, and the potential misuse by specific user groups. To address these challenges, we propose architectural solutions aimed at preventing inappropriate use and promoting responsible AI interactions.
    摘要 人工智能聊天机器人,如ChatGPT,是当今AI领域的一项新的和强大的进步。用户喜欢使用它们以获取快速和准确的答案,而不需要遍历多个链接。聊天机器人的对话方式使得它们易于使用,并且能够组织化地提供答案。然而,这些聊天机器人存在限制,特别是在提供准确答案和伦理问题方面。在这项研究中,我们探讨了ChatGPT在学术上下文中的伦理问题、其限制以及特定用户群体可能会滥用它的问题。为解决这些挑战,我们提议了一些建筑解决方案,以防止不当使用和推动负责任AI互动。

ACL Anthology Helper: A Tool to Retrieve and Manage Literature from ACL Anthology

  • paper_url: http://arxiv.org/abs/2310.20467
  • repo_url: None
  • paper_authors: Chen Tang, Frank Guerin, Chenghua Lin
  • for: This paper is written for researchers who need to efficiently access and organize literature from the ACL Anthology, a comprehensive collection of NLP and CL publications.
  • methods: The paper presents a tool called ACL Anthology Helper, which automates the process of parsing and downloading papers along with their meta-information, and stores them in a local MySQL database.
  • results: The tool offers over 20 operations for efficient literature retrieval, including “where,” “group,” “order,” and more, and has been successfully utilized in writing a survey paper (Tang et al.,2022a).
    Abstract The ACL Anthology is an online repository that serves as a comprehensive collection of publications in the field of natural language processing (NLP) and computational linguistics (CL). This paper presents a tool called ``ACL Anthology Helper''. It automates the process of parsing and downloading papers along with their meta-information, which are then stored in a local MySQL database. This allows for efficient management of the local papers using a wide range of operations, including "where," "group," "order," and more. By providing over 20 operations, this tool significantly enhances the retrieval of literature based on specific conditions. Notably, this tool has been successfully utilised in writing a survey paper (Tang et al.,2022a). By introducing the ACL Anthology Helper, we aim to enhance researchers' ability to effectively access and organise literature from the ACL Anthology. This tool offers a convenient solution for researchers seeking to explore the ACL Anthology's vast collection of publications while allowing for more targeted and efficient literature retrieval.
    摘要 ACL Anthology是一个在线存储库,它是自然语言处理(NLP)和计算语言学(CL)领域的完整收藏。这篇论文介绍了一种名为“ACL Anthology Helper”的工具。它自动将ACL Anthology中的文章和相关信息解析出来,并将其存储在本地的MySQL数据库中。这使得研究者可以使用各种操作(如“where”、“group”、“order”等)来高效管理本地文章。这个工具提供了超过20种操作,可以帮助研究者根据特定条件进行文献检索。尤其是,这个工具在写作一篇survey paper(Tang et al.,2022a)时得到了成功应用。我们通过引入ACL Anthology Helper,旨在增强研究者对ACL Anthology的文献检索和管理的能力。这个工具提供了一种方便的解决方案,帮助研究者更加高效地探索ACL Anthology的庞大文献收藏,并且允许更加Targeted和高效的文献检索。

Interpretable Neural PDE Solvers using Symbolic Frameworks

  • paper_url: http://arxiv.org/abs/2310.20463
  • repo_url: None
  • paper_authors: Yolanne Yi Ran Lee
  • for: 这篇论文旨在解决深度学习中的解释性问题,以提高人们对神经网络解决方案的理解和信任度。
  • methods: 这篇论文提出了将符号框架(如符号回归)与神经网络相结合的想法,以帮助将神经网络的决策过程变换为人类可读的数学表达。
  • results: 研究人员通过对数据集进行符号框架和神经网络的组合,发现这种方法可以提高神经网络的解释性和准确性。
    Abstract Partial differential equations (PDEs) are ubiquitous in the world around us, modelling phenomena from heat and sound to quantum systems. Recent advances in deep learning have resulted in the development of powerful neural solvers; however, while these methods have demonstrated state-of-the-art performance in both accuracy and computational efficiency, a significant challenge remains in their interpretability. Most existing methodologies prioritize predictive accuracy over clarity in the underlying mechanisms driving the model's decisions. Interpretability is crucial for trustworthiness and broader applicability, especially in scientific and engineering domains where neural PDE solvers might see the most impact. In this context, a notable gap in current research is the integration of symbolic frameworks (such as symbolic regression) into these solvers. Symbolic frameworks have the potential to distill complex neural operations into human-readable mathematical expressions, bridging the divide between black-box predictions and solutions.
    摘要 Note: The text has been translated into Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. Traditional Chinese is used in Hong Kong, Macau, and Taiwan.

AsGrad: A Sharp Unified Analysis of Asynchronous-SGD Algorithms

  • paper_url: http://arxiv.org/abs/2310.20452
  • repo_url: None
  • paper_authors: Rustem Islamov, Mher Safaryan, Dan Alistarh
  • for: 这个论文是为了研究 asynchronous-type algorithms for distributed SGD in heterogeneous setting 而写的。
  • methods: 这个论文使用了 asynchronous SGD 和 worker shuffling 等方法。
  • results: 论文提出了一种 unified convergence theory for non-convex smooth functions in heterogeneous regime, 并且证明了这种方法的性能可以与同步方法相比。
    Abstract We analyze asynchronous-type algorithms for distributed SGD in the heterogeneous setting, where each worker has its own computation and communication speeds, as well as data distribution. In these algorithms, workers compute possibly stale and stochastic gradients associated with their local data at some iteration back in history and then return those gradients to the server without synchronizing with other workers. We present a unified convergence theory for non-convex smooth functions in the heterogeneous regime. The proposed analysis provides convergence for pure asynchronous SGD and its various modifications. Moreover, our theory explains what affects the convergence rate and what can be done to improve the performance of asynchronous algorithms. In particular, we introduce a novel asynchronous method based on worker shuffling. As a by-product of our analysis, we also demonstrate convergence guarantees for gradient-type algorithms such as SGD with random reshuffling and shuffle-once mini-batch SGD. The derived rates match the best-known results for those algorithms, highlighting the tightness of our approach. Finally, our numerical evaluations support theoretical findings and show the good practical performance of our method.
    摘要 我们分析了分布式SGD中的异步类算法,在异种设置下进行分析,每个工作者都有自己的计算和通信速度,以及数据分布。在这些算法中,工作者在某个过去的迭代中计算了本地数据相关的可能偏移和随机梯度,然后将这些梯度返回给服务器而无需与其他工作者同步。我们提出了一种统一的收敛理论,用于非对称凸函数的收敛分析。我们的分析显示,异步算法的收敛率受到多种因素的影响,并且可以通过一些方法提高其性能。例如,我们引入了一种基于工作者排序的异步方法。在我们的分析中,我们还证明了SGD与随机排序和排序一次小批量SGD的收敛性能。实验结果支持我们的理论发现,并表明了我们的方法在实际应用中的良好性能。

Efficient Bayesian Learning Curve Extrapolation using Prior-Data Fitted Networks

  • paper_url: http://arxiv.org/abs/2310.20447
  • repo_url: None
  • paper_authors: Steven Adriaensen, Herilalaina Rakotoarison, Samuel Müller, Frank Hutter
  • for: 预测模型在训练后期的性能,基于早期训练过程中的性能。
  • methods: 使用先进的 Bayesian方法,并使用 prior-data fitted neural networks (PFNs) 来approximate Bayesian inference。
  • results: 对于10万个人工生成的学习曲线,LC-PFN可以更准确地 aproximate posterior predictive distribution,并且比 MCMC 快上万倍;对于20000个真实学习曲线,LC-PFN可以达到竞争性的性能。
    Abstract Learning curve extrapolation aims to predict model performance in later epochs of training, based on the performance in earlier epochs. In this work, we argue that, while the inherent uncertainty in the extrapolation of learning curves warrants a Bayesian approach, existing methods are (i) overly restrictive, and/or (ii) computationally expensive. We describe the first application of prior-data fitted neural networks (PFNs) in this context. A PFN is a transformer, pre-trained on data generated from a prior, to perform approximate Bayesian inference in a single forward pass. We propose LC-PFN, a PFN trained to extrapolate 10 million artificial right-censored learning curves generated from a parametric prior proposed in prior art using MCMC. We demonstrate that LC-PFN can approximate the posterior predictive distribution more accurately than MCMC, while being over 10 000 times faster. We also show that the same LC-PFN achieves competitive performance extrapolating a total of 20 000 real learning curves from four learning curve benchmarks (LCBench, NAS-Bench-201, Taskset, and PD1) that stem from training a wide range of model architectures (MLPs, CNNs, RNNs, and Transformers) on 53 different datasets with varying input modalities (tabular, image, text, and protein data). Finally, we investigate its potential in the context of model selection and find that a simple LC-PFN based predictive early stopping criterion obtains 2 - 6x speed-ups on 45 of these datasets, at virtually no overhead.
    摘要 学习曲线拟合目标是预测训练过程中后期模型表现,基于早期表现的预测。在这项工作中,我们认为,由于拟合学习曲线的不确定性,应采用 bayesian 方法。然而,现有方法存在以下两点问题:一是过于限制性,二是计算成本高。我们介绍了在这个上下文中的首次应用先进数据适应神经网络(PFN)。PFN 是一种基于先进数据生成的 transformer,用于在单个前进 pass 中进行 Approximate Bayesian Inference。我们提出了基于先进数据生成的 10 万个人工受限学习曲线(LC-PFN),用于预测 MCMC 生成的参数 posterior 分布。我们证明了 LC-PFN 可以更准确地预测 posterior 分布,而且比 MCMC 快上万分之一。此外,我们还证明了同一 LC-PFN 可以在四个学习曲线 benchmark 上 extrapolate 20 万个真实的学习曲线,来自训练各种模型结构(MLPs、CNNs、RNNs 和 Transformers)和 53 个不同的数据集(标量、图像、文本和蛋白质数据)。最后,我们研究了其在模型选择方面的潜在应用,并发现一个简单的 LC-PFN 基于的预测早期停止 criterion 可以在 45 个数据集上获得 2 - 6 倍的速度增加,而且几乎没有额外成本。

Analyzing the Impact of Companies on AI Research Based on Publications

  • paper_url: http://arxiv.org/abs/2310.20444
  • repo_url: https://github.com/LazaTabax/AI-Impact-Scientometrics
  • paper_authors: Michael Färber, Lazaros Tampakis
  • for: 本研究的目的是探讨AI研究中企业的影响,以及如何使这种影响变得可衡量。
  • methods: 研究者使用科学出版活动数据来比较大学和企业在AI领域的出版 activites,并发现了不同之处。
  • results: 研究发现,与企业合作撰写的AI论文在引用数量方面表现 significanly higher,并且在线上获得了更多的关注。
    Abstract Artificial Intelligence (AI) is one of the most momentous technologies of our time. Thus, it is of major importance to know which stakeholders influence AI research. Besides researchers at universities and colleges, researchers in companies have hardly been considered in this context. In this article, we consider how the influence of companies on AI research can be made measurable on the basis of scientific publishing activities. We compare academic- and company-authored AI publications published in the last decade and use scientometric data from multiple scholarly databases to look for differences across these groups and to disclose the top contributing organizations. While the vast majority of publications is still produced by academia, we find that the citation count an individual publication receives is significantly higher when it is (co-)authored by a company. Furthermore, using a variety of altmetric indicators, we notice that publications with company participation receive considerably more attention online. Finally, we place our analysis results in a broader context and present targeted recommendations to safeguard a harmonious balance between academia and industry in the realm of AI research.
    摘要 人工智能(AI)是当今最重要的技术之一,因此了解AI研究的各类涉及者对其影响非常重要。在这篇文章中,我们将研究如何使AI研究中公司的影响可衡量,基于科学出版活动。我们比较了过去十年的学术机构和企业合作撰写的人工智能论文,使用多种学术数据库的科学ometrics数据来找到这些组织的差异。虽然大多数论文仍然由学术机构出版,但我们发现,与公司合作者合写的论文的引用数量较高。此外,使用多种Altmetric指标,我们发现在线关注量较高。最后,我们将分析结果置于更广阔的背景下,并提供特点化的建议,以保持学术和产业在人工智能研究中的和谐协作。

  • paper_url: http://arxiv.org/abs/2310.20443
  • repo_url: None
  • paper_authors: Björn Schembera, Frank Wübbeling, Hendrik Kleikamp, Christine Biedinger, Jochen Fiedler, Marco Reidelbach, Aurela Shehu, Burkhard Schmidt, Thomas Koprucki, Dorothea Iglezakis, Dominik Göddeke
  • for: 本研究旨在提高数学研究数据的可访问性和共享性,通过semantic技术和数学基础的文档来增强数学研究数据的搜索和理解能力。
  • methods: 本研究使用了ontology和知识图来描述数学研究数据的结构和意义,并将数学模型和数学算法的解释和实现嵌入到了ontology中。
  • results: 通过使用ontology和知识图,可以准确地描述数学研究数据的结构和意义,提高了数学研究数据的可访问性和共享性。例如,通过使用ontology来描述微型扰动分析的数学模型和算法,可以增强对数学研究数据的理解和应用。
    Abstract In applied mathematics and related disciplines, the modeling-simulation-optimization workflow is a prominent scheme, with mathematical models and numerical algorithms playing a crucial role. For these types of mathematical research data, the Mathematical Research Data Initiative has developed, merged and implemented ontologies and knowledge graphs. This contributes to making mathematical research data FAIR by introducing semantic technology and documenting the mathematical foundations accordingly. Using the concrete example of microfracture analysis of porous media, it is shown how the knowledge of the underlying mathematical model and the corresponding numerical algorithms for its solution can be represented by the ontologies.
    摘要 在应用数学和相关领域,模拟优化工作流程是一种常见的方式,数学模型和数值算法在这些数学研究数据中扮演着关键的角色。为了使数学研究数据成为可重用的(FAIR),数学研究数据Initative已经开发、合并并实现了 ontologies和知识图。这使得数学研究数据可以具有 semantics 技术和文档其数学基础。使用微裂分分析的porous media作为具体例子,这篇文章示出了ontologies可以表示数学模型的知识和相应的数值算法的解决方法。

Raising the ClaSS of Streaming Time Series Segmentation

  • paper_url: http://arxiv.org/abs/2310.20431
  • repo_url: https://github.com/ermshaua/classification-score-stream
  • paper_authors: Arik Ermshaus, Patrick Schäfer, Ulf Leser
  • for: 这篇论文的主要目标是提出一种高效精度的流处理时间序列分 segmentation(STSS)算法,用于处理高频数字测量结果,以捕捉人、动物、工业、商业和自然过程中的变化。
  • methods: 该算法使用自我超vised时间序列分类来评估分区Homogeneity,并应用统计测试检测变化点(CPs)。
  • results: 对两个大量数据集和六个实际数据存档进行实验评估,发现ClaSS比八种现有的竞争者更加精度,其空间和时间复杂度独立于分区大小,仅与滑动窗口大小有关。 ClaSS还被实现为Apache Flink流处理引擎中的窗口运算符,其吞吐量为538个数据点每秒。
    Abstract Ubiquitous sensors today emit high frequency streams of numerical measurements that reflect properties of human, animal, industrial, commercial, and natural processes. Shifts in such processes, e.g. caused by external events or internal state changes, manifest as changes in the recorded signals. The task of streaming time series segmentation (STSS) is to partition the stream into consecutive variable-sized segments that correspond to states of the observed processes or entities. The partition operation itself must in performance be able to cope with the input frequency of the signals. We introduce ClaSS, a novel, efficient, and highly accurate algorithm for STSS. ClaSS assesses the homogeneity of potential partitions using self-supervised time series classification and applies statistical tests to detect significant change points (CPs). In our experimental evaluation using two large benchmarks and six real-world data archives, we found ClaSS to be significantly more precise than eight state-of-the-art competitors. Its space and time complexity is independent of segment sizes and linear only in the sliding window size. We also provide ClaSS as a window operator with an average throughput of 538 data points per second for the Apache Flink streaming engine.
    摘要 今天的普遍存在的传感器 emit高频流量数字测量值,这些测量值反映人类、动物、工业、商业和自然过程的性质。这些过程的变化,例如由外部事件或内部状态变化引起的变化,将在记录的信号中 manifest。流处理时序段分 segmentation (STSS) 的任务是将流分成连续的变量大小的分段,这些分段与观察过程或实体的状态相对应。partition操作本身必须能够与输入信号频率相应。我们介绍了一种新的、高效、准确的分段算法 ClaSS。 ClaSS 使用自我超级时间序列分类来评估可能的分段的一致性,并使用统计测试检测变化点 (CP)。在我们对两个大型benchmark和六个实际数据存档进行实验时,我们发现 ClaSS 比八种现状的竞争对手更加精准。其空间和时间复杂度是 independetnent of segment size和linearOnly在滑动窗口大小。我们还提供了 ClaSS 作为窗口运算符,其均匀输入数据点数为 Apache Flink 流处理引擎的538个/秒。

Investigating Relative Performance of Transfer and Meta Learning

  • paper_url: http://arxiv.org/abs/2311.00727
  • repo_url: None
  • paper_authors: Benji Alwis
  • for: 本研究旨在探讨如何使用有限数据进行神经网络学习,以提高自动驾驶技术的发展。
  • methods: 本研究比较了两种方法:转移学习和元学习,以解决神经网络在不同情况下的学习问题。
  • results: 研究发现,在不同的数据训练量和模型结构下,元学习方法在跨类和预测任务上表现更好,而转移学习方法在类相似任务上表现更好。
    Abstract Over the past decade, the field of machine learning has experienced remarkable advancements. While image recognition systems have achieved impressive levels of accuracy, they continue to rely on extensive training datasets. Additionally, a significant challenge has emerged in the form of poor out-of-distribution performance, which necessitates retraining neural networks when they encounter conditions that deviate from their training data. This limitation has notably contributed to the slow progress in self-driving car technology. These pressing issues have sparked considerable interest in methods that enable neural networks to learn effectively from limited data. This paper presents the outcomes of an extensive investigation designed to compare two distinct approaches, transfer learning and meta learning, as potential solutions to this problem. The overarching objective was to establish a robust criterion for selecting the most suitable method in diverse machine learning scenarios. Building upon prior research, I expanded the comparative analysis by introducing a new meta learning method into the investigation. Subsequently, I assessed whether the findings remained consistent under varying conditions. Finally, I delved into the impact of altering the size of the training dataset on the relative performance of these methods. This comprehensive exploration has yielded insights into the conditions favoring each approach, thereby facilitating the development of a criterion for selecting the most appropriate method in any given situation
    摘要 过去一个 décennio,机器学习领域有了非常出色的进步。虽然图像识别系统已经达到了很高的准确率,但它们仍然需要较长的训练数据来进行学习。此外,出现了一个重要的挑战,即对于训练数据的偏移而言,神经网络需要重新训练,这限制了自驾车技术的发展。这些问题引发了大量关注,探索有效地从有限的数据中学习神经网络的方法。这篇论文将展示了对两种不同的方法,传输学习和元学习,进行了广泛的比较分析。我们的总目标是在多样化的机器学习场景中选择最适合的方法。在先前的研究基础上,我将新引入元学习方法进行比较分析,然后评估这些结果是否在不同的条件下保持一致。最后,我研究了训练集大小如何影响这些方法之间的相对性能。这项全面的探索带来了每种方法在不同情况下的优势和缺陷,从而为选择最适合的方法提供了一个依据。

Meta Learning for Multi-View Visuomotor Systems

  • paper_url: http://arxiv.org/abs/2310.20414
  • repo_url: None
  • paper_authors: Benji Alwis, Nick Pears, Pengcheng Liu
  • for: 这篇论文提出了一种快速适应多视野感知系统 для robot 的新方法,可以将camera配置从基准设置中适应。
  • methods: 这篇论文使用meta-learning来精确地微调感知网络,保持政策网络不变。
  • results: 实验结果显示,这种方法可以将新训练集数量降低至基准性能水平。
    Abstract This paper introduces a new approach for quickly adapting a multi-view visuomotor system for robots to varying camera configurations from the baseline setup. It utilises meta-learning to fine-tune the perceptual network while keeping the policy network fixed. Experimental results demonstrate a significant reduction in the number of new training episodes needed to attain baseline performance.
    摘要 Here's the text in Simplified Chinese:这篇论文介绍了一种新的方法,可以快速地适应多视图视motor系统 robot 到不同的摄像头配置从基eline设置。它利用 meta-learning 来细化感知网络,保持政策网络不变。实验结果表明,需要新的训练集数量很大,以达到基eline性能。

Multi-Base Station Cooperative Sensing with AI-Aided Tracking

  • paper_url: http://arxiv.org/abs/2310.20403
  • repo_url: None
  • paper_authors: Elia Favarelli, Elisabetta Matricardi, Lorenzo Pucci, Enrico Paolini, Wen Xu, Andrea Giorgetti
  • for: 这个研究旨在提高 JOINT SENSING AND COMMUNICATION(JSC)网络的性能,该网络由多个基站(BS)合作,通过协调中心(FC)交换探测环境信息,同时与多个用户设备(UE)建立通信链接。
  • methods: 每个BS在网络中 acted as a monostatic radar系统,对监测区域进行全面扫描,生成距离角图,提供目标位置信息。图像被FC进行融合,然后使用卷积神经网络(CNN)进行目标类别预测,并使用自适应归一化算法将探测来自同一个目标的检测分组更加有效。最后,使用潜在矩阵(PHD)筛选器和多 Бер诺利米xture(MBM)筛选器来估算目标的状态。
  • results: numerical results表明,我们的框架可以提供很好的探测性能,实现距离估计在60cm左右,同时为UE提供通信服务,减少了对通信容量的干扰在10%到20%之间。研究还表明,在特定的Case study中,使用3个BS进行探测可以保持 Localization error在1米左右。
    Abstract In this work, we investigate the performance of a joint sensing and communication (JSC) network consisting of multiple base stations (BSs) that cooperate through a fusion center (FC) to exchange information about the sensed environment while concurrently establishing communication links with a set of user equipments (UEs). Each BS within the network operates as a monostatic radar system, enabling comprehensive scanning of the monitored area and generating range-angle maps that provide information regarding the position of a group of heterogeneous objects. The acquired maps are subsequently fused in the FC. Then, a convolutional neural network (CNN) is employed to infer the category of the targets, e.g., pedestrians or vehicles, and such information is exploited by an adaptive clustering algorithm to group the detections originating from the same target more effectively. Finally, two multi-target tracking algorithms, the probability hypothesis density (PHD) filter and multi-Bernoulli mixture (MBM) filter, are applied to estimate the state of the targets. Numerical results demonstrated that our framework could provide remarkable sensing performance, achieving an optimal sub-pattern assignment (OSPA) less than 60 cm, while keeping communication services to UEs with a reduction of the communication capacity in the order of 10% to 20%. The impact of the number of BSs engaged in sensing is also examined, and we show that in the specific case study, 3 BSs ensure a localization error below 1 m.
    摘要 在这项工作中,我们研究了一个共同感知和通信(JSC)网络,该网络由多个基站(BS)组成,通过协调中心(FC)进行信息交换,以便同时建立用户设备(UE)与网络的通信链接。每个BS在网络中作为单STATIC雷达系统,可进行全面扫描监测区域,生成距离角度图,提供监测区域中对象的位置信息。获取的图像被后续进行拟合神经网络(CNN)进行推断,确定目标的类别(例如,人行或车辆),并将其用于适应性归类算法,以更有效地将检测来自同一个目标的分组。最后,我们使用了概率 Hypothesis Density(PHD)筛选器和多 Bernoulli 混合(MBM)筛选器来估计目标的状态。 numerics 结果表明,我们的框架可以提供很出色的感知性能,达到至少60 cm的最佳子模式分配(OSPA),同时保持对 UE 的通信服务的减少,在10%-20%的范围内。我们还研究了BSs的数量对感知性能的影响,并显示在特定的案例研究中,3个BSs可以保持localization errorBelow 1 m。

Utilitarian Algorithm Configuration

  • paper_url: http://arxiv.org/abs/2310.20401
  • repo_url: https://github.com/drgrhm/utilitarian-ac
  • paper_authors: Devon R. Graham, Kevin Leyton-Brown, Tim Roughgarden
  • for: 该论文旨在提供一种能够最大化算法的用户利益的配置方法,同时提供理论保证算法性能的 garantia。
  • methods: 该论文使用了一种新的规范算法,即 Utility-based Configuration(Utility配置),该算法可以提供更好的算法性能 guarantees。
  • results: 该论文提供了一些有效和理论上有保证的配置方法,并证明了这些方法的运行时间上下界与理论下界相似,同时也通过实验证明了这些方法的性能。
    Abstract We present the first nontrivial procedure for configuring heuristic algorithms to maximize the utility provided to their end users while also offering theoretical guarantees about performance. Existing procedures seek configurations that minimize expected runtime. However, very recent theoretical work argues that expected runtime minimization fails to capture algorithm designers' preferences. Here we show that the utilitarian objective also confers significant algorithmic benefits. Intuitively, this is because mean runtime is dominated by extremely long runs even when they are incredibly rare; indeed, even when an algorithm never gives rise to such long runs, configuration procedures that provably minimize mean runtime must perform a huge number of experiments to demonstrate this fact. In contrast, utility is bounded and monotonically decreasing in runtime, allowing for meaningful empirical bounds on a configuration's performance. This paper builds on this idea to describe effective and theoretically sound configuration procedures. We prove upper bounds on the runtime of these procedures that are similar to theoretical lower bounds, while also demonstrating their performance empirically.
    摘要 我们提出了首先的非负方法来配置假设算法以最大化它们的用户价值,同时提供了理论保证的性能。现有的方法寻找最小化预期运行时间的配置,但是非常最近的理论工作表明,预期运行时间最小化不能捕捉算法设计师的偏好。我们显示了Utilitarian目标也具有重要的算法优点。几何意义的是,平均运行时间受到极端长的运行影响,甚至当算法从未发生这些极端长运行时,配置程序还是需要实际进行大量的实验来证明这一点。相比之下,价值是封顶的和递减的,允许有意义的实验边界来评估配置的性能。这篇论文基于这个想法,描述了有效且理论上正确的配置方法。我们证明了这些方法的时间上限,与理论下限相似,而且也在实际中证明了它们的性能。

Do large language models solve verbal analogies like children do?

  • paper_url: http://arxiv.org/abs/2310.20384
  • repo_url: https://github.com/cstevenson-uva/verbal_analogies_kids_vs_llms
  • paper_authors: Claire E. Stevenson, Mathilde ter Veen, Rochelle Choenni, Han L. J. van der Maas, Ekaterina Shutova
  • for: investigate whether large language models (LLMs) solve verbal analogies using associations, similar to what children do.
  • methods: used verbal analogies extracted from an online adaptive learning environment, where 14,002 7-12 year olds from the Netherlands solved 622 analogies in Dutch.
  • results: the six tested Dutch monolingual and multilingual LLMs performed around the same level as children, with MGPT performing worst, and XLM-V and GPT-3 the best. However, when controlling for associative processes, each model’s performance level drops 1-2 years.
    Abstract Analogy-making lies at the heart of human cognition. Adults solve analogies such as \textit{Horse belongs to stable like chicken belongs to ...?} by mapping relations (\textit{kept in}) and answering \textit{chicken coop}. In contrast, children often use association, e.g., answering \textit{egg}. This paper investigates whether large language models (LLMs) solve verbal analogies in A:B::C:? form using associations, similar to what children do. We use verbal analogies extracted from an online adaptive learning environment, where 14,002 7-12 year-olds from the Netherlands solved 622 analogies in Dutch. The six tested Dutch monolingual and multilingual LLMs performed around the same level as children, with MGPT performing worst, around the 7-year-old level, and XLM-V and GPT-3 the best, slightly above the 11-year-old level. However, when we control for associative processes this picture changes and each model's performance level drops 1-2 years. Further experiments demonstrate that associative processes often underlie correctly solved analogies. We conclude that the LLMs we tested indeed tend to solve verbal analogies by association with C like children do.
    摘要 人类认知中的比喻创造是非常重要的。大人解决比喻问题,如“马属于牧场如鸡属于...?”,通过映射关系(如“被保持在”),并回答“鸡巢”。而孩子们通常使用关联,例如回答“蛋”。这篇论文研究了大语言模型(LLMs)是否使用关联来解决A:B::C:?的文本比喻问题。我们使用来自在线适应学习环境中的622个 dutch语文比喻,14,002名7-12岁的荷兰孩子解决了这些比喻。我们测试了6个荷兰单语言和多语言LLMs,其表现与孩子们的水平相似,MGPT表现最差,约等于7岁的水平,而XLM-V和GPT-3表现最好,稍高于11岁的水平。但当我们控制了相关过程时,每个模型的表现水平下降1-2年。进一步的实验表明,相关过程通常对正确解决比喻问题起到了关键作用。我们 conclude That the LLMs we tested tend to solve verbal analogies by association, similar to how children do.

A Comprehensive Study of GPT-4V’s Multimodal Capabilities in Medical Imaging

  • paper_url: http://arxiv.org/abs/2310.20381
  • repo_url: None
  • paper_authors: Yingshu Li, Yunyi Liu, Zhanyu Wang, Xinyu Liang, Lingqiao Liu, Lei Wang, Leyang Cui, Zhaopeng Tu, Longyue Wang, Luping Zhou
  • for: 这个论文是为了评估GPT-4V在医学影像任务中的能力,包括肺X射线报告生成和医学视Question Answering(VQA)等。
  • methods: 这个研究使用了公共可用的标准套件进行评估GPT-4V的性能,并发现GPT-4V在肺X射线报告生成任务中表现出色,尤其是当使用了结构化提示时。但在MIMIC-CXR数据集上的评估中,GPT-4V在某些评价指标上存在改进的空间,如CIDEr。
  • results: GPT-4V在医学VQA任务中能够分辨问题类型,但与现有标准准剂相比,准确率不高。此外,我们发现了传统评价指标如BLEU分数的局限性,提出了开发更加Semantic robust的评价方法的需要。在视觉固定任务中,GPT-4V表现出初步的承诺,但精度不高,特别是在特定的医学器官和标志的识别方面。
    Abstract This paper presents a comprehensive evaluation of GPT-4V's capabilities across diverse medical imaging tasks, including Radiology Report Generation, Medical Visual Question Answering (VQA), and Visual Grounding. While prior efforts have explored GPT-4V's performance in medical imaging, to the best of our knowledge, our study represents the first quantitative evaluation on publicly available benchmarks. Our findings highlight GPT-4V's potential in generating descriptive reports for chest X-ray images, particularly when guided by well-structured prompts. However, its performance on the MIMIC-CXR dataset benchmark reveals areas for improvement in certain evaluation metrics, such as CIDEr. In the domain of Medical VQA, GPT-4V demonstrates proficiency in distinguishing between question types but falls short of prevailing benchmarks in terms of accuracy. Furthermore, our analysis finds the limitations of conventional evaluation metrics like the BLEU score, advocating for the development of more semantically robust assessment methods. In the field of Visual Grounding, GPT-4V exhibits preliminary promise in recognizing bounding boxes, but its precision is lacking, especially in identifying specific medical organs and signs. Our evaluation underscores the significant potential of GPT-4V in the medical imaging domain, while also emphasizing the need for targeted refinements to fully unlock its capabilities.
    摘要

A Machine Learning-Based Framework for Clustering Residential Electricity Load Profiles to Enhance Demand Response Programs

  • paper_url: http://arxiv.org/abs/2310.20367
  • repo_url: None
  • paper_authors: Vasilis Michalakopoulos, Elissaios Sarmas, Ioannis Papias, Panagiotis Skaloumpakas, Vangelis Marinakis, Haris Doukas
  • for: 这个研究的目的是要使用机器学习方法来实现最佳的负载profile,并通过实际案例研究,探索适当的消耗者群组,以提高能源节省和需求回应program的效果。
  • methods: 这个研究使用了四种常用的分 clustering algorithms,包括K-means、K-medoids、 Hierarchical Agglomerative Clustering和Density-based Spatial Clustering,并通过empirical analysis和多个评估指标进行评估。此外,这个研究还将问题定义为一个 probablistic classification 问题,并使用Explainable AI (xAI) 技术增加解释性。
  • results: 根据分 clustering algorithm 分析,这个案例的最佳 cluster 数量为七个,但是我们的方法发现,这七个cluster中有两个cluster,即10%的数据集,具有明显的内部不一致,因此我们将其分为九个cluster。这显示了我们的方法的标准化和多样性。
    Abstract Load shapes derived from smart meter data are frequently employed to analyze daily energy consumption patterns, particularly in the context of applications like Demand Response (DR). Nevertheless, one of the most important challenges to this endeavor lies in identifying the most suitable consumer clusters with similar consumption behaviors. In this paper, we present a novel machine learning based framework in order to achieve optimal load profiling through a real case study, utilizing data from almost 5000 households in London. Four widely used clustering algorithms are applied specifically K-means, K-medoids, Hierarchical Agglomerative Clustering and Density-based Spatial Clustering. An empirical analysis as well as multiple evaluation metrics are leveraged to assess those algorithms. Following that, we redefine the problem as a probabilistic classification one, with the classifier emulating the behavior of a clustering algorithm,leveraging Explainable AI (xAI) to enhance the interpretability of our solution. According to the clustering algorithm analysis the optimal number of clusters for this case is seven. Despite that, our methodology shows that two of the clusters, almost 10\% of the dataset, exhibit significant internal dissimilarity and thus it splits them even further to create nine clusters in total. The scalability and versatility of our solution makes it an ideal choice for power utility companies aiming to segment their users for creating more targeted Demand Response programs.
    摘要 <>将文本翻译成简化中文。<>智能计量数据中的形状频繁地用于分析每天的能源消耗模式,特别在应用程序方面如需求应答(DR)。然而,在这种尝试中最重要的挑战是确定最适合的消耗者群组,具有类似的消耗行为。在这篇论文中,我们提出了一种新的机器学习基于框架,通过实际案例研究,使用伦敦 almost 5000户家庭的数据。我们运用了四种广泛使用的分 clustering 算法,即 K-means、K-medoids、 Hierarchical Agglomerative Clustering 和 Density-based Spatial Clustering。我们使用实际分析和多种评价指标来评估这些算法。然后,我们将问题重新定义为一个probabilistic分类问题, классификатор模拟分 clustering 算法的行为,使用 Explainable AI (xAI) 提高解释性。根据分 clustering 算法分析,这个案例中的最佳数量为七个。尽管如此,我们的方法显示,这些数据中约10%的数据存在 significante internal不同,因此我们将其进一步分为九个群组。我们的解决方案具有可扩展性和多样性,使其成为电力供应公司为创建更有目标的需求应答计划而选择的理想选择。

Mathematical Introduction to Deep Learning: Methods, Implementations, and Theory

  • paper_url: http://arxiv.org/abs/2310.20360
  • repo_url: https://github.com/introdeeplearning/book
  • paper_authors: Arnulf Jentzen, Benno Kuckuck, Philippe von Wurstemberger
  • for: This paper provides an introduction to deep learning algorithms, covering essential components such as ANN architectures and optimization algorithms, as well as theoretical aspects like approximation capacities and generalization errors.
  • methods: The paper reviews various deep learning methods, including fully-connected feedforward ANNs, convolutional ANNs, recurrent ANNs, residual ANNs, and ANNs with batch normalization. It also covers optimization algorithms such as SGD, accelerated methods, and adaptive methods.
  • results: The paper provides a solid foundation in deep learning algorithms for students and scientists who are new to the field, and offers a firmer mathematical understanding of the objects and methods considered in deep learning for practitioners. Additionally, it reviews deep learning approximation methods for PDEs, including physics-informed neural networks (PINNs) and deep Galerkin methods.
    Abstract This book aims to provide an introduction to the topic of deep learning algorithms. We review essential components of deep learning algorithms in full mathematical detail including different artificial neural network (ANN) architectures (such as fully-connected feedforward ANNs, convolutional ANNs, recurrent ANNs, residual ANNs, and ANNs with batch normalization) and different optimization algorithms (such as the basic stochastic gradient descent (SGD) method, accelerated methods, and adaptive methods). We also cover several theoretical aspects of deep learning algorithms such as approximation capacities of ANNs (including a calculus for ANNs), optimization theory (including Kurdyka-{\L}ojasiewicz inequalities), and generalization errors. In the last part of the book some deep learning approximation methods for PDEs are reviewed including physics-informed neural networks (PINNs) and deep Galerkin methods. We hope that this book will be useful for students and scientists who do not yet have any background in deep learning at all and would like to gain a solid foundation as well as for practitioners who would like to obtain a firmer mathematical understanding of the objects and methods considered in deep learning.
    摘要 这本书的目标是为读者提供深度学习算法的基础知识。我们详细介绍了深度学习算法中的主要组成部分,包括各种人工神经网络架构(如全连接Feedforward ANNs、卷积ANNs、回传ANNs、差分ANNs和批处理正则化)以及不同的优化算法(如基本权重补做法、加速方法和自适应方法)。我们还讲述了深度学习算法中一些理论方面的话题,例如神经网络的表达能力(包括神经网络的几何学)、优化理论(包括库德日-{\L}ojasiewicz不等式)和泛化误差。在书的最后部分,我们还详细介绍了一些深度学习方法用于解决 partial differential equations(PDEs),包括物理学 informed neural networks(PINNs)和深度Galerkin方法。我们希望这本书能够为没有深度学习背景的学生和科学家提供坚实的基础知识,同时也为已有深度学习背景的实践者提供更加强大的数学基础理解。

Enhancing the Spatial Awareness Capability of Multi-Modal Large Language Model

  • paper_url: http://arxiv.org/abs/2310.20357
  • repo_url: None
  • paper_authors: Yongqiang Zhao, Zhenyu Li, Zhi Jin, Feng Zhang, Haiyan Zhao, Chengfeng Dou, Zhengwei Tao, Xinhai Xu, Donghong Liu
  • for: 这个论文主要用于提高多Modal大型自然语言模型(MLLM)中的空间意识能力。
  • methods: 该论文提出使用更精确的物体间空间位置信息来引导MLLM更好地回答用户关于空间意识的问题。 specifically, 使用算法获取多模式空间信息和场景图来获取相关的空间信息和场景细节。
  • results: 经过广泛的实验 validate the proposed method can effectively enhance the spatial awareness tasks and associated tasks of MLLM.
    Abstract The Multi-Modal Large Language Model (MLLM) refers to an extension of the Large Language Model (LLM) equipped with the capability to receive and infer multi-modal data. Spatial awareness stands as one of the crucial abilities of MLLM, encompassing diverse skills related to understanding spatial relationships among objects and between objects and the scene area. Industries such as autonomous driving, smart healthcare, robotics, virtual, and augmented reality heavily demand MLLM's spatial awareness capabilities. However, there exists a noticeable gap between the current spatial awareness capabilities of MLLM and the requirements set by human needs. To address this issue, this paper proposes using more precise spatial position information between objects to guide MLLM in providing more accurate responses to user-related inquiries. Specifically, for a particular multi-modal task, we utilize algorithms for acquiring geometric spatial information and scene graphs to obtain relevant geometric spatial information and scene details of objects involved in the query. Subsequently, based on this information, we direct MLLM to address spatial awareness-related queries posed by the user. Extensive experiments were conducted in benchmarks such as MME, MM-Vet, and other multi-modal large language models. The experimental results thoroughly confirm the efficacy of the proposed method in enhancing the spatial awareness tasks and associated tasks of MLLM.
    摘要 多模式大语言模型(MLLM)指的是基于大语言模型(LLM)的扩展,具有接受和理解多模式数据的能力。在多模式语言理解中,空间意识是关键的能力之一,涵盖了对物体之间和场景区域之间的物体位置关系的理解。自动驾驶、智能医疗、 робоalomics、虚拟和增强现实等领域都强烈需要 MLLM 的空间意识能力。然而,目前 MLLM 的空间意识能力与人类需求之间存在显著的差距。为解决这问题,本文提出使用更精确的物体之间的空间位置信息来导引 MLLM 在用户关注的问题中提供更加准确的回答。特别是,对于某个多模式任务,我们使用算法获取物体之间的 геометри空间信息和场景图来获取相关的 геометри空间信息和场景细节。然后,根据这些信息,我们向 MLLM 提供空间意识相关的查询。我们在 MME、MM-Vet 等多模式大语言模型的benchmark中进行了广泛的实验,并确认了提议方法的可行性和有效性。

Muscle volume quantification: guiding transformers with anatomical priors

  • paper_url: http://arxiv.org/abs/2310.20355
  • repo_url: None
  • paper_authors: Louise Piecuch, Vanessa Gonzales Duque, Aurélie Sarcher, Enzo Hollville, Antoine Nordez, Giuseppe Rabita, Gaël Guilhem, Diana Mateus
  • for: 这篇论文的目的是提出一种自动分类18个下肢肌肉的方法,帮助体育领域中的形状分析。
  • methods: 这篇论文使用了一种混合构造的模型,结合了卷积构造和视觉trasformer构造单元,以便更好地捕捉肌肉的形状特征。此外,还添加了一个邻接矩阵的优化损失函数,以利用肌肉的 анатоical priors。
  • results: 实验结果显示,这种混合模型可以从较小的数据库中训练出高精度的预测,而且邻接矩阵优化损失函数可以提高预测的精度。
    Abstract Muscle volume is a useful quantitative biomarker in sports, but also for the follow-up of degenerative musculo-skelletal diseases. In addition to volume, other shape biomarkers can be extracted by segmenting the muscles of interest from medical images. Manual segmentation is still today the gold standard for such measurements despite being very time-consuming. We propose a method for automatic segmentation of 18 muscles of the lower limb on 3D Magnetic Resonance Images to assist such morphometric analysis. By their nature, the tissue of different muscles is undistinguishable when observed in MR Images. Thus, muscle segmentation algorithms cannot rely on appearance but only on contour cues. However, such contours are hard to detect and their thickness varies across subjects. To cope with the above challenges, we propose a segmentation approach based on a hybrid architecture, combining convolutional and visual transformer blocks. We investigate for the first time the behaviour of such hybrid architectures in the context of muscle segmentation for shape analysis. Considering the consistent anatomical muscle configuration, we rely on transformer blocks to capture the longrange relations between the muscles. To further exploit the anatomical priors, a second contribution of this work consists in adding a regularisation loss based on an adjacency matrix of plausible muscle neighbourhoods estimated from the training data. Our experimental results on a unique database of elite athletes show it is possible to train complex hybrid models from a relatively small database of large volumes, while the anatomical prior regularisation favours better predictions.
    摘要 筋量是运动领域中有用的量化生物标志,同时也可以用于跟踪萎缩肌骨疾病的进程。除了量度筋量外,可以从医疗图像中提取其他形态生物标志。 manual segmentation 仍然是今天的标准方法,尽管它很时间consuming。我们提议一种自动 segmentation 18 个Lower limb 肌肉的3D 磁共振成像图像,以帮助这种形态分析。由于不同肌肉组织的组织学特征无法在 MR 图像中分辨,因此肌肉分 segmentation 算法无法仅基于外观而进行。然而,这些边缘很难于检测,并且在不同主题之间存在差异。为了解决这些挑战,我们提议一种 hybrid 架构,结合 convolutional 和视觉转换块。我们在 muscle segmentation 领域中首次研究了这种混合架构的行为。由于肌肉组织的一致性,我们利用转换块来捕捉肌肉之间的长距离关系。为了进一步利用 анатомиче priors,我们的第二个贡献是添加一个基于 adjacency 矩阵的准则损失,该矩阵在训练数据中估计肌肉之间的可能性关系。我们在一个Unique 的数据库中进行了实验,并证明可以从一个相对较小的数据库中训练复杂的混合模型,而且准则损失会帮助预测更好。

Combining Shape Completion and Grasp Prediction for Fast and Versatile Grasping with a Multi-Fingered Hand

  • paper_url: http://arxiv.org/abs/2310.20350
  • repo_url: None
  • paper_authors: Matthias Humt, Dominik Winkelbauer, Ulrich Hillenbrand, Berthold Bäuml
  • for: 本研究旨在解决助手机器人抓取物品时,因为无或有限知识而难以完成的问题。
  • methods: 该研究使用了深度学习pipeline,包括形状完成模块和抓取预测模块。形状完成网络基于VQDIF,并在任意查询点预测空间占用值。抓取预测模块使用了两个阶段架构,首先生成手势使用自然语言模型,然后对每个姿势进行指节配置预测。
  • results: 实验表明,该pipeline能够成功地抓取各种家用物品,只需要单个视角点云图像。整个管道快速,只需要约1秒完成物体形状预测(0.7秒)和生成1000个抓取(0.3秒)。
    Abstract Grasping objects with limited or no prior knowledge about them is a highly relevant skill in assistive robotics. Still, in this general setting, it has remained an open problem, especially when it comes to only partial observability and versatile grasping with multi-fingered hands. We present a novel, fast, and high fidelity deep learning pipeline consisting of a shape completion module that is based on a single depth image, and followed by a grasp predictor that is based on the predicted object shape. The shape completion network is based on VQDIF and predicts spatial occupancy values at arbitrary query points. As grasp predictor, we use our two-stage architecture that first generates hand poses using an autoregressive model and then regresses finger joint configurations per pose. Critical factors turn out to be sufficient data realism and augmentation, as well as special attention to difficult cases during training. Experiments on a physical robot platform demonstrate successful grasping of a wide range of household objects based on a depth image from a single viewpoint. The whole pipeline is fast, taking only about 1 s for completing the object's shape (0.7 s) and generating 1000 grasps (0.3 s).
    摘要 握持无前知 objetcs 是助手 роботиCS中高度相关的技能。然而,在这种通用设定下,这问题仍然是开放问题,特别是当 grasping 是多指手部中的多种 grasping 时。我们提出了一种新的、快速、高准确度的深度学习管道,包括基于单个深度图像的形状完成模块,然后是基于预测对象形状的抓取预测器。形状完成网络基于VQDIF,预测的是在任意查询点的空间占用值。抓取预测器使用我们的两个阶段架构,首先使用回归模型生成手势,然后每个姿势使用指joint配置进行回归。关键因素发现是足够的数据实际和扩展,以及特别关注具有困难的情况的训练。实验在物理 робо臂平台上表明,可以成功地基于单个视角的深度图像抓取各种家用品。整个管道快速,只需0.7秒完成对象的形状(0.3秒)和生成1000个抓取(0.3秒)。

Improving Entropy-Based Test-Time Adaptation from a Clustering View

  • paper_url: http://arxiv.org/abs/2310.20327
  • repo_url: None
  • paper_authors: Guoliang Lin, Hanjiang Lai, Yan Pan, Jian Yin
  • for: 处理域Shift问题,提高模型在测试数据上的表现。
  • methods: 基于测试时间的完全适应(TTA)方法,特别是基于熵(Entropy)的TTA方法(EBTTA),可以很好地解决域Shift问题。
  • results: 通过对EBTTA方法进行cluster视角的解释,提供了更深刻的理解EBTTA的机制,并且提出了一些改进EBTTA的方法,包括Robust label assignment、重量调整和梯度积累,实验结果表明我们的方法可以在多个 dataset 上 achieve consistent improvement。
    Abstract Domain shift is a common problem in the realistic world, where training data and test data follow different data distributions. To deal with this problem, fully test-time adaptation (TTA) leverages the unlabeled data encountered during test time to adapt the model. In particular, Entropy-Based TTA (EBTTA) methods, which minimize the prediction's entropy on test samples, have shown great success. In this paper, we introduce a new perspective on the EBTTA, which interprets these methods from a view of clustering. It is an iterative algorithm: 1) in the assignment step, the forward process of the EBTTA models is the assignment of labels for these test samples, and 2) in the updating step, the backward process is the update of the model via the assigned samples. Based on the interpretation, we can gain a deeper understanding of EBTTA, where we show that the entropy loss would further increase the largest probability. Accordingly, we offer an alternative explanation that why existing EBTTA methods are sensitive to initial assignments, outliers, and batch size. This observation can guide us to put forward the improvement of EBTTA. We propose robust label assignment, weight adjustment, and gradient accumulation to alleviate the above problems. Experimental results demonstrate that our method can achieve consistent improvements on various datasets. Code is provided in the supplementary material.
    摘要 域名shift是现实世界中的一个常见问题,training data和test datafollow different data distributions。为解决这个问题,完全的test-time adaptation(TTA)利用在测试时遇到的无标签数据来适应模型。特别是Entropy-Based TTA(EBTTA)方法,which minimizes the prediction's entropy on test samples, have shown great success. 在这篇论文中,我们介绍了一新的视角对EBTTA,即 interpreting these methods as a view of clustering. It is an iterative algorithm: 1) in the assignment step, the forward process of the EBTTA models is the assignment of labels for these test samples, and 2) in the updating step, the backward process is the update of the model via the assigned samples. Based on the interpretation, we can gain a deeper understanding of EBTTA, where we show that the entropy loss would further increase the largest probability. Accordingly, we offer an alternative explanation that why existing EBTTA methods are sensitive to initial assignments, outliers, and batch size. This observation can guide us to put forward the improvement of EBTTA. We propose robust label assignment, weight adjustment, and gradient accumulation to alleviate the above problems. Experimental results demonstrate that our method can achieve consistent improvements on various datasets. 代码可以在补充材料中找到。

SemanticBoost: Elevating Motion Generation with Augmented Textual Cues

  • paper_url: http://arxiv.org/abs/2310.20323
  • repo_url: https://github.com/blackgold3/SemanticBoost
  • paper_authors: Xin He, Shaoli Huang, Xiaohang Zhan, Chao Wen, Ying Shan
  • for: 本文 targets at addressing the difficulties in generating motions from intricate semantic descriptions, such as insufficient semantic annotations in datasets and weak contextual understanding.
  • methods: 本文提出了一个新的框架,即 SemanticBoost,它同时解决了这两个问题。SemanticBoost 包括一个 Semantic Enhancement module 和一个 Context-Attuned Motion Denoiser (CAMD)。Semantic Enhancement module 从运动数据中提取更多的 semantics,使 dataset 的文本描述更加详细,并确保运动数据和文本描述之间的匹配更加准确,不需要依赖大型语言模型。CAMD 方法可以有效地捕捉上下文信息,并将生成的运动与给定的文本描述相互Alignment。
  • results: 对 Humanml3D 数据集进行实验表明,SemanticBoost 比 auto-regressive-based 技术高效,实现了最新的性能,而且保持了实际和平滑的运动生成质量。
    Abstract Current techniques face difficulties in generating motions from intricate semantic descriptions, primarily due to insufficient semantic annotations in datasets and weak contextual understanding. To address these issues, we present SemanticBoost, a novel framework that tackles both challenges simultaneously. Our framework comprises a Semantic Enhancement module and a Context-Attuned Motion Denoiser (CAMD). The Semantic Enhancement module extracts supplementary semantics from motion data, enriching the dataset's textual description and ensuring precise alignment between text and motion data without depending on large language models. On the other hand, the CAMD approach provides an all-encompassing solution for generating high-quality, semantically consistent motion sequences by effectively capturing context information and aligning the generated motion with the given textual descriptions. Distinct from existing methods, our approach can synthesize accurate orientational movements, combined motions based on specific body part descriptions, and motions generated from complex, extended sentences. Our experimental results demonstrate that SemanticBoost, as a diffusion-based method, outperforms auto-regressive-based techniques, achieving cutting-edge performance on the Humanml3D dataset while maintaining realistic and smooth motion generation quality.
    摘要 当前技术面临着从复杂的 semantic 描述中生成动作的困难,主要是因为数据集中的 semantic 注解不够和 contextual 理解不强。为解决这些问题,我们提出 SemanticBoost 框架,这个框架同时解决了这两个问题。我们的框架包括 semantic 增强模块和 context-attuned motion denoiser (CAMD)。semantic 增强模块从动作数据中提取更多的 semantics,使 dataset 的文本描述更加详细,从而确保文本和动作数据之间的精确对应,不需要依赖于大型语言模型。在另一方面,CAMD 方法提供了一个涵盖全的解决方案,可以生成高质量、semantic 一致的动作序列,通过有效地捕捉 context information 和将生成的动作与给定的文本描述进行对应。与现有方法不同,我们的方法可以生成准确的 orientational 运动、基于specific body part 的合并运动和从复杂、扩展的句子中生成的动作。我们的实验结果表明,SemanticBoost 作为一种 diffusion-based 方法,在 Humanml3D dataset 上表现出了 cutting-edge 性能,同时保持了真实和平滑的动作生成质量。

Theory of Mind in Large Language Models: Examining Performance of 11 State-of-the-Art models vs. Children Aged 7-10 on Advanced Tests

  • paper_url: http://arxiv.org/abs/2310.20320
  • repo_url: None
  • paper_authors: Max J. van Duijn, Bram M. A. van Dijk, Tom Kouwenhoven, Werner de Valk, Marco R. Spruit, Peter van der Putten
  • for: 这个研究的目的是为了探讨大语言模型(LLM)是否具备理解意图和信念的能力,即理论心理(ToM)。
  • methods: 研究使用了11个基础和指导训练的 LLM,测试它们在 beyond false-belief paradigm 中的能力,包括非文字语言使用和 recursivity 的意向性。
  • results: 研究发现,GPT 家族的指导训练 LLM 表现出色,常常也超过了7-10岁的儿童的表现。基础 LLM 通常无法解决 ToM 任务,即使使用特定的提示。研究认为,语言和 ToM 的演化和发展可能帮助解释 instruciton-tuning 添加了什么:奖励合作交流,考虑到对话者和场景。
    Abstract To what degree should we ascribe cognitive capacities to Large Language Models (LLMs), such as the ability to reason about intentions and beliefs known as Theory of Mind (ToM)? Here we add to this emerging debate by (i) testing 11 base- and instruction-tuned LLMs on capabilities relevant to ToM beyond the dominant false-belief paradigm, including non-literal language usage and recursive intentionality; (ii) using newly rewritten versions of standardized tests to gauge LLMs' robustness; (iii) prompting and scoring for open besides closed questions; and (iv) benchmarking LLM performance against that of children aged 7-10 on the same tasks. We find that instruction-tuned LLMs from the GPT family outperform other models, and often also children. Base-LLMs are mostly unable to solve ToM tasks, even with specialized prompting. We suggest that the interlinked evolution and development of language and ToM may help explain what instruction-tuning adds: rewarding cooperative communication that takes into account interlocutor and context. We conclude by arguing for a nuanced perspective on ToM in LLMs.
    摘要 哪些程度应归属大语言模型(LLM)的认知能力,如理解目的和信念的理论心(ToM)?我们在这个emerging debate中添加了一些测试基于11个基础和指导调参的LLM,包括 beyond the dominant false-belief paradigm的非literal语言使用和 recursive intentionality; (ii)使用 newly rewritten versions of standardized tests to gauge LLMs' robustness; (iii) prompting and scoring for open besides closed questions; and (iv) benchmarking LLM performance against that of children aged 7-10 on the same tasks.我们发现,基于GPT家族的指导调参LLMs表现出色,经常也超过了儿童的表现。基础LLMs几乎无法解决ToM任务,即使使用特殊的提示。我们建议认为,语言和ToM的演化和发展可能帮助解释 what instruction-tuning adds:奖励合作交流,考虑到对话者和上下文。我们 conclude by advocating for a nuanced perspective on ToM in LLMs。

Causal Interpretation of Self-Attention in Pre-Trained Transformers

  • paper_url: http://arxiv.org/abs/2310.20307
  • repo_url: None
  • paper_authors: Raanan Y. Rohekar, Yaniv Gurwicz, Shami Nisimov
  • for: 本研究旨在提供一种 causal 解释自注意 Mechanism 在 Transformer 神经网络架构中的含义,并使用这种解释来学习输入序列中的 causal 结构。
  • methods: 本研究使用了一种基于自注意的 Conditional Independence 算法,通过计算深度注意层中表示的部分相关性来学习输入序列的 causal 结构。
  • results: 研究发现,使用这种方法可以在不知情的情况下学习输入序列的 causal 结构,并且可以使用现有的约束基于算法来完成零基础 causal 探索。 Additionally, the authors demonstrate the effectiveness of their method on two tasks: sentiment classification (NLP) and recommendation.
    Abstract We propose a causal interpretation of self-attention in the Transformer neural network architecture. We interpret self-attention as a mechanism that estimates a structural equation model for a given input sequence of symbols (tokens). The structural equation model can be interpreted, in turn, as a causal structure over the input symbols under the specific context of the input sequence. Importantly, this interpretation remains valid in the presence of latent confounders. Following this interpretation, we estimate conditional independence relations between input symbols by calculating partial correlations between their corresponding representations in the deepest attention layer. This enables learning the causal structure over an input sequence using existing constraint-based algorithms. In this sense, existing pre-trained Transformers can be utilized for zero-shot causal-discovery. We demonstrate this method by providing causal explanations for the outcomes of Transformers in two tasks: sentiment classification (NLP) and recommendation.
    摘要 我们提出了对Transformer神经网络架构的自注意力做出 causal 解释。我们视自注意力为一个计算 structural equation model 的机制,这个模型可以在特定上下文中被解释为输入序列上的字符(token)之间的 causal 结构。important 的是,这个解释在具有潜在干扰者的情况下仍然有效。根据这个解释,我们可以通过计算深层注意层中对应的表现之间的偏相关性来估计输入序列中ymbol之间的 conditional independence 关系。这使得可以使用现有的几何约束基本的算法来学习输入序列的 causal 结构。在这个意义上,现有的预训Transformers可以被利用来进行零件 causal-发现。我们在两个任务中进行了实际的示例:情感分类(NLP)和推荐。

Revolutionizing Global Food Security: Empowering Resilience through Integrated AI Foundation Models and Data-Driven Solutions

  • paper_url: http://arxiv.org/abs/2310.20301
  • repo_url: None
  • paper_authors: Mohamed R. Shoaib, Heba M. Emara, Jun Zhao
  • for: 本研究旨在探讨基于不同数据类型的人工智能基础模型在各种食品安全应用中的整合,以超越现有的深度学习和机器学习方法的局限性。
  • methods: 本研究使用多spectral镜像、气象数据、土壤属性、历史记录和高分辨率卫星图像等多种数据类型,应用基于AI的基础模型来解决各种食品安全问题。
  • results: 研究表明,基于AI的基础模型可以准确预测食品产量、改善资源分配和支持 Informed决策,这些模型在解决全球食品安全问题上发挥了重要作用,为实现可持续可靠的食品未来做出了重要贡献。
    Abstract Food security, a global concern, necessitates precise and diverse data-driven solutions to address its multifaceted challenges. This paper explores the integration of AI foundation models across various food security applications, leveraging distinct data types, to overcome the limitations of current deep and machine learning methods. Specifically, we investigate their utilization in crop type mapping, cropland mapping, field delineation and crop yield prediction. By capitalizing on multispectral imagery, meteorological data, soil properties, historical records, and high-resolution satellite imagery, AI foundation models offer a versatile approach. The study demonstrates that AI foundation models enhance food security initiatives by providing accurate predictions, improving resource allocation, and supporting informed decision-making. These models serve as a transformative force in addressing global food security limitations, marking a significant leap toward a sustainable and secure food future.
    摘要 Translation notes:* "Food security" is translated as "食品安全" (shí pin an quan)* "global concern" is translated as "全球关注" (quán jiāo guān zhù)* "precise and diverse data-driven solutions" is translated as "精准多样数据驱动解决方案" (jīng zhù duō yàng shù yì jīng yì)* "AI foundation models" is translated as "人工智能基础模型" (rén gōng zhì neng jī bào mó del)* "crop type mapping" is translated as "作物类型Mapping" (zuò wù xìng bǐng mapping)* "cropland mapping" is translated as "耕地Mapping" (jiāng dì mapping)* "field delineation" is translated as "场地划分" (chǎng dì hú fēn)* "crop yield prediction" is translated as "作物产量预测" (zuò wù chǎng liàng yù jì)* "multispectral imagery" is translated as "多spectral成像" (duō yǐng chéng xiàng)* "meteorological data" is translated as "气象数据" (qì xiàng shù yì)* "soil properties" is translated as "土壤属性" (tǔ zhōng fù xìng)* "historical records" is translated as "历史记录" (lì shǐ jì lè)* "high-resolution satellite imagery" is translated as "高分辨率卫星成像" (gāo fēn bìng ràng wèi xīng chéng xiàng)

Sample-Efficient and Safe Deep Reinforcement Learning via Reset Deep Ensemble Agents

  • paper_url: http://arxiv.org/abs/2310.20287
  • repo_url: None
  • paper_authors: Woojun Kim, Yongjae Shin, Jongeui Park, Youngchul Sung
  • for: 解决深度强化学习中的偏袋问题,提高样本效率和安全性。
  • methods: 使用深度集成学习来改进强化学习中的重置方法,以提高样本效率和安全性。
  • results: 经过多种实验,包括安全强化学习领域的实验,研究发现该方法可以提高样本效率和安全性。
    Abstract Deep reinforcement learning (RL) has achieved remarkable success in solving complex tasks through its integration with deep neural networks (DNNs) as function approximators. However, the reliance on DNNs has introduced a new challenge called primacy bias, whereby these function approximators tend to prioritize early experiences, leading to overfitting. To mitigate this primacy bias, a reset method has been proposed, which performs periodic resets of a portion or the entirety of a deep RL agent while preserving the replay buffer. However, the use of the reset method can result in performance collapses after executing the reset, which can be detrimental from the perspective of safe RL and regret minimization. In this paper, we propose a new reset-based method that leverages deep ensemble learning to address the limitations of the vanilla reset method and enhance sample efficiency. The proposed method is evaluated through various experiments including those in the domain of safe RL. Numerical results show its effectiveness in high sample efficiency and safety considerations.
    摘要 深度强化学习(深度RL)通过与深度神经网络(DNNs)结合,已经实现了复杂任务的出色解决。然而,这种依赖DNNs带来了一个新的挑战—— primacy bias,这使得这些函数近似器倾向于把早期经验优先,导致过拟合。为了解决这种 primacy bias,一种重置方法已经提出,该方法在深度RLAgent中 periodic 重置一部分或整个的 Agent,保留缓存。然而,使用重置方法可能会导致执行重置后的性能崩溃,这可能是安全RL和 regret 最小化的视角下不利的。在这篇论文中,我们提出了一种新的重置基于方法,利用深度集成学习来解决重置方法的局限性,提高样本效率。我们通过多个实验,包括安全RL领域的实验,证明了该方法的效果。数据结果表明,该方法可以在高样本效率和安全考虑下达到出色的效果。

AutoMixer for Improved Multivariate Time-Series Forecasting on Business and IT Observability Data

  • paper_url: http://arxiv.org/abs/2310.20280
  • repo_url: None
  • paper_authors: Santosh Palaskar, Vijay Ekambaram, Arindam Jati, Neelamadhav Gantayat, Avirup Saha, Seema Nagar, Nam H. Nguyen, Pankaj Dayama, Renuka Sindhgatta, Prateeti Mohapatra, Harshit Kumar, Jayant Kalagnanam, Nandyala Hemachandra, Narayan Rangaraj
  • for: This paper is written for improving the accuracy of business key performance indicator (Biz-KPI) forecasting, which is essential for enhancing business efficiency and revenue.
  • methods: The paper introduces a novel approach called AutoMixer, which combines channel-compressed pretraining and finetuning with a time-series Foundation Model (FM) to improve the accuracy of multivariate time series forecasting.
  • results: The paper demonstrates through detailed experiments and dashboard analytics that AutoMixer consistently improves the forecasting accuracy of Biz-KPIs by 11-15%, providing actionable business insights and enhancing decision-making.
    Abstract The efficiency of business processes relies on business key performance indicators (Biz-KPIs), that can be negatively impacted by IT failures. Business and IT Observability (BizITObs) data fuses both Biz-KPIs and IT event channels together as multivariate time series data. Forecasting Biz-KPIs in advance can enhance efficiency and revenue through proactive corrective measures. However, BizITObs data generally exhibit both useful and noisy inter-channel interactions between Biz-KPIs and IT events that need to be effectively decoupled. This leads to suboptimal forecasting performance when existing multivariate forecasting models are employed. To address this, we introduce AutoMixer, a time-series Foundation Model (FM) approach, grounded on the novel technique of channel-compressed pretrain and finetune workflows. AutoMixer leverages an AutoEncoder for channel-compressed pretraining and integrates it with the advanced TSMixer model for multivariate time series forecasting. This fusion greatly enhances the potency of TSMixer for accurate forecasts and also generalizes well across several downstream tasks. Through detailed experiments and dashboard analytics, we show AutoMixer's capability to consistently improve the Biz-KPI's forecasting accuracy (by 11-15\%) which directly translates to actionable business insights.
    摘要 企业过程效率取决于商业关键性表现指标 (Biz-KPI),而 IT 失败可能会对其产生负面影响。商业和信息技术观察 (BizITObs) 数据将 Biz-KPI 和 IT 事件通道组合成为多变量时间系列数据,可以预测 Biz-KPI 的发展趋势。然而,BizITObs 数据通常具有 Biz-KPI 和 IT 事件之间有用和噪声的交互,需要有效地隔离。这会导致使用现有的多变量预测模型时,预测性能受到限制。为解决这一问题,我们介绍 AutoMixer,一种基于时间序列基模型 (FM) 的方法,借鉴了频率压缩预训练和融合TSMixer模型。AutoMixer 可以强化 TSMixer 模型的精度预测,同时也能够在多个下游任务中广泛应用。通过详细的实验和达标分析,我们示出 AutoMixer 可以不断提高 Biz-KPI 的预测精度(11-15%),直接对企业做出有用的业务指导。

Constructing Sample-to-Class Graph for Few-Shot Class-Incremental Learning

  • paper_url: http://arxiv.org/abs/2310.20268
  • repo_url: None
  • paper_authors: Fuyuan Hu, Jian Zhang, Fan Lyu, Linyan Li, Fenglei Xu
  • for: 提高ew-shot类增量学习(FSCIL)模型的学习效果,使其可以从少量数据中学习新的概念,而不会忘记过去的类。
  • methods: 提出Sample-to-Class(S2C)图学习方法,包括Sample-level Graph Network(SGN)和Class-level Graph Network(CGN),以及一种多Stage训练策略。
  • results: 在三个 популяр的benchmark数据集上进行实验,表明我们的方法可以明显超过基eline,并在FSCIL中设置新的州OF-the-art结果。
    Abstract Few-shot class-incremental learning (FSCIL) aims to build machine learning model that can continually learn new concepts from a few data samples, without forgetting knowledge of old classes. The challenges of FSCIL lies in the limited data of new classes, which not only lead to significant overfitting issues but also exacerbates the notorious catastrophic forgetting problems. As proved in early studies, building sample relationships is beneficial for learning from few-shot samples. In this paper, we promote the idea to the incremental scenario, and propose a Sample-to-Class (S2C) graph learning method for FSCIL. Specifically, we propose a Sample-level Graph Network (SGN) that focuses on analyzing sample relationships within a single session. This network helps aggregate similar samples, ultimately leading to the extraction of more refined class-level features. Then, we present a Class-level Graph Network (CGN) that establishes connections across class-level features of both new and old classes. This network plays a crucial role in linking the knowledge between different sessions and helps improve overall learning in the FSCIL scenario. Moreover, we design a multi-stage strategy for training S2C model, which mitigates the training challenges posed by limited data in the incremental process. The multi-stage training strategy is designed to build S2C graph from base to few-shot stages, and improve the capacity via an extra pseudo-incremental stage. Experiments on three popular benchmark datasets show that our method clearly outperforms the baselines and sets new state-of-the-art results in FSCIL.
    摘要 非常多shot类增量学习(FSCIL)目的是建立一个机器学习模型,可以从少量数据样本中不断学习新的概念,而不会忘记过去的类知识。 however,有限的新类数据不仅会导致重要的拟合问题,还会把著名的忘记问题加剧。在这篇论文中,我们推广了这个想法到增量场景,并提出了一种Sample-to-Class(S2C)图学习方法。具体来说,我们提出了一种Sample-level Graph Network(SGN),它专门关注在单个会话中的样本关系。这个网络帮助综合类似的样本,从而提取更加精细的类层特征。然后,我们提出了一种Class-level Graph Network(CGN),它建立了新和旧类特征之间的连接。这个网络在不同会话之间的知识连接中扮演着关键的角色,帮助改善FSCIL场景中的总体学习。此外,我们设计了一种多 Stage 训练策略,以解决增量过程中有限数据训练的挑战。这种多 Stage 训练策略是建立S2C图从基础到几shot阶段,然后通过额外的 Pseudo-增量阶段进行改进。在三个流行的标准 benchmark 数据集上进行实验,我们的方法明显超过了基eline,并在FSCIL场景中设置了新的状态纪录。

Beyond Average Return in Markov Decision Processes

  • paper_url: http://arxiv.org/abs/2310.20266
  • repo_url: None
  • paper_authors: Alexandre Marthe, Aurélien Garivier, Claire Vernade
  • for: 这篇论文探讨了markov决策过程中可以有效地计算和优化的奖励函数。
  • methods: 论文使用动态规划(DP)和分布式奖励学习(DistRL)来研究奖励函数。
  • results: 论文发现只有通用均值可以被优化,其他函数则只能估计。论文还提供了估计错误 bound 和这种方法的潜在应用和局限性。
    Abstract What are the functionals of the reward that can be computed and optimized exactly in Markov Decision Processes? In the finite-horizon, undiscounted setting, Dynamic Programming (DP) can only handle these operations efficiently for certain classes of statistics. We summarize the characterization of these classes for policy evaluation, and give a new answer for the planning problem. Interestingly, we prove that only generalized means can be optimized exactly, even in the more general framework of Distributional Reinforcement Learning (DistRL).DistRL permits, however, to evaluate other functionals approximately. We provide error bounds on the resulting estimators, and discuss the potential of this approach as well as its limitations.These results contribute to advancing the theory of Markov Decision Processes by examining overall characteristics of the return, and particularly risk-conscious strategies.
    摘要 <>将文本翻译成简化中文。<>马可夫决策过程(Markov Decision Processes,MDP)中可以计算和优化的奖励功能有哪些?在无限远程、未折扣设定下,动态计划(Dynamic Programming,DP)只能有效处理某些类型的统计数据。我们总结这些类型的特征,并给出一个新的答案 для计划问题。有趣的是,我们证明只有泛化平均才能够准确地优化,即使在更一般的分布式奖励学习(Distributional Reinforcement Learning,DistRL)框架下。DistRL 允许评估其他函数的估计值,并提供误差 bound для这些估计值。我们讨论这种方法的潜在和局限性,并对这些结果的推广和应用进行了评论。这些研究Result contributes to the development of Markov Decision Processes theory by examining the overall characteristics of the return, and particularly risk-conscious strategies.

Artificial Intelligence for reverse engineering: application to detergents using Raman spectroscopy

  • paper_url: http://arxiv.org/abs/2310.20254
  • repo_url: None
  • paper_authors: Pedro Marote, Marie Martin, Anne Bonhomme, Pierre Lantéri, Yohann Clément
  • For: This paper aims to develop a method for quickly assessing the potential toxicity of commercial products, particularly detergent products, using digital tools and analytical techniques.* Methods: The authors use a combination of spectral databases, mixture databases, experimental design, chemometrics, and machine learning algorithms to identify the constituents of the mixture and estimate its composition. They also use various sample preparation methods, such as raw samples and diluted/concentrated samples, and Raman spectroscopy to analyze the samples.* Results: The authors are able to identify the constituents of the detergent products and estimate their composition using the proposed method. This method can be applied to other matrices and industries for pollutant identification and contamination assessment, leading to time savings and improved quality control.
    Abstract The reverse engineering of a complex mixture, regardless of its nature, has become significant today. Being able to quickly assess the potential toxicity of new commercial products in relation to the environment presents a genuine analytical challenge. The development of digital tools (databases, chemometrics, machine learning, etc.) and analytical techniques (Raman spectroscopy, NIR spectroscopy, mass spectrometry, etc.) will allow for the identification of potential toxic molecules. In this article, we use the example of detergent products, whose composition can prove dangerous to humans or the environment, necessitating precise identification and quantification for quality control and regulation purposes. The combination of various digital tools (spectral database, mixture database, experimental design, Chemometrics / Machine Learning algorithm{\ldots}) together with different sample preparation methods (raw sample, or several concentrated / diluted samples) Raman spectroscopy, has enabled the identification of the mixture's constituents and an estimation of its composition. Implementing such strategies across different analytical tools can result in time savings for pollutant identification and contamination assessment in various matrices. This strategy is also applicable in the industrial sector for product or raw material control, as well as for quality control purposes.
    摘要 现代化的复杂混合物反工程技术已经在今天变得非常重要。快速评估新商品的环境潜在危险性是一项实际分析挑战。通过数字工具(数据库、化学ometrics、机器学习等)和分析技术(拉曼谱、near infrared谱、质谱等)可以识别潜在危险分子。在本文中,我们使用洗剂产品为例,其组成物可能对人类或环境构成威胁,需要精准的识别和量化,以确保质量控制和法规要求。通过不同的数字工具(spectral Database、mixture Database、实验设计、化学ometrics / 机器学习算法等)和不同的样本准备方法(Raw sample、多种浓缩/分解样本),使用拉曼谱技术,可以识别混合物的成分和估算其组成。实施这些策略在不同的分析工具上可以节省污染物识别和污染评估的时间。这种策略也适用于工业部门的产品或原材料控制、质量控制等。

Diversified Node Sampling based Hierarchical Transformer Pooling for Graph Representation Learning

  • paper_url: http://arxiv.org/abs/2310.20250
  • repo_url: None
  • paper_authors: Gaichao Li, Jinsong Chen, John E. Hopcroft, Kun He
  • for: Graph Transformer Pooling (GTPool) aims to improve graph pooling methods by capturing long-range pairwise interactions and selecting more representative nodes.
  • methods: GTPool uses a scoring module based on the self-attention mechanism to measure the importance of nodes, and a diversified sampling method called Roulette Wheel Sampling (RWS) to select nodes from different scoring intervals.
  • results: GTPool outperforms existing popular graph pooling methods on 11 benchmark datasets, effectively obtaining long-range information and selecting more representative nodes.Here’s the full summary in Simplified Chinese:
  • for: graph下采用Transformer Pooling(GTPool)方法,以提高graph pooling方法的表现,包括捕捉长距离对应关系和选择更加代表性的节点。
  • methods: GTPool使用基于自注意机制的分数模块,以全面评估节点的重要性,并使用多样化的采样方法——扭轮采样(RWS),以遍及不同的分数 интерVAL中的节点。
  • results: GTPool在11个标准测试集上表现出色,胜过现有的流行graph pooling方法,fficiently obtain long-range information和选择更加代表性的节点。
    Abstract Graph pooling methods have been widely used on downsampling graphs, achieving impressive results on multiple graph-level tasks like graph classification and graph generation. An important line called node dropping pooling aims at exploiting learnable scoring functions to drop nodes with comparatively lower significance scores. However, existing node dropping methods suffer from two limitations: (1) for each pooled node, these models struggle to capture long-range dependencies since they mainly take GNNs as the backbones; (2) pooling only the highest-scoring nodes tends to preserve similar nodes, thus discarding the affluent information of low-scoring nodes. To address these issues, we propose a Graph Transformer Pooling method termed GTPool, which introduces Transformer to node dropping pooling to efficiently capture long-range pairwise interactions and meanwhile sample nodes diversely. Specifically, we design a scoring module based on the self-attention mechanism that takes both global context and local context into consideration, measuring the importance of nodes more comprehensively. GTPool further utilizes a diversified sampling method named Roulette Wheel Sampling (RWS) that is able to flexibly preserve nodes across different scoring intervals instead of only higher scoring nodes. In this way, GTPool could effectively obtain long-range information and select more representative nodes. Extensive experiments on 11 benchmark datasets demonstrate the superiority of GTPool over existing popular graph pooling methods.
    摘要 graph pooling方法在下采集图时得到了广泛的应用,并在多种图级任务中达到了很好的结果,如图 классификация和图生成。一种重要的笔Push called node dropping pooling寻求通过学习可读的分数函数来Drop nodes with relatively lower significance scores。然而,现有的节点排除方法受到两个限制:(1)为每个卷积节点,这些模型很难Capture long-range dependent relationships,因为它们主要使用GNN作为后备;(2)只Pooling the highest-scoring nodes tends to preserve similar nodes, thus discarding the abundant information of low-scoring nodes。为了解决这些问题,我们提出了一种图变换池方法,称为GTPool,该方法将Transformer卷积推理引入节点排除池化,以高效地捕捉长范围对应关系并同时采样多样化的节点。具体来说,我们设计了一个分数模块,基于自注意机制,可以同时考虑全局上下文和局部上文,评估节点的重要性更全面。GTPool还使用了一种多样化采样方法,称为Roulette Wheel Sampling (RWS),可以自由地保留节点在不同的分数间,而不是仅仅保留高分节点。这样,GTPool可以有效地获取长范围信息和选择更代表性的节点。我们在11个标准测试集上进行了广泛的实验,并证明了GTPool在现有的流行graph pooling方法的基础上具有显著的优势。

Breaking Language Barriers in Multilingual Mathematical Reasoning: Insights and Observations

  • paper_url: http://arxiv.org/abs/2310.20246
  • repo_url: https://github.com/nuochenpku/MathOctopus
  • paper_authors: Nuo Chen, Zinan Zheng, Ning Wu, Linjun Shou, Ming Gong, Yangqiu Song, Dongmei Zhang, Jia Li
  • for: 这篇论文主要针对的是开发多语言数学逻辑学习模型(xMR),以解决现有研究主要集中在单语言数学逻辑学习模型(LLM)的问题。
  • methods: 这篇论文使用了翻译来构建了首个多语言数学逻辑指导数据集MGSM8KInstruct,包括了十种不同语言,从而解决了训练数据的缺乏问题。同时,提出了不同的训练策略来建立强大的xMR LLMs,并命名为MathOctopus。
  • results: 这篇论文的实验结果表明,MathOctopus-13B模型在MGSM测试集上达到了47.6%的准确率,超过了ChatGPT 46.3%的表现。此外,通过广泛的实验,发现了一些重要的观察和发现,如在多语言上扩展拒绝采样策略可以提高模型性能,并且使用多语言 parallel corpora进行数学终端精度训练可以提高模型的多语言和单语言性能。
    Abstract Existing research predominantly focuses on developing powerful language learning models (LLMs) for mathematical reasoning within monolingual languages, with few explorations in preserving efficacy in a multilingual context. To bridge this gap, this paper pioneers exploring and training powerful Multilingual Math Reasoning (xMR) LLMs. Firstly, by utilizing translation, we construct the first multilingual math reasoning instruction dataset, MGSM8KInstruct, encompassing ten distinct languages, thus addressing the issue of training data scarcity in xMR tasks. Based on the collected dataset, we propose different training strategies to build powerful xMR LLMs, named MathOctopus, notably outperform conventional open-source LLMs and exhibit superiority over ChatGPT in few-shot scenarios. Notably, MathOctopus-13B reaches 47.6% accuracy which exceeds ChatGPT 46.3% on MGSM testset. Beyond remarkable results, we unearth several pivotal observations and insights from extensive experiments: (1) When extending the rejection sampling strategy to the multilingual context, it proves effective for model performances, albeit limited. (2) Employing parallel corpora for math Supervised Fine-Tuning (SFT) across multiple languages not only significantly enhances model performance multilingually but also elevates their monolingual performance. This indicates that crafting multilingual corpora can be regarded as a vital strategy for enhancing model performance in a specific language, especially in mathematical reasoning tasks. For instance, MathOctopus-7B improves its counterparts that trained on English from 42.2% to 50.8% on GSM8K testset.
    摘要 现有研究主要集中于开发强大的语言学习模型(LLMs),以便在单语言中提高数学逻辑能力。然而,很少有人研究在多语言上保持效果。为了填补这一漏洞,本文首次尝试了开发多语言数学逻辑模型(xMR)。我们首先通过翻译,构建了包含十种不同语言的多语言数学逻辑指导集(MGSM8KInstruct),解决了在xMR任务中训练数据的缺乏问题。基于收集的数据集,我们提出了不同的训练策略,用于建立强大的xMR LLMs,称为MathOctopus,与已有的开源LLMs相比,具有显著的优势,并在几个语言上展示出了对ChatGPT的超越。尤其是MathOctopus-13B在MGSM测试集上达到了47.6%的准确率,超过了ChatGPT的46.3%。我们也从广泛的实验中发现了一些重要的观察和发现:(1)在多语言上扩展拒绝采样策略,虽然有限,仍然有效。(2)在多语言中使用平行 corpus для数学监督精度提升(SFT),不仅可以在多语言中显著提高模型性能,还可以提高单语言中的模型性能。这表明,制作多语言 corpus 可以被视为一种重要的提高模型性能的策略,特别是在数学逻辑任务中。例如,MathOctopus-7B在GSM8K测试集上由42.2%提高到50.8%。

Breathing Life into Faces: Speech-driven 3D Facial Animation with Natural Head Pose and Detailed Shape

  • paper_url: http://arxiv.org/abs/2310.20240
  • repo_url: None
  • paper_authors: Wei Zhao, Yijun Wang, Tianyu He, Lianying Yin, Jianxin Lin, Xin Jin
    for:本研究旨在实现具有自然和精确的语音驱动3D面部动画,以提高人工智能创作中的人物表现。methods:我们引入了VividTalker框架,其主要包括分解面部动画为头部pose和嘴部动画,并将它们分别传递到独立的零域空间中。然后,这些特征通过窗口基于Transformer架构进行autoregressive生成。results:实验结果显示,VividTalker在语音驱动3D面部动画方面具有较高的自然和精确性,并且能够实现自然的头部pose和面部细节。
    Abstract The creation of lifelike speech-driven 3D facial animation requires a natural and precise synchronization between audio input and facial expressions. However, existing works still fail to render shapes with flexible head poses and natural facial details (e.g., wrinkles). This limitation is mainly due to two aspects: 1) Collecting training set with detailed 3D facial shapes is highly expensive. This scarcity of detailed shape annotations hinders the training of models with expressive facial animation. 2) Compared to mouth movement, the head pose is much less correlated to speech content. Consequently, concurrent modeling of both mouth movement and head pose yields the lack of facial movement controllability. To address these challenges, we introduce VividTalker, a new framework designed to facilitate speech-driven 3D facial animation characterized by flexible head pose and natural facial details. Specifically, we explicitly disentangle facial animation into head pose and mouth movement and encode them separately into discrete latent spaces. Then, these attributes are generated through an autoregressive process leveraging a window-based Transformer architecture. To augment the richness of 3D facial animation, we construct a new 3D dataset with detailed shapes and learn to synthesize facial details in line with speech content. Extensive quantitative and qualitative experiments demonstrate that VividTalker outperforms state-of-the-art methods, resulting in vivid and realistic speech-driven 3D facial animation.
    摘要 创建生动的语音驱动3D人脸动画需要自然和精准的声音输入和脸部表达的同步。然而,现有的方法仍然无法渲染具有灵活头姿和自然的脸部细节(例如皱纹)。这种限制主要归结于两点:1)收集详细3D人脸形状的训练集是非常昂贵的。这种训练集缺乏细节的形状标注,使得模型受到较少的表达动画训练。2)与口型运动相比,头姿与语音内容的相关性远低。因此,同时模型口型运动和头姿的控制会增加无法控制的脸部运动。为解决这些挑战,我们介绍了VividTalker,一个新的框架,用于实现语音驱动3D人脸动画,具有灵活的头姿和自然的脸部细节。我们将 facial animation 分解成头姿和口型运动两个部分,并将它们分别编码到独立的隐藏空间中。然后,通过窗口基本的 transformer 架构进行核算,生成这些特征。为了增加3D人脸动画的丰富性,我们构建了一个新的3D数据集,包含细节的形状,并学习在语音内容上Synthesize facial details。广泛的量化和质量测试表明,VividTalker 超过了当前的方法,实现了生动和真实的语音驱动3D人脸动画。

VisPercep: A Vision-Language Approach to Enhance Visual Perception for People with Blindness and Low Vision

  • paper_url: http://arxiv.org/abs/2310.20225
  • repo_url: None
  • paper_authors: Yu Hao, Fan Yang, Hao Huang, Shuaihang Yuan, Sundeep Rangan, John-Ross Rizzo, Yao Wang, Yi Fang
  • for: 这个研究是为了帮助潦积视力和低视力人群(pBLV)更好地认识不熟悉的环境,并提供更加精确的对象识别和障碍物检测。
  • methods: 这个方法利用了一个大型视力语言模型,将捕捉到的图像与一个大型图像标注模型(RAM)结合,以提供详细和全面的环境描述,并为pBLV提供障碍物检测和预警。
  • results: 实验结果表明,该方法可以准确地识别对象,并为pBLV提供有用的环境描述和障碍物检测。
    Abstract People with blindness and low vision (pBLV) encounter substantial challenges when it comes to comprehensive scene recognition and precise object identification in unfamiliar environments. Additionally, due to the vision loss, pBLV have difficulty in accessing and identifying potential tripping hazards on their own. In this paper, we present a pioneering approach that leverages a large vision-language model to enhance visual perception for pBLV, offering detailed and comprehensive descriptions of the surrounding environments and providing warnings about the potential risks. Our method begins by leveraging a large image tagging model (i.e., Recognize Anything (RAM)) to identify all common objects present in the captured images. The recognition results and user query are then integrated into a prompt, tailored specifically for pBLV using prompt engineering. By combining the prompt and input image, a large vision-language model (i.e., InstructBLIP) generates detailed and comprehensive descriptions of the environment and identifies potential risks in the environment by analyzing the environmental objects and scenes, relevant to the prompt. We evaluate our approach through experiments conducted on both indoor and outdoor datasets. Our results demonstrate that our method is able to recognize objects accurately and provide insightful descriptions and analysis of the environment for pBLV.
    摘要 人们 WITH 视障和低视力 (pBLV) 在不熟悉环境中实际遇到严重的挑战,包括缺乏实际视觉识别和环境探索。此外,由于视力损害,pBLV 很难自主检测和识别可能的滑块障碍。在这篇论文中,我们提出了一个创新的方法,利用大量视力语言模型来增强pBLV的视觉感知,提供精确和详细的环境描述,并给出环境中可能的障碍警告。我们的方法开始由一个大量图像标签模型(Recognize Anything (RAM))来识别摄取的图像中的所有通用物品。然后,认知结果和用户查询被融合为特定 для pBLV 的问题,使用提问工程学。通过融合问题和输入图像,一个大量视力语言模型(InstructBLIP)产生了精确和详细的环境描述,并通过分析环境物品和场景,对应用者提供有用的资讯和分析。我们通过实验证明了我们的方法可以对实际图像进行正确的识别和提供有用的环境描述和障碍警告。

Choose A Table: Tensor Dirichlet Process Multinomial Mixture Model with Graphs for Passenger Trajectory Clustering

  • paper_url: http://arxiv.org/abs/2310.20224
  • repo_url: None
  • paper_authors: Ziyue Li, Hao Yan, Chen Zhang, Lijun Sun, Wolfgang Ketter, Fugee Tsung
  • for: 本研究旨在提出一种基于 trajectory 记录的乘客分 clustering 方法,以解决现有方法不能轻松地对乘客进行分 clustering 的问题,即乘客旅行记录具有层次结构和多维信息,而且现有方法需要准确指定分 clustering 数量。
  • methods: 本研究提出了一种新的tensor Dirichlet Process Multinomial Mixture model with graphs(TDPMM-G),该模型能够保持乘客旅行记录的层次结构,并在一步式的方式中对多维信息进行分 clustering,同时自动确定分 clustering 数量。图像空间中的 semantic 邻居关系也被利用于社区探测中。
  • results: 在基于香港地铁乘客数据的案例研究中,TDPMM-G 模型能够自动地确定分 clustering 数量,并提供了更好的分 clustering 结果, measured by within-cluster compactness 和 cross-cluster separateness。代码可以在https://github.com/bonaldli/TensorDPMM-G 上获取。
    Abstract Passenger clustering based on trajectory records is essential for transportation operators. However, existing methods cannot easily cluster the passengers due to the hierarchical structure of the passenger trip information, including multiple trips within each passenger and multi-dimensional information about each trip. Furthermore, existing approaches rely on an accurate specification of the clustering number to start. Finally, existing methods do not consider spatial semantic graphs such as geographical proximity and functional similarity between the locations. In this paper, we propose a novel tensor Dirichlet Process Multinomial Mixture model with graphs, which can preserve the hierarchical structure of the multi-dimensional trip information and cluster them in a unified one-step manner with the ability to determine the number of clusters automatically. The spatial graphs are utilized in community detection to link the semantic neighbors. We further propose a tensor version of Collapsed Gibbs Sampling method with a minimum cluster size requirement. A case study based on Hong Kong metro passenger data is conducted to demonstrate the automatic process of cluster amount evolution and better cluster quality measured by within-cluster compactness and cross-cluster separateness. The code is available at https://github.com/bonaldli/TensorDPMM-G.
    摘要 passenger clustering based on trajectory records 是交通运营商必备的。然而,现有方法难以对乘客进行聚类,因为乘客旅行记录具有层次结构,包括每个乘客内部有多次旅行和多维信息。此外,现有方法需要准确指定聚类数量开始。最后,现有方法不考虑地理 semantic graph 和功能相似性 междуlocation。本文提出了一种新的 tensor Dirichlet Process Multinomial Mixture model with graphs,可以保持多维旅行记录的层次结构,并在一步性的方式中聚类,并且可以自动确定聚类数量。使用地理 graph 进行社区检测,以链接 semantic 邻居。我们还提出了tensor version of Collapsed Gibbs Sampling method with minimum cluster size requirement。一个基于香港地铁乘客数据的案例研究,以示出自动确定聚类数量的过程和更好的聚类质量(内部紧凑性和跨聚类分离度)。代码可以在https://github.com/bonaldli/TensorDPMM-G中找到。

A Systematic Review for Transformer-based Long-term Series Forecasting

  • paper_url: http://arxiv.org/abs/2310.20218
  • repo_url: None
  • paper_authors: Liyilei Su, Xumin Zuo, Rui Li, Xin Wang, Heng Zhao, Bingding Huang
  • for: 这篇论文主要是为了探讨深度学习在时间序列预测(TSF)中的应用和进步。
  • methods: 论文使用了各种变体的transformer架构,以提高长期时间序列预测(LTSF)任务的处理能力。
  • results: 论文提供了一个全面的时间序列预测 dataset和相关的评价指标,以及有价值的建议和技巧 для在时间序列分析中训练 transformer。Here’s the full text in Simplified Chinese:
  • for: 这篇论文主要是为了探讨深度学习在时间序列预测(TSF)中的应用和进步。
  • methods: 论文使用了各种变体的transformer架构,以提高长期时间序列预测(LTSF)任务的处理能力。
  • results: 论文提供了一个全面的时间序列预测 dataset和相关的评价指标,以及有价值的建议和技巧 для在时间序列分析中训练 transformer。
    Abstract The emergence of deep learning has yielded noteworthy advancements in time series forecasting (TSF). Transformer architectures, in particular, have witnessed broad utilization and adoption in TSF tasks. Transformers have proven to be the most successful solution to extract the semantic correlations among the elements within a long sequence. Various variants have enabled transformer architecture to effectively handle long-term time series forecasting (LTSF) tasks. In this article, we first present a comprehensive overview of transformer architectures and their subsequent enhancements developed to address various LTSF tasks. Then, we summarize the publicly available LTSF datasets and relevant evaluation metrics. Furthermore, we provide valuable insights into the best practices and techniques for effectively training transformers in the context of time-series analysis. Lastly, we propose potential research directions in this rapidly evolving field.
    摘要 深度学习的出现对时序预测(TSF)带来了引人注目的进步。特别是在Transformer架构方面,它在TSF任务中得到了广泛的应用和采用。Transformer架构能够很好地提取时序序列中元素之间的 semantic 相关性。不同的变体使得Transformer架构能够有效地处理长期时序预测(LTSF)任务。在这篇文章中,我们首先提供了Transformer架构的全面概述,以及其后续的改进方法,用于解决不同的LTSF任务。然后,我们列举了公共可用的LTSF数据集和相关的评价指标。此外,我们还提供了有价值的时间序列分析训练最佳实践。最后,我们提出了这个领域的可能的未来研究方向。

Does GPT-4 Pass the Turing Test?

  • paper_url: http://arxiv.org/abs/2310.20216
  • repo_url: None
  • paper_authors: Cameron Jones, Benjamin Bergen
  • for: 这个论文是为了测试GPT-4模型在在线图灵测试中的性能而写的。
  • methods: 这个论文使用了公共在线图灵测试来评估GPT-4模型的性能,并比较了基eline设置by ELIZA和GPT-3.5。
  • results: GPT-4模型在图灵测试中的最好提示得分为41%,高于ELIZA和GPT-3.5的基eline(27%和14%),但仍然不足于人类参与者的基eline(63%)。
    Abstract We evaluated GPT-4 in a public online Turing Test. The best-performing GPT-4 prompt passed in 41% of games, outperforming baselines set by ELIZA (27%) and GPT-3.5 (14%), but falling short of chance and the baseline set by human participants (63%). Participants' decisions were based mainly on linguistic style (35%) and socio-emotional traits (27%), supporting the idea that intelligence is not sufficient to pass the Turing Test. Participants' demographics, including education and familiarity with LLMs, did not predict detection rate, suggesting that even those who understand systems deeply and interact with them frequently may be susceptible to deception. Despite known limitations as a test of intelligence, we argue that the Turing Test continues to be relevant as an assessment of naturalistic communication and deception. AI models with the ability to masquerade as humans could have widespread societal consequences, and we analyse the effectiveness of different strategies and criteria for judging humanlikeness.
    摘要 我们对 GPT-4 进行了一项公开的在线图灵测试。最佳 GPT-4 提示在游戏中达到了41%的成功率,超过了 ELIZA (27%)和 GPT-3.5 (14%)的基线,但 still falling short of chance and human participants (63%)的基线。参与者的决策基于主要是语言风格(35%)和社会情感特征(27%),支持智能不足以通过图灵测试的想法。参与者的背景、教育和 LLM 的熟悉度没有预测检测率,表明,即使深入了解系统并经常与之交互,也可能受到欺骗。虽然图灵测试有知名的限制,我们认为它仍然是自然语言交流和欺骗的有效评价工具。 AI 模型具有人类化的能力可能会有广泛的社会影响,我们分析了不同的策略和标准来评价人类化程度。

Handover Protocol Learning for LEO Satellite Networks: Access Delay and Collision Minimization

  • paper_url: http://arxiv.org/abs/2310.20215
  • repo_url: None
  • paper_authors: Ju-Hyung Lee, Chanyoung Park, Soohyun Park, Andreas F. Molisch
  • for: 本研究提出了一种基于深度优化学习(DRL)的手over(HO)协议,称为DHO,以解决低轨道 Earth 卫星网络中 HO 过程中的持续存在的长延迟问题。
  • methods: DHO 减少 Measurement Report(MR)阶段的延迟,通过使用预先确定的 LEO 卫星轨道模式进行预测。
  • results: DHO 在多种网络条件下比传统 HO 协议表现出色,包括访问延迟、碰撞率和手over成功率,这说明了 DHO 在实际网络中的实用性。此外,研究还检验了访问延迟与碰撞率之间的负担评估,以及 DHO 使用不同 DRL 算法的训练性能和收敛性。
    Abstract This study presents a novel deep reinforcement learning (DRL)-based handover (HO) protocol, called DHO, specifically designed to address the persistent challenge of long propagation delays in low-Earth orbit (LEO) satellite networks' HO procedures. DHO skips the Measurement Report (MR) in the HO procedure by leveraging its predictive capabilities after being trained with a pre-determined LEO satellite orbital pattern. This simplification eliminates the propagation delay incurred during the MR phase, while still providing effective HO decisions. The proposed DHO outperforms the legacy HO protocol across diverse network conditions in terms of access delay, collision rate, and handover success rate, demonstrating the practical applicability of DHO in real-world networks. Furthermore, the study examines the trade-off between access delay and collision rate and also evaluates the training performance and convergence of DHO using various DRL algorithms.
    摘要 In simplified Chinese, the text would be:这个研究提出了一种基于深度优化学习(DRL)的手over(HO)协议,称为DHO,特意设计用于解决低轨道卫星网络中HO过程中的持续存在的长延迟问题。DHO忽略MR阶段的测量报告,通过在训练后使用预先确定的LEO卫星轨道模式来利用其预测能力。这种简化消除了MR阶段中的延迟,同时仍提供有效的HO决策。提议的DHO在多种网络条件下比传统HO协议出色,包括访问延迟、碰撞率和手over成功率,这表明DHO在实际网络中有 praktische应用性。此外,研究还检验了访问延迟和碰撞率之间的负面关系,以及DHO使用不同DRL算法的训练性能和结束。

Fraud Analytics Using Machine-learning & Engineering on Big Data (FAME) for Telecom

  • paper_url: http://arxiv.org/abs/2311.00724
  • repo_url: None
  • paper_authors: Sudarson Roy Pratihar, Subhadip Paul, Pranab Kumar Dash, Amartya Kumar Das
  • for: 这项研究的目的是提供一种工业化解决方案,用于检测诈骗和发现新的诈骗模式,以提高检测效率和准确率。
  • methods: 本研究使用了自适应数据挖掘技术和大数据技术,以检测国际营收分享诈骗。
  • results: 研究成功地检测出了国际营收分享诈骗,false positive率低于5%。使用了More than 1 Terra Bytes of Call Detail Record from a reputed wholesale carrier and overseas telecom transit carrier。
    Abstract Telecom industries lose globally 46.3 Billion USD due to fraud. Data mining and machine learning techniques (apart from rules oriented approach) have been used in past, but efficiency has been low as fraud pattern changes very rapidly. This paper presents an industrialized solution approach with self adaptive data mining technique and application of big data technologies to detect fraud and discover novel fraud patterns in accurate, efficient and cost effective manner. Solution has been successfully demonstrated to detect International Revenue Share Fraud with <5% false positive. More than 1 Terra Bytes of Call Detail Record from a reputed wholesale carrier and overseas telecom transit carrier has been used to conduct this study.
    摘要 telecommunications industries globally lose 46.3 billion USD due to fraud. In the past, data mining and machine learning techniques (excluding rule-based approaches) have been used, but their efficiency has been low due to the rapidly changing fraud patterns. This paper proposes an industrialized solution approach with self-adaptive data mining techniques and the application of big data technologies to detect fraud and discover new fraud patterns in an accurate, efficient, and cost-effective manner. The solution has been successfully demonstrated to detect international revenue share fraud with less than 5% false positives. More than 1 terabyte of call detail record data from a reputable wholesale carrier and overseas telecom transit carrier has been used for this study.Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China and Singapore. If you need the translation in Traditional Chinese, please let me know.

In Search of Lost Online Test-time Adaptation: A Survey

  • paper_url: http://arxiv.org/abs/2310.20199
  • repo_url: None
  • paper_authors: Zixin Wang, Yadan Luo, Liang Zheng, Zhuoxiao Chen, Sen Wang, Zi Huang
  • for: 这个论文主要针对在批处理到达时进行测试时适应(OTTA)的机器学习模型,以适应新的数据分布。
  • methods: 该论文分类了OTTA技术为三种 PRIMARY类别,并对其进行了强大的ViT背景进行测试,以发现真正有效的策略。
  • results: 该论文的研究发现:1)转换器在多种领域的适应性强;2)许多OTTA方法的效果取决于较大的批处理大小;3)在适应过程中,稳定的优化和对干扰的抵抗是关键,特别是当批处理大小为1时。
    Abstract In this paper, we present a comprehensive survey on online test-time adaptation (OTTA), a paradigm focused on adapting machine learning models to novel data distributions upon batch arrival. Despite the proliferation of OTTA methods recently, the field is mired in issues like ambiguous settings, antiquated backbones, and inconsistent hyperparameter tuning, obfuscating the real challenges and making reproducibility elusive. For clarity and a rigorous comparison, we classify OTTA techniques into three primary categories and subject them to benchmarks using the potent Vision Transformer (ViT) backbone to discover genuinely effective strategies. Our benchmarks span not only conventional corrupted datasets such as CIFAR-10/100-C and ImageNet-C but also real-world shifts embodied in CIFAR-10.1 and CIFAR-10-Warehouse, encapsulating variations across search engines and synthesized data by diffusion models. To gauge efficiency in online scenarios, we introduce novel evaluation metrics, inclusive of FLOPs, shedding light on the trade-offs between adaptation accuracy and computational overhead. Our findings diverge from existing literature, indicating: (1) transformers exhibit heightened resilience to diverse domain shifts, (2) the efficacy of many OTTA methods hinges on ample batch sizes, and (3) stability in optimization and resistance to perturbations are critical during adaptation, especially when the batch size is 1. Motivated by these insights, we pointed out promising directions for future research. The source code will be made available.
    摘要 在这篇论文中,我们提出了一项全面的在线测试时适应(OTTA)概述,描述了在批处理到达时对新数据分布进行适应的机器学习模型的方法。尽管最近有很多关于OTTA方法的扩展,但是这个领域受到了各种问题的干扰,如模糊的设置、过时的基础模型和不一致的Hyperparameter调整,使得真正的挑战和可重现性变得易让。为了便于比较和检验,我们将OTTA技术分为三个主要类别,并对它们进行了 benchmark 使用potent Vision Transformer(ViT)基础,以发现真正有效的策略。我们的benchmark包括传统的损害数据集如CIFAR-10/100-C和ImageNet-C,以及实际世界中的shift,包括CIFAR-10.1和CIFAR-10-Warehouse,这些shift包括搜索引擎和Synthesized数据的扩散模型。为了衡量在线场景中的效率,我们引入了新的评价指标,包括FLOPs,这有助于揭示适应精度和计算开销之间的负荷。我们的发现与现有文献不同,显示:(1) transformers 具有多样化领域转移的高度抗性,(2)OTTA方法的效果很大程度上取决于批处理大小,(3)在适应过程中稳定的优化和抗扰性很重要,特别是当批处理大小为1。这些发现使我们更加积极地提出了未来研究的方向,源代码将被公开。

Generating Continuations in Multilingual Idiomatic Contexts

  • paper_url: http://arxiv.org/abs/2310.20195
  • repo_url: None
  • paper_authors: Rhitabrat Pokharel, Ameeta Agrawal
  • for: 测试语言模型对非compositional figurative文本的理解能力
  • methods: 使用英语和葡萄牙语的数据集,采用零shot、几shot和精度调教三种训练方式
  • results: 模型对Literal和idiomatic上下文中的续写表现相对较为平均,小margin差距,同时模型在两种语言上的表现相似, indicating models’ robustness in this task.
    Abstract The ability to process idiomatic or literal multiword expressions is a crucial aspect of understanding and generating any language. The task of generating contextually relevant continuations for narratives containing idiomatic (or literal) expressions can allow us to test the ability of generative language models (LMs) in understanding nuanced language containing non-compositional figurative text. We conduct a series of experiments using datasets in two distinct languages (English and Portuguese) under three different training settings (zero-shot, few-shot, and fine-tuned). Our results suggest that the models are only slightly better at generating continuations for literal contexts than idiomatic contexts, with exceedingly small margins. Furthermore, the models studied in this work perform equally well across both languages, indicating the robustness of generative models in performing this task.
    摘要 Language models (LMs) 理解和生成非compositional figurative text的能力是一个关键方面。我们通过用英语和葡萄牙语两种不同的语言和零shot、几shot和精心适应的训练方式进行了一系列实验。我们发现模型在literal上和idiomatic上的继续生成性能几乎相同,差异非常小。此外,我们发现模型在这两种语言之间具有一定的坚定性。

Self-supervised Pre-training for Precipitation Post-processor

  • paper_url: http://arxiv.org/abs/2310.20187
  • repo_url: None
  • paper_authors: Sojung An, Junha Lee, Jiyeon Jang, Inchae Na, Wooyeon Park, Sujeong You
  • for: 提高地方气象预报模型的暴雨预测精度
  • methods: 使用深度学习方法对气象 físico预测模型进行修正
  • results: 实验结果表明,提议方法在地方气象预报中的暴雨修正性能高于其他方法
    Abstract Securing sufficient forecast lead time for local precipitation is essential for preventing hazardous weather events. Nonetheless, global warming-induced climate change is adding to the challenge of accurately predicting severe precipitation events, such as heavy rainfall. In this work, we propose a deep learning-based precipitation post-processor approach to numerical weather prediction (NWP) models. The precipitation post-processor consists of (i) self-supervised pre-training, where parameters of encoder are pre-trained on the reconstruction of masked variables of the atmospheric physics domain, and (ii) transfer learning on precipitation segmentation tasks (target domain) from the pre-trained encoder. We also introduce a heuristic labeling approach for effectively training class-imbalanced datasets. Our experiment results in precipitation correction for regional NWP show that the proposed method outperforms other approaches.
    摘要 为确保地方降水的预测预测时间足够,是预测恶势力天气事件的关键。然而,global warming-induced climate change导致严重降水事件的预测变得更加挑战性。在这种情况下,我们提议一种基于深度学习的降水后处理方法,用于数值天气预测模型(NWP)。降水后处理方法包括:(i)自动预训练,在掩码 атмосфер物理领域中的变量上进行自我预训练,以便在降水分类任务中进行转移学习。(ii)在降水分类任务(目标领域)中,使用预训练后的编码器进行转移学习。我们还提出了一种干预标注方法,用于有效地训练类偏树在降水分类任务中。我们的实验结果表明,提议的方法在地方NWP中的降水更正任务中表现出色,超过了其他方法。

Learning to Discover Skills through Guidance

  • paper_url: http://arxiv.org/abs/2310.20178
  • repo_url: https://github.com/Sfedfcv/redesigned-pancake
  • paper_authors: Hyunseung Kim, Byungkun Lee, Hojoon Lee, Dongyoon Hwang, Sejik Park, Kyushik Min, Jaegul Choo
  • for: 提高无监督技能发现(USD)中的探索限制
  • methods: 使用导航技能选择和聚集方法,以增强探索和分化技能
  • results: 在复杂环境中表现出色,比基eline方法更高效,并提供了质量的代码和视觉化图像Here’s a more detailed explanation of each point:
  • for: The paper aims to address the challenge of limited exploration in unsupervised skill discovery (USD), which is a major problem in the field. The authors propose a novel algorithm called skill discovery with guidance (DISCO-DANCE) to improve exploration.
  • methods: The DISCO-DANCE algorithm selects a guide skill that has the highest potential to reach unexplored states, and then guides other skills to follow the guide skill. This helps to maximize the discriminability of the guided skills in unexplored states.
  • results: The authors evaluate the DISCO-DANCE algorithm on three challenging benchmarks, including two navigation tasks and a continuous control task. The results show that DISCO-DANCE outperforms other USD baselines in these environments, and provides high-quality code and visualizations.
    Abstract In the field of unsupervised skill discovery (USD), a major challenge is limited exploration, primarily due to substantial penalties when skills deviate from their initial trajectories. To enhance exploration, recent methodologies employ auxiliary rewards to maximize the epistemic uncertainty or entropy of states. However, we have identified that the effectiveness of these rewards declines as the environmental complexity rises. Therefore, we present a novel USD algorithm, skill discovery with guidance (DISCO-DANCE), which (1) selects the guide skill that possesses the highest potential to reach unexplored states, (2) guides other skills to follow guide skill, then (3) the guided skills are dispersed to maximize their discriminability in unexplored states. Empirical evaluation demonstrates that DISCO-DANCE outperforms other USD baselines in challenging environments, including two navigation benchmarks and a continuous control benchmark. Qualitative visualizations and code of DISCO-DANCE are available at https://mynsng.github.io/discodance.
    摘要 在无监督技能发现(USD)领域,一个主要挑战是限制探索,主要由于行为偏离初始轨迹的惩罚。为提高探索,现有方法ologies使用辅助奖励来最大化状态的认知不确定性或熵。然而,我们发现在环境复杂性增加时,这些奖励的效果下降。因此,我们提出了一种新的 USD算法,即技能发现指南(DISCO-DANCE),它包括以下三个步骤:1. 选择具有最高潜在可达不探索状态的导航技能(guide skill)。2. 使其他技能跟随导航技能。3. 使用导航技能的探索策略来最大化被引导技能的分化程度。Empirical评估表明,DISCO-DANCE在复杂环境中比其他 USD基线表现出色,包括两个导航标准测试和一个连续控制标准测试。详细的Visualization和代码可以在 上获取。

GraphTransformers for Geospatial Forecasting

  • paper_url: http://arxiv.org/abs/2310.20174
  • repo_url: None
  • paper_authors: Pallavi Banerjee, Satyaki Chakraborty
  • for: 预测地球空间序列中的轨迹,使用图表 transformer 提高预测精度。
  • methods: 利用图Structures自动生成的GraphStructures,提高预测精度。
  • results: 在 HURDAT 数据集上,与基准模型相比,我们的 GraphTransformer 方法显著提高了预测精度。
    Abstract In this paper we introduce a novel framework for trajectory prediction of geospatial sequences using GraphTransformers. When viewed across several sequences, we observed that a graph structure automatically emerges between different geospatial points that is often not taken into account for such sequence modeling tasks. We show that by leveraging this graph structure explicitly, geospatial trajectory prediction can be significantly improved. Our GraphTransformer approach improves upon state-of-the-art Transformer based baseline significantly on HURDAT, a dataset where we are interested in predicting the trajectory of a hurricane on a 6 hourly basis.
    摘要 在这篇论文中,我们介绍了一种新的方框架,用于地ospatial序列预测使用GraphTransformers。当考虑多个序列时,我们发现了不同的地ospatial点之间自然形成的图structure,通常不被考虑在这类序列模型任务中。我们示出了通过显式利用这个图结构,可以显著提高地ospatial轨迹预测。我们的GraphTransformer方法在HURDAT数据集上,即每6个小时预测风暴轨迹, siginificantly exceeded基eline。

Is Robustness Transferable across Languages in Multilingual Neural Machine Translation?

  • paper_url: http://arxiv.org/abs/2310.20162
  • repo_url: None
  • paper_authors: Leiyu Pan, Supryadi, Deyi Xiong
  • for: 本研究探讨了多种语言之间的机器翻译模型强度如何传递。
  • methods: 我们提出了一种强度传递分析协议,并在多种实验中测试了其效果。 Specifically, we use character-、word-、和多级噪声来攻击特定翻译方向的多语言神经机器翻译模型,并评估其他翻译方向的强度。
  • results: 我们的发现表明,在一个翻译方向上获得的强度可以实际传递到其他翻译方向。此外,我们还发现在字符级噪声和词级噪声下,强度更容易传递。
    Abstract Robustness, the ability of models to maintain performance in the face of perturbations, is critical for developing reliable NLP systems. Recent studies have shown promising results in improving the robustness of models through adversarial training and data augmentation. However, in machine translation, most of these studies have focused on bilingual machine translation with a single translation direction. In this paper, we investigate the transferability of robustness across different languages in multilingual neural machine translation. We propose a robustness transfer analysis protocol and conduct a series of experiments. In particular, we use character-, word-, and multi-level noises to attack the specific translation direction of the multilingual neural machine translation model and evaluate the robustness of other translation directions. Our findings demonstrate that the robustness gained in one translation direction can indeed transfer to other translation directions. Additionally, we empirically find scenarios where robustness to character-level noise and word-level noise is more likely to transfer.
    摘要 “模型强健性”——模型在干扰下保持表现的能力——是发展可靠NLP系统的关键。 latest studies have shown promising results in improving model robustness through adversarial training and data augmentation. However, most of these studies have focused on bilingual machine translation with a single translation direction. In this paper, we investigate the transferability of robustness across different languages in multilingual neural machine translation. We propose a robustness transfer analysis protocol and conduct a series of experiments. In particular, we use character-, word-, and multi-level noises to attack the specific translation direction of the multilingual neural machine translation model and evaluate the robustness of other translation directions. Our findings demonstrate that the robustness gained in one translation direction can indeed transfer to other translation directions. Additionally, we empirically find scenarios where robustness to character-level noise and word-level noise is more likely to transfer.

Language Guided Visual Question Answering: Elevate Your Multimodal Language Model Using Knowledge-Enriched Prompts

  • paper_url: http://arxiv.org/abs/2310.20159
  • repo_url: https://github.com/declare-lab/lg-vqa
  • paper_authors: Deepanway Ghosal, Navonil Majumder, Roy Ka-Wei Lee, Rada Mihalcea, Soujanya Poria
    for:The paper focuses on knowledge-augmented visual question answering (VQA), which requires understanding both the image and the question to provide a natural language answer.methods:The proposed multimodal framework uses language guidance (LG) in the form of rationales, image captions, scene graphs, etc to answer questions more accurately.results:The use of language guidance improves the performance of CLIP by 7.6% and BLIP-2 by 4.8% in the challenging A-OKVQA dataset, and consistently improves performance on the Science-QA, VSR, and IconQA datasets.Here’s the simplified Chinese text:for:这篇论文关注了具有知识增强的视觉问答(VQA)任务,需要理解图像和问题以提供自然语言回答。methods:提议的多模态框架使用语言指导(LG)的形式,包括理由、图像描述、场景图等,以更加准确地回答问题。results:使用语言指导可以提高CLIP的性能 by 7.6%和BLIP-2 by 4.8%在复杂的A-OKVQA数据集上,并在科学-QA、VSR和IconQA数据集上表现更加稳定。
    Abstract Visual question answering (VQA) is the task of answering questions about an image. The task assumes an understanding of both the image and the question to provide a natural language answer. VQA has gained popularity in recent years due to its potential applications in a wide range of fields, including robotics, education, and healthcare. In this paper, we focus on knowledge-augmented VQA, where answering the question requires commonsense knowledge, world knowledge, and reasoning about ideas and concepts not present in the image. We propose a multimodal framework that uses language guidance (LG) in the form of rationales, image captions, scene graphs, etc to answer questions more accurately. We benchmark our method on the multi-choice question-answering task of the A-OKVQA, Science-QA, VSR, and IconQA datasets using CLIP and BLIP models. We show that the use of language guidance is a simple but powerful and effective strategy for visual question answering. Our language guidance improves the performance of CLIP by 7.6% and BLIP-2 by 4.8% in the challenging A-OKVQA dataset. We also observe consistent improvement in performance on the Science-QA, VSR, and IconQA datasets when using the proposed language guidances. The implementation of LG-VQA is publicly available at https:// github.com/declare-lab/LG-VQA.
    摘要 Visual Question Answering (VQA) 是一个答问图像的任务,它假设了对图像和问题的理解,以提供自然语言的答案。随着Recent years,VQA 得到了广泛的应用领域,包括机器人、教育和医疗等。在这篇论文中,我们将关注知识增强VQA,其中答问题需要基于图像的常识知识、世界知识和理解概念和想法。我们提议一种多模态框架,使用语言指导(LG)来更准确地答问题。我们使用 CLIP 和 BLIP 模型进行实验,并证明使用语言指导是一种简单 yet powerful 和有效的策略。我们在 A-OKVQA、Science-QA、VSR 和 IconQA 数据集上进行多选问答任务,并证明了语言指导可以提高 CLIP 的表现,具体是在 A-OKVQA 数据集上提高了 7.6%,并在其他数据集上也得到了一致的提高。LG-VQA 的实现可以在 GitHub 上找到:https://github.com/declare-lab/LG-VQA。

MLatom 3: Platform for machine learning-enhanced computational chemistry simulations and workflows

  • paper_url: http://arxiv.org/abs/2310.20155
  • repo_url: None
  • paper_authors: Pavlo O. Dral, Fuchun Ge, Yi-Fan Hou, Peikun Zheng, Yuxinxin Chen, Mario Barbatti, Olexandr Isayev, Cheng Wang, Bao-Xin Xue, Max Pinheiro Jr, Yuming Su, Yiheng Dai, Yangtao Chen, Lina Zhang, Shuang Zhang, Arif Ullah, Quanhao Zhang, Yanchi Ou
  • for: 这篇论文主要是为了推广机器学习(ML)在计算化学中的应用,并提供一个灵活的软件框架,以便用户可以设计自定义的工作流程。
  • methods: 这篇论文使用了许多机器学习算法和量子化学方法,包括AIQM1模型,以计算能量和热化学性质、优化几何、运行分子和量子动力学、计算(ro)振荡、一 photon UV/vis吸收和两 photon吸收谱。
  • results: 这篇论文可以用来计算能量和热化学性质、优化几何、运行分子和量子动力学、计算(ro)振荡、一 photon UV/vis吸收和两 photon吸收谱,以及使用自定义的机器学习模型和量子化学方法。
    Abstract Machine learning (ML) is increasingly becoming a common tool in computational chemistry. At the same time, the rapid development of ML methods requires a flexible software framework for designing custom workflows. MLatom 3 is a program package designed to leverage the power of ML to enhance typical computational chemistry simulations and to create complex workflows. This open-source package provides plenty of choice to the users who can run simulations with the command line options, input files, or with scripts using MLatom as a Python package, both on their computers and on the online XACS cloud computing at XACScloud.com. Computational chemists can calculate energies and thermochemical properties, optimize geometries, run molecular and quantum dynamics, and simulate (ro)vibrational, one-photon UV/vis absorption, and two-photon absorption spectra with ML, quantum mechanical, and combined models. The users can choose from an extensive library of methods containing pre-trained ML models and quantum mechanical approximations such as AIQM1 approaching coupled-cluster accuracy. The developers can build their own models using various ML algorithms. The great flexibility of MLatom is largely due to the extensive use of the interfaces to many state-of-the-art software packages and libraries.
    摘要

Interactive Multi-fidelity Learning for Cost-effective Adaptation of Language Model with Sparse Human Supervision

  • paper_url: http://arxiv.org/abs/2310.20153
  • repo_url: None
  • paper_authors: Jiaxin Zhang, Zhuohang Li, Kamalika Das, Sricharan Kumar
    for:This paper aims to address the issue of high data annotation costs in domain-specific tasks for large language models (LLMs).methods:The proposed method is called Interactive Multi-Fidelity Learning (IMFL), which formulates the domain-specific fine-tuning process as a multi-fidelity learning problem. It uses a novel exploration-exploitation query strategy that incorporates two innovative designs: prompt retrieval and variable batch size.results:Extensive experiments on financial and medical tasks show that IMFL achieves superior performance compared with single fidelity annotations. Given a limited budget of human annotation, IMFL significantly outperforms the human annotation baselines in all four tasks and achieves very close performance as human annotations on two of the tasks.
    Abstract Large language models (LLMs) have demonstrated remarkable capabilities in various tasks. However, their suitability for domain-specific tasks, is limited due to their immense scale at deployment, susceptibility to misinformation, and more importantly, high data annotation costs. We propose a novel Interactive Multi-Fidelity Learning (IMFL) framework for the cost-effective development of small domain-specific LMs under limited annotation budgets. Our approach formulates the domain-specific fine-tuning process as a multi-fidelity learning problem, focusing on identifying the optimal acquisition strategy that balances between low-fidelity automatic LLM annotations and high-fidelity human annotations to maximize model performance. We further propose an exploration-exploitation query strategy that enhances annotation diversity and informativeness, incorporating two innovative designs: 1) prompt retrieval that selects in-context examples from human-annotated samples to improve LLM annotation, and 2) variable batch size that controls the order for choosing each fidelity to facilitate knowledge distillation, ultimately enhancing annotation quality. Extensive experiments on financial and medical tasks demonstrate that IMFL achieves superior performance compared with single fidelity annotations. Given a limited budget of human annotation, IMFL significantly outperforms the human annotation baselines in all four tasks and achieves very close performance as human annotations on two of the tasks. These promising results suggest that the high human annotation costs in domain-specific tasks can be significantly reduced by employing IMFL, which utilizes fewer human annotations, supplemented with cheaper and faster LLM (e.g., GPT-3.5) annotations to achieve comparable performance.
    摘要 大型语言模型(LLM)在多种任务中表现出色,但它们在特定领域任务中的适用性受限因为它们的投入规模太大,容易受到谣言的影响,以及更重要的是,需要大量数据标注成本。我们提出了一种新的互动式多质量学习(IMFL)框架,用于Cost-effective发展特定领域的小型语言模型,即使具有有限的标注预算。我们的方法将领域特定精度调整过程定义为多质量学习问题,关注在 Identifying optimal acquisition strategy that balances between low-fidelity automatic LLM annotations and high-fidelity human annotations to maximize model performance。我们还提出了一种探索-利用查询策略,以提高标注多样性和有用性,包括两项创新设计:1)提取示例选择,选择人工标注的示例来提高LLM注释,2)变量批处理,控制选择每种精度的顺序,以便实现知识储存,最终提高标注质量。我们在金融和医疗任务上进行了广泛的实验,展示了IMFL在单个质量标注下表现出优于其他方法。给定有限的人工标注预算,IMFL在所有四个任务中明显超过人工标注基线,并在两个任务中几乎达到人工标注水平。这些成功的结果表明,通过使用IMFL,可以大幅减少特定领域任务中的人工标注成本,使用更少的人工标注,并且通过快速和便宜的LLM(如GPT-3.5)注释来实现相似的性能。

Unlearn What You Want to Forget: Efficient Unlearning for LLMs

  • paper_url: http://arxiv.org/abs/2310.20150
  • repo_url: None
  • paper_authors: Jiaao Chen, Diyi Yang
  • for: 这篇论文旨在解决大语言模型(LLM)在预训和储存广泛文本数据时可能会遇到隐私问题和数据保护规定的问题。
  • methods: 这篇论文提出了一个高效的快速卸学框架,可以将LLM更新到去除个人用户相关数据后,不需要重新训练整个模型。这个框架通过在transformer中引入轻量级卸学层,并通过一个组合机制实现不同卸学层之间的有效混合,以实现多次快速卸学。
  • results: 实验结果显示,提出的方法比顶专业基准更有效,可以实现高品质的预测和生成任务。
    Abstract Large language models (LLMs) have achieved significant progress from pre-training on and memorizing a wide range of textual data, however, this process might suffer from privacy issues and violations of data protection regulations. As a result, the ability to easily remove data related to individual users from such models while not deteriorating their predictive quality after the removal becomes increasingly important. To address these issues, in this work, we propose an efficient unlearning framework that could efficiently update LLMs without having to retrain the whole model after data removals, by introducing lightweight unlearning layers learned with a selective teacher-student objective into the transformers. In addition, we introduce a fusion mechanism to effectively combine different unlearning layers that learns to forget different sets of data to handle a sequence of forgetting operations. Experiments on classification and generation tasks demonstrate the effectiveness of our proposed methods compared to the state-of-the-art baselines.
    摘要 大型语言模型(LLM)在预训数据的广泛训练中已经取得了重要进步,但这个过程可能会受到隐私问题和数据保护规定的影响。为了解决这些问题,在这个工作中,我们提出了一个高效的忘记框架,可以将LLM更新到不需要重新训练整个模型,只需要在实际使用时进行更新。我们还引入了一个融合机制,可以有效地结合不同的忘记层,以应对一串忘记操作。实验结果显示,我们的提案比现有的基eline更加有效。

Decision-Making for Autonomous Vehicles with Interaction-Aware Behavioral Prediction and Social-Attention Neural Network

  • paper_url: http://arxiv.org/abs/2310.20148
  • repo_url: None
  • paper_authors: Xiao Li, Kaiwen Liu, H. Eric Tseng, Anouck Girard, Ilya Kolmanovsky
  • for: 本研究旨在帮助自动驾驶车辆更好地理解周围交通情况,以便更好地完成任务。
  • methods: 本研究使用了人类驾驶者交互意图的行为模型,利用bayesian滤波器和往返规划算法来帮助自动驾驶车辆做出决策。在线部署时,我们使用了基于注意力机制的神经网络架构来模拟行为模型,并使用决策树搜索算法来解决决策问题。
  • results: 我们的行为模型在实际轨迹预测任务中被评估,并在强制交叉场景中使用实际车辆数据和模拟环境进行了广泛的评估。结果表明,我们的算法可以在不同交通条件下完成强制交叉任务,同时保证驾驶安全。
    Abstract Autonomous vehicles need to accomplish their tasks while interacting with human drivers in traffic. It is thus crucial to equip autonomous vehicles with artificial reasoning to better comprehend the intentions of the surrounding traffic, thereby facilitating the accomplishments of the tasks. In this work, we propose a behavioral model that encodes drivers' interacting intentions into latent social-psychological parameters. Leveraging a Bayesian filter, we develop a receding-horizon optimization-based controller for autonomous vehicle decision-making which accounts for the uncertainties in the interacting drivers' intentions. For online deployment, we design a neural network architecture based on the attention mechanism which imitates the behavioral model with online estimated parameter priors. We also propose a decision tree search algorithm to solve the decision-making problem online. The proposed behavioral model is then evaluated in terms of its capabilities for real-world trajectory prediction. We further conduct extensive evaluations of the proposed decision-making module, in forced highway merging scenarios, using both simulated environments and real-world traffic datasets. The results demonstrate that our algorithms can complete the forced merging tasks in various traffic conditions while ensuring driving safety.
    摘要 Translation notes:* "autonomous vehicles" is translated as "自动驾驶车辆" (zìdòng yìdòng chēliàng)* "artificial reasoning" is translated as "人工智能" (réngōng zhìnéng)* "behavioral model" is translated as "行为模型" (xíngwèi módeli)* "latent social-psychological parameters" is translated as "隐藏的社会心理参数" (hàinziào de xiāoshì xīnxīng yùbù)* "Bayesian filter" is translated as " bayesian 筛子" (bèi jiān zhì)* "receding-horizon optimization" is translated as "退 horizon 优化" (jiù hóngyìn yóuhuà)* "controller" is translated as "控制器" (kòng Zhìqì)* "neural network architecture" is translated as "神经网络架构" (shénxīn wǎngluò jiàgòu)* "attention mechanism" is translated as "注意机制" (zhùyì jīzhèng)* "decision tree search algorithm" is translated as "决策树搜索算法" (jìsuī shù sōujiàngsuànfǎ)* "forced highway merging" is translated as "强制高速公路合并" (qiángxí gāosù gōnglù hébièsù)* "simulated environments" is translated as "模拟环境" (móxī huánjīng)* "real-world traffic datasets" is translated as "实际交通数据集" (shíjiè jiāotòng shùjì)

EELBERT: Tiny Models through Dynamic Embeddings

  • paper_url: http://arxiv.org/abs/2310.20144
  • repo_url: None
  • paper_authors: Gabrielle Cohn, Rishika Agarwal, Deepanshu Gupta, Siddharth Patwardhan
  • for: 压缩 trasnformer 模型 (例如 BERT),以 minimize 影响下游任务的准确性。
  • methods: 取消输入嵌入层,并使用动态 embedding 计算。
  • results: 对 GLUE benchmark 进行实证评估,发现我们的 BERT 变体 (EELBERT) 与传统 BERT 模型减少幅度相似。通过这种方法,我们开发出了最小化的模型 UNO-EELBERT,在 GLUE 分数上与全部训练 BERT-tiny 相似,而且占用空间只有 1.2 MB,相比于传统 BERT 模型的 4%。
    Abstract We introduce EELBERT, an approach for compression of transformer-based models (e.g., BERT), with minimal impact on the accuracy of downstream tasks. This is achieved by replacing the input embedding layer of the model with dynamic, i.e. on-the-fly, embedding computations. Since the input embedding layer accounts for a significant fraction of the model size, especially for the smaller BERT variants, replacing this layer with an embedding computation function helps us reduce the model size significantly. Empirical evaluation on the GLUE benchmark shows that our BERT variants (EELBERT) suffer minimal regression compared to the traditional BERT models. Through this approach, we are able to develop our smallest model UNO-EELBERT, which achieves a GLUE score within 4% of fully trained BERT-tiny, while being 15x smaller (1.2 MB) in size.
    摘要 我们介绍EELBERT方法,用于缩减基于转移器的模型(例如BERT)的尺寸,而无影响下游任务的精度。这是通过取代模型中的输入嵌入层而实现的,并且使用动态嵌入计算。由于输入嵌入层对模型大小的影响相当大,特别是小型BERT的情况下,将这个层替换为嵌入计算函数可以对模型实现重要的缩减。我们在GLUE评估标准 benchmark 上进行了实践评估,发现我们的BERTVariants(EELBERT)与传统BERT模型之间存在较少的 regression。透过这种方法,我们成功开发了我们最小的模型UNO-EELBERT,其在GLUE评估中获得了与完全训练的BERT-tiny模型相似的分数(在1.2 MB的尺寸下),而且只需15倍的空间大小。

Contrastive Difference Predictive Coding

  • paper_url: http://arxiv.org/abs/2310.20141
  • repo_url: https://github.com/chongyi-zheng/td_infonce
  • paper_authors: Chongyi Zheng, Ruslan Salakhutdinov, Benjamin Eysenbach
  • for: 预测和理解未来的时间序列问题的解决方案(如果-conditioned 强化学习)
  • methods: 使用对照预测编码法模型时间序列数据,并将不同时间序列数据剪辑在一起来降低学习预测未来事件所需的数据量。
  • results: 比对于先前的RL方法,我们的方法可以 achieved $2 \times$ 中间改进率和更好地处理随机环境,而且在表格设置下,我们的方法比Successor表示法和标准(蒙特卡洛)对照预测编码法更高效,具体是 $20 \times$ 和 $1500 \times$ 更高效。
    Abstract Predicting and reasoning about the future lie at the heart of many time-series questions. For example, goal-conditioned reinforcement learning can be viewed as learning representations to predict which states are likely to be visited in the future. While prior methods have used contrastive predictive coding to model time series data, learning representations that encode long-term dependencies usually requires large amounts of data. In this paper, we introduce a temporal difference version of contrastive predictive coding that stitches together pieces of different time series data to decrease the amount of data required to learn predictions of future events. We apply this representation learning method to derive an off-policy algorithm for goal-conditioned RL. Experiments demonstrate that, compared with prior RL methods, ours achieves $2 \times$ median improvement in success rates and can better cope with stochastic environments. In tabular settings, we show that our method is about $20 \times$ more sample efficient than the successor representation and $1500 \times$ more sample efficient than the standard (Monte Carlo) version of contrastive predictive coding.
    摘要 预测和理解未来是许多时间序列问题的核心。例如,目标条件 reinforcement learning 可以看作是学习表示来预测哪些状态在未来可能会被访问。而先前的方法通常使用了对比预测编码来模型时间序列数据,但学习表示长期依赖通常需要大量数据。在这篇论文中,我们介绍了一种基于时间差的对比预测编码,将不同时间序列数据的各个部分缝合起来,以降低学习未来事件预测所需的数据量。我们应用这种表示学习方法来 derivate 一种离散RL算法。实验表明,相比之前的RL方法,我们的方法可以 achieved 2 倍的成功率,并且更好地处理随机环境。在表格设置中,我们示出了我们方法相比 successor 表示和标准(Monte Carlo)版本的对比预测编码,具有 $20 \times$ 更高的样本效率和 $1500 \times$ 更高的样本效率。

Efficient Classification of Student Help Requests in Programming Courses Using Large Language Models

  • paper_url: http://arxiv.org/abs/2310.20105
  • repo_url: None
  • paper_authors: Jaromir Savelka, Paul Denny, Mark Liffiton, Brad Sheese
  • for: 本研究旨在evaluate大型自然语言模型(LLMs)是否可以准确地分类学生帮助请求,以便适应教育系统。
  • methods: 本研究使用GPT-3.5和GPT-4模型进行学生帮助请求的自动分类,并进行零shot trial和精度调整。
  • results: 研究发现,GPT-3.5和GPT-4模型在大多数类别中具有相似的表现,而GPT-4在调试相关的子类别中表现更优。经过精度调整,GPT-3.5模型的表现得到了大幅提高,与两名人工评分者的准确率和一致性几乎相同。
    Abstract The accurate classification of student help requests with respect to the type of help being sought can enable the tailoring of effective responses. Automatically classifying such requests is non-trivial, but large language models (LLMs) appear to offer an accessible, cost-effective solution. This study evaluates the performance of the GPT-3.5 and GPT-4 models for classifying help requests from students in an introductory programming class. In zero-shot trials, GPT-3.5 and GPT-4 exhibited comparable performance on most categories, while GPT-4 outperformed GPT-3.5 in classifying sub-categories for requests related to debugging. Fine-tuning the GPT-3.5 model improved its performance to such an extent that it approximated the accuracy and consistency across categories observed between two human raters. Overall, this study demonstrates the feasibility of using LLMs to enhance educational systems through the automated classification of student needs.
    摘要 学生的帮助请求的精准分类可以帮助制定有效的回应。自动将这些请求分类为不同类型是一项非轻松的任务,但大语言模型(LLM)似乎提供了可 accessible 和cost-effective的解决方案。本研究 evaluates the performance of GPT-3.5和GPT-4模型来 классифици学生在入门编程课程中的帮助请求。在零 shot trial中,GPT-3.5和GPT-4在大多数类别上表现相似,而GPT-4在 Debugging 相关的 sub-category 上表现更高。对 GPT-3.5 模型进行精度调整可以提高其表现到那种程度,使其与两名人类评分员的准确率和一致性相似。总的来说,本研究证明了使用 LLM 提高教育系统的可能性,通过自动分类学生的需求。

Plagiarism and AI Assistance Misuse in Web Programming: Unfair Benefits and Characteristics

  • paper_url: http://arxiv.org/abs/2310.20104
  • repo_url: None
  • paper_authors: Oscar Karnalim, Hapnes Toba, Meliana Christianti Johan, Erico Darmawan Handoyo, Yehezkiel David Setiawan, Josephine Alvina Luwia
  • For: This paper aims to identify and understand the issues of plagiarism and misuse of AI assistance in web programming education.* Methods: The authors conducted a controlled experiment to compare student performance in completing web programming tasks independently, with a submission to plagiarize, and with the help of AI assistance (ChatGPT).* Results: The study shows that students who engage in misconducts (plagiarism and AI assistance) get comparable test marks with less completion time. Plagiarized submissions are similar to independent ones except for trivial aspects, while AI-assisted submissions are more complex and less readable. Students believe AI assistance could be useful with proper acknowledgment, but they are not convinced of its readability and correctness.Here’s the simplified Chinese version of the three key points:
  • for: 这篇论文目的是描述在网络编程教育中,抄袭和人工智能助手的滥用是emerging问题,但很少有相关研究。作者计划开发自动化工具,帮助教师识别这两种不正当行为。
  • methods: 作者通过控制性实验,比较学生在独立完成网络编程任务、抄袭提交和人工智能助手(ChatGPT)帮助下的性能。
  • results: 研究显示,学生在不正当行为(抄袭和人工智能助手)下得到相似的测验marks,并且需要更少的时间。抄袭提交与独立完成的任务相似,但是人工智能助手提交的代码更复杂,阅读性差。学生认为,人工智能助手可以是有用的,但是不确定其可读性和正确性。
    Abstract In programming education, plagiarism and misuse of artificial intelligence (AI) assistance are emerging issues. However, not many relevant studies are focused on web programming. We plan to develop automated tools to help instructors identify both misconducts. To fully understand the issues, we conducted a controlled experiment to observe the unfair benefits and the characteristics. We compared student performance in completing web programming tasks independently, with a submission to plagiarize, and with the help of AI assistance (ChatGPT). Our study shows that students who are involved in such misconducts get comparable test marks with less completion time. Plagiarized submissions are similar to the independent ones except in trivial aspects such as color and identifier names. AI-assisted submissions are more complex, making them less readable. Students believe AI assistance could be useful given proper acknowledgment of the use, although they are not convinced with readability and correctness of the solutions.
    摘要 在程序教育中,抄袭和人工智能(AI)协助的不当使用是emerging问题。然而,不多的相关研究是关注网络程序。我们打算开发自动化工具,帮助教师识别这些不当行为。为了全面了解问题,我们进行了一个控制实验,观察不正当利益和特点。我们比较了学生完成网络程序任务的独立完成、抄袭提交和AI协助(ChatGPT)的完成情况。我们的研究表明,参与这些不当行为的学生在测试marks和完成时间上得到了相似的成绩。抄袭提交与独立完成相似,只是在 superficies 方面有些不同,如颜色和标识符名称。AI协助的提交更复杂,使其 menos可读。学生认为AI协助可以是有用的,只要正确地承认其使用,但他们并不是完全信服AI协助的解决方案的正确性和可读性。

Data Market Design through Deep Learning

  • paper_url: http://arxiv.org/abs/2310.20096
  • repo_url: https://github.com/abusufyanvu/6S191_MIT_DeepLearning
  • paper_authors: Sai Srivatsa Ravindranath, Yanchen Jiang, David C. Parkes
  • for: 这个研究的目的是设计优化收益的数据市场,以扩展我们所能理解和实现的领域。
  • methods: 这个研究使用深度学习来设计数据市场,并处理了一些契约约束和优化约束。
  • results: 研究显示,这种新的深度学习框架可以准确地复制已知的理论解决方案,扩展到更复杂的设置,并用来证明数据市场的优化设计。
    Abstract The $\textit{data market design}$ problem is a problem in economic theory to find a set of signaling schemes (statistical experiments) to maximize expected revenue to the information seller, where each experiment reveals some of the information known to a seller and has a corresponding price [Bergemann et al., 2018]. Each buyer has their own decision to make in a world environment, and their subjective expected value for the information associated with a particular experiment comes from the improvement in this decision and depends on their prior and value for different outcomes. In a setting with multiple buyers, a buyer's expected value for an experiment may also depend on the information sold to others [Bonatti et al., 2022]. We introduce the application of deep learning for the design of revenue-optimal data markets, looking to expand the frontiers of what can be understood and achieved. Relative to earlier work on deep learning for auction design [D\"utting et al., 2023], we must learn signaling schemes rather than allocation rules and handle $\textit{obedience constraints}$ $-$ these arising from modeling the downstream actions of buyers $-$ in addition to incentive constraints on bids. Our experiments demonstrate that this new deep learning framework can almost precisely replicate all known solutions from theory, expand to more complex settings, and be used to establish the optimality of new designs for data markets and make conjectures in regard to the structure of optimal designs.
    摘要 “资料市场设计”问题是经济理论中寻找一组信号对策( Statistical experiments),以最大化顾客提供的资料价值,每个实验透露顾客所知的一部分信息,并有相应的价格。每个买家在世界环境中做出决策,他们对某个实验的预期值取决于它对他们的决策进行改善,并且取决于他们的先前和不同结果的价值。在多个买家的情况下,买家对某个实验的预期值也可能取决于他们对他们的资料销售。我们将应用深度学习设计资料市场,以扩展我们能理解和实现的前iers。相比于earlier work on deep learning for auction design [D\"utting et al., 2023],我们需要学习信号对策而不是配置规则,并且处理“服从限制”(obedience constraints),这些限制来自资料市场下买家的下游行为。我们的实验表明,这个新的深度学习框架可以几乎精准地复制所有已知的理论解决方案,扩展到更加复杂的设定,并且可以用来证明资料市场的设计是最佳的。

Evaluating Neural Language Models as Cognitive Models of Language Acquisition

  • paper_url: http://arxiv.org/abs/2310.20093
  • repo_url: None
  • paper_authors: Héctor Javier Vázquez Martínez, Annika Lea Heuser, Charles Yang, Jordan Kodner
  • for: 这篇论文旨在探讨语言模型(LM)在语言科学中的潜在应用,以及LM的训练方法与儿童语言学习的差异。
  • methods: 作者 argue that some commonly used benchmarks for evaluating LMs’ syntactic capacities may not be rigorous enough, and that template-based benchmarks lack the structural diversity found in theoretical and psychological studies of language.
  • results: 作者发现,当LMs被训练在小规模数据上模拟儿童语言学习时,可以用简单的基eline模型匹配。作者建议使用已经 disponibly的、仔细挑选的数据集,并由大量的本地语言使用者评估其语法结构的合理性。在一个数据集上,LMs 评估 sentences 与人类语言用户不一致。作者 conclude with 建议,以更好地连接LMs 与儿童语言学习的实际研究。
    Abstract The success of neural language models (LMs) on many technological tasks has brought about their potential relevance as scientific theories of language despite some clear differences between LM training and child language acquisition. In this paper we argue that some of the most prominent benchmarks for evaluating the syntactic capacities of LMs may not be sufficiently rigorous. In particular, we show that the template-based benchmarks lack the structural diversity commonly found in the theoretical and psychological studies of language. When trained on small-scale data modeling child language acquisition, the LMs can be readily matched by simple baseline models. We advocate for the use of the readily available, carefully curated datasets that have been evaluated for gradient acceptability by large pools of native speakers and are designed to probe the structural basis of grammar specifically. On one such dataset, the LI-Adger dataset, LMs evaluate sentences in a way inconsistent with human language users. We conclude with suggestions for better connecting LMs with the empirical study of child language acquisition.
    摘要 neural language models (LMs) 的成功在许多技术任务上,使其潜在地成为语言科学的理论,尽管LM训练和儿童语言学习有一些明显的区别。在这篇论文中,我们 argue That some of the most prominent benchmarks for evaluating the syntactic capacities of LMs may not be sufficiently rigorous. In particular, we show that the template-based benchmarks lack the structural diversity commonly found in the theoretical and psychological studies of language. When trained on small-scale data modeling child language acquisition, the LMs can be readily matched by simple baseline models. We advocate for the use of the readily available, carefully curated datasets that have been evaluated for gradient acceptability by large pools of native speakers and are designed to probe the structural basis of grammar specifically. On one such dataset, the LI-Adger dataset, LMs evaluate sentences in a way inconsistent with human language users. We conclude with suggestions for better connecting LMs with the empirical study of child language acquisition.Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. The Traditional Chinese writing system is used in Taiwan and other parts of the world.