cs.AI - 2023-09-08

Few-Shot Learning of Force-Based Motions From Demonstration Through Pre-training of Haptic Representation

  • paper_url: http://arxiv.org/abs/2309.04640
  • repo_url: None
  • paper_authors: Marina Y. Aoyama, João Moura, Namiko Saito, Sethu Vijayakumar
  • for: 能够快速适应不同物体的物理特性,提高机器人抓取物体的能力。
  • methods: 使用半监督学习自动机制,将学习模型分解成感觉表示编码器和动作生成解码器。首先使用大量未经监督的数据进行预训练,然后使用少量监督学习来训练动作生成解码器,以便快速适应不同物体的物理特性。
  • results: 对干洗任务使用不同弹性和表面黏度的毛巾进行验证,结果表明预训练可以大幅提高下游任务中机器人对物体物理特性的认识和生成恰当的动作,超过了没有预训练的LfD方法。此外,我们还验证了在物理机器人硬件上运行的动作是否符合预期,并证明感觉表示编码器在实际物体上采集的数据上具有良好的表达能力,从而解释了它在下游任务中的贡献。
    Abstract In many contact-rich tasks, force sensing plays an essential role in adapting the motion to the physical properties of the manipulated object. To enable robots to capture the underlying distribution of object properties necessary for generalising learnt manipulation tasks to unseen objects, existing Learning from Demonstration (LfD) approaches require a large number of costly human demonstrations. Our proposed semi-supervised LfD approach decouples the learnt model into an haptic representation encoder and a motion generation decoder. This enables us to pre-train the first using large amount of unsupervised data, easily accessible, while using few-shot LfD to train the second, leveraging the benefits of learning skills from humans. We validate the approach on the wiping task using sponges with different stiffness and surface friction. Our results demonstrate that pre-training significantly improves the ability of the LfD model to recognise physical properties and generate desired wiping motions for unseen sponges, outperforming the LfD method without pre-training. We validate the motion generated by our semi-supervised LfD model on the physical robot hardware using the KUKA iiwa robot arm. We also validate that the haptic representation encoder, pre-trained in simulation, captures the properties of real objects, explaining its contribution to improving the generalisation of the downstream task.
    摘要 多数有物理性任务中,力感测具有重要作用,以适应 manipulate 物体的物理性。现有的学习从示例 (LfD) 方法需要大量的贵重人类示例,以便学习总结 manipulate 任务。我们提出的半supervised LfD 方法将学习模型分解为感觉表示编码器和动作生成解码器。这使得我们可以在大量的无监督数据上预训练首先,使用少量的 LfD 训练第二个,利用学习人类技能的好处。我们使用擦除任务中使用不同坚度和表面黏性的湿巾进行验证。我们的结果表明,预训练可以提高 LfD 模型认识物理特性和生成满意的擦除动作的能力,超过没有预训练的 LfD 方法。我们使用Physical robot 硬件KUKA iiwa robot arm验证下游任务中的动作。我们还验证预训练在实际物体上的感觉表示编码器能够 capture 物体的物理特性,解释其在下游任务的总结中的贡献。

Perceptual adjustment queries and an inverted measurement paradigm for low-rank metric learning

  • paper_url: http://arxiv.org/abs/2309.04626
  • repo_url: https://github.com/austinxu87/paq
  • paper_authors: Austin Xu, Andrew D. McRae, Jingyan Wang, Mark A. Davenport, Ashwin Pananjady
  • for: 这个论文是为了提出一种新的查询机制,即感知调整查询(PAQ),用于收集人类反馈。
  • methods: 这个论文使用了一种倒计时间的测量方案,并结合了 cardinal 和 ordinal 查询的优点。
  • results: 论文在度量学习问题中使用了 PAQ 测量,并实现了一种高维度、低纬度矩阵估计问题的解决方案,并提供了样本复杂性保证。
    Abstract We introduce a new type of query mechanism for collecting human feedback, called the perceptual adjustment query ( PAQ). Being both informative and cognitively lightweight, the PAQ adopts an inverted measurement scheme, and combines advantages from both cardinal and ordinal queries. We showcase the PAQ in the metric learning problem, where we collect PAQ measurements to learn an unknown Mahalanobis distance. This gives rise to a high-dimensional, low-rank matrix estimation problem to which standard matrix estimators cannot be applied. Consequently, we develop a two-stage estimator for metric learning from PAQs, and provide sample complexity guarantees for this estimator. We present numerical simulations demonstrating the performance of the estimator and its notable properties.
    摘要 我们介绍一种新型的询问机制,called perceptual adjustment query (PAQ),它具有 both informative 和 cognitively lightweight 的特点。PAQ 使用倒排量度系统,并结合 cardinal 和 ordinal 询问的优点。我们在 metric learning 问题中使用 PAQ,收集 PAQ 测量来学习未知的 Mahalanobis 距离。这导致了一个高维、低阶矩阵估计问题,标准矩阵估计器无法应用。因此,我们开发了一个 two-stage 估计器 для metric learning from PAQs,并提供了样本Complexity 保证 для 这个估计器。我们还进行了 numrical simulations 来评估这个估计器的表现和其他优点。

Leveraging World Model Disentanglement in Value-Based Multi-Agent Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2309.04615
  • repo_url: None
  • paper_authors: Zhizun Wang, David Meger
  • for: addresses the challenge of achieving a common goal of multiple agents interacting in the same environment with reduced sample complexity.
  • methods: uses a modularized world model, composed of action-conditioned, action-free, and static branches, to unravel the environment dynamics and produce imagined outcomes based on past experience, without sampling directly from the real environment.
  • results: achieves high sample efficiency and exhibits superior performance in defeating the enemy armies compared to other baselines in Easy, Hard, and Super-Hard StarCraft II micro-management challenges.
    Abstract In this paper, we propose a novel model-based multi-agent reinforcement learning approach named Value Decomposition Framework with Disentangled World Model to address the challenge of achieving a common goal of multiple agents interacting in the same environment with reduced sample complexity. Due to scalability and non-stationarity problems posed by multi-agent systems, model-free methods rely on a considerable number of samples for training. In contrast, we use a modularized world model, composed of action-conditioned, action-free, and static branches, to unravel the environment dynamics and produce imagined outcomes based on past experience, without sampling directly from the real environment. We employ variational auto-encoders and variational graph auto-encoders to learn the latent representations for the world model, which is merged with a value-based framework to predict the joint action-value function and optimize the overall training objective. We present experimental results in Easy, Hard, and Super-Hard StarCraft II micro-management challenges to demonstrate that our method achieves high sample efficiency and exhibits superior performance in defeating the enemy armies compared to other baselines.
    摘要 在这篇论文中,我们提出了一种新的模型基于多代理人强化学习方法,称为值分解框架,以解决多代理人在同一环境中实现共同目标的问题,并减少样本复杂性。由于多代理人系统的扩展性和非站点性问题,无约方法需要很多样本进行训练。相反,我们使用模块化的世界模型,包括动作决策、无动作和静态分支,来揭示环境动力学和生成基于过去经验的想像结果,不直接从实际环境中采样。我们使用变量自动编码器和变量图自动编码器来学习 latent 表示,并将其与值基于框架相结合,预测共同动作值函数并优化总训练目标。我们在易、Difficult和超级Difficult StarCraft II 微管理挑战中进行了实验,并证明了我们的方法可以 достичь高样本效率,并在击败敌军方面表现出色,相比其他基准。

Linking Symptom Inventories using Semantic Textual Similarity

  • paper_url: http://arxiv.org/abs/2309.04607
  • repo_url: https://github.com/shashankv98/symptom-inventories
  • paper_authors: Eamonn Kennedy, Shashank Vadlamani, Hannah M Lindsey, Kelly S Peterson, Kristen Dams OConnor, Kenton Murray, Ronak Agarwal, Houshang H Amiri, Raeda K Andersen, Talin Babikian, David A Baron, Erin D Bigler, Karen Caeyenberghs, Lisa Delano-Wood, Seth G Disner, Ekaterina Dobryakova, Blessen C Eapen, Rachel M Edelstein, Carrie Esopenko, Helen M Genova, Elbert Geuze, Naomi J Goodrich-Hunsaker, Jordan Grafman, Asta K Haberg, Cooper B Hodges, Kristen R Hoskinson, Elizabeth S Hovenden, Andrei Irimia, Neda Jahanshad, Ruchira M Jha, Finian Keleher, Kimbra Kenney, Inga K Koerte, Spencer W Liebel, Abigail Livny, Marianne Lovstad, Sarah L Martindale, Jeffrey E Max, Andrew R Mayer, Timothy B Meier, Deleene S Menefee, Abdalla Z Mohamed, Stefania Mondello, Martin M Monti, Rajendra A Morey, Virginia Newcombe, Mary R Newsome, Alexander Olsen, Nicholas J Pastorek, Mary Jo Pugh, Adeel Razi, Jacob E Resch, Jared A Rowland, Kelly Russell, Nicholas P Ryan, Randall S Scheibel, Adam T Schmidt, Gershon Spitz, Jaclyn A Stephens, Assaf Tal, Leah D Talbert, Maria Carmela Tartaglia, Brian A Taylor, Sophia I Thomopoulos, Maya Troyanskaya, Eve M Valera, Harm Jan van der Horn, John D Van Horn, Ragini Verma, Benjamin SC Wade, Willian SC Walker, Ashley L Ware, J Kent Werner Jr, Keith Owen Yeates, Ross D Zafonte, Michael M Zeineh, Brandon Zielinski, Paul M Thompson, Frank G Hillary, David F Tate, Elisabeth A Wilde, Emily L Dennis
  • for: 这研究旨在使用人工智能(AI)方法,通过语义文本相似性(STS)来连接不同设置和研究中的症状和分数。
  • methods: 这研究使用四种预训练的STS模型,测试这些模型能够对多达6,607名参与者和16个国际数据源中的症状描述对应的分数进行预测。
  • results: STS方法实现了74.8%的准确率,在五个任务中表现出色,超过其他测试模型。这项研究表明,将语义信息纳入专家决策过程可以提高总体和疾病特定评估的效果。
    Abstract An extensive library of symptom inventories has been developed over time to measure clinical symptoms, but this variety has led to several long standing issues. Most notably, results drawn from different settings and studies are not comparable, which limits reproducibility. Here, we present an artificial intelligence (AI) approach using semantic textual similarity (STS) to link symptoms and scores across previously incongruous symptom inventories. We tested the ability of four pre-trained STS models to screen thousands of symptom description pairs for related content - a challenging task typically requiring expert panels. Models were tasked to predict symptom severity across four different inventories for 6,607 participants drawn from 16 international data sources. The STS approach achieved 74.8% accuracy across five tasks, outperforming other models tested. This work suggests that incorporating contextual, semantic information can assist expert decision-making processes, yielding gains for both general and disease-specific clinical assessment.
    摘要 有很多 symptom 库已经在不同的时间和场景中开发出来,但这种多样性带来了一些长期的问题。最主要的问题是不同的设置和研究中的结果无法比较,这限制了重producibility。在这里,我们使用人工智能(AI)方法,使用semantic textual similarity(STS)将症状和分数连接起来,以解决这些不同的症状库之间的不一致性。我们测试了四种预训练的 STS 模型,将 тысячи个症状描述对比的任务进行了检测 - 这是一项传统上需要专家团队完成的复杂任务。模型在四个不同的库中预测了6,607名参与者从16个国际数据源中的症状严重程度,STS 方法实现了74.8%的准确率,超过了其他测试的模型。这种工作表明, incorporating contextual, semantic information可以帮助专家决策过程,带来疾病特定和通用的临床评估的改进。

EGOFALLS: A visual-audio dataset and benchmark for fall detection using egocentric cameras

  • paper_url: http://arxiv.org/abs/2309.04579
  • repo_url: https://github.com/Xueyi-Wang/EGOFALLS
  • paper_authors: Xueyi Wang
  • for: 预防和mitigate falls的 Tool
  • methods: 使用多modal descriptor从 egocentric camera captured video中提取特征,并在late decision fusion层上建立。
  • results: 结果表明,通过 audio和视觉信息的混合,通过late decision fusion层,可以提高探测性能,这对于护理老年人有很好的应用。
    Abstract Falls are significant and often fatal for vulnerable populations such as the elderly. Previous works have addressed the detection of falls by relying on data capture by a single sensor, images or accelerometers. In this work, we rely on multimodal descriptors extracted from videos captured by egocentric cameras. Our proposed method includes a late decision fusion layer that builds on top of the extracted descriptors. Furthermore, we collect a new dataset on which we assess our proposed approach. We believe this is the first public dataset of its kind. The dataset comprises 10,948 video samples by 14 subjects. We conducted ablation experiments to assess the performance of individual feature extractors, fusion of visual information, and fusion of both visual and audio information. Moreover, we experimented with internal and external cross-validation. Our results demonstrate that the fusion of audio and visual information through late decision fusion improves detection performance, making it a promising tool for fall prevention and mitigation.
    摘要 跌倒是脆弱群体,如老年人,可致生命危险。先前的研究通过单个传感器、图像或加速计获取数据进行跌倒检测。在这种工作中,我们基于视频捕获的 egocentric 摄像头提取多模态描述符。我们的提议方法包括在描述符之间堆叠晚期决策层。此外,我们收集了一个新的数据集,用于评估我们的提议方法。这是首个公共数据集。数据集包含 10,948 个视频样本,来自 14 个主题。我们进行了减少实验来评估特定的特征提取器、视觉信息的融合和双重视觉和声音信息的融合的性能。此外,我们还进行了内部和外部验证。我们的结果表明,通过晚期决策融合视觉和声音信息可以提高跌倒检测性能,这是跌倒预防和 Mitigation 的有望工具。

Unleashing the Power of Graph Learning through LLM-based Autonomous Agents

  • paper_url: http://arxiv.org/abs/2309.04565
  • repo_url: None
  • paper_authors: Lanning Wei, Zhiqiang He, Huan Zhao, Quanming Yao
    for:* This paper aims to simplify the learning process on diverse real-world graphs by using Large Language Models (LLMs) as autonomous agents.methods:* The proposed method, called Auto$^2$Graph, uses LLMs to decompose the complex graph learning task into three components: detecting the learning intent, configuring solutions based on AutoGraph, and generating a response.* The AutoGraph agents manage crucial procedures in automated graph learning, including data-processing, AutoML configuration, searching architectures, and hyper-parameter fine-tuning.results:* The proposed method demonstrates comparable performance on different datasets and learning tasks, and the human-like decisions made by the agents.
    Abstract Graph structured data are widely existed and applied in the real-world applications, while it is a challenge to handling these diverse data and learning tasks on graph in an efficient manner. When facing the complicated graph learning tasks, experts have designed diverse Graph Neural Networks (GNNs) in recent years. They have also implemented AutoML in Graph, also known as AutoGraph, to automatically generate data-specific solutions. Despite their success, they encounter limitations in (1) managing diverse learning tasks at various levels, (2) dealing with different procedures in graph learning beyond architecture design, and (3) the huge requirements on the prior knowledge when using AutoGraph. In this paper, we propose to use Large Language Models (LLMs) as autonomous agents to simplify the learning process on diverse real-world graphs. Specifically, in response to a user request which may contain varying data and learning targets at the node, edge, or graph levels, the complex graph learning task is decomposed into three components following the agent planning, namely, detecting the learning intent, configuring solutions based on AutoGraph, and generating a response. The AutoGraph agents manage crucial procedures in automated graph learning, including data-processing, AutoML configuration, searching architectures, and hyper-parameter fine-tuning. With these agents, those components are processed by decomposing and completing step by step, thereby generating a solution for the given data automatically, regardless of the learning task on node or graph. The proposed method is dubbed Auto$^2$Graph, and the comparable performance on different datasets and learning tasks. Its effectiveness is demonstrated by its comparable performance on different datasets and learning tasks, as well as the human-like decisions made by the agents.
    摘要 Graph结构数据广泛存在并应用于实际应用场景,但是处理这些多样化数据和学习任务是一个挑战。随着复杂Graph学习任务的出现,专家们在过去几年内设计了多种Graph Neural Networks(GNNs)。他们还实现了AutoML在Graph上,也称为AutoGraph,以自动生成数据特定的解决方案。尽管它们取得了成功,但它们还面临着(1)在不同级别上处理多种学习任务的管理问题,(2)在Graph学习任务之外的不同过程的处理,以及(3)使用AutoGraph时对特定知识的巨大要求。在这篇论文中,我们提议使用大语言模型(LLMs)作为自主代理人,使得Graph学习过程更加简单。具体来说,在用户请求中可能包含不同数据和学习目标的情况下,我们将复杂的Graph学习任务分解为三个组件,即检测学习意图、基于AutoGraph配置解决方案以及生成响应。AutoGraph代理人处理关键的自动Graph学习过程,包括数据处理、AutoML配置、搜索架构和超参数精度调整。通过这些代理人,这些组件可以一步步完成,从而自动生成对给定数据的解决方案,不管学习任务是节点级或图级。我们称这种方法为Auto$^2$Graph,其效果得到证明,包括在不同数据集和学习任务上实现相同或更好的性能,以及代理人做出的人类化决策。

Connecting NTK and NNGP: A Unified Theoretical Framework for Neural Network Learning Dynamics in the Kernel Regime

  • paper_url: http://arxiv.org/abs/2309.04522
  • repo_url: None
  • paper_authors: Yehonatan Avidan, Qianyi Li, Haim Sompolinsky
  • for: 这 paper 旨在解释深度神经网络在无穷宽度限制下学习过程的完整理论框架。
  • methods: 这 paper 使用 Markov 距离学习模型和时间依赖神经动力学kernel(NDK)来结合NTK和NNGP两种不同的理论框架。
  • results: 这 paper 得出了两个不同的学习阶段:梯度导航阶段和扩散学习阶段,并通过synthetic和benchmark数据集的numerical evaluations来提供新的理解深度神经网络学习过程的新视角。
    Abstract Artificial neural networks have revolutionized machine learning in recent years, but a complete theoretical framework for their learning process is still lacking. Substantial progress has been made for infinitely wide networks. In this regime, two disparate theoretical frameworks have been used, in which the network's output is described using kernels: one framework is based on the Neural Tangent Kernel (NTK) which assumes linearized gradient descent dynamics, while the Neural Network Gaussian Process (NNGP) kernel assumes a Bayesian framework. However, the relation between these two frameworks has remained elusive. This work unifies these two distinct theories using a Markov proximal learning model for learning dynamics in an ensemble of randomly initialized infinitely wide deep networks. We derive an exact analytical expression for the network input-output function during and after learning, and introduce a new time-dependent Neural Dynamical Kernel (NDK) from which both NTK and NNGP kernels can be derived. We identify two learning phases characterized by different time scales: gradient-driven and diffusive learning. In the initial gradient-driven learning phase, the dynamics is dominated by deterministic gradient descent, and is described by the NTK theory. This phase is followed by the diffusive learning stage, during which the network parameters sample the solution space, ultimately approaching the equilibrium distribution corresponding to NNGP. Combined with numerical evaluations on synthetic and benchmark datasets, we provide novel insights into the different roles of initialization, regularization, and network depth, as well as phenomena such as early stopping and representational drift. This work closes the gap between the NTK and NNGP theories, providing a comprehensive framework for understanding the learning process of deep neural networks in the infinite width limit.
    摘要 人工神经网络在最近几年内 revolutionized机器学习,但完整的理论框架仍然缺失。在无穷宽网络 regime 中,有两种不同的理论框架用于描述网络的输出:一个基于 Neural Tangent Kernel (NTK) 的框架,它假设 linearized gradient descent 动力学,而另一个基于 Neural Network Gaussian Process (NNGP) 框架,它假设 Bayesian 框架。然而,这两个框架之间的关系仍然不明确。这项工作将这两个不同的理论联系起来,使用一种 Markov proximal learning 模型来描述学习过程中的动力学。我们得到了一个精确的分析表达,描述了网络输入-输出函数在学习和学习后的行为,并引入了一种新的时间依赖的 Neural Dynamical Kernel (NDK),从而可以 derivate NTK 和 NNGP 两个框架。我们分 distinguished two stages of learning characterized by different time scales:deterministic gradient descent 驱动的早期学习阶段,以及在这个阶段之后的杂散学习阶段。在初期的梯度驱动学习阶段,动力学由 deterministic gradient descent 控制,可以通过 NTK 理论来描述。这个阶段被后来的杂散学习阶段所follow,在这个阶段中,网络参数在解决空间中享受漂泊,最终 approaching the equilibrium distribution corresponding to NNGP。通过对 sintetic 和 benchmark 数据进行数值评估,我们提供了新的理解,关于初始化、正则化和网络深度的不同角色,以及phenomena such as early stopping 和 representational drift。这项工作 closure 了 NTK 和 NNGP 两个理论之间的 gap,提供了深度学习过程中无穷宽网络的全面框架。

On the Actionability of Outcome Prediction

  • paper_url: http://arxiv.org/abs/2309.04470
  • repo_url: https://github.com/andrewmogbolu2/blockchain-technology
  • paper_authors: Lydia T. Liu, Solon Barocas, Jon Kleinberg, Karen Levy
  • for: 这篇论文探讨了在社会影响领域中预测未来结果的应用,包括教育和医疗等领域。
  • methods: 论文使用了一个简单的模型,包括行动、隐藏状态和测量。
  • results: 论文发现,准确预测结果并不总是最有效的策略,即使结合其他测量。 except in cases where there is a single decisive action for improving the outcome, outcome prediction never maximizes “action value”. 在大多数情况下,测量行动可能性和隐藏状态可以大幅提高行动价值。
    Abstract Predicting future outcomes is a prevalent application of machine learning in social impact domains. Examples range from predicting student success in education to predicting disease risk in healthcare. Practitioners recognize that the ultimate goal is not just to predict but to act effectively. Increasing evidence suggests that relying on outcome predictions for downstream interventions may not have desired results. In most domains there exists a multitude of possible interventions for each individual, making the challenge of taking effective action more acute. Even when causal mechanisms connecting the individual's latent states to outcomes is well understood, in any given instance (a specific student or patient), practitioners still need to infer -- from budgeted measurements of latent states -- which of many possible interventions will be most effective for this individual. With this in mind, we ask: when are accurate predictors of outcomes helpful for identifying the most suitable intervention? Through a simple model encompassing actions, latent states, and measurements, we demonstrate that pure outcome prediction rarely results in the most effective policy for taking actions, even when combined with other measurements. We find that except in cases where there is a single decisive action for improving the outcome, outcome prediction never maximizes "action value", the utility of taking actions. Making measurements of actionable latent states, where specific actions lead to desired outcomes, considerably enhances the action value compared to outcome prediction, and the degree of improvement depends on action costs and the outcome model. This analysis emphasizes the need to go beyond generic outcome prediction in interventional settings by incorporating knowledge of plausible actions and latent states.
    摘要 预测未来结果是社会影响领域中广泛应用的机器学习技术。例如,预测教育中学生成功和医疗领域疾病风险等。专业人员认为,最终目标不仅是预测,还是实际行动。然而,有增加证据表明,仅仅基于结果预测的下游 intervención可能无法实现愿望的结果。在大多数领域中,每个个体都有多种可能的 intervención,使得选择有效行动变得更加困难。即使理解个体的 latent states 和结果之间的 causal 机制,在特定学生或病人身上,专业人员仍需从预算的 latent states 中推断哪一些 intervención 最有效。基于这点,我们问:精准的结果预测有什么帮助于确定最佳 intervención?通过一个简单的模型,包括行动、 latent states 和测量,我们表明了纯粹的结果预测在大多数情况下无法实现最有效的政策,即使与其他测量结合使用。我们发现,除非结果中存在单一的决定性行动,否则结果预测不能增加“行动价值”,即对行动的负担和结果模型。通过测量行动可触发结果的 latent states,可以显著提高行动价值,并且提高的程度取决于行动成本和结果模型。这一分析强调了在干预设定中超越普通的结果预测,通过包含可能的行动和 latent states 的知识来实现更高效的行动。

tSPM+; a high-performance algorithm for mining transitive sequential patterns from clinical data

  • paper_url: http://arxiv.org/abs/2309.05671
  • repo_url: None
  • paper_authors: Jonas Hügel, Ulrich Sax, Shawn N. Murphy, Hossein Estiri
  • for: 本研究旨在开发高性能的时间序列模式挖掘算法(tSPM+),以便更好地挖掘大规模医疗数据集中的时间序列模式,并通过 Machine Learning 工作流程进行挖掘。
  • methods: 本研究使用的方法包括时间序列模式挖掘算法(tSPM)和高性能实现方法(tSPM+),以及 Docker 容器和 R 包套件,以便易于与现有的 Machine Learning 工作流程集成。
  • results: 本研究表明,使用 tSPM+ 算法可以提高速度到因子 980,并降低内存占用量达 48 倍。此外,研究还使用了 WHO 定义的 Post COVID-19 病人和其症状,并通过时间序列模式挖掘来识别这些病人。
    Abstract The increasing availability of large clinical datasets collected from patients can enable new avenues for computational characterization of complex diseases using different analytic algorithms. One of the promising new methods for extracting knowledge from large clinical datasets involves temporal pattern mining integrated with machine learning workflows. However, mining these temporal patterns is a computational intensive task and has memory repercussions. Current algorithms, such as the temporal sequence pattern mining (tSPM) algorithm, are already providing promising outcomes, but still leave room for optimization. In this paper, we present the tSPM+ algorithm, a high-performance implementation of the tSPM algorithm, which adds a new dimension by adding the duration to the temporal patterns. We show that the tSPM+ algorithm provides a speed up to factor 980 and a up to 48 fold improvement in memory consumption. Moreover, we present a docker container with an R-package, We also provide vignettes for an easy integration into already existing machine learning workflows and use the mined temporal sequences to identify Post COVID-19 patients and their symptoms according to the WHO definition.
    摘要 “随着巨量临床数据的可用性增加,可以开启新的可 Computational 描述复杂疾病的可能性。一种可能的新方法是在机器学习工作流程中进行时间模式挖掘,但是挖掘这些时间模式是一个 computationally 沉重的任务,需要大量的计算资源和内存。现有的算法,如时间序列模式挖掘(tSPM)算法,已经提供了有希望的结果,但还有很多的余地来进行优化。在本文中,我们提出了 tSPM+ 算法,它是 tSPM 算法的高性能实现,通过添加时间持续时间到时间模式,提供了速度因子 980 和内存使用量增加到 48 倍。此外,我们提供了一个 Docker 容器和 R 套件,并提供了绿色的范例,以便与现有的机器学习工作流程整合,并使用挖掘的时间序列来根据 WHO 定义识别 Post COVID-19 病人和其症状。”

Subwords as Skills: Tokenization for Sparse-Reward Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2309.04459
  • repo_url: None
  • paper_authors: David Yunis, Justin Jung, Falcon Dai, Matthew Walter
  • for: 实现短时间内获得奖励的环境探索是困难的,特别是在连续动作空间中,因为需要进行长时间的协调动作序列以获得任何奖励。
  • methods: 我们提出了一种新的方法,它包括将互动资料集中的交互资料转换为短时间内的动作集,然后使用这个新的动作空间来优化策略。这个方法比基eline要好,因为它不需要从动作空间中挑选整个范围。
  • results: 我们的方法在一些困难的短时间内获得奖励的环境中比基eline更好,并且需要很少的计算量进行技能生成和在线探索。
    Abstract Exploration in sparse-reward reinforcement learning is difficult due to the requirement of long, coordinated sequences of actions in order to achieve any reward. Moreover, in continuous action spaces there are an infinite number of possible actions, which only increases the difficulty of exploration. One class of methods designed to address these issues forms temporally extended actions, often called skills, from interaction data collected in the same domain, and optimizes a policy on top of this new action space. Typically such methods require a lengthy pretraining phase, especially in continuous action spaces, in order to form the skills before reinforcement learning can begin. Given prior evidence that the full range of the continuous action space is not required in such tasks, we propose a novel approach to skill-generation with two components. First we discretize the action space through clustering, and second we leverage a tokenization technique borrowed from natural language processing to generate temporally extended actions. Such a method outperforms baselines for skill-generation in several challenging sparse-reward domains, and requires orders-of-magnitude less computation in skill-generation and online rollouts.
    摘要
  1. Discretize the action space through clustering.2. Leverage a tokenization technique from natural language processing to generate temporally extended actions.Our approach outperforms baselines for skill-generation in several challenging sparse-reward domains and requires significantly less computation in skill-generation and online rollouts.Simplified Chinese:探索 sparse-reward reinforcement learning 是困难的,因为需要长时间协调的动作序列来获得任何奖励。此外,连续动作空间中有无限多个可能的动作,这只会使探索变得更加困难。一些方法使用时间扩展的动作,通常称为技能,从互动数据中的同一个领域中收集,然后优化一个策略。然而,这些方法通常需要很长的预训练阶段,特别是在连续动作空间中。基于证据表明,完整的连续动作空间不是必需的,我们提出了一种新的方法 для技能生成,具有两个组成部分:1. 使用 clustering 将动作空间细分。2. 从自然语言处理中借鉴的tokenization技术来生成时间扩展的动作。我们的方法在一些具有挑战性的 sparse-reward 领域中出perform baseline,并且需要orders-of-magnitude less computation在技能生成和在线执行。

Physics-Informed Neural Networks for an optimal counterdiabatic quantum computation

  • paper_url: http://arxiv.org/abs/2309.04434
  • repo_url: None
  • paper_authors: Antonio Ferrer-Sánchez, Carlos Flores-Garrigos, Carlos Hernani-Morales, José J. Orquín-Marqués, Narendra N. Hegade, Alejandro Gomez Cadavid, Iraitz Montalban, Enrique Solano, Yolanda Vives-Gilabert, José D. Martín-Guerrero
  • for: 用于解决量子电路中系统的 counterdiabatic(CD)协议,即利用物理学发现的启发式神经网络(PINNs)来准确地解决量子系统中不同物理 observable 的时间演化问题。
  • methods: 借鉴物理学的思想,将必要的物理信息嵌入到一个底层神经网络中,以有效地解决问题。在特定的情况下,我们采用 hermiticity condition 来保证获得最佳的 counterdiabatic 项,并使用了原理力动的最小化方法来实现。
  • results: 提出了一种可靠的方法来解决 CD 驱动问题,不受过去基于类型数字approximation的方法所受的限制。方法可以获得物理 observable 中的优化结果,包括时间参数函数、 gauge potential 或运算符、系统能量水平的时间演化等。在实践中,我们应用了这种方法于 $\mathrm{H_{2}$ 和 $\mathrm{LiH}$ 分子,使用 STO-3G 基准表示。结果表明可以成功地解决非adiabatic 项的归一化问题,并且这种方法在量子计算算法中具有重要的实际应用优势。
    Abstract We introduce a novel methodology that leverages the strength of Physics-Informed Neural Networks (PINNs) to address the counterdiabatic (CD) protocol in the optimization of quantum circuits comprised of systems with $N_{Q}$ qubits. The primary objective is to utilize physics-inspired deep learning techniques to accurately solve the time evolution of the different physical observables within the quantum system. To accomplish this objective, we embed the necessary physical information into an underlying neural network to effectively tackle the problem. In particular, we impose the hermiticity condition on all physical observables and make use of the principle of least action, guaranteeing the acquisition of the most appropriate counterdiabatic terms based on the underlying physics. The proposed approach offers a dependable alternative to address the CD driving problem, free from the constraints typically encountered in previous methodologies relying on classical numerical approximations. Our method provides a general framework to obtain optimal results from the physical observables relevant to the problem, including the external parameterization in time known as scheduling function, the gauge potential or operator involving the non-adiabatic terms, as well as the temporal evolution of the energy levels of the system, among others. The main applications of this methodology have been the $\mathrm{H_{2}$ and $\mathrm{LiH}$ molecules, represented by a 2-qubit and 4-qubit systems employing the STO-3G basis. The presented results demonstrate the successful derivation of a desirable decomposition for the non-adiabatic terms, achieved through a linear combination utilizing Pauli operators. This attribute confers significant advantages to its practical implementation within quantum computing algorithms.
    摘要 我们提出了一种新的方法,利用物理学 informed neural network(PINNs)来解决量子环境中的 counterdiabatic(CD)协议。我们的主要目标是利用物理类似深度学习技术来精确地解决量子系统中不同物理观测器的时间演化。为了完成这个目标,我们将必要的物理信息嵌入到一个基础的神经网络中,以有效地处理问题。特别是,我们将 hermiticity 条件套用到所有物理观测器上,并使用最小作用原理,以确保获得最适当的 counterdiabatic 项目,基于背景的物理。我们的方法提供了一个可靠的替代方案,用于解决 CD 驱动问题,不受前一代方法所受的传统约束。我们的方法可以实现从物理观测器中获得最佳结果,包括时间进行演化的外部化函数、 gauge 潜在或操作内含非adiabatic 项目,以及量子系统中能阶的时间演化。主要应用包括 $\rm H_2$ 和 $\rm LiH$ 分子,表示了一个 2-qubit 和 4-qubit 系统,使用 STO-3G 基底。给出的结果显示了成功地从非adiabatic 项目中获得了欲要的分解,通过一个基于 Pauli 算子的线性 комbination。这个特性具有实用实现量子 Computing 算法中的实际优势。

Variations and Relaxations of Normalizing Flows

  • paper_url: http://arxiv.org/abs/2309.04433
  • repo_url: None
  • paper_authors: Keegan Kelly, Lorena Piedras, Sukrit Rao, David Roth
  • for: 本研究旨在探讨 Normalizing Flows (NFs) 模型的扩展和改进,以提高其表达能力和采样效率。
  • methods: 本研究使用了一系列的改进和扩展方法,包括将 Normalizing Flows 与其他生成模型相结合,以释放其表达能力和采样速度。
  • results: 研究人员通过实验和数据分析表明,这些改进和扩展方法可以增强 Normalizing Flows 的表达能力和采样效率,同时保持其likelihood tractability和数据可靠性。
    Abstract Normalizing Flows (NFs) describe a class of models that express a complex target distribution as the composition of a series of bijective transformations over a simpler base distribution. By limiting the space of candidate transformations to diffeomorphisms, NFs enjoy efficient, exact sampling and density evaluation, enabling NFs to flexibly behave as both discriminative and generative models. Their restriction to diffeomorphisms, however, enforces that input, output and all intermediary spaces share the same dimension, limiting their ability to effectively represent target distributions with complex topologies. Additionally, in cases where the prior and target distributions are not homeomorphic, Normalizing Flows can leak mass outside of the support of the target. This survey covers a selection of recent works that combine aspects of other generative model classes, such as VAEs and score-based diffusion, and in doing so loosen the strict bijectivity constraints of NFs to achieve a balance of expressivity, training speed, sample efficiency and likelihood tractability.
    摘要 对象分布的描述:Normalizing Flows(NFs)是一类模型,它将复杂的目标分布表示为一系列基于简单的基分布的比例变换的组合。通过限制变换空间为抽象函数,NFs 可以实现高效、准确的采样和总体评估,使其能够作为描述性和生成模型。然而,NFs 的假设均为抽象函数限制了输入、输出和所有中间空间的维度相同,这限制了它们对复杂分布的表示能力。此外,当先验分布和目标分布不同映射时,Normalizing Flows 可能会导致流出先验分布的质量。这篇评论汇聚了一些最近的工作,它们将其他生成模型类型,如 VAEs 和 score-based diffusion 的特点与 Normalizing Flows 结合,以适应不同的应用场景。这些工作通过放宽 Normalizing Flows 的假设,以达到表达能力、训练速度、采样效率和可评估性的平衡。

Create Your World: Lifelong Text-to-Image Diffusion

  • paper_url: http://arxiv.org/abs/2309.04430
  • repo_url: None
  • paper_authors: Gan Sun, Wenqi Liang, Jiahua Dong, Jun Li, Zhengming Ding, Yang Cong
  • for: 创造用户自己的概念世界,即通过文本提示生成用户自己的概念图像。
  • methods: 提出了一种具有知识紧急忘记和semantic紧急忽略的文本到图像扩散模型(L2DM),通过任务意识增强模块和灵活概念融合模块来解决知识紧急忘记问题,并通过概念注意艺术家模块和正交注意模块来解决semantic紧急忽略问题。
  • results: 在比较 related state-of-the-art 模型时,our model可以在不同的 continual text prompts 下生成更 faithful 的图像,both in terms of qualitative and quantitative metrics。
    Abstract Text-to-image generative models can produce diverse high-quality images of concepts with a text prompt, which have demonstrated excellent ability in image generation, image translation, etc. We in this work study the problem of synthesizing instantiations of a use's own concepts in a never-ending manner, i.e., create your world, where the new concepts from user are quickly learned with a few examples. To achieve this goal, we propose a Lifelong text-to-image Diffusion Model (L2DM), which intends to overcome knowledge "catastrophic forgetting" for the past encountered concepts, and semantic "catastrophic neglecting" for one or more concepts in the text prompt. In respect of knowledge "catastrophic forgetting", our L2DM framework devises a task-aware memory enhancement module and a elastic-concept distillation module, which could respectively safeguard the knowledge of both prior concepts and each past personalized concept. When generating images with a user text prompt, the solution to semantic "catastrophic neglecting" is that a concept attention artist module can alleviate the semantic neglecting from concept aspect, and an orthogonal attention module can reduce the semantic binding from attribute aspect. To the end, our model can generate more faithful image across a range of continual text prompts in terms of both qualitative and quantitative metrics, when comparing with the related state-of-the-art models. The code will be released at https://wenqiliang.github.io/.
    摘要 文本到图生成模型可以生成多种高质量图像,用文本提示来描述概念,并达到图像生成、图像翻译等领域的出色表现。在这项工作中,我们研究如何通过不断创造用户自己的概念实例,即“创造你的世界”,使用户的新概念快速学习。为了实现这个目标,我们提出了一种生命long text-to-image扩散模型(L2DM),用于解决知识“悖论”和 semantic “悖论”问题。在知识“悖论”方面,L2DM框架启用任务意识增强模块和灵活概念精炼模块,可以分别保护以前所遇到的概念知识和每个个人化的概念知识。在生成用户文本提示图像时,对 semantic “悖论”问题的解决方式是通过概念注意力艺术模块和 ortogonal注意力模块来减少概念忽视和属性绑定。因此,我们的模型可以在不同的 continual text prompts 下生成更 faithful 的图像,并且在质量和量度上都达到了相对的提升。代码将在 上发布。

  • paper_url: http://arxiv.org/abs/2309.04426
  • repo_url: None
  • paper_authors: Lyuyang Sima, Joseph Bucukovski, Erwan Carlson, Nicole L. Yien
  • for: 本研究旨在为新手研究领域的同仁提供一种系统性的学习概念和研究方向,探讨了脑灵模型的优缺点和适用性,以及脑灵网络算法和无监督学习算法的概念和研究进展。
  • methods: 本研究通过对五种脑灵模型的优缺点和适用性进行梳理,分析了五种网络拓扑的特点,并概述了基于synaptic plasticity规则的无监督学习算法和四种监督学习算法的研究进展。
  • results: 本研究对脑灵网络算法的概念和研究进展进行了报告和分析,并对国内外的脑灵neuromorphic芯片的研究进行了评论和分析。
    Abstract In the rapid evolution of next-generation brain-inspired artificial intelligence and increasingly sophisticated electromagnetic environment, the most bionic characteristics and anti-interference performance of spiking neural networks show great potential in terms of computational speed, real-time information processing, and spatio-temporal information processing. Data processing. Spiking neural network is one of the cores of brain-like artificial intelligence, which realizes brain-like computing by simulating the structure and information transfer mode of biological neural networks. This paper summarizes the strengths, weaknesses and applicability of five neuronal models and analyzes the characteristics of five network topologies; then reviews the spiking neural network algorithms and summarizes the unsupervised learning algorithms based on synaptic plasticity rules and four types of supervised learning algorithms from the perspectives of unsupervised learning and supervised learning; finally focuses on the review of brain-like neuromorphic chips under research at home and abroad. This paper is intended to provide learning concepts and research orientations for the peers who are new to the research field of spiking neural networks through systematic summaries.
    摘要 在Next-generation brain-inspired artificial intelligence的快速演化和日益复杂的电磁环境中,具有最高生物化特征和抗干扰性能的脉冲神经网络表现出了很大的潜力,包括计算速度、实时信息处理和空间时间信息处理。数据处理。脉冲神经网络是人工智能中的核心之一,通过模拟生物神经网络的结构和信息传递方式来实现脑内样式计算。本文对五种神经元模型的优缺点和适用场景进行总结,然后分析了五种网络拓扑的特点,最后评论了国内外的脑系模块 chip 的研究进展。本文的目的是为研究领域新手提供系统性的学习概念和研究方向,以帮助他们更好地了解脉冲神经网络的研究领域。

SynthoGestures: A Novel Framework for Synthetic Dynamic Hand Gesture Generation for Driving Scenarios

  • paper_url: http://arxiv.org/abs/2309.04421
  • repo_url: https://github.com/amrgomaaelhady/synthogestures
  • paper_authors: Amr Gomaa, Robin Zitt, Guillermo Reyes, Antonio Krüger
  • for: 这篇论文旨在提供一种用于生成人工智能人机界面的自动驾驶领域的手势数据集的创新性方法。
  • methods: 该方法使用虚拟3D模型生成手势数据集,并提供了自定义选项和降低欠拟合风险。它还模拟了不同的摄像头位置和类型,包括RGB、红外和深度摄像头。
  • results: 实验结果表明,该方法可以提高手势识别精度,并可以取代或补充真实手势数据集。这种方法可以节省时间和劳动力来创建数据集,因此加速了人机界面的开发。
    Abstract Creating a diverse and comprehensive dataset of hand gestures for dynamic human-machine interfaces in the automotive domain can be challenging and time-consuming. To overcome this challenge, we propose using synthetic gesture datasets generated by virtual 3D models. Our framework utilizes Unreal Engine to synthesize realistic hand gestures, offering customization options and reducing the risk of overfitting. Multiple variants, including gesture speed, performance, and hand shape, are generated to improve generalizability. In addition, we simulate different camera locations and types, such as RGB, infrared, and depth cameras, without incurring additional time and cost to obtain these cameras. Experimental results demonstrate that our proposed framework, SynthoGestures\footnote{\url{https://github.com/amrgomaaelhady/SynthoGestures}, improves gesture recognition accuracy and can replace or augment real-hand datasets. By saving time and effort in the creation of the data set, our tool accelerates the development of gesture recognition systems for automotive applications.
    摘要 创建一个多样化和完整的手势数据集 для动态人机界面在汽车领域可能是困难和耗时的。为了解决这个挑战,我们提议使用虚拟3D模型生成的 sintetic手势数据集。我们的框架使用Unreal Engine生成真实的手势,提供自定义选项,降低遮挡风险。多种变体,包括手势速度、性能和手形,被生成以提高泛化性。此外,我们模拟了不同的摄像头位置和类型,如RGB、红外和深度摄像头,而无需额外的时间和成本来获得这些摄像头。实验结果表明,我们提议的框架SynthoGestures\footnote{\url{https://github.com/amrgomaaelhady/SynthoGestures},可以提高手势识别精度,并可以取代或补充真正的手势数据集。通过节省时间和努力来创建数据集,我们的工具加速了汽车应用程序的手势识别系统的开发。

Privacy Preserving Federated Learning with Convolutional Variational Bottlenecks

  • paper_url: http://arxiv.org/abs/2309.04515
  • repo_url: None
  • paper_authors: Daniel Scheliga, Patrick Mäder, Marco Seeland
  • for: 防止梯度泄露攻击,保护训练数据隐私。
  • methods: 使用Variational Modeling实现隐私保护,并对梯度泄露攻击进行分析。
  • results: 提出了一种新的隐私模块—卷积秘密瓶颈(CVB),可以在潜在攻击的情况下保持隐私。对三种模型和六个图像分类 datasets进行了广泛的实验研究,并证明了CVB的有效性。
    Abstract Gradient inversion attacks are an ubiquitous threat in federated learning as they exploit gradient leakage to reconstruct supposedly private training data. Recent work has proposed to prevent gradient leakage without loss of model utility by incorporating a PRivacy EnhanCing mODulE (PRECODE) based on variational modeling. Without further analysis, it was shown that PRECODE successfully protects against gradient inversion attacks. In this paper, we make multiple contributions. First, we investigate the effect of PRECODE on gradient inversion attacks to reveal its underlying working principle. We show that variational modeling introduces stochasticity into the gradients of PRECODE and the subsequent layers in a neural network. The stochastic gradients of these layers prevent iterative gradient inversion attacks from converging. Second, we formulate an attack that disables the privacy preserving effect of PRECODE by purposefully omitting stochastic gradients during attack optimization. To preserve the privacy preserving effect of PRECODE, our analysis reveals that variational modeling must be placed early in the network. However, early placement of PRECODE is typically not feasible due to reduced model utility and the exploding number of additional model parameters. Therefore, as a third contribution, we propose a novel privacy module -- the Convolutional Variational Bottleneck (CVB) -- that can be placed early in a neural network without suffering from these drawbacks. We conduct an extensive empirical study on three seminal model architectures and six image classification datasets. We find that all architectures are susceptible to gradient leakage attacks, which can be prevented by our proposed CVB. Compared to PRECODE, we show that our novel privacy module requires fewer trainable parameters, and thus computational and communication costs, to effectively preserve privacy.
    摘要 Gradient倒转攻击是聚合学习中普遍存在的威胁,它们利用梯度泄露来重构被认为是私有的训练数据。 recent work 提出了防止梯度泄露而不失去模型效用的方法,基于隐私提升模型(PRECODE)的variational modeling。 without further analysis, it was shown that PRECODE successfully protects against gradient inversion attacks. In this paper, we make multiple contributions. First, we investigate the effect of PRECODE on gradient inversion attacks to reveal its underlying working principle. We show that variational modeling introduces stochasticity into the gradients of PRECODE and the subsequent layers in a neural network. The stochastic gradients of these layers prevent iterative gradient inversion attacks from converging. Second, we formulate an attack that disables the privacy preserving effect of PRECODE by purposefully omitting stochastic gradients during attack optimization. To preserve the privacy preserving effect of PRECODE, our analysis reveals that variational modeling must be placed early in the network. However, early placement of PRECODE is typically not feasible due to reduced model utility and the exploding number of additional model parameters. Therefore, as a third contribution, we propose a novel privacy module -- the Convolutional Variational Bottleneck (CVB) -- that can be placed early in a neural network without suffering from these drawbacks. We conduct an extensive empirical study on three seminal model architectures and six image classification datasets. We find that all architectures are susceptible to gradient leakage attacks, which can be prevented by our proposed CVB. Compared to PRECODE, we show that our novel privacy module requires fewer trainable parameters, and thus computational and communication costs, to effectively preserve privacy.

Generalization Bounds: Perspectives from Information Theory and PAC-Bayes

  • paper_url: http://arxiv.org/abs/2309.04381
  • repo_url: None
  • paper_authors: Fredrik Hellström, Giuseppe Durisi, Benjamin Guedj, Maxim Raginsky
  • for: 本研究旨在探讨机器学习理论中的泛化问题,尤其是PAC-Bayesian方法的应用和扩展。
  • methods: 本研究使用了信息理论的视角来探讨泛化问题,并与PAC-Bayesian方法的信息论派 connexion。
  • results: 本研究提供了一个统一的对待方法,并证明了许多泛化证明和PAC-Bayesian方法之间的相似性。 特别是,本研究强调了 conditional mutual information(CMI)框架,信息复杂度的分析,以及应用于深度学习等领域。
    Abstract A fundamental question in theoretical machine learning is generalization. Over the past decades, the PAC-Bayesian approach has been established as a flexible framework to address the generalization capabilities of machine learning algorithms, and design new ones. Recently, it has garnered increased interest due to its potential applicability for a variety of learning algorithms, including deep neural networks. In parallel, an information-theoretic view of generalization has developed, wherein the relation between generalization and various information measures has been established. This framework is intimately connected to the PAC-Bayesian approach, and a number of results have been independently discovered in both strands. In this monograph, we highlight this strong connection and present a unified treatment of generalization. We present techniques and results that the two perspectives have in common, and discuss the approaches and interpretations that differ. In particular, we demonstrate how many proofs in the area share a modular structure, through which the underlying ideas can be intuited. We pay special attention to the conditional mutual information (CMI) framework; analytical studies of the information complexity of learning algorithms; and the application of the proposed methods to deep learning. This monograph is intended to provide a comprehensive introduction to information-theoretic generalization bounds and their connection to PAC-Bayes, serving as a foundation from which the most recent developments are accessible. It is aimed broadly towards researchers with an interest in generalization and theoretical machine learning.
    摘要

Beyond Static Datasets: A Deep Interaction Approach to LLM Evaluation

  • paper_url: http://arxiv.org/abs/2309.04369
  • repo_url: None
  • paper_authors: Jiatong Li, Rui Li, Qi Liu
  • for: 评估大语言模型(LLMs)的能力在各种现实世界任务中,以提高LLMs的评估方法。
  • methods: 提出了一种基于深度交互的LLM评估框架,通过 LLMS 在 elaborately 设计的评估任务中的深度交互来评估其在现实世界中的表现。
  • results: 通过了四个 elaborately 设计的评估任务的实验,证明了该方法的效iveness。
    Abstract Large Language Models (LLMs) have made progress in various real-world tasks, which stimulates requirements for the evaluation of LLMs. Existing LLM evaluation methods are mainly supervised signal-based which depends on static datasets and cannot evaluate the ability of LLMs in dynamic real-world scenarios where deep interaction widely exists. Other LLM evaluation methods are human-based which are costly and time-consuming and are incapable of large-scale evaluation of LLMs. To address the issues above, we propose a novel Deep Interaction-based LLM-evaluation framework. In our proposed framework, LLMs' performances in real-world domains can be evaluated from their deep interaction with other LLMs in elaborately designed evaluation tasks. Furthermore, our proposed framework is a general evaluation method that can be applied to a host of real-world tasks such as machine translation and code generation. We demonstrate the effectiveness of our proposed method through extensive experiments on four elaborately designed evaluation tasks.
    摘要 (Simplified Chinese)大型语言模型(LLM)在不同的实际任务中进步,这些进步刺激了 LLM 的评估需求。现有的 LLM 评估方法主要是指导信号基的,它们依赖于静态数据集,无法评估 LLM 在动态实际场景中的能力。其他 LLM 评估方法是人类基的,成本和时间开销高,不能大规模评估 LLM。为解决这些问题,我们提出了一种深度互动基本 LLM 评估框架。在我们的提议中, LLM 的实际领域表现可以通过它们在其他 LLM 之间的深度互动来评估。此外,我们的提议框架是一种通用的评估方法,可以应用于多种实际任务,如机器翻译和代码生成。我们通过了四个 elaborate 的评估任务进行了广泛的实验,以证明我们的提议方法的有效性。

Active Learning for Classifying 2D Grid-Based Level Completability

  • paper_url: http://arxiv.org/abs/2309.04367
  • repo_url: https://github.com/mahsabazzaz/level-completabilty-x-active-learning
  • paper_authors: Mahsa Bazzaz, Seth Cooper
  • for: 本研究旨在使用活动学习方法来评估生成器生成的关卡完成度。
  • methods: 本研究使用了深度学习模型来训练关卡完成度分类器,并通过活动学习方法来选择需要标注的关卡。
  • results: 研究结果显示,使用活动学习方法来标注关卡可以获得更高的分类器性能,而不需要更多的标注数据。
    Abstract Determining the completability of levels generated by procedural generators such as machine learning models can be challenging, as it can involve the use of solver agents that often require a significant amount of time to analyze and solve levels. Active learning is not yet widely adopted in game evaluations, although it has been used successfully in natural language processing, image and speech recognition, and computer vision, where the availability of labeled data is limited or expensive. In this paper, we propose the use of active learning for learning level completability classification. Through an active learning approach, we train deep-learning models to classify the completability of generated levels for Super Mario Bros., Kid Icarus, and a Zelda-like game. We compare active learning for querying levels to label with completability against random queries. Our results show using an active learning approach to label levels results in better classifier performance with the same amount of labeled data.
    摘要 确定生成器生成的水平的可完成性可以是一项具有挑战性的任务,因为它可能需要使用解决者代理,这些代理经常需要较长的时间来分析和解决水平。在游戏评估中,活动学习还没有广泛采用, although it has been successfully applied in自然语言处理、图像和语音识别以及计算机视觉等领域,其中数据的可用性是有限的或昂贵的。在本文中,我们提议使用活动学习来学习生成器生成的水平可完成性分类。通过活动学习的方式,我们使用深度学习模型来类ifying生成器生成的水平的可完成性,并对Super Mario Bros., Kid Icarus和一款 Zelda-like 游戏进行了实验。我们比较了使用活动学习查询水平是否可以完成的 Label 与随机查询的性能。我们的结果表明,使用活动学习方法来标注水平的可完成性可以获得更好的分类器性能,只需要与传统的随机查询相同的数据量。

Systematic Review of Techniques in Brain Image Synthesis using Deep Learning

  • paper_url: http://arxiv.org/abs/2309.04511
  • repo_url: None
  • paper_authors: Shubham Singh, Ammar Ranapurwala, Mrunal Bewoor, Sheetal Patil, Satyam Rai
  • for: 本文探讨医学成像领域的当前状况,尤其是利用深度学习技术进行大脑成像synthesis。
  • methods: 本文详细介绍了不同的方法和技术,包括2D to 3D constructions、MRI synthesis以及使用transformers。
  • results: 本文总结了这些方法的限制和挑战,并探讨未来这个领域的发展前景和深度学习技术在医学成像领域的潜在影响。
    Abstract This review paper delves into the present state of medical imaging, with a specific focus on the use of deep learning techniques for brain image synthesis. The need for medical image synthesis to improve diagnostic accuracy and decrease invasiveness in medical procedures is emphasized, along with the role of deep learning in enabling these advancements. The paper examines various methods and techniques for brain image synthesis, including 2D to 3D constructions, MRI synthesis, and the use of transformers. It also addresses limitations and challenges faced in these methods, such as obtaining well-curated training data and addressing brain ultrasound issues. The review concludes by exploring the future potential of this field and the opportunities for further advancements in medical imaging using deep learning techniques. The significance of transformers and their potential to revolutionize the medical imaging field is highlighted. Additionally, the paper discusses the potential solutions to the shortcomings and limitations faced in this field. The review provides researchers with an updated reference on the present state of the field and aims to inspire further research and bridge the gap between the present state of medical imaging and the future possibilities offered by deep learning techniques.
    摘要

Zero-Shot Robustification of Zero-Shot Models With Foundation Models

  • paper_url: http://arxiv.org/abs/2309.04344
  • repo_url: https://github.com/sprocketlab/roboshot
  • paper_authors: Dyah Adila, Changho Shin, Linrong Cai, Frederic Sala
  • for: 提高预训练模型的Robustness和Zero-shot推理能力
  • methods: 使用零shot语言模型获取任务描述中有用的信息,并使用这些信息来修正预训练模型的嵌入。
  • results: 对九个图像和自然语言处理任务进行评估,与多种零shot基线比较,平均提高15.98%。同时,RoboShotCompatible with多种预训练模型和语言模型。
    Abstract Zero-shot inference is a powerful paradigm that enables the use of large pretrained models for downstream classification tasks without further training. However, these models are vulnerable to inherited biases that can impact their performance. The traditional solution is fine-tuning, but this undermines the key advantage of pretrained models, which is their ability to be used out-of-the-box. We propose RoboShot, a method that improves the robustness of pretrained model embeddings in a fully zero-shot fashion. First, we use zero-shot language models (LMs) to obtain useful insights from task descriptions. These insights are embedded and used to remove harmful and boost useful components in embeddings -- without any supervision. Theoretically, we provide a simple and tractable model for biases in zero-shot embeddings and give a result characterizing under what conditions our approach can boost performance. Empirically, we evaluate RoboShot on nine image and NLP classification tasks and show an average improvement of 15.98% over several zero-shot baselines. Additionally, we demonstrate that RoboShot is compatible with a variety of pretrained and language models.
    摘要 zero-shot推理是一种强大的概念,它允许使用大规模预训练模型来进行下游分类任务,无需进一步训练。然而,这些模型受到遗传的偏见的影响,这可能会影响其性能。传统的解决方案是细化,但这会消除预训练模型的优势,即可以直接使用。我们提出了RoboShot,一种方法,可以在完全零shot的方式下提高预训练模型的坚持性。首先,我们使用零shot语言模型(LM)来获得有用的洞察 FROM task descriptions。这些洞察被嵌入并用于从预训练模型中移除害虫和提高有用的组件,无需任何监督。理论上,我们提供了零shot偏见的简单和可追踪的模型,并给出了在哪些条件下,我们的方法可以提高性能。Empirically,我们在九个图像和NLP分类任务上评估了RoboShot,并显示了15.98%的均值提升,相比于多个零shot基线。此外,我们也证明了RoboShot与多种预训练和语言模型相容。

Online Submodular Maximization via Online Convex Optimization

  • paper_url: http://arxiv.org/abs/2309.04339
  • repo_url: None
  • paper_authors: Tareq Si-Salem, Gözde Özcan, Iasonas Nikolaou, Evimaria Terzi, Stratis Ioannidis
  • for: 研究 monotone submodular maximization under general matroid constraints 的在线设定下的优化问题。
  • methods: 使用 online convex optimization (OCO) 来优化大量的 submodular functions,即Weighted threshold potential functions。
  • results: 可以通过 OCO 策略和合适的轮减方案来实现 sublinear regret 在 combinatorial 设定下。
    Abstract We study monotone submodular maximization under general matroid constraints in the online setting. We prove that online optimization of a large class of submodular functions, namely, weighted threshold potential functions, reduces to online convex optimization (OCO). This is precisely because functions in this class admit a concave relaxation; as a result, OCO policies, coupled with an appropriate rounding scheme, can be used to achieve sublinear regret in the combinatorial setting. We show that our reduction extends to many different versions of the online learning problem, including the dynamic regret, bandit, and optimistic-learning settings.
    摘要 我们研究简单幂函数最大化在通用环境中的在线Setting下。我们证明在线优化一种大类型的简单幂函数,即有重量的阈值 potential functions,可以降为在线凸优化(OCO)。这是因为这种函数允许一种凹降函数的抽象,因此OCO策略,结合适当的舒缓策略,可以实现在 combinatorial 设置下的减少 regret。我们证明我们的减少扩展到许多不同的在线学习问题,包括动态 regret、bandit 和 optimistic-learning 设置。

Graph Neural Networks Use Graphs When They Shouldn’t

  • paper_url: http://arxiv.org/abs/2309.04332
  • repo_url: https://github.com/mayabechlerspeicher/Graph_Neural_Networks_Overfit_Graphs
  • paper_authors: Maya Bechler-Speicher, Ido Amos, Ran Gilad-Bachrach, Amir Globerson
  • for: 本研究探讨了Graph Neural Networks(GNNs)在不同graph distribution中对graph structure的学习情况。
  • methods: 本研究使用了GNNs学习graph数据,并通过graph editing方法来 Mitigate GNNs对不必要的graph structure的过拟合。
  • results: 研究发现,GNNs在某些graph distribution中有很强的过拟合行为,而且reguler graphs更为稳定。此外,研究还提供了一种理论解释,asserting that GNNs的学习过程受到了gradient descent的偏见。最后,研究表明,通过graph editing方法可以提高GNNs的准确率。
    Abstract Predictions over graphs play a crucial role in various domains, including social networks, molecular biology, medicine, and more. Graph Neural Networks (GNNs) have emerged as the dominant approach for learning on graph data. Instances of graph labeling problems consist of the graph-structure (i.e., the adjacency matrix), along with node-specific feature vectors. In some cases, this graph-structure is non-informative for the predictive task. For instance, molecular properties such as molar mass depend solely on the constituent atoms (node features), and not on the molecular structure. While GNNs have the ability to ignore the graph-structure in such cases, it is not clear that they will. In this work, we show that GNNs actually tend to overfit the graph-structure in the sense that they use it even when a better solution can be obtained by ignoring it. We examine this phenomenon with respect to different graph distributions and find that regular graphs are more robust to this overfitting. We then provide a theoretical explanation for this phenomenon, via analyzing the implicit bias of gradient-descent-based learning of GNNs in this setting. Finally, based on our empirical and theoretical findings, we propose a graph-editing method to mitigate the tendency of GNNs to overfit graph-structures that should be ignored. We show that this method indeed improves the accuracy of GNNs across multiple benchmarks.
    摘要 Graph predictions play a crucial role in various domains, including social networks, molecular biology, and medicine. Graph Neural Networks (GNNs) have emerged as the dominant approach for learning on graph data. Instances of graph labeling problems consist of the graph structure (i.e., the adjacency matrix) and node-specific feature vectors. In some cases, the graph structure is non-informative for the predictive task, such as molecular properties that depend solely on the constituent atoms (node features) and not on the molecular structure. While GNNs have the ability to ignore the graph structure in such cases, it is not clear that they will. In this work, we show that GNNs tend to overfit the graph structure, using it even when a better solution can be obtained by ignoring it. We examine this phenomenon with respect to different graph distributions and find that regular graphs are more robust to this overfitting. We then provide a theoretical explanation for this phenomenon, via analyzing the implicit bias of gradient-descent-based learning of GNNs in this setting. Finally, based on our empirical and theoretical findings, we propose a graph-editing method to mitigate the tendency of GNNs to overfit graph structures that should be ignored. We show that this method improves the accuracy of GNNs across multiple benchmarks.Here is the translation in Traditional Chinese:Graph predictions play a crucial role in various domains, including social networks, molecular biology, and medicine. Graph Neural Networks (GNNs) have emerged as the dominant approach for learning on graph data. Instances of graph labeling problems consist of the graph structure (i.e., the adjacency matrix) and node-specific feature vectors. In some cases, the graph structure is non-informative for the predictive task, such as molecular properties that depend solely on the constituent atoms (node features) and not on the molecular structure. While GNNs have the ability to ignore the graph structure in such cases, it is not clear that they will. In this work, we show that GNNs tend to overfit the graph structure, using it even when a better solution can be obtained by ignoring it. We examine this phenomenon with respect to different graph distributions and find that regular graphs are more robust to this overfitting. We then provide a theoretical explanation for this phenomenon, via analyzing the implicit bias of gradient-descent-based learning of GNNs in this setting. Finally, based on our empirical and theoretical findings, we propose a graph-editing method to mitigate the tendency of GNNs to overfit graph structures that should be ignored. We show that this method improves the accuracy of GNNs across multiple benchmarks.

Incremental Learning of Humanoid Robot Behavior from Natural Interaction and Large Language Models

  • paper_url: http://arxiv.org/abs/2309.04316
  • repo_url: None
  • paper_authors: Leonard Bärmann, Rainer Kartmann, Fabian Peller-Konrad, Alex Waibel, Tamim Asfour
  • for: 本研究旨在将人工智能给机器人,以便实现人机合作的自然语言对话。
  • methods: 本研究使用大量语言模型(LLMs)来高级掌控机器人的行为,通过在互动式终端机中透过人类指令、环境观察和执行结果来对LMM进行反馈,从而生成下一个陈述。
  • results: 本研究实现了机器人在进行互动式学习时的增量学习,并在 simulations 和实际情况中进行评估,展示了对多种任务的通用增量学习能力。
    Abstract Natural-language dialog is key for intuitive human-robot interaction. It can be used not only to express humans' intents, but also to communicate instructions for improvement if a robot does not understand a command correctly. Of great importance is to endow robots with the ability to learn from such interaction experience in an incremental way to allow them to improve their behaviors or avoid mistakes in the future. In this paper, we propose a system to achieve incremental learning of complex behavior from natural interaction, and demonstrate its implementation on a humanoid robot. Building on recent advances, we present a system that deploys Large Language Models (LLMs) for high-level orchestration of the robot's behavior, based on the idea of enabling the LLM to generate Python statements in an interactive console to invoke both robot perception and action. The interaction loop is closed by feeding back human instructions, environment observations, and execution results to the LLM, thus informing the generation of the next statement. Specifically, we introduce incremental prompt learning, which enables the system to interactively learn from its mistakes. For that purpose, the LLM can call another LLM responsible for code-level improvements of the current interaction based on human feedback. The improved interaction is then saved in the robot's memory, and thus retrieved on similar requests. We integrate the system in the robot cognitive architecture of the humanoid robot ARMAR-6 and evaluate our methods both quantitatively (in simulation) and qualitatively (in simulation and real-world) by demonstrating generalized incrementally-learned knowledge.
    摘要 人工智能对话是人机交互的关键,可以不仅表达人类的意图,还可以通过对缺失指令的通信来传达 instrucciones 。对于 robots 来说,授予其能够通过交互经验学习,以便在未来避免错误或改善行为。在这篇论文中,我们提出了一种实现复杂行为的逐步学习系统,并在人工智能大语言模型(LLM)的基础上实现了高级别的行为编导。我们建立了一个交互循环,其中人类指令、环境观察和执行结果被反馈给 LLM,以便生成下一句语句。特别是,我们引入了逐步提问学习,这使得系统可以通过自己的错误来学习。为此,LLM可以调用另一个 LLM,以便基于人类反馈进行代码级别的改进。改进后的交互被保存在机器人的记忆中,并在相似的请求时被重新使用。我们在人工智能杂质机器人 ARMAR-6 的认知架构中集成了系统,并在模拟和真实环境中进行了评估,并表明了通过逐步学习获得的普遍化知识。

Federated Learning for Early Dropout Prediction on Healthy Ageing Applications

  • paper_url: http://arxiv.org/abs/2309.04311
  • repo_url: None
  • paper_authors: Christos Chrysanthos Nikolaidis, Vasileios Perifanis, Nikolaos Pavlidis, Pavlos S. Efraimidis
  • for: 这 paper 是关于社会护理应用的研究,旨在提高老年人的生活质量,并帮助操作人员提供早期干预。
  • methods: 这 paper 使用了机器学习(ML)算法,实现了高度准确的预测,超过传统统计方法的cope能力。
  • results: 研究表明, federated machine learning(FML)方法可以减轻隐私问题,实现分布式训练,无需传输个人数据。该方法在实际数据集上进行了评估,并提出了数据选择和类别不平衡处理技术,以提高模型在非独立Identical分布(non-iid)数据上的预测精度。
    Abstract The provision of social care applications is crucial for elderly people to improve their quality of life and enables operators to provide early interventions. Accurate predictions of user dropouts in healthy ageing applications are essential since they are directly related to individual health statuses. Machine Learning (ML) algorithms have enabled highly accurate predictions, outperforming traditional statistical methods that struggle to cope with individual patterns. However, ML requires a substantial amount of data for training, which is challenging due to the presence of personal identifiable information (PII) and the fragmentation posed by regulations. In this paper, we present a federated machine learning (FML) approach that minimizes privacy concerns and enables distributed training, without transferring individual data. We employ collaborative training by considering individuals and organizations under FML, which models both cross-device and cross-silo learning scenarios. Our approach is evaluated on a real-world dataset with non-independent and identically distributed (non-iid) data among clients, class imbalance and label ambiguity. Our results show that data selection and class imbalance handling techniques significantly improve the predictive accuracy of models trained under FML, demonstrating comparable or superior predictive performance than traditional ML models.
    摘要 提供社会护理应用程序对老年人的生活质量有着关键作用,可以帮助运营商提供早期干预。准确预测健康年龄应用程序用户退出是直接关系到个人健康状况。机器学习(ML)算法可以实现非常高精度预测,超出传统统计方法的cope能力。然而,ML需要训练大量数据,这在个人标识信息(PII)和法规 Fragmentation 的存在下是挑战。在这篇论文中,我们提出了一种联邦机器学习(FML)方法,减少隐私问题,实现分布式训练,不需要传输个人数据。我们采用了合作训练,考虑个人和组织在FML中的相互作用,模型cross设备和cross筒 scenarios。我们的方法在实际数据集上进行了评估,该数据集具有非独立和同样分布(non-iid)、客户端数据不均衡和标签抖抖。我们的结果表明,数据选择和客户端数据不均衡处理技术可以提高FML训练得到的预测性能,达到与传统ML模型相同或更高的预测性能。

  • paper_url: http://arxiv.org/abs/2309.04296
  • repo_url: None
  • paper_authors: Arian Prabowo, Kaixuan Chen, Hao Xue, Subbu Sethuvenkatraman, Flora D. Salim
  • for: 这篇论文旨在应对 COVID-19 锁定期间中的能源负载预测问题,并使用 continual learning 技术更新模型以应对 Out-of-Distribution 情况。
  • methods: 这篇论文使用了 continual learning 算法 FSNet,与 privacy-preserving 的人偏移数据来更新模型,并评估了这些方法在实际应用中的性能。
  • results: 研究结果显示 continual learning 技术在 Out-of-Distribution 期间能够确保精确的能源负载预测,并且在锁定期间内与普通的 online learning 方法相比,能够更好地适应变化。
    Abstract In traditional deep learning algorithms, one of the key assumptions is that the data distribution remains constant during both training and deployment. However, this assumption becomes problematic when faced with Out-of-Distribution periods, such as the COVID-19 lockdowns, where the data distribution significantly deviates from what the model has seen during training. This paper employs a two-fold strategy: utilizing continual learning techniques to update models with new data and harnessing human mobility data collected from privacy-preserving pedestrian counters located outside buildings. In contrast to online learning, which suffers from 'catastrophic forgetting' as newly acquired knowledge often erases prior information, continual learning offers a holistic approach by preserving past insights while integrating new data. This research applies FSNet, a powerful continual learning algorithm, to real-world data from 13 building complexes in Melbourne, Australia, a city which had the second longest total lockdown duration globally during the pandemic. Results underscore the crucial role of continual learning in accurate energy forecasting, particularly during Out-of-Distribution periods. Secondary data such as mobility and temperature provided ancillary support to the primary forecasting model. More importantly, while traditional methods struggled to adapt during lockdowns, models featuring at least online learning demonstrated resilience, with lockdown periods posing fewer challenges once armed with adaptive learning techniques. This study contributes valuable methodologies and insights to the ongoing effort to improve energy load forecasting during future Out-of-Distribution periods.
    摘要 传统深度学习算法中一个关键假设是数据分布在训练和部署期间都保持不变。然而,这个假设在面临外部数据期间(如 COVID-19 封锁)时变得问题。这篇论文采用了两重策略:利用连续学习技术更新模型并利用隐私保护的人行数据,收集在外部建筑物外。与在线学习相比,其受到“致命的忘记”的影响,新获得的知识经常覆盖先前的信息,而连续学习则提供了一个整体的方法,保留过去的经验并与新数据集成。这项研究使用了 FSNet,一种强大的连续学习算法,对澳大利亚梅尔本市(全球第二长的总封锁时间)的13个建筑物进行实际应用。结果表明,连续学习在异常数据期间具有精度的能量预测作用,特别是在外部数据期间。次要数据,如流动和温度,为主要预测模型提供了辅助支持。更重要的是,传统方法在封锁期间很难适应,而在线学习方法至少在封锁期间展现了抗逆境能力,封锁期间使用可靠的学习技术后,封锁期间的挑战较少。本研究对未来的异常数据期间的能量负荷预测做出了有价值的方法和发现。

FIMO: A Challenge Formal Dataset for Automated Theorem Proving

  • paper_url: http://arxiv.org/abs/2309.04295
  • repo_url: None
  • paper_authors: Chengwu Liu, Jianhao Shen, Huajian Xin, Zhengying Liu, Ye Yuan, Haiming Wang, Wei Ju, Chuanyang Zheng, Yichun Yin, Lin Li, Ming Zhang, Qun Liu
  • for: 用于提高现有的自动证明方法,以达到国际数学奥林匹克(IMO)水平。
  • methods: 使用了GPT-4进行初步实验,以评估现有方法的局限性。
  • results: 发现现有方法存在很大的局限性,表明还有很长的探索之路才能达到满意的自动证明结果。
    Abstract We present FIMO, an innovative dataset comprising formal mathematical problem statements sourced from the International Mathematical Olympiad (IMO) Shortlisted Problems. Designed to facilitate advanced automated theorem proving at the IMO level, FIMO is currently tailored for the Lean formal language. It comprises 149 formal problem statements, accompanied by both informal problem descriptions and their corresponding LaTeX-based informal proofs. Through initial experiments involving GPT-4, our findings underscore the existing limitations in current methodologies, indicating a substantial journey ahead before achieving satisfactory IMO-level automated theorem proving outcomes.
    摘要 我们介绍FIMO,一个创新的数据集,包含国际数学奥林匹克(IMO)短列表问题的正式数学问题陈述。 FIMO是为了促进高级自动证明在IMO水平而设计,现在采用了Lean正式语言。它包含149个正式问题陈述,以及相应的LaTeX格式的不正式证明。经初步实验表明,现有的方法存在限制,需要进一步的努力才能达到满意的IMO自动证明结果。

Fuzzy Fingerprinting Transformer Language-Models for Emotion Recognition in Conversations

  • paper_url: http://arxiv.org/abs/2309.04292
  • repo_url: None
  • paper_authors: Patrícia Pereira, Rui Ribeiro, Helena Moniz, Luisa Coheur, Joao Paulo Carvalho
  • for: 这个论文是为了结合大语言模型和杂糅指纹技术来实现对话情感识别的目的。
  • methods: 该论文使用了预训练的RoBERTa模型和改进的杂糅指纹分类模块来实现对话情感识别。
  • results: 该论文在 DailyDialog ERC 数据集上实现了状态元的识别结果,使用了许多更轻量级的模型。
    Abstract Fuzzy Fingerprints have been successfully used as an interpretable text classification technique, but, like most other techniques, have been largely surpassed in performance by Large Pre-trained Language Models, such as BERT or RoBERTa. These models deliver state-of-the-art results in several Natural Language Processing tasks, namely Emotion Recognition in Conversations (ERC), but suffer from the lack of interpretability and explainability. In this paper, we propose to combine the two approaches to perform ERC, as a means to obtain simpler and more interpretable Large Language Models-based classifiers. We propose to feed the utterances and their previous conversational turns to a pre-trained RoBERTa, obtaining contextual embedding utterance representations, that are then supplied to an adapted Fuzzy Fingerprint classification module. We validate our approach on the widely used DailyDialog ERC benchmark dataset, in which we obtain state-of-the-art level results using a much lighter model.
    摘要 弹性指纹技术已成功应用于可读性文本分类 tasks,但,如大多数其他技术一样,它们在BERT或RoBERTa等大型预训练语言模型的出现后,已被大量超越。这些模型在识别情感 conversations(ERC)中达到了状态的表现,但它们缺乏可读性和解释性。在这篇论文中,我们提议将两种方法结合使用,以实现更加简单和可读的 Large Language Models-based classifier。我们提议将话语和其前一系列的对话提供给预训练的 RoBERTa,以获取话语上下文嵌入表示,然后将其传递给修改后的弹性指纹分类模块。我们在广泛使用的 DailyDialog ERC benchmark dataset上验证了我们的方法,并在使用轻量级模型时达到了状态的水平。

Sequential Semantic Generative Communication for Progressive Text-to-Image Generation

  • paper_url: http://arxiv.org/abs/2309.04287
  • repo_url: None
  • paper_authors: Hyelin Nam, Jihong Park, Jinho Choi, Seong-Lyun Kim
  • for: 这篇论文提出了一种基于多模态生成器的新通信系统框架,以便在智能应用中实现成功的通信。
  • methods: 论文使用多模态生成器的技术将对象图像转换为文本,并使用反向过程将文本转换回图像。每个文本句子中的每个单词都有特定的语法角色,负责传递图像中的特定信息。
  • results: 论文的实验结果表明,使用文本将图像转换为文本并将文本转换回图像可以减轻通信负担,同时保持图像的含义。这种方法可以在智能应用中实现更高效的通信。
    Abstract This paper proposes new framework of communication system leveraging promising generation capabilities of multi-modal generative models. Regarding nowadays smart applications, successful communication can be made by conveying the perceptual meaning, which we set as text prompt. Text serves as a suitable semantic representation of image data as it has evolved to instruct an image or generate image through multi-modal techniques, by being interpreted in a manner similar to human cognition. Utilizing text can also reduce the overload compared to transmitting the intact data itself. The transmitter converts objective image to text through multi-model generation process and the receiver reconstructs the image using reverse process. Each word in the text sentence has each syntactic role, responsible for particular piece of information the text contains. For further efficiency in communication load, the transmitter sequentially sends words in priority of carrying the most information until reaches successful communication. Therefore, our primary focus is on the promising design of a communication system based on image-to-text transformation and the proposed schemes for sequentially transmitting word tokens. Our work is expected to pave a new road of utilizing state-of-the-art generative models to real communication systems
    摘要 The transmitter converts objective images into text through a multi-model generation process, and the receiver reconstructs the image using a reverse process. Each word in the text sentence has a specific syntactic role, responsible for conveying a particular piece of information the text contains. To further improve communication efficiency, the transmitter sequentially sends words in priority of carrying the most information until successful communication is achieved.Our primary focus is on the design of a communication system based on image-to-text transformation and the proposed schemes for sequentially transmitting word tokens. Our work is expected to pave a new road for utilizing state-of-the-art generative models in real communication systems.

Spatial-Temporal Graph Attention Fuser for Calibration in IoT Air Pollution Monitoring Systems

  • paper_url: http://arxiv.org/abs/2309.04508
  • repo_url: None
  • paper_authors: Keivan Faghih Niresi, Mengjie Zhao, Hugo Bissig, Henri Baumann, Olga Fink
  • for: 这篇论文主要是为了提高互联网物联网(IoT)传感器的精度,特别是在无控制的环境下进行准确的减噪calibration。
  • methods: 我们提出了一种新的方法,利用图 neural network,具体来说是图注意力网络模块,将数组传感器的数据进行融合,以提高传感器的准确性。
  • results: 我们的实验结果表明,我们的方法可以在IoT空气污染监测平台中显著提高传感器的准确性。
    Abstract The use of Internet of Things (IoT) sensors for air pollution monitoring has significantly increased, resulting in the deployment of low-cost sensors. Despite this advancement, accurately calibrating these sensors in uncontrolled environmental conditions remains a challenge. To address this, we propose a novel approach that leverages graph neural networks, specifically the graph attention network module, to enhance the calibration process by fusing data from sensor arrays. Through our experiments, we demonstrate the effectiveness of our approach in significantly improving the calibration accuracy of sensors in IoT air pollution monitoring platforms.
    摘要 互联网物品(IoT)传感器在空气污染监测中的应用已经增加了,因此促进了低成本传感器的应用。然而,在无法控制的环境下精确地调整这些传感器仍然是一大挑战。为解决这个问题,我们提出了一种新的方法,利用图像神经网络,具体来说是图像注意力网络模组,将数据从传感器阵列融合以提高传感器的准确调整。我们的实验结果显示,我们的方法可以对IoT空气污染监测平台中的传感器进行重要的改善。

LLMCad: Fast and Scalable On-device Large Language Model Inference

  • paper_url: http://arxiv.org/abs/2309.04255
  • repo_url: None
  • paper_authors: Daliang Xu, Wangsong Yin, Xin Jin, Ying Zhang, Shiyun Wei, Mengwei Xu, Xuanzhe Liu
    for:This paper aims to improve the efficiency of generative Natural Language Processing (NLP) tasks on mobile devices.methods:The proposed method, LLMCad, uses a compact LLM to generate candidate tokens and a high-precision LLM to validate them, with three novel techniques: token tree construction, self-adjusting fallback strategy, and speculative token generation.results:LLMCad achieves impressive token generation speeds, up to 9.3x faster than existing inference engines, making it a promising solution for on-device NLP tasks.
    Abstract Generative tasks, such as text generation and question answering, hold a crucial position in the realm of mobile applications. Due to their sensitivity to privacy concerns, there is a growing demand for their execution directly on mobile devices. Currently, the execution of these generative tasks heavily depends on Large Language Models (LLMs). Nevertheless, the limited memory capacity of these devices presents a formidable challenge to the scalability of such models. In our research, we introduce LLMCad, an innovative on-device inference engine specifically designed for efficient generative Natural Language Processing (NLP) tasks. The core idea behind LLMCad revolves around model collaboration: a compact LLM, residing in memory, takes charge of generating the most straightforward tokens, while a high-precision LLM steps in to validate these tokens and rectify any identified errors. LLMCad incorporates three novel techniques: (1) Instead of generating candidate tokens in a sequential manner, LLMCad employs the smaller LLM to construct a token tree, encompassing a wider range of plausible token pathways. Subsequently, the larger LLM can efficiently validate all of these pathways simultaneously. (2) It employs a self-adjusting fallback strategy, swiftly initiating the verification process whenever the smaller LLM generates an erroneous token. (3) To ensure a continuous flow of token generation, LLMCad speculatively generates tokens during the verification process by implementing a compute-IO pipeline. Through an extensive series of experiments, LLMCad showcases an impressive token generation speed, achieving rates up to 9.3x faster than existing inference engines.
    摘要 <>传统的自然语言处理(NLP)任务,如文本生成和问答,在移动设备上执行的需求不断增长。由于隐私问题的敏感性,需要直接在移动设备上执行这些任务。现在,这些任务的执行仍然大多依赖于大型语言模型(LLM)。然而,移动设备的内存容量却成为这些模型的扩展性的强烈挑战。 在我们的研究中,我们提出了 LLMCad,一种特有的在移动设备上进行高效生成NLP任务的推理引擎。 LLMCad的核心思想是模型合作:一个较小的LLM在内存中 residences,负责生成最直观的单词,而一个高精度的LLM则在验证这些单词并更正发现的错误。 LLMCad包含三种新技术:1. 而不是顺序生成候选单词,LLMCad使用小型LLM构建一个包含更多可能的单词路径的单词树。然后,大型LLM可以高效地验证这些路径。2. 它采用自适应的快速恢复策略,当小型LLM生成错误的单词时,快速发起验证过程。3. 为确保单词生成的不间断流动,LLMCad使用compute-IO管道来 спекулятив地生成单词,以便在验证过程中继续生成单词。经过了广泛的实验,LLMCad展示了很快的单词生成速度,达到9.3倍于现有的推理引擎。>>>

Towards Reliable and Fluent Large Language Models: Incorporating Feedback Learning Loops in QA Systems

  • paper_url: http://arxiv.org/abs/2309.06384
  • repo_url: None
  • paper_authors: Dongyub Lee, Taesun Whang, Chanhee Lee, Heuiseok Lim
  • For: The paper aims to improve the utility and trustworthiness of large language models (LLMs) in various daily applications by addressing issues such as erroneous references, hallucinated information, and inadequate details.* Methods: The study builds a dataset to train a critic model that evaluates the citation, correctness, and fluency of responses generated by LLMs in QA systems. It also proposes an automated feedback mechanism that leverages the critic model to offer real-time feedback on heterogeneous aspects of generated text, and introduces a feedback learning loop that uses the critic model to iteratively improve the performance of the LLM responsible for response generation.* Results: The experimental results demonstrate the efficacy of the approach, showing substantial improvements in citation and fluency metrics for ChatGPT, including a 4% precision increase in citation and an approximately 8% enhancement in the MAUVE metric for fluency, while maintaining high levels of correctness.Here’s the simplified Chinese text for the three information points:* 为:本研究旨在提高大语言模型(LLM)在日常应用中的可靠性和信任worthiness,并解决误差参考、hallucinated信息和缺乏细节等问题。* 方法:本研究建立了一个评价机器人模型可以评估LLM在问答系统中生成的引用、正确性和流畅性。它还提出了一种自动反馈机制,利用评价模型提供实时反馈对生成文本中的多种方面。最后,它引入了一个反馈学习循环,使用评价模型来持续改进LLM负责生成文本的性能。* 结果:实验结果表明,本approach的有效性,包括对ChatGPT的引用精度提高4%,以及对流畅性metric MAUVE的提高约8%,同时保持高水平的正确性。
    Abstract Large language models (LLMs) have emerged as versatile tools in various daily applications. However, they are fraught with issues that undermine their utility and trustworthiness. These include the incorporation of erroneous references (citation), the generation of hallucinated information (correctness), and the inclusion of superfluous or omission of crucial details (fluency). To ameliorate these concerns, this study makes several key contributions. First, we build a dataset to train a critic model capable of evaluating the citation, correctness, and fluency of responses generated by LLMs in QA systems. Second, we propose an automated feedback mechanism that leverages the critic model to offer real-time feedback on heterogeneous aspects of generated text. Third, we introduce a feedback learning loop that uses this critic model to iteratively improve the performance of the LLM responsible for response generation. Experimental results demonstrate the efficacy of our approach, showing substantial improvements in citation and fluency metrics for ChatGPT, including a 4% precision increase in citation and an approximately 8% enhancement in the MAUVE metric for fluency, while maintaining high levels of correctness.
    摘要
  • Incorrect references (citation)* Made-up information (correctness)* Too much or too little information (fluency)To address these problems, this study makes three important contributions:1. We create a dataset to train a critic model that can evaluate the citation, correctness, and fluency of responses generated by LLMs in QA systems.2. We propose an automated feedback mechanism that uses the critic model to give real-time feedback on the responses.3. We introduce a feedback learning loop that uses the critic model to improve the performance of the LLM responsible for response generation.Our approach was tested and showed significant improvements in citation and fluency metrics for ChatGPT. The precision of citation improved by 4% and the MAUVE metric for fluency improved by approximately 8%, while maintaining high levels of correctness.

Decoding visual brain representations from electroencephalography through Knowledge Distillation and latent diffusion models

  • paper_url: http://arxiv.org/abs/2309.07149
  • repo_url: None
  • paper_authors: Matteo Ferrante, Tommaso Boccato, Stefano Bargione, Nicola Toschi
  • for: 这个研究旨在连接神经信号与视觉认知。
  • methods: 该研究使用电энцефалография(EEG)数据来分类和重建图像,并采用了一种基于Contrastive Language-Image Pre-Training(CLIP)的语音分类教师网络进行知识传承。
  • results: 该模型可以达到80%的top-5准确率,significantly出perform了标准CNN和多个RNN基本 Referenced benchmarks,并且可以生成基于EEG活动的图像估计。
    Abstract Decoding visual representations from human brain activity has emerged as a thriving research domain, particularly in the context of brain-computer interfaces. Our study presents an innovative method that employs to classify and reconstruct images from the ImageNet dataset using electroencephalography (EEG) data from subjects that had viewed the images themselves (i.e. "brain decoding"). We analyzed EEG recordings from 6 participants, each exposed to 50 images spanning 40 unique semantic categories. These EEG readings were converted into spectrograms, which were then used to train a convolutional neural network (CNN), integrated with a knowledge distillation procedure based on a pre-trained Contrastive Language-Image Pre-Training (CLIP)-based image classification teacher network. This strategy allowed our model to attain a top-5 accuracy of 80%, significantly outperforming a standard CNN and various RNN-based benchmarks. Additionally, we incorporated an image reconstruction mechanism based on pre-trained latent diffusion models, which allowed us to generate an estimate of the images which had elicited EEG activity. Therefore, our architecture not only decodes images from neural activity but also offers a credible image reconstruction from EEG only, paving the way for e.g. swift, individualized feedback experiments. Our research represents a significant step forward in connecting neural signals with visual cognition.
    摘要 研究人员们已经开发了一种新的方法,可以从人脑电听信号中解码和重建图像。我们的研究使用了6名参与者,每名参与者看过50个图像,这些图像包括40个semantic类别。我们将EEG记录转换成spectrogram,然后使用这些spectrogram来训练一个卷积神经网络(CNN),并结合一种基于预训练的Contrastive Language-Image Pre-Training(CLIP)图像分类教师网络的知识继承程序。这种策略使我们的模型达到了80%的top-5准确率,明显超过了标准CNN和多种RNN基本指标。此外,我们还添加了一种基于预训练的液态噪声模型的图像重建机制,使我们能够从EEG只有generated一个图像的估计。因此,我们的architecture不仅可以从 neural activity中解码图像,还可以提供一种可靠的图像重建方式,从EEG只有。这些成果 Represent a significant step forward in connecting neural signals with visual cognition.

UQ at #SMM4H 2023: ALEX for Public Health Analysis with Social Media

  • paper_url: http://arxiv.org/abs/2309.04213
  • repo_url: https://github.com/yanjiangjerry/alex
  • paper_authors: Yan Jiang, Ruihong Qiu, Yi Zhang, Zi Huang
  • for: This paper aims to improve the performance of public health analysis on social media by addressing the data imbalance issue and utilizing the ability of large language models (LLMs) effectively.
  • methods: The proposed ALEX framework uses a combination of data augmentation, balanced training, and proper prompting to improve the performance of LLMs in public health analysis on social media.
  • results: The ALEX model achieved the best performance among all submissions in three tasks (Task 2, Task 4, and Task 1) in the Social Media Mining for Health 2023 (SMM4H) challenge, with high scores in all tasks.Here’s the simplified Chinese text for the three key points:
  • for: 这篇论文目的是通过解决数据不均衡问题和有效地利用大语言模型(LLMs)的能力来提高社交媒体上公共健康分析的性能。
  • methods: 提出的ALEX框架使用数据扩充、平衡训练和合适的提示来提高LLMs在社交媒体上公共健康分析中的性能。
  • results: ALEX模型在2023年社交媒体健康挖掘大会(SMM4H)中的三个任务(任务2、任务4和任务1)中得分最高,在所有任务中得分也很高。
    Abstract As social media becomes increasingly popular, more and more activities related to public health emerge. Current techniques for public health analysis involve popular models such as BERT and large language models (LLMs). However, the costs of training in-domain LLMs for public health are especially expensive. Furthermore, such kinds of in-domain datasets from social media are generally imbalanced. To tackle these challenges, the data imbalance issue can be overcome by data augmentation and balanced training. Moreover, the ability of the LLMs can be effectively utilized by prompting the model properly. In this paper, a novel ALEX framework is proposed to improve the performance of public health analysis on social media by adopting an LLMs explanation mechanism. Results show that our ALEX model got the best performance among all submissions in both Task 2 and Task 4 with a high score in Task 1 in Social Media Mining for Health 2023 (SMM4H)[1]. Our code has been released at https:// github.com/YanJiangJerry/ALEX.
    摘要 随着社交媒体的普及,公共卫生领域的活动越来越多。现有的公共卫生分析技术主要基于BERT和大型自然语言模型(LLM)。然而,培训域 específico LLMs 的成本特别高。此外,这些社交媒体数据集通常受到偏见的问题。为了解决这些挑战,可以通过数据扩展和平衡训练来缓解数据不均衡问题。此外,可以通过对模型提供正确的提示来有效地利用LLMs的能力。本文提出了一种基于 LLMs 解释机制的ALEX框架,用于提高社交媒体上的公共卫生分析表现。实验结果表明,我们的ALEX模型在 Social Media Mining for Health 2023(SMM4H)中的任务2和任务4中得到了最高分,并在任务1中获得了高分。我们的代码已经在 GitHub 上发布,请参考 https://github.com/YanJiangJerry/ALEX。

Towards Mitigating Architecture Overfitting in Dataset Distillation

  • paper_url: http://arxiv.org/abs/2309.04195
  • repo_url: None
  • paper_authors: Xuyang Zhong, Chen Liu
  • for: 提高 neural network 在具有限制的训练数据情况下的性能
  • methods: 提出了一系列的建筑设计和训练方法,能够提高不同网络架构在热针训练数据上的泛化性能
  • results: 通过广泛的实验,证明了我们的方法的有效性和通用性,特别是在不同的尺度情况下,我们的方法可以在使用更大容量网络时达到相对或超过现有方法的性能
    Abstract Dataset distillation methods have demonstrated remarkable performance for neural networks trained with very limited training data. However, a significant challenge arises in the form of architecture overfitting: the distilled training data synthesized by a specific network architecture (i.e., training network) generates poor performance when trained by other network architectures (i.e., test networks). This paper addresses this issue and proposes a series of approaches in both architecture designs and training schemes which can be adopted together to boost the generalization performance across different network architectures on the distilled training data. We conduct extensive experiments to demonstrate the effectiveness and generality of our methods. Particularly, across various scenarios involving different sizes of distilled data, our approaches achieve comparable or superior performance to existing methods when training on the distilled data using networks with larger capacities.
    摘要

Knowledge-tuning Large Language Models with Structured Medical Knowledge Bases for Reliable Response Generation in Chinese

  • paper_url: http://arxiv.org/abs/2309.04175
  • repo_url: None
  • paper_authors: Haochun Wang, Sendong Zhao, Zewen Qiang, Zijian Li, Nuwa Xi, Yanrui Du, MuZhen Cai, Haoqiang Guo, Yuhan Chen, Haoming Xu, Bing Qin, Ting Liu
  • for: 提高大语言模型在医疗领域中的可靠性和效果,即使模型没有医疗领域的专业知识。
  • methods: 利用结构化的医疗知识库来提高大语言模型的领域知识,从而提高响应生成的准确率。
  • results: 经过知识训练后,大语言模型可以在医疗知识区域中表现出更高的准确率,并且可以提供可靠的响应。
    Abstract Large Language Models (LLMs) have demonstrated remarkable success in diverse natural language processing (NLP) tasks in general domains. However, LLMs sometimes generate responses with the hallucination about medical facts due to limited domain knowledge. Such shortcomings pose potential risks in the utilization of LLMs within medical contexts. To address this challenge, we propose knowledge-tuning, which leverages structured medical knowledge bases for the LLMs to grasp domain knowledge efficiently and facilitate reliable response generation. We also release cMedKnowQA, a Chinese medical knowledge question-answering dataset constructed from medical knowledge bases to assess the medical knowledge proficiency of LLMs. Experimental results show that the LLMs which are knowledge-tuned with cMedKnowQA, can exhibit higher levels of accuracy in response generation compared with vanilla instruction-tuning and offer a new reliable way for the domain adaptation of LLMs.
    摘要 Note:* "Large Language Models" (LLMs) is translated as "大型语言模型" (dàxìng yǔyán módelǐ)* "natural language processing" (NLP) is translated as "自然语言处理" (zìrán yǔyán xùcè)* "medical knowledge" is translated as "医学知识" (yīxué zhīshī)* "knowledge-tuning" is translated as "知识调教" (zhīshī tiàoxüe)* "vanilla instruction-tuning" is translated as "简单的指导调教" (jiǎndān de zhǐguī tiàoxüe)* "domain adaptation" is translated as "领域适应" (lǐngyì shìbiàn)

Manifold-based Verbalizer Space Re-embedding for Tuning-free Prompt-based Classification

  • paper_url: http://arxiv.org/abs/2309.04174
  • repo_url: None
  • paper_authors: Haochun Wang, Sendong Zhao, Chi Liu, Nuwa Xi, Muzhen Cai, Bing Qin, Ting Liu
  • for: 这个研究的目的是提出一种免 Parametric 的概率类别方法,可以与高维度的关键词嵌入进行类别。
  • methods: 这个方法使用了 Locally Linear Embedding with Intra-class Neighborhood Constraint (LLE-INC) 技术,将关键词嵌入转换为高维度的数据点,并将这些点分为不同的类别。
  • results: 实验结果显示,这个方法可以与受条件的类别方法相比,具有相似的类别精度,并且不需要任何参数调整。另外,在将类别方法与高维度关键词嵌入结合使用时,这个方法可以进一步提高类别精度。
    Abstract Prompt-based classification adapts tasks to a cloze question format utilizing the [MASK] token and the filled tokens are then mapped to labels through pre-defined verbalizers. Recent studies have explored the use of verbalizer embeddings to reduce labor in this process. However, all existing studies require a tuning process for either the pre-trained models or additional trainable embeddings. Meanwhile, the distance between high-dimensional verbalizer embeddings should not be measured by Euclidean distance due to the potential for non-linear manifolds in the representation space. In this study, we propose a tuning-free manifold-based space re-embedding method called Locally Linear Embedding with Intra-class Neighborhood Constraint (LLE-INC) for verbalizer embeddings, which preserves local properties within the same class as guidance for classification. Experimental results indicate that even without tuning any parameters, our LLE-INC is on par with automated verbalizers with parameter tuning. And with the parameter updating, our approach further enhances prompt-based tuning by up to 3.2%. Furthermore, experiments with the LLaMA-7B&13B indicate that LLE-INC is an efficient tuning-free classification approach for the hyper-scale language models.
    摘要 In this study, we propose a tuning-free manifold-based space re-embedding method called Locally Linear Embedding with Intra-class Neighborhood Constraint (LLE-INC) for verbalizer embeddings, which preserves local properties within the same class as guidance for classification. Experimental results indicate that even without tuning any parameters, our LLE-INC is on par with automated verbalizers with parameter tuning. And with the parameter updating, our approach further enhances prompt-based tuning by up to 3.2%. Furthermore, experiments with the LLaMA-7B&13B indicate that LLE-INC is an efficient tuning-free classification approach for the hyper-scale language models.(Note: The text has been translated into Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. The traditional Chinese form is also commonly used in Taiwan and Hong Kong.)

Compositional Learning of Visually-Grounded Concepts Using Reinforcement

  • paper_url: http://arxiv.org/abs/2309.04504
  • repo_url: https://github.com/haidiazaman/rl-concept-learning-project
  • paper_authors: Zijun Lin, Haidi Azaman, M Ganesh Kumar, Cheston Tan
  • for: investigating how deep reinforcement learning agents learn and compose color-shape based combinatorial instructions to solve novel combinations in a spatial navigation task.
  • methods: using 3D environments and exploring compositional learning, frozen text encoders (e.g. CLIP, BERT), and pretraining on shape or color concepts separately.
  • results: agents pretrained on concept and compositional learning achieve significantly higher reward when evaluated zero-shot on novel color-shape1-shape2 visual object combinations, and a 20 times decrease in training episodes needed to solve unseen combinations of instructions.
    Abstract Deep reinforcement learning agents need to be trained over millions of episodes to decently solve navigation tasks grounded to instructions. Furthermore, their ability to generalize to novel combinations of instructions is unclear. Interestingly however, children can decompose language-based instructions and navigate to the referred object, even if they have not seen the combination of queries prior. Hence, we created three 3D environments to investigate how deep RL agents learn and compose color-shape based combinatorial instructions to solve novel combinations in a spatial navigation task. First, we explore if agents can perform compositional learning, and whether they can leverage on frozen text encoders (e.g. CLIP, BERT) to learn word combinations in fewer episodes. Next, we demonstrate that when agents are pretrained on the shape or color concepts separately, they show a 20 times decrease in training episodes needed to solve unseen combinations of instructions. Lastly, we show that agents pretrained on concept and compositional learning achieve significantly higher reward when evaluated zero-shot on novel color-shape1-shape2 visual object combinations. Overall, our results highlight the foundations needed to increase an agent's proficiency in composing word groups through reinforcement learning and its ability for zero-shot generalization to new combinations.
    摘要 深度强化学习机器人需要通过百万集 Episodes 来有效地解决基于指令的导航任务。此外,它们对新的组合指令的泛化能力也不清楚。然而,孩子们可以将语言基于的指令分解成不同的部分,并寻找 Referred 对象,即使它们没有看到这些组合指令之前。因此,我们创建了三个3D环境,以 investigate 如何深度RL机器人学习和组合色彩基本指令来解决新的组合任务。首先,我们研究了机器人是否可以进行组合学习,以及是否可以使用冻结文本编码器(例如 CLIP、BERT)来学习单词组合。接着,我们示出了当机器人在Shape或Color概念上进行预训练后,它们可以减少解决未看过的组合指令的训练集数量。最后,我们显示了在概念学习和组合学习下,机器人在零次学习情况下对新的Color-Shape1-Shape2视觉对象组合表现出了显著更高的奖励。总的来说,我们的结果表明了如何通过强化学习来提高机器人对单词组合的能力,以及这种能力的零次泛化能力。

Leveraging Prototype Patient Representations with Feature-Missing-Aware Calibration to Mitigate EHR Data Sparsity

  • paper_url: http://arxiv.org/abs/2309.04160
  • repo_url: None
  • paper_authors: Yinghao Zhu, Zixiang Wang, Long He, Shiyun Xie, Zixi Chen, Jingkun An, Liantao Ma, Chengwei Pan
  • for: 这份研究是为了解决电子健康记录(EHR)数据的稀畴性问题,以提高预测模型的效能。
  • methods: 本研究使用间接替代填写方法,利用相似 пацієнтах的原型表示来获得更为紧密的嵌入。它还包括一个专门设计的特征信任学习模组,以评估每个特征的可靠性。
  • results: 研究结果显示,设计的模型在MIMIC-III和MIMIC-IV数据集上的医院死亡结果预测任务中实现了 statistically significant 的提升,比较之前的EHR-Related模型。代码可以在 \url{https://github.com/yhzhu99/SparseEHR} 上获取,以保证可重现性。
    Abstract Electronic Health Record (EHR) data frequently exhibits sparse characteristics, posing challenges for predictive modeling. Current direct imputation such as matrix imputation approaches hinge on referencing analogous rows or columns to complete raw missing data and do not differentiate between imputed and actual values. As a result, models may inadvertently incorporate irrelevant or deceptive information with respect to the prediction objective, thereby compromising the efficacy of downstream performance. While some methods strive to recalibrate or augment EHR embeddings after direct imputation, they often mistakenly prioritize imputed features. This misprioritization can introduce biases or inaccuracies into the model. To tackle these issues, our work resorts to indirect imputation, where we leverage prototype representations from similar patients to obtain a denser embedding. Recognizing the limitation that missing features are typically treated the same as present ones when measuring similar patients, our approach designs a feature confidence learner module. This module is sensitive to the missing feature status, enabling the model to better judge the reliability of each feature. Moreover, we propose a novel patient similarity metric that takes feature confidence into account, ensuring that evaluations are not based merely on potentially inaccurate imputed values. Consequently, our work captures dense prototype patient representations with feature-missing-aware calibration process. Comprehensive experiments demonstrate that designed model surpasses established EHR-focused models with a statistically significant improvement on MIMIC-III and MIMIC-IV datasets in-hospital mortality outcome prediction task. The code is publicly available at \url{https://github.com/yhzhu99/SparseEHR} to assure the reproducibility.
    摘要 电子健康记录(EHR)数据经常具有稀畴特征,这会对预测模型造成挑战。目前的直接填充方法,如矩阵填充方法,基于相似的行或列来完善未完整的数据,并不能区分填充的和实际值。这会让模型把无关或误导性的信息纳入预测目标中,从而降低下游性能。一些方法尝试通过重新填充或扩展EHR嵌入来解决这些问题,但它们通常会偏好填充的特征。这会引入偏见或错误到模型中。为了解决这些问题,我们的工作采用间接填充方法,利用相似患者的原型表示来获得更密集的嵌入。我们认识到缺失的特征通常会被视为已知的特征,因此我们的方法设计了特征信任学习模块。这个模块能够感知缺失特征的状态,使模型更好地评估每个特征的可靠性。此外,我们提出了一种新的患者相似度度量,该度量考虑特征信任度,以确保评估不仅基于可能不准确的填充值。因此,我们的工作可以获得 dense prototype 患者表示,同时具有缺失特征整合过程中的可靠性评估。我们的实验表明,我们的模型在MIMIC-III和MIMIC-IV数据集上的医院死亡结果预测任务中 statistically significant 提高了表现,至于代码,可以在 中找到。以确保可重现性。

  • paper_url: http://arxiv.org/abs/2309.04146
  • repo_url: None
  • paper_authors: Kyoungyeon Cho, Seungkum Han, Wonseok Hwang
  • for: 法律文本分析的大规模统计分析可以提供有价值的法律洞察。
  • methods: NESTLE 提供了一个无代码工具 для大规模法律文本统计分析,包括搜索引擎、终端信息抽取系统和大语言模型。
  • results: NESTLE 可以在大规模法律文本中实现 GPT-4 相当的性能,并且可以在不需要编写代码的情况下进行自定义统计分析。
    Abstract The statistical analysis of large scale legal corpus can provide valuable legal insights. For such analysis one needs to (1) select a subset of the corpus using document retrieval tools, (2) structuralize text using information extraction (IE) systems, and (3) visualize the data for the statistical analysis. Each process demands either specialized tools or programming skills whereas no comprehensive unified "no-code" tools have been available. Especially for IE, if the target information is not predefined in the ontology of the IE system, one needs to build their own system. Here we provide NESTLE, a no code tool for large-scale statistical analysis of legal corpus. With NESTLE, users can search target documents, extract information, and visualize the structured data all via the chat interface with accompanying auxiliary GUI for the fine-level control. NESTLE consists of three main components: a search engine, an end-to-end IE system, and a Large Language Model (LLM) that glues the whole components together and provides the chat interface. Powered by LLM and the end-to-end IE system, NESTLE can extract any type of information that has not been predefined in the IE system opening up the possibility of unlimited customizable statistical analysis of the corpus without writing a single line of code. The use of the custom end-to-end IE system also enables faster and low-cost IE on large scale corpus. We validate our system on 15 Korean precedent IE tasks and 3 legal text classification tasks from LEXGLUE. The comprehensive experiments reveal NESTLE can achieve GPT-4 comparable performance by training the internal IE module with 4 human-labeled, and 192 LLM-labeled examples. The detailed analysis provides the insight on the trade-off between accuracy, time, and cost in building such system.
    摘要 大规模法律文本分析可以提供有价值的法律洞察。为实现这一目标,需要(1)使用文档检索工具选择规模大的文本子集,(2)使用信息抽取(IE)系统结构化文本,以及(3)使用数据视图工具进行统计分析。每个过程都需要特殊的工具或编程技能,而现在没有一款综合的“无代码”工具可用。尤其是IE,如果目标信息没有在IE系统中 Ontology 中定义,那么需要自己建立系统。我们提供了 NESTLE,一款“无代码”工具,可以在对大规模文本进行统计分析时,使用 conversational интерфейス和相应的辅助GUI进行搜索、信息抽取和数据视图。NESTLE 由三个主要组件组成:搜索引擎、端到端IE系统和一个基于大语言模型(LLM)的核心组件。通过LLM和端到端IE系统,NESTLE 可以自动抽取文本中的任何信息,无需在IE系统中先定义目标信息,这样开放了无限可定制的统计分析方法,无需写一行代码。使用自定义端到端IE系统,NESTLE 还可以在大规模文本中进行更快和低成本的IE。我们在15个韩国前例IE任务和3个法律文本分类任务上进行了详细的实验,并证明NESTLE 可以在培育内部IE模块时与 GPT-4 相当的性能。etailed 分析还提供了对准则、时间和成本之间的费 trade-off 的深入分析。

Trustworthy and Synergistic Artificial Intelligence for Software Engineering: Vision and Roadmaps

  • paper_url: http://arxiv.org/abs/2309.04142
  • repo_url: None
  • paper_authors: David Lo
  • for: This paper aims to provide a comprehensive overview of the current state and future directions of Artificial Intelligence for Software Engineering (AI4SE), with a focus on realizing trustworthy and synergistic AI4SE.
  • methods: The paper uses a combination of literature review, analysis, and visioning to explore the current challenges and potential solutions for AI4SE, and to paint a vision for the future of software engineering.
  • results: The paper highlights the potential leaps that can be achieved if the key challenges of AI4SE are surmounted, including the transition towards Software Engineering 2.0, and provides two strategic roadmaps for realizing trustworthy and synergistic AI4SE.
    Abstract For decades, much software engineering research has been dedicated to devising automated solutions aimed at enhancing developer productivity and elevating software quality. The past two decades have witnessed an unparalleled surge in the development of intelligent solutions tailored for software engineering tasks. This momentum established the Artificial Intelligence for Software Engineering (AI4SE) area, which has swiftly become one of the most active and popular areas within the software engineering field. This Future of Software Engineering (FoSE) paper navigates through several focal points. It commences with a succinct introduction and history of AI4SE. Thereafter, it underscores the core challenges inherent to AI4SE, particularly highlighting the need to realize trustworthy and synergistic AI4SE. Progressing, the paper paints a vision for the potential leaps achievable if AI4SE's key challenges are surmounted, suggesting a transition towards Software Engineering 2.0. Two strategic roadmaps are then laid out: one centered on realizing trustworthy AI4SE, and the other on fostering synergistic AI4SE. While this paper may not serve as a conclusive guide, its intent is to catalyze further progress. The ultimate aspiration is to position AI4SE as a linchpin in redefining the horizons of software engineering, propelling us toward Software Engineering 2.0.
    摘要 (Simplified Chinese translation)For decades, much software engineering research has been dedicated to devising automated solutions aimed at enhancing developer productivity and elevating software quality. The past two decades have witnessed an unparalleled surge in the development of intelligent solutions tailored for software engineering tasks. This momentum established the Artificial Intelligence for Software Engineering (AI4SE) area, which has swiftly become one of the most active and popular areas within the software engineering field. This Future of Software Engineering (FoSE) paper navigates through several focal points. It commences with a succinct introduction and history of AI4SE. Thereafter, it underscores the core challenges inherent to AI4SE, particularly highlighting the need to realize trustworthy and synergistic AI4SE. Progressing, the paper paints a vision for the potential leaps achievable if AI4SE's key challenges are surmounted, suggesting a transition towards Software Engineering 2.0. Two strategic roadmaps are then laid out: one centered on realizing trustworthy AI4SE, and the other on fostering synergistic AI4SE. While this paper may not serve as a conclusive guide, its intent is to catalyze further progress. The ultimate aspiration is to position AI4SE as a linchpin in redefining the horizons of software engineering, propelling us toward Software Engineering 2.0.

Proprioceptive External Torque Learning for Floating Base Robot and its Applications to Humanoid Locomotion

  • paper_url: http://arxiv.org/abs/2309.04138
  • repo_url: None
  • paper_authors: Daegyu Lim, Myeong-Ju Kim, Junhyeok Cha, Donghyeon Kim, Jaeheung Park
  • for: 本研究旨在实现人型机器人的稳定行走和安全执行,并且减少对系统的成本、阻尼、复杂度和故障可能性。
  • methods: 本研究使用 proprioceptive 哔视感器(Encoder 和 IMU)来学习外部关节扭矩,不需要增加价格、阻尼、复杂度和可能性故障的 Force-Torque 仪。
  • results: 实验结果显示,训练 GRU 网络可以实现更好的外部关节扭矩和触地力估算,与模型基本方法(MOB)和摩擦模型相比,具有更小的误差。此外,训练网络还可以在不同脚和上层体重的情况下保持稳定的行走,并且显示了可以实现零矩点传递控制。
    Abstract The estimation of external joint torque and contact wrench is essential for achieving stable locomotion of humanoids and safety-oriented robots. Although the contact wrench on the foot of humanoids can be measured using a force-torque sensor (FTS), FTS increases the cost, inertia, complexity, and failure possibility of the system. This paper introduces a method for learning external joint torque solely using proprioceptive sensors (encoders and IMUs) for a floating base robot. For learning, the GRU network is used and random walking data is collected. Real robot experiments demonstrate that the network can estimate the external torque and contact wrench with significantly smaller errors compared to the model-based method, momentum observer (MOB) with friction modeling. The study also validates that the estimated contact wrench can be utilized for zero moment point (ZMP) feedback control, enabling stable walking. Moreover, even when the robot's feet and the inertia of the upper body are changed, the trained network shows consistent performance with a model-based calibration. This result demonstrates the possibility of removing FTS on the robot, which reduces the disadvantages of hardware sensors. The summary video is available at https://youtu.be/gT1D4tOiKpo.
    摘要 estimate 外部联 torque 和 接触扭矩 的估算是人类机器人稳定行走和安全机器人的关键。 although 机器人的足部可以使用力矩传感器(FTS)测量 contacts 扭矩,FTS 会增加系统的成本、抗力、复杂性和失败可能性。 this paper introduces a method for learning external joint torque solely using proprioceptive sensors (encoders and IMUs) for a floating base robot. for learning, the GRU network is used and random walking data is collected. real robot experiments demonstrate that the network can estimate the external torque and contact wrench with significantly smaller errors compared to the model-based method, momentum observer (MOB) with friction modeling. the study also validates that the estimated contact wrench can be utilized for zero moment point (ZMP) feedback control, enabling stable walking. moreover, even when the robot's feet and the inertia of the upper body are changed, the trained network shows consistent performance with a model-based calibration. this result demonstrates the possibility of removing FTS on the robot, which reduces the disadvantages of hardware sensors. the summary video is available at .Note: The translation is in Simplified Chinese, which is the standard written form of Chinese used in mainland China. If you prefer Traditional Chinese, please let me know and I can provide the translation in that format as well.

Weakly Supervised Point Clouds Transformer for 3D Object Detection

  • paper_url: http://arxiv.org/abs/2309.04105
  • repo_url: None
  • paper_authors: Zuojin Tang, Bo Sun, Tongwei Ma, Daosheng Li, Zhenhui Xu
  • for: 本文提出了一种基于点云transformer的弱类别学习框架,用于3D对象检测。目的是降低3D数据集的标注成本,以提高训练效率。
  • methods: 我们提出了一种无监督投票提议模块,通过随机设置anchor点和使用投票网络选择高质量的anchor点。然后,它将信息详细总结成教师和学生网络。学生网络采用ResNet网络高效地提取本地特征,但也可能丢失全局信息。为了提供全局和本地信息的输入,我们采用了transformer自注意机制和ResNet层。
  • results: 在KITTI datasets上进行了实验, achieved the highest level of average precision compared with the most recent weakly supervised 3D object detectors。
    Abstract The annotation of 3D datasets is required for semantic-segmentation and object detection in scene understanding. In this paper we present a framework for the weakly supervision of a point clouds transformer that is used for 3D object detection. The aim is to decrease the required amount of supervision needed for training, as a result of the high cost of annotating a 3D datasets. We propose an Unsupervised Voting Proposal Module, which learns randomly preset anchor points and uses voting network to select prepared anchor points of high quality. Then it distills information into student and teacher network. In terms of student network, we apply ResNet network to efficiently extract local characteristics. However, it also can lose much global information. To provide the input which incorporates the global and local information as the input of student networks, we adopt the self-attention mechanism of transformer to extract global features, and the ResNet layers to extract region proposals. The teacher network supervises the classification and regression of the student network using the pre-trained model on ImageNet. On the challenging KITTI datasets, the experimental results have achieved the highest level of average precision compared with the most recent weakly supervised 3D object detectors.
    摘要 三维数据集的注释是Scene理解中Semantic-segmentation和对象检测的必要条件。在这篇论文中,我们提出了一个用于弱样本监督的点云变换器框架,以降低训练所需的监督量,因为 annotating a 3D dataset 的成本很高。我们提出了一个无监督投票建议模块,它学习随机设置的锚点,并使用投票网络选择高质量的锚点。然后,它将信息精炼到教师和学生网络。在学生网络中,我们使用ResNet网络来高效地提取本地特征,但它也可能产生大量的全局信息丢失。为了提供包含全局和本地信息的输入,我们采用了 transformer 自注意机制来提取全局特征,并使用 ResNet 层来提取地区提案。教师网络监督学生网络在 ImageNet 预训练模型的基础上进行分类和回归。在 KITTI 数据集上进行的实验结果达到了最近弱样本监督三维对象检测器的最高水平,相比之下,其他最近的弱样本监督三维对象检测器。

Modeling Recommender Ecosystems: Research Challenges at the Intersection of Mechanism Design, Reinforcement Learning and Generative Models

  • paper_url: http://arxiv.org/abs/2309.06375
  • repo_url: None
  • paper_authors: Craig Boutilier, Martin Mladenov, Guy Tennenholtz
  • for: 本文提出了一种概念框架,用于提高现代推荐系统的价值,以及提高推荐系统中各个actor的利益。
  • methods: 本文提出了一些新的方法,包括使用强化学习优化长期目标,使用社会选择理论考虑不同actor的利益,以及使用行为经济学和心理学来更好地模型用户和Item提供者的行为。
  • results: 本文的研究结果表明,通过使用这些新的方法,可以提高推荐系统的总体健康度和用户利益,同时也可以提高Item提供者的利益。
    Abstract Modern recommender systems lie at the heart of complex ecosystems that couple the behavior of users, content providers, advertisers, and other actors. Despite this, the focus of the majority of recommender research -- and most practical recommenders of any import -- is on the local, myopic optimization of the recommendations made to individual users. This comes at a significant cost to the long-term utility that recommenders could generate for its users. We argue that explicitly modeling the incentives and behaviors of all actors in the system -- and the interactions among them induced by the recommender's policy -- is strictly necessary if one is to maximize the value the system brings to these actors and improve overall ecosystem "health". Doing so requires: optimization over long horizons using techniques such as reinforcement learning; making inevitable tradeoffs in the utility that can be generated for different actors using the methods of social choice; reducing information asymmetry, while accounting for incentives and strategic behavior, using the tools of mechanism design; better modeling of both user and item-provider behaviors by incorporating notions from behavioral economics and psychology; and exploiting recent advances in generative and foundation models to make these mechanisms interpretable and actionable. We propose a conceptual framework that encompasses these elements, and articulate a number of research challenges that emerge at the intersection of these different disciplines.
    摘要 现代推荐系统处于复杂的生态系统中,与用户、内容提供者、广告主和其他actor的行为相互关联。然而,大多数推荐研究和实践中心于本地、短期优化推荐给单个用户。我们认为,如果推荐系统想要在长期增值给用户,那么需要考虑所有actor的利益和行为,以及这些actor之间由推荐策略引起的互动。这需要:使用增强学习来优化推荐策略在长期 horizons上; 通过社会选择方法来让拥有不同利益的actor之间进行让担做出妥协; 减少信息不对称性,同时考虑激励和战略行为,使用机制设计的工具; 更好地模型用户和物品提供者的行为,通过包括行为经济学和心理学的思想; 并利用最新的生成和基础模型来使这些机制可读性和可行性。我们提出了一个涵盖这些元素的概念框架,并详细描述了这些不同领域之间的研究挑战。

Data-driven classification of low-power communication signals by an unauthenticated user using a software-defined radio

  • paper_url: http://arxiv.org/abs/2309.04088
  • repo_url: https://github.com/minds-code/jammingsdr
  • paper_authors: Tarun Rao Keshabhoina, Marcos M. Vasconcelos
  • for: 本文针对大规模分布式多智能体系统,尤其是在 робо控制网络应用中,通过低功率通信网络进行信息交换,具有限制的功率和无法识别的频率带宽和扩散因子。
  • methods: 本文使用了一种 Structural Pattern 在 LoRa 信号的快速频率表示中找到一个简单的解决方案,将问题转化为一个分类问题,可以使用神经网络实现。
  • results: 本文表明,如果攻击者可以成功地确定目标信号的带宽和扩散因子,那么 LoRa 协议就会受到拒绝服务攻击。
    Abstract Many large-scale distributed multi-agent systems exchange information over low-power communication networks. In particular, agents intermittently communicate state and control signals in robotic network applications, often with limited power over an unlicensed spectrum, prone to eavesdropping and denial-of-service attacks. In this paper, we argue that a widely popular low-power communication protocol known as LoRa is vulnerable to denial-of-service attacks by an unauthenticated attacker if it can successfully identify a target signal's bandwidth and spreading factor. Leveraging a structural pattern in the LoRa signal's instantaneous frequency representation, we relate the problem of jointly inferring the two unknown parameters to a classification problem, which can be efficiently implemented using neural networks.
    摘要 很多大规模分布式多代理系统通过低功率通信网络进行信息交换。特别是在机器人网络应用中,代理器间断断续地交换状态和控制信号,经常具有有限的功率和无license频段,容易受到侦测和拒绝服务攻击。在这篇论文中,我们 argue That a widely popular low-power communication protocol known as LoRa is vulnerable to denial-of-service attacks by an unauthenticated attacker if it can successfully identify a target signal's bandwidth and spreading factor。通过利用LoRa信号的快速频率表示结构特征,我们将相应的问题相似于一个分类问题,可以使用神经网络高效地解决。

Curve Your Attention: Mixed-Curvature Transformers for Graph Representation Learning

  • paper_url: http://arxiv.org/abs/2309.04082
  • repo_url: None
  • paper_authors: Sungjun Cho, Seunghyuk Cho, Sungwoo Park, Hankook Lee, Honglak Lee, Moontae Lee
  • for: 学习实际图像中的层次或循环结构,而传统的欧几何空间不能够准确地表示这些结构。
  • methods: 提出全产品托卡斯特谐变换器,一种可以在常数曲率空间上操作的全通过积分的变换器,可以在endorse-to-end的方式中学习不同曲率的图像。
  • results: 对图像重建和节点分类进行了实验,并证明了通过托卡斯特谐变换器可以更好地学习非欧几何图像。
    Abstract Real-world graphs naturally exhibit hierarchical or cyclical structures that are unfit for the typical Euclidean space. While there exist graph neural networks that leverage hyperbolic or spherical spaces to learn representations that embed such structures more accurately, these methods are confined under the message-passing paradigm, making the models vulnerable against side-effects such as oversmoothing and oversquashing. More recent work have proposed global attention-based graph Transformers that can easily model long-range interactions, but their extensions towards non-Euclidean geometry are yet unexplored. To bridge this gap, we propose Fully Product-Stereographic Transformer, a generalization of Transformers towards operating entirely on the product of constant curvature spaces. When combined with tokenized graph Transformers, our model can learn the curvature appropriate for the input graph in an end-to-end fashion, without the need of additional tuning on different curvature initializations. We also provide a kernelized approach to non-Euclidean attention, which enables our model to run in time and memory cost linear to the number of nodes and edges while respecting the underlying geometry. Experiments on graph reconstruction and node classification demonstrate the benefits of generalizing Transformers to the non-Euclidean domain.
    摘要 real-world 图表自然地具有层次或循环结构,这些结构不适合传统的欧几何空间。有些图注意力网络可以利用折射或圆形空间来学习更准确的表示,但这些方法受到消息传递模式的限制,容易导致过滤和压缩的问题。更新的工作已经提出了全球注意力基于图Transformers,可以轻松模型长距离交互,但这些方法在非欧几何几何上的扩展仍然未知。为了bridging这个鸿沟,我们提出了全产品投影特征变换器,一种基于Transformers的非欧几何特征变换器。当与 токен化的图Transformers结合使用时,我们的模型可以在终端方式上学习输入图的曲率,无需额外调整不同曲率的初始化。我们还提供了非欧几何注意力的kernel方法,使得我们的模型在时间和内存成本 linear 于图的节点和边数量的情况下运行,同时尊重下面的几何结构。实验表示,通过将Transformers扩展到非欧几何领域,可以获得更好的图重建和节点分类性能。

SayNav: Grounding Large Language Models for Dynamic Planning to Navigation in New Environments

  • paper_url: http://arxiv.org/abs/2309.04077
  • repo_url: None
  • paper_authors: Abhinav Rajvanshi, Karan Sikka, Xiao Lin, Bhoram Lee, Han-Pang Chiu, Alvaro Velasquez
  • for: 这篇论文是用于提出一种新的方法,即 SayNav,以便 autonomous agent 在未知环境中完成复杂的导航任务。
  • methods: SayNav 使用了一种新的固定机制,即增量建立 3D 场景图,以便将人类知识从大型自然语言模型 (LLMs) 中生成适合情况的高级计划。
  • results: SayNav 在一个新的多物体导航任务上取得了95.35% 的成功率(与基线相比,只有56.06%),这highlights SayNav 的能力在大规模新环境中生成动态计划并成功地找到物体。 In addition, SayNav 还能够效率地泛化到实际环境中。
    Abstract Semantic reasoning and dynamic planning capabilities are crucial for an autonomous agent to perform complex navigation tasks in unknown environments. It requires a large amount of common-sense knowledge, that humans possess, to succeed in these tasks. We present SayNav, a new approach that leverages human knowledge from Large Language Models (LLMs) for efficient generalization to complex navigation tasks in unknown large-scale environments. SayNav uses a novel grounding mechanism, that incrementally builds a 3D scene graph of the explored environment as inputs to LLMs, for generating feasible and contextually appropriate high-level plans for navigation. The LLM-generated plan is then executed by a pre-trained low-level planner, that treats each planned step as a short-distance point-goal navigation sub-task. SayNav dynamically generates step-by-step instructions during navigation and continuously refines future steps based on newly perceived information. We evaluate SayNav on a new multi-object navigation task, that requires the agent to utilize a massive amount of human knowledge to efficiently search multiple different objects in an unknown environment. SayNav outperforms an oracle based Point-nav baseline, achieving a success rate of 95.35% (vs 56.06% for the baseline), under the ideal settings on this task, highlighting its ability to generate dynamic plans for successfully locating objects in large-scale new environments. In addition, SayNav also enables efficient generalization from simulation to real environments.
    摘要 <>Semantic reasoning和动态规划能力是自主代理人完成复杂的导航任务所必需的。这需要人类具备的通用常识知识,以确保成功完成这些任务。我们介绍了SayNav,一种新的方法,利用大型自然语言模型(LLM)来提高导航任务的效率。SayNav使用一种新的固定机制,逐步建立未知环境中探索的3D场景图,并将这些图用于LLM生成高级计划。生成的计划将被一个已经训练的低级 плаanner执行,该 плаanner将每个规划步骤视为短距离点 Navigation sub-任务。SayNav在导航过程中动态生成步骤指示,并在新获得的信息基础上不断改进未来步骤。我们对SayNav进行了一种新的多对象导航任务的评估,该任务需要代理人能够效率地搜索未知环境中多种不同的对象。SayNav在理想的设置下,与基线点导航比较,成功率为95.35%(vs 56.06%), highlighting its ability to generate dynamic plans for successfully locating objects in large-scale new environments。此外,SayNav还能够效率地从 simulate 到实际环境的总结。>>>

Computationally Efficient Data-Driven Discovery and Linear Representation of Nonlinear Systems For Control

  • paper_url: http://arxiv.org/abs/2309.04074
  • repo_url: https://github.com/tiwari-research-group/koopman-control-no-decoder
  • paper_authors: Madhur Tiwari, George Nehma, Bethany Lusch
  • for: 这个研究旨在开发一种基于库曼算法的数据驱动框架,用于系统识别和非线性系统的线性化。
  • methods: 我们提出的方法基于深度学习框架,包括回归学习。我们使用一个线性quadratic控制来控制得到的线性系统。
  • results: 我们通过一个拖钩系统的示例来展示我们的方法,并在噪音数据上进行了仿真。我们发现,与Autoencoder为基础的方法相比,我们的方法更高效地训练,并且更准确地预测。
    Abstract This work focuses on developing a data-driven framework using Koopman operator theory for system identification and linearization of nonlinear systems for control. Our proposed method presents a deep learning framework with recursive learning. The resulting linear system is controlled using a linear quadratic control. An illustrative example using a pendulum system is presented with simulations on noisy data. We show that our proposed method is trained more efficiently and is more accurate than an autoencoder baseline.
    摘要 这个研究将关注使用库曼算法来建立数据驱动的框架,用于系统识别和线性化非线性系统,以便控制。我们提出的方法使用循环学习,并使用线性quadratic控制来控制得到的线性系统。我们通过用一个悬钩系统为例,并在噪声数据上进行了仿真,显示了我们的提议方法可以更高效地训练和更准确地识别。Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you prefer Traditional Chinese, please let me know and I can provide the translation in that format as well.

Inferring physical laws by artificial intelligence based causal models

  • paper_url: http://arxiv.org/abs/2309.04069
  • repo_url: None
  • paper_authors: Jorawar Singh, Kishor Bharti, Arvind
  • for: 这个论文旨在探讨人工智能和机器学习在科学研究中的应用,以及如何通过 causal learning 模型捕捉物理现象的 causal 关系。
  • methods: 这篇论文使用了 causal inference 和 intervención 的原则来研究物理现象的 causal 关系,并通过对一些常见物理现象的研究来证明模型的可靠性。
  • results: 研究发现,这种 causal learning 模型不仅可以捕捉数据之间的相关性,还可以正确地确定变量之间的 causal 关系,从而增强(或减弱)对模型的信任度。
    Abstract The advances in Artificial Intelligence (AI) and Machine Learning (ML) have opened up many avenues for scientific research, and are adding new dimensions to the process of knowledge creation. However, even the most powerful and versatile of ML applications till date are primarily in the domain of analysis of associations and boil down to complex data fitting. Judea Pearl has pointed out that Artificial General Intelligence must involve interventions involving the acts of doing and imagining. Any machine assisted scientific discovery thus must include casual analysis and interventions. In this context, we propose a causal learning model of physical principles, which not only recognizes correlations but also brings out casual relationships. We use the principles of causal inference and interventions to study the cause-and-effect relationships in the context of some well-known physical phenomena. We show that this technique can not only figure out associations among data, but is also able to correctly ascertain the cause-and-effect relations amongst the variables, thereby strengthening (or weakening) our confidence in the proposed model of the underlying physical process.
    摘要 人工智能(AI)和机器学习(ML)的进步开创了许多科研领域的可能性,增加了知识创造的新维度。然而,至今最强大和多样化的ML应用都是对关系分析的,即使是复杂数据适应。 Judah Pearl指出,人工通用智能必须包括干预和想象的行为。因此,任何机器帮助科研发现都必须包括 causal 分析和干预。在这个上下文中,我们提议一种 causal 学习模型,不仅认可关系,还能够揭示 causal 关系。我们使用 causal 推理和干预来研究物理现象中的因果关系。我们示例了这种技术不仅可以找出数据中的相关性,还能够正确地确定变量之间的因果关系,从而增强(或削弱)我们对下面物理过程的模型的信任程度。

3D Denoisers are Good 2D Teachers: Molecular Pretraining via Denoising and Cross-Modal Distillation

  • paper_url: http://arxiv.org/abs/2309.04062
  • repo_url: None
  • paper_authors: Sungjun Cho, Dae-Woong Jeong, Sung Moon Ko, Jinwoo Kim, Sehui Han, Seunghoon Hong, Honglak Lee, Moontae Lee
  • for: 本研究旨在开发一种自然语言处理技术,用于提高分子性质预测的准确率和效率。
  • methods: 本研究使用了一种名为D&D的自适应分子表示学习框架,通过对3D杂交的知识进行填充和跨模态知识传递来学习分子表示。
  • results: 实验表明,使用D&D框架学习的图表示能够基于2D图像推断3D信息,并在实际分子性质预测任务中表现出优于其他基eline。
    Abstract Pretraining molecular representations from large unlabeled data is essential for molecular property prediction due to the high cost of obtaining ground-truth labels. While there exist various 2D graph-based molecular pretraining approaches, these methods struggle to show statistically significant gains in predictive performance. Recent work have thus instead proposed 3D conformer-based pretraining under the task of denoising, which led to promising results. During downstream finetuning, however, models trained with 3D conformers require accurate atom-coordinates of previously unseen molecules, which are computationally expensive to acquire at scale. In light of this limitation, we propose D&D, a self-supervised molecular representation learning framework that pretrains a 2D graph encoder by distilling representations from a 3D denoiser. With denoising followed by cross-modal knowledge distillation, our approach enjoys use of knowledge obtained from denoising as well as painless application to downstream tasks with no access to accurate conformers. Experiments on real-world molecular property prediction datasets show that the graph encoder trained via D&D can infer 3D information based on the 2D graph and shows superior performance and label-efficiency against other baselines.
    摘要 <>传统的分类任务中使用大量标注数据进行预训练是不可避免的,因为获取标注数据的成本很高。然而,现有的2D图形基于的分子预训练方法很难实现 statistically significant的提升。最近的研究则提出了基于3D杂化的分子预训练,这些方法在预测性能方面具有了良好的结果。然而,在下游训练中,使用3D杂化的模型需要在未看过的分子上获取高精度的原子坐标,这是 computationally expensive的。为了解决这个限制,我们提出了 D&D,一种自助学习的分子表示学习框架,通过减噪和跨模态知识传递来预训练2D图形编码器。我们的方法可以充分利用减噪中获得的知识,同时在下游任务中不需要高精度的原子坐标。实验表明,通过 D&D 预训练的图形编码器可以基于2D图形中推断出3D信息,并与其他基准方法相比具有更好的性能和标签效率。Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you prefer Traditional Chinese, please let me know and I can provide the translation in that format instead.