cs.AI - 2023-09-15

URA*: Uncertainty-aware Path Planning using Image-based Aerial-to-Ground Traversability Estimation for Off-road Environments

  • paper_url: http://arxiv.org/abs/2309.08814
  • repo_url: https://github.com/shaswata09/offroad-path-planning
  • paper_authors: Charles Moore, Shaswata Mitra, Nisha Pillai, Marc Moore, Sudip Mittal, Cindy Bethel, Jingdao Chen
  • for: 提高自动驾驶车辆在非公路环境中的导航能力,即使环境不具备明确的地图或路标。
  • methods: 使用空中图像 ensemble convolutional neural network (CNN) 模型进行像素级游离性预测,并使用不确定性意识的规划算法计算基于这些不纯净的游离性预测值的最佳路径。
  • results: 在马萨诸塞路数据集、深度 globeb 数据集以及密西西比州大学的非公路试验场上评估了提案的图像分割和规划方法,结果显示,提案的方法在初始路径质量和在线机器人运行中的重新规划能力方面卓越于传统规划算法。
    Abstract A major challenge with off-road autonomous navigation is the lack of maps or road markings that can be used to plan a path for autonomous robots. Classical path planning methods mostly assume a perfectly known environment without accounting for the inherent perception and sensing uncertainty from detecting terrain and obstacles in off-road environments. Recent work in computer vision and deep neural networks has advanced the capability of terrain traversability segmentation from raw images; however, the feasibility of using these noisy segmentation maps for navigation and path planning has not been adequately explored. To address this problem, this research proposes an uncertainty-aware path planning method, URA* using aerial images for autonomous navigation in off-road environments. An ensemble convolutional neural network (CNN) model is first used to perform pixel-level traversability estimation from aerial images of the region of interest. The traversability predictions are represented as a grid of traversal probability values. An uncertainty-aware planner is then applied to compute the best path from a start point to a goal point given these noisy traversal probability estimates. The proposed planner also incorporates replanning techniques to allow rapid replanning during online robot operation. The proposed method is evaluated on the Massachusetts Road Dataset, the DeepGlobe dataset, as well as a dataset of aerial images from off-road proving grounds at Mississippi State University. Results show that the proposed image segmentation and planning methods outperform conventional planning algorithms in terms of the quality and feasibility of the initial path, as well as the quality of replanned paths.
    摘要 寻路自主导航在非公路环境中存在一个主要挑战,即缺乏地图或路标,可以用来规划自主机器人的路径。传统的寻路规划方法大多假设环境是完全知道的,而不考虑探测地形和障碍物时的感知和探测不确定性。现代计算机视觉和深度神经网络技术已经提高了从照片中提取地形可行性分割的能力,但是使用这些不稳定的分割图来 Navigation and path planning has not been fully explored. To address this problem, this research proposes an uncertainty-aware path planning method, URA* using aerial images for autonomous navigation in off-road environments. An ensemble convolutional neural network (CNN) model is first used to perform pixel-level traversability estimation from aerial images of the region of interest. The traversability predictions are represented as a grid of traversal probability values. An uncertainty-aware planner is then applied to compute the best path from a start point to a goal point given these noisy traversal probability estimates. The proposed planner also incorporates replanning techniques to allow rapid replanning during online robot operation. The proposed method is evaluated on the Massachusetts Road Dataset, the DeepGlobe dataset, as well as a dataset of aerial images from off-road proving grounds at Mississippi State University. Results show that the proposed image segmentation and planning methods outperform conventional planning algorithms in terms of the quality and feasibility of the initial path, as well as the quality of replanned paths.

SHAPNN: Shapley Value Regularized Tabular Neural Network

  • paper_url: http://arxiv.org/abs/2309.08799
  • repo_url: None
  • paper_authors: Qisen Cheng, Shuhui Qu, Janghwan Lee
  • for: 这篇论文是为了提出一种新的深度表格数据模型化架构,用于超级vised学习。
  • methods: 该方法使用了Shapley值,一种已有的解释黑盒模型的技术,并通过标准的反向传播优化方法来训练神经网络。
  • results: 该方法可以提供有效的解释,不带计算负担,对数据实例和数据集都适用。此外,预测与解释服务为训练提供了一种正则化,提高模型的性能。此外,正则化预测还提高了模型的持续学习能力。在各种公开数据集上进行了比较,SHAPNN在AUROC、透明度和流动数据Robustness等方面表现出色。
    Abstract We present SHAPNN, a novel deep tabular data modeling architecture designed for supervised learning. Our approach leverages Shapley values, a well-established technique for explaining black-box models. Our neural network is trained using standard backward propagation optimization methods, and is regularized with realtime estimated Shapley values. Our method offers several advantages, including the ability to provide valid explanations with no computational overhead for data instances and datasets. Additionally, prediction with explanation serves as a regularizer, which improves the model's performance. Moreover, the regularized prediction enhances the model's capability for continual learning. We evaluate our method on various publicly available datasets and compare it with state-of-the-art deep neural network models, demonstrating the superior performance of SHAPNN in terms of AUROC, transparency, as well as robustness to streaming data.
    摘要 我们提出了SHAPNN,一种新的深度表格数据建模架构,适用于指导学习。我们的方法利用了Shapley值,已经广泛应用于解释黑盒模型的技术。我们的神经网络通过标准的反向传播优化方法进行训练,并通过实时估算的Shapley值进行规regularization。我们的方法具有许多优势,包括无计算 overhead的数据实例和数据集解释能力,预测与解释服务为评估器,提高模型性能,以及对流动数据的强化。我们对多个公开的数据集进行了评估,并与现有的深度神经网络模型进行了比较,demonstrating SHAPNN在AUROC、透明度和流动数据的稳定性方面的超越性。

D3: Data Diversity Design for Systematic Generalization in Visual Question Answering

  • paper_url: http://arxiv.org/abs/2309.08798
  • repo_url: https://github.com/amiroor/d3questiongenerationclevr
  • paper_authors: Amir Rahimi, Vanessa D’Amario, Moyuru Yamada, Kentaro Takemoto, Tomotake Sasaki, Xavier Boix
  • for: This paper is written to investigate the role of data diversity in achieving systematic generalization in Visual Question Answering (VQA) tasks.
  • methods: The paper uses a combination of simple tasks and neural network architectures to study the effect of data diversity on systematic generalization.
  • results: The paper finds that the diversity of simple tasks is a key factor in achieving systematic generalization, and that neural module networks are better able to leverage all forms of data diversity than monolithic architectures.
    Abstract Systematic generalization is a crucial aspect of intelligence, which refers to the ability to generalize to novel tasks by combining known subtasks and concepts. One critical factor that has been shown to influence systematic generalization is the diversity of training data. However, diversity can be defined in various ways, as data have many factors of variation. A more granular understanding of how different aspects of data diversity affect systematic generalization is lacking. We present new evidence in the problem of Visual Question Answering (VQA) that reveals that the diversity of simple tasks (i.e. tasks formed by a few subtasks and concepts) plays a key role in achieving systematic generalization. This implies that it may not be essential to gather a large and varied number of complex tasks, which could be costly to obtain. We demonstrate that this result is independent of the similarity between the training and testing data and applies to well-known families of neural network architectures for VQA (i.e. monolithic architectures and neural module networks). Additionally, we observe that neural module networks leverage all forms of data diversity we evaluated, while monolithic architectures require more extensive amounts of data to do so. These findings provide a first step towards understanding the interactions between data diversity design, neural network architectures, and systematic generalization capabilities.
    摘要 系统化总结是智能的关键方面,它指的是将已知的子任务和概念结合起来应对新任务。一个重要的因素是训练数据的多样性,但是多样性的定义可以有多种方式。我们没有充分理解不同方面的数据多样性如何影响系统化总结。我们在视觉问答任务(VQA)中提供新的证据,显示了简单任务的多样性对系统化总结的关键作用。这意味着可能不需要收集大量和多样化的复杂任务,这些任务可能是成本高的获取。我们证明这结果不виси于训练和测试数据的相似性,并适用于常见的 neural network 架构(i.e. 单一架构和神经模块网络)。此外,我们发现神经模块网络可以利用我们评估的所有多样性,而单一架构则需要更多的数据来做到这一点。这些发现为我们理解数据多样性设计、神经网络架构和系统化总结能力之间的交互关系提供了一个第一步。

Privacy-preserving Early Detection of Epileptic Seizures in Videos

  • paper_url: http://arxiv.org/abs/2309.08794
  • repo_url: https://github.com/DevD1092/seizure-detection
  • paper_authors: Deval Mehta, Shobi Sivathamboo, Hugh Simpson, Patrick Kwan, Terence O`Brien, Zongyuan Ge
  • for: 这 paper 的目的是开发一种隐私保护的视频 Epilepsy 病发识别方法。
  • methods: 该方法使用了 optical flow 特征从 Epilepsy 病发视频中提取信息,并利用 transformer 基于进行逐步知识储存,从而保持隐私。
  • results: 该方法可以在 Privacy-preserving 的情况下,准确地检测 TCS 病发,其准确率为 83.9%。In English:
  • for: The purpose of this paper is to develop a privacy-preserving video-based epileptic seizure classification method.
  • methods: The method uses optical flow features extracted from epileptic seizure videos and utilizes a transformer-based progressive knowledge distillation to preserve privacy.
  • results: The method can accurately detect tonic-clonic seizures (TCSs) in a privacy-preserving manner, with an accuracy of 83.9%.
    Abstract In this work, we contribute towards the development of video-based epileptic seizure classification by introducing a novel framework (SETR-PKD), which could achieve privacy-preserved early detection of seizures in videos. Specifically, our framework has two significant components - (1) It is built upon optical flow features extracted from the video of a seizure, which encodes the seizure motion semiotics while preserving the privacy of the patient; (2) It utilizes a transformer based progressive knowledge distillation, where the knowledge is gradually distilled from networks trained on a longer portion of video samples to the ones which will operate on shorter portions. Thus, our proposed framework addresses the limitations of the current approaches which compromise the privacy of the patients by directly operating on the RGB video of a seizure as well as impede real-time detection of a seizure by utilizing the full video sample to make a prediction. Our SETR-PKD framework could detect tonic-clonic seizures (TCSs) in a privacy-preserving manner with an accuracy of 83.9% while they are only half-way into their progression. Our data and code is available at https://github.com/DevD1092/seizure-detection
    摘要 在这项工作中,我们贡献了视频基于癫痫病发迹的分类发展,通过引入一种新的框架(SETR-PKD),实现了隐私保护的早期发现癫痫病发。 Specifically,我们的框架有两个重要组成部分:(1)基于视频中癫痫病发动的Optical flow特征提取,这些特征可以储存癫痫病发动的semiotics,同时保护病人的隐私;(2)使用基于转换器的进行进行逐步知识储存,其中知识从网络训练在更长的视频批处理中进行储存,然后逐步储存到短视频批处理中。因此,我们的提出的SETR-PKD框架可以在隐私保护的情况下,将癫痫病发迹推断到83.9%的准确率,而这些癫痫病发迹只有在进程中的半段。我们的数据和代码可以在https://github.com/DevD1092/seizure-detection上获取。

Fin-Fact: A Benchmark Dataset for Multimodal Financial Fact Checking and Explanation Generation

  • paper_url: http://arxiv.org/abs/2309.08793
  • repo_url: https://github.com/iit-dm/fin-fact
  • paper_authors: Aman Rangapur, Haoran Wang, Kai Shu
  • for: 这篇论文旨在提供一个Multimodal Fact-Checking数据集,用于combating misinformation in finance,提高金融报道和新闻传递的可靠性。
  • methods: 该论文使用了专业 fact-checker 的标注和解释,以及多Modal的内容,包括文本和视觉内容,以提高事实检查的准确性。
  • results: 该论文提供了一个名为 Fin-Fact 的数据集,包括专业 fact-checker 的标注和解释,以及多Modal的内容,可以帮助用户更好地理解事实检查的决策理由,提高事实检查的可靠性。
    Abstract Fact-checking in financial domain is under explored, and there is a shortage of quality dataset in this domain. In this paper, we propose Fin-Fact, a benchmark dataset for multimodal fact-checking within the financial domain. Notably, it includes professional fact-checker annotations and justifications, providing expertise and credibility. With its multimodal nature encompassing both textual and visual content, Fin-Fact provides complementary information sources to enhance factuality analysis. Its primary objective is combating misinformation in finance, fostering transparency, and building trust in financial reporting and news dissemination. By offering insightful explanations, Fin-Fact empowers users, including domain experts and end-users, to understand the reasoning behind fact-checking decisions, validating claim credibility, and fostering trust in the fact-checking process. The Fin-Fact dataset, along with our experimental codes is available at https://github.com/IIT-DM/Fin-Fact/.
    摘要 Fact-checking in the financial domain is under-explored, and there is a shortage of high-quality datasets in this domain. In this paper, we propose Fin-Fact, a benchmark dataset for multimodal fact-checking within the financial domain. Notably, it includes professional fact-checker annotations and justifications, providing expertise and credibility. With its multimodal nature encompassing both textual and visual content, Fin-Fact provides complementary information sources to enhance factuality analysis. Its primary objective is to combat misinformation in finance, promote transparency, and build trust in financial reporting and news dissemination. By offering insightful explanations, Fin-Fact empowers users, including domain experts and end-users, to understand the reasoning behind fact-checking decisions, validate claim credibility, and foster trust in the fact-checking process. The Fin-Fact dataset, along with our experimental codes, is available at .

Projected Task-Specific Layers for Multi-Task Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2309.08776
  • repo_url: https://github.com/josselinsomervilleroberts/ptsl
  • paper_authors: Josselin Somerville Roberts, Julia Di
  • for: 这篇论文是为了探讨多任务强化学习如何帮助机器人在家居和办公室中扩展多种操作任务。
  • methods: 该论文提出了一种新的架构,即 Projected Task-Specific Layers (PTSL),它使用公共策略和密集任务特定更正来更好地表达共享和变量任务信息。
  • results: 论文表明,PTSL模型在Meta-World benchmark上的MT10和MT50测试集上比现有的状态arius方法表现出色,为一个Sawyer机械臂完成了10和50个目标conditioned任务。
    Abstract Multi-task reinforcement learning could enable robots to scale across a wide variety of manipulation tasks in homes and workplaces. However, generalizing from one task to another and mitigating negative task interference still remains a challenge. Addressing this challenge by successfully sharing information across tasks will depend on how well the structure underlying the tasks is captured. In this work, we introduce our new architecture, Projected Task-Specific Layers (PTSL), that leverages a common policy with dense task-specific corrections through task-specific layers to better express shared and variable task information. We then show that our model outperforms the state of the art on the MT10 and MT50 benchmarks of Meta-World consisting of 10 and 50 goal-conditioned tasks for a Sawyer arm.
    摘要 多任务强化学习可以让机器人在家庭和办公室中扩展到各种抓取任务。然而,从一个任务到另一个任务的泛化和消除负面任务干扰仍然是挑战。我们认为,如果可以成功共享任务之间的信息,那么解决这一挑战就取决于任务结构的捕捉程度。在这篇文章中,我们介绍了我们的新架构,即项目化任务特定层(PTSL),它利用共享策略和密集任务特定修正来更好地表达共享和变量任务信息。我们然后证明,我们的模型在Meta-World的MT10和MT50标准准的10和50个目标决策任务中表现出色,超过了现有的状态艺术。

Enhance audio generation controllability through representation similarity regularization

  • paper_url: http://arxiv.org/abs/2309.08773
  • repo_url: None
  • paper_authors: Yangyang Shi, Gael Le Lan, Varun Nagaraja, Zhaoheng Ni, Xinhao Mei, Ernie Chang, Forrest Iandola, Yang Liu, Vikas Chandra
  • for: 这个论文目的是提高语音生成的控制,使语音和文本表示之间更加一致。
  • methods: 这个论文使用了语音和文本token表示的整合,并在CFG阶段引入了表示规范,以避免语音和文本之间的差异。
  • results: 实验结果表明,该方法可以提高对audio和music生成的对象指标,以及人类对audio生成的感知。
    Abstract This paper presents an innovative approach to enhance control over audio generation by emphasizing the alignment between audio and text representations during model training. In the context of language model-based audio generation, the model leverages input from both textual and audio token representations to predict subsequent audio tokens. However, the current configuration lacks explicit regularization to ensure the alignment between the chosen text representation and the language model's predictions. Our proposal involves the incorporation of audio and text representation regularization, particularly during the classifier-free guidance (CFG) phase, where the text condition is excluded from cross attention during language model training. The aim of this proposed representation regularization is to minimize discrepancies in audio and text similarity compared to other samples within the same training batch. Experimental results on both music and audio generation tasks demonstrate that our proposed methods lead to improvements in objective metrics for both audio and music generation, as well as an enhancement in the human perception for audio generation.
    摘要 Our proposed method involves adding regularization to the audio and text representations, particularly during the classifier-free guidance (CFG) phase, where the text condition is excluded from cross attention during language model training. The goal of this regularization is to minimize the differences in audio and text similarity compared to other samples within the same training batch.Experimental results on both music and audio generation tasks show that our proposed methods lead to improvements in objective metrics for both audio and music generation, as well as an enhancement in human perception for audio generation.Translation notes:* "audio generation" is translated as "音频生成" (yīn yuè shēng chéng)* "text representation" is translated as "文本表示" (wén tè biǎo yì)* "audio token" is translated as "音频符" (yīn yuè fú)* "classifier-free guidance" is translated as "无类别指导" (wú lè bèi zhǐ dào)* "CFG phase" is translated as "CFG阶段" (CFG jiè dàn)Please note that the translation is in Simplified Chinese, and the word order may be different from the original text.

Rethinking Cross-Domain Pedestrian Detection: A Background-Focused Distribution Alignment Framework for Instance-Free One-Stage Detectors

  • paper_url: http://arxiv.org/abs/2309.08771
  • repo_url: https://github.com/caiyancheng/bfda
  • paper_authors: Yancheng Cai, Bo Zhang, Baopu Li, Tao Chen, Hongliang Yan, Jingdong Zhang, Jiahao Xu
  • for: 这个研究旨在将步行人检测器从一个标签丰富的领域扩展到另一个标签缺乏的领域,以应对实际应用中的问题。
  • methods: 这篇研究使用了领域调整的方法,将领域整合到执行阶段,并且将执行阶段的整合与实际应用中的一阶段检测器结合起来。
  • results: 这篇研究提出了一个名为“背景注重分布对齐”(BFDA)的新框架,用于训练领域适应的一阶段检测器。BFDA首先将背景特征分离出来,然后使用一种新的长Short识别子来对背景特征进行对齐。
    Abstract Cross-domain pedestrian detection aims to generalize pedestrian detectors from one label-rich domain to another label-scarce domain, which is crucial for various real-world applications. Most recent works focus on domain alignment to train domain-adaptive detectors either at the instance level or image level. From a practical point of view, one-stage detectors are faster. Therefore, we concentrate on designing a cross-domain algorithm for rapid one-stage detectors that lacks instance-level proposals and can only perform image-level feature alignment. However, pure image-level feature alignment causes the foreground-background misalignment issue to arise, i.e., the foreground features in the source domain image are falsely aligned with background features in the target domain image. To address this issue, we systematically analyze the importance of foreground and background in image-level cross-domain alignment, and learn that background plays a more critical role in image-level cross-domain alignment. Therefore, we focus on cross-domain background feature alignment while minimizing the influence of foreground features on the cross-domain alignment stage. This paper proposes a novel framework, namely, background-focused distribution alignment (BFDA), to train domain adaptive onestage pedestrian detectors. Specifically, BFDA first decouples the background features from the whole image feature maps and then aligns them via a novel long-short-range discriminator.
    摘要 cross-domain pedestrian detection aim to generalize pedestrian detectors from one label-rich domain to another label-scarce domain, which is crucial for various real-world applications. Most recent works focus on domain alignment to train domain-adaptive detectors either at the instance level or image level. From a practical point of view, one-stage detectors are faster. Therefore, we concentrate on designing a cross-domain algorithm for rapid one-stage detectors that lacks instance-level proposals and can only perform image-level feature alignment. However, pure image-level feature alignment causes the foreground-background misalignment issue to arise, i.e., the foreground features in the source domain image are falsely aligned with background features in the target domain image. To address this issue, we systematically analyze the importance of foreground and background in image-level cross-domain alignment, and learn that background plays a more critical role in image-level cross-domain alignment. Therefore, we focus on cross-domain background feature alignment while minimizing the influence of foreground features on the cross-domain alignment stage. This paper proposes a novel framework, namely, background-focused distribution alignment (BFDA), to train domain adaptive one-stage pedestrian detectors. Specifically, BFDA first decouples the background features from the whole image feature maps and then aligns them via a novel long-short-range discriminator.

AlbNER: A Corpus for Named Entity Recognition in Albanian

  • paper_url: http://arxiv.org/abs/2309.08741
  • repo_url: None
  • paper_authors: Erion Çano
  • for: 这篇论文是为了解决 Albanian 语言的自然语言处理和计算语言学研究中的资源短缺问题而写的。
  • methods: 该论文使用了 Albanian Wikipedia 文章中的 900 句话,并对其进行了标注名实体处理。
  • results: 根据BERT和RoBERTa变体在 AlbNER 数据上进行了微调和测试,结果显示,模型大小对 NER 性能有轻微的影响,而语言传递却有显著的影响。 AlbNER 数据集和这些结果可以作为未来实验的基线。
    Abstract Scarcity of resources such as annotated text corpora for under-resourced languages like Albanian is a serious impediment in computational linguistics and natural language processing research. This paper presents AlbNER, a corpus of 900 sentences with labeled named entities, collected from Albanian Wikipedia articles. Preliminary results with BERT and RoBERTa variants fine-tuned and tested with AlbNER data indicate that model size has slight impact on NER performance, whereas language transfer has a significant one. AlbNER corpus and these obtained results should serve as baselines for future experiments.
    摘要 资源短缺,如 Albanian 语言的注释文本库,是计算语言学和自然语言处理研究中的严重阻碍。本文提出了 AlbNER 词库,包含 900 句 sentences 的标注名实体,从 Albanian Wikipedia 文章中收集。初步结果表明,BERT 和 RoBERTa 变体在 AlbNER 数据上进行 fine-tuning 和测试后,模型大小对 NER 性能有轻微影响,而语言传递却有显著影响。AlbNER 词库和这些结果可作为未来实验的基线。

OpenAI Cribbed Our Tax Example, But Can GPT-4 Really Do Tax?

  • paper_url: http://arxiv.org/abs/2309.09992
  • repo_url: None
  • paper_authors: Andrew Blair-Stanek, Nils Holzenberger, Benjamin Van Durme
  • for: 本文解释OpenAI在其直播示例中使用GPT-4计算税务时取得的税法示例来源,以及GPT-4为什么得出错误答案,以及如何估算税务。
  • methods: GPT-4使用的方法是什么?
  • results: GPT-4的计算结果是什么?
    Abstract The authors explain where OpenAI got the tax law example in its livestream demonstration of GPT-4, why GPT-4 got the wrong answer, and how it fails to reliably calculate taxes.
    摘要 文章讲述OpenAI在其直播中展示GPT-4时得到了税法示例,GPT-4为什么得到了错误答案,以及如何计算税金的问题。Note that "GPT-4" in the text is translated as "GPT-4" in Simplified Chinese, as there is no direct translation for "GPT-4" in Simplified Chinese.

MusiLingo: Bridging Music and Text with Pre-trained Language Models for Music Captioning and Query Response

  • paper_url: http://arxiv.org/abs/2309.08730
  • repo_url: https://github.com/zihaod/musilingo
  • paper_authors: Zihao Deng, Yinghao Ma, Yudong Liu, Rongchen Guo, Ge Zhang, Wenhu Chen, Wenhao Huang, Emmanouil Benetos
  • for: 本研究旨在探讨大型语言模型(LLMs)在多媒体应用中的潜在性,特别是将文本和音乐领域融合在一起。
  • methods: 该研究提出了一种名为MusiLingo的音乐标题生成和音乐相关问题回答系统,使用单一投影层将音乐表示从预先冻结的MERT音乐抽象模型与预先冻结的LLaMA语言模型进行对应。
  • results: 经过训练和微调,MusiLingo在一个广泛的音乐标题集和 MusicInstruct(MI)集上表现竞争力强,能够生成高质量的音乐标题和音乐相关问题对。
    Abstract Large Language Models (LLMs) have shown immense potential in multimodal applications, yet the convergence of textual and musical domains remains relatively unexplored. To address this gap, we present MusiLingo, a novel system for music caption generation and music-related query responses. MusiLingo employs a single projection layer to align music representations from the pre-trained frozen music audio model MERT with the frozen LLaMA language model, bridging the gap between music audio and textual contexts. We train it on an extensive music caption dataset and fine-tune it with instructional data. Due to the scarcity of high-quality music Q&A datasets, we created the MusicInstruct (MI) dataset from MusicCaps, tailored for open-ended music inquiries. Empirical evaluations demonstrate its competitive performance in generating music captions and composing music-related Q&A pairs. Our introduced dataset enables notable advancements beyond previous ones.
    摘要 大型语言模型(LLM)在多模态应用中表现出了巨大的潜力,然而文字和音乐领域之间的融合仍然尚未得到充分的探索。为解决这个差距,我们介绍了MusiLingo,一种新的音乐标签生成和音乐相关问答系统。MusiLingo通过单一投影层将预训练的冻结音乐AUDIO模型MERT与预训练的LLaMA语言模型相关联,从而bridge音乐Audio和文字上下文之间的差距。我们在广泛的音乐标签数据集上训练并在指导数据上细化MusiLingo。由于音乐Q&A数据集的质量不够高,我们从MusicCaps中创建了MusicInstruct(MI)数据集,适用于开放式音乐问题。我们的实验表明MusiLingo在生成音乐标签和组合音乐相关问答对之间的表现具有竞争力。我们引入的数据集可以为之前的研究提供新的发展空间。

SculptBot: Pre-Trained Models for 3D Deformable Object Manipulation

  • paper_url: http://arxiv.org/abs/2309.08728
  • repo_url: None
  • paper_authors: Alison Bartsch, Charlotte Avra, Amir Barati Farimani
  • for: 这个论文旨在解决机器人 manipulate 塑料时存在的特殊挑战,包括高度自由度和自带遮挡。
  • methods: 该论文使用点云作为状态表示,并利用预训练的点云重建Transformer来学习材料塑形的积分动态模型,以预测抓取动作后材料的变形。
  • results: 实验表明提议的系统能够成功捕捉泥土的动态特征,并创造出简单的各种形状。
    Abstract Deformable object manipulation presents a unique set of challenges in robotic manipulation by exhibiting high degrees of freedom and severe self-occlusion. State representation for materials that exhibit plastic behavior, like modeling clay or bread dough, is also difficult because they permanently deform under stress and are constantly changing shape. In this work, we investigate each of these challenges using the task of robotic sculpting with a parallel gripper. We propose a system that uses point clouds as the state representation and leverages pre-trained point cloud reconstruction Transformer to learn a latent dynamics model to predict material deformations given a grasp action. We design a novel action sampling algorithm that reasons about geometrical differences between point clouds to further improve the efficiency of model-based planners. All data and experiments are conducted entirely in the real world. Our experiments show the proposed system is able to successfully capture the dynamics of clay, and is able to create a variety of simple shapes.
    摘要 <>TRANSLATE_TEXT可变形物体控制存在机器人控制中的独特挑战,它们具有高度的自由度和严重的自遮挡。物体表示方面,模型粘土或面包皮的材料表示也是困难的,因为它们在压力下永久弯曲并不断变化形状。在这项工作中,我们通过机器人雕塑任务使用并列抓取器进行研究。我们提议一个使用点云作为状态表示的系统,利用预训练的点云重建Transformer来学习材料变形的秘密模型,以便根据抓取动作预测材料的弯曲。我们设计了一种新的行动抽样算法,该算法根据点云的几何差异来进一步改进模型基于 плаanner的效率。所有数据和实验都在实际世界中进行。我们的实验表明我们的提议系统能够成功地捕捉粘土的动态,并能够创造一些简单的形状。Note: The text has been translated using the Google Translate API, which may not be perfect and may not capture all the nuances of the original text.

Modelling Irregularly Sampled Time Series Without Imputation

  • paper_url: http://arxiv.org/abs/2309.08698
  • repo_url: https://github.com/rohit102497/slan
  • paper_authors: Rohit Agarwal, Aman Sinha, Dilip K. Prasad, Marianne Clausel, Alexander Horsch, Mathieu Constant, Xavier Coubez
  • for: 实时系列时间序列 (ISTS) 模型化困难,因为有资料欠损。现有方法大多将 irregularly 标本资料转换为常规标本资料,并假设该欠损机制。我们提出了 SLAN(Switch LSTM Aggregate Network),它不需要假设任何背景过程,将 irregularly 标本资料模型化,并且在测量过程中动态地适应架构。
  • methods: SLAN 使用了一个包含 LSTM 的封包,并且透过在飞行器中动态地适应架构,以捕捉每个实验室的本地概要信息,并且保持全球概要状态。
  • results: 我们在 MIMIC-III、Physionet 2012 和 Physionet 2019 等公共资料集上显示了 SLAN 的优化性能,并且提供了可用的代码(https://github.com/Rohit102497/SLAN)。
    Abstract Modelling irregularly-sampled time series (ISTS) is challenging because of missing values. Most existing methods focus on handling ISTS by converting irregularly sampled data into regularly sampled data via imputation. These models assume an underlying missing mechanism leading to unwanted bias and sub-optimal performance. We present SLAN (Switch LSTM Aggregate Network), which utilizes a pack of LSTMs to model ISTS without imputation, eliminating the assumption of any underlying process. It dynamically adapts its architecture on the fly based on the measured sensors. SLAN exploits the irregularity information to capture each sensor's local summary explicitly and maintains a global summary state throughout the observational period. We demonstrate the efficacy of SLAN on publicly available datasets, namely, MIMIC-III, Physionet 2012 and Physionet 2019. The code is available at https://github.com/Rohit102497/SLAN.
    摘要 模型异常样本时序(ISTS)具有许多挑战,主要是因为缺失数据。现有的方法通常是将异常样本数据转换为常规样本数据,并通过插值来填充缺失数据。这些模型假设存在下面的缺失机制,从而导致不需要的偏见和优化性不佳。我们提出了SLAN( Switch LSTM Aggregate Network),它利用一个包含LSTM的模型来模型异常样本时序,不需要任何下面缺失机制的假设。它在实际测量的感知器上动态地调整其架构,并且在观测期间保持全局摘要状态。SLAN利用异常性信息来显式地捕捉每个感知器的本地摘要,并且在观测期间保持全局摘要状态。我们在公共可用的数据集上展示了SLAN的效果,具体来说是MIMIC-III、Physionet 2012和Physionet 2019等数据集。代码可以在https://github.com/Rohit102497/SLAN上获取。

  • paper_url: http://arxiv.org/abs/2309.08695
  • repo_url: https://github.com/ramonachristen/multilingual_negation_scope_resolution_on_legal_data
  • paper_authors: Ramona Christen, Anastassia Shaitarova, Matthias Stürmer, Joel Niklaus
  • for: 这个论文的目的是解决法律文本中的否定范围划分问题。
  • methods: 这个论文使用了语言模型,并在不同语言的文本上进行了精心的训练和评估。
  • results: 该论文的实验结果表明,使用语言模型进行法律文本中否定范围划分问题的解决可以达到token级别的86.7%的准确率,并在多种语言之间进行了跨语言比较。
    Abstract Resolving the scope of a negation within a sentence is a challenging NLP task. The complexity of legal texts and the lack of annotated in-domain negation corpora pose challenges for state-of-the-art (SotA) models when performing negation scope resolution on multilingual legal data. Our experiments demonstrate that models pre-trained without legal data underperform in the task of negation scope resolution. Our experiments, using language models exclusively fine-tuned on domains like literary texts and medical data, yield inferior results compared to the outcomes documented in prior cross-domain experiments. We release a new set of annotated court decisions in German, French, and Italian and use it to improve negation scope resolution in both zero-shot and multilingual settings. We achieve token-level F1-scores of up to 86.7% in our zero-shot cross-lingual experiments, where the models are trained on two languages of our legal datasets and evaluated on the third. Our multilingual experiments, where the models were trained on all available negation data and evaluated on our legal datasets, resulted in F1-scores of up to 91.1%.
    摘要

Fake News Detectors are Biased against Texts Generated by Large Language Models

  • paper_url: http://arxiv.org/abs/2309.08674
  • repo_url: None
  • paper_authors: Jinyan Su, Terry Yue Zhuo, Jonibek Mansurov, Di Wang, Preslav Nakov
  • for: 本研究旨在评估假新闻检测器在人类写作和大语言模型生成的假信息场景下的性能。
  • methods: 本研究使用了一种新的评估方法,利用对真新闻文章的 adversarial 重写来 Mitigate 大语言模型生成的假信息。
  • results: 研究发现,现有的假新闻检测器往往偏好地标记大语言模型生成的内容为假新闻,而常常错将人类写作的假新闻标为真新闻。这种偏好源于 LLM 输出的语言特征。通过对真新闻文章进行 adversarial 重写,研究人员提出了一种纠正策略,以提高假新闻检测器的检测精度。
    Abstract The spread of fake news has emerged as a critical challenge, undermining trust and posing threats to society. In the era of Large Language Models (LLMs), the capability to generate believable fake content has intensified these concerns. In this study, we present a novel paradigm to evaluate fake news detectors in scenarios involving both human-written and LLM-generated misinformation. Intriguingly, our findings reveal a significant bias in many existing detectors: they are more prone to flagging LLM-generated content as fake news while often misclassifying human-written fake news as genuine. This unexpected bias appears to arise from distinct linguistic patterns inherent to LLM outputs. To address this, we introduce a mitigation strategy that leverages adversarial training with LLM-paraphrased genuine news. The resulting model yielded marked improvements in detection accuracy for both human and LLM-generated news. To further catalyze research in this domain, we release two comprehensive datasets, \texttt{GossipCop++} and \texttt{PolitiFact++}, thus amalgamating human-validated articles with LLM-generated fake and real news.
    摘要 假新闻的扩散已成为一项重要挑战,消除信任和对社会构成威胁。在大语言模型(LLM)时代,生成可信假内容的能力加剧了这些问题。在这项研究中,我们提出了一种新的评估假新闻检测器的方法,包括人类写的和 LLM 生成的谣言。结果显示,许多现有的检测器具有偏见:它们更容易将 LLM 生成的内容标记为假新闻,而常常错过人类写的假新闻。这种意外的偏见似乎 arise from LLM 输出中的特殊语言特征。为解决这一问题,我们提出了一种缓解策略,利用 LLM 生成的真实新闻进行对抗训练。这种策略使得检测器的检测精度得到了显著提高,包括人类写的和 LLM 生成的新闻。为了进一步推动这个领域的研究,我们发布了两个全面的数据集,即 \texttt{GossipCop++} 和 \texttt{PolitiFact++},这两个数据集包括人类验证的文章以及 LLM 生成的假和真新闻。

Chain-of-Thought Reasoning is a Policy Improvement Operator

  • paper_url: http://arxiv.org/abs/2309.08589
  • repo_url: None
  • paper_authors: Hugh Zhang, David C. Parkes
  • for: 这项研究的目的是证明语言模型可以通过自我教育来学习新的技能,不需要人类示范。
  • methods: 这项研究使用了链式思维理解来让语言模型自我教育,然后通过细化模型来生成相同的答案。
  • results: 研究发现,通过SECToR自我教育方法,语言模型可以自主地学习加法运算,并且可以在无法访问任何示例数据的情况下自动地计算29位数字。
    Abstract Large language models have astounded the world with fascinating new capabilities. However, they currently lack the ability to teach themselves new skills, relying instead on being trained on large amounts of human-generated data. We introduce SECToR (Self-Education via Chain-of-Thought Reasoning), a proof-of-concept demonstration that language models can successfully teach themselves new skills using chain-of-thought reasoning. Inspired by previous work in both reinforcement learning (Silver et al., 2017) and human cognition (Kahneman, 2011), SECToR first uses chain-of-thought reasoning to slowly think its way through problems. SECToR then fine-tunes the model to generate those same answers, this time without using chain-of-thought reasoning. Language models trained via SECToR autonomously learn to add up to 29-digit numbers without any access to any ground truth examples beyond an initial supervised fine-tuning phase consisting only of numbers with 6 or fewer digits. Our central hypothesis is that chain-of-thought reasoning can act as a policy improvement operator, analogously to how Monte-Carlo Tree Search is used in AlphaZero. We hope that this research can lead to new directions in which language models can learn to teach themselves without the need for human demonstrations.
    摘要 大型语言模型在全球引起了轰动,推出了新的可能性。然而,现在这些模型仍然无法自学新技能,它们需要在大量的人类生成的数据上接受训练。我们介绍了SECToR(自我教育via推理链),这是一个证明了语言模型可以通过推理链自我教育新技能的证明。这个想法源于之前的循环学习(Silver et al., 2017)和人类认知(Kahneman, 2011)的研究。SECToR首先使用推理链思考问题,然后精确地调整模型,以便在不使用推理链的情况下产生相同的答案。这些语言模型通过SECToR自主学习了添加29位数字的技能,而且无需任何真实的示例例外于初始的监督练习阶段,该阶段只包含6位或少数数字。我们的中心假设是,推理链可以作为政策改善算法,类似于AlphaZero中使用的Monte-Carlo Tree Search。我们希望这些研究可以带来新的方向,让语言模型可以自主学习而不需要人类示例。

Compositional Foundation Models for Hierarchical Planning

  • paper_url: http://arxiv.org/abs/2309.08587
  • repo_url: None
  • paper_authors: Anurag Ajay, Seungwook Han, Yilun Du, Shaung Li, Abhi Gupta, Tommi Jaakkola, Josh Tenenbaum, Leslie Kaelbling, Akash Srivastava, Pulkit Agrawal
  • for: solves long-horizon tasks with hierarchical reasoning across spatial and temporal scales
  • methods: leverages multiple expert foundation models trained on language, vision, and action data; constructs symbolic plans grounded in the environment; infers actions from generated videos
  • results: illustrates efficacy and adaptability in three different long-horizon table-top manipulation tasks
    Abstract To make effective decisions in novel environments with long-horizon goals, it is crucial to engage in hierarchical reasoning across spatial and temporal scales. This entails planning abstract subgoal sequences, visually reasoning about the underlying plans, and executing actions in accordance with the devised plan through visual-motor control. We propose Compositional Foundation Models for Hierarchical Planning (HiP), a foundation model which leverages multiple expert foundation model trained on language, vision and action data individually jointly together to solve long-horizon tasks. We use a large language model to construct symbolic plans that are grounded in the environment through a large video diffusion model. Generated video plans are then grounded to visual-motor control, through an inverse dynamics model that infers actions from generated videos. To enable effective reasoning within this hierarchy, we enforce consistency between the models via iterative refinement. We illustrate the efficacy and adaptability of our approach in three different long-horizon table-top manipulation tasks.
    摘要 为了在新环境中做出有效的决策,需要进行层次的思考,覆盖空间和时间尺度。这意味着制定抽象子目标序列,视觉地理解下面的计划,并在计划中执行动作。我们提出了层次基础模型 для高级规划(HiP),一种基础模型,利用多个专家基础模型, individually jointly 处理语言、视觉和动作数据,解决长期任务。我们使用大型语言模型构建符号计划,将其grounded在环境中通过大型视频扩散模型。生成的视频计划然后grounded到视motor控制,通过反向动力学模型,从生成的视频中推算出动作。为了确保层次中的有效思考,我们在模型之间强制保持一致性,通过迭代纠正。我们在三个不同的长期表面 manipulate任务中证明了我们的方法的有效性和适应性。

How Transferable are Attribute Controllers on Pretrained Multilingual Translation Models?

  • paper_url: http://arxiv.org/abs/2309.08565
  • repo_url: https://github.com/dannigt/attribute-controller-transfer
  • paper_authors: Danni Liu, Jan Niehues
  • for: 这篇论文旨在探讨如何通过将属性控制器转移到新语言中,提高自然语言处理器的形式识别能力。
  • methods: 该论文使用基于预训练大量多语言翻译模型的无监督学习方法,以转移属性控制器的能力到新语言中。
  • results: 该论文通过对多种数据enario进行分析,发现了在零shot情况下和不同领域中的改进,并通过人工评估确认了这些改进的有效性。
    Abstract Customizing machine translation models to comply with fine-grained attributes such as formality has seen tremendous progress recently. However, current approaches mostly rely on at least some supervised data with attribute annotation. Data scarcity therefore remains a bottleneck to democratizing such customization possibilities to a wider range of languages, lower-resource ones in particular. Given recent progress in pretrained massively multilingual translation models, we use them as a foundation to transfer the attribute controlling capabilities to languages without supervised data. In this work, we present a comprehensive analysis of transferring attribute controllers based on a pretrained NLLB-200 model. We investigate both training- and inference-time control techniques under various data scenarios, and uncover their relative strengths and weaknesses in zero-shot performance and domain robustness. We show that both paradigms are complementary, as shown by consistent improvements on 5 zero-shot directions. Moreover, a human evaluation on a real low-resource language, Bengali, confirms our findings on zero-shot transfer to new target languages. The code is $\href{https://github.com/dannigt/attribute-controller-transfer}{\text{here}$.
    摘要 Recently, there has been significant progress in customizing machine translation models to accommodate fine-grained attributes, such as formality. However, most current approaches rely on at least some supervised data with attribute annotation, which can be a limiting factor in democratizing these customization possibilities to a wider range of languages, particularly lower-resource ones. To address this issue, we use pretrained massively multilingual translation models as a foundation to transfer the attribute controlling capabilities to languages without supervised data.In this study, we conduct a comprehensive analysis of transferring attribute controllers based on a pretrained NLLB-200 model. We investigate both training- and inference-time control techniques under various data scenarios and compare their performance in zero-shot settings and domain robustness. Our results show that both paradigms are complementary, as evidenced by consistent improvements in five zero-shot directions. Additionally, a human evaluation on a real low-resource language, Bengali, confirms our findings on zero-shot transfer to new target languages. The code for this study is available at $\href{https://github.com/dannigt/attribute-controller-transfer}{\text{here}$.

Deep Reinforcement Learning for Efficient and Fair Allocation of Health Care Resources

  • paper_url: http://arxiv.org/abs/2309.08560
  • repo_url: None
  • paper_authors: Yikuan Li, Chengsheng Mao, Kaixuan Huang, Hanyin Wang, Zheng Yu, Mengdi Wang, Yuan Luo
  • for: 该研究旨在提出一种基于强化学习的医疗资源分配策略优化方法,以实现公平、有效地分配医疗资源,特别是在医疗资源紧缺的情况下。
  • methods: 该研究使用了转换器基本深度Q网络(Transformer-based deep Q-network)来集成病例患者的疾病进程和患者之间的交互效应,以优化医疗资源分配策略。
  • results: 研究结果表明,该方法可以减少过度死亡和提高分配公平性,特别在不同水平的呼吸器短缺情况下。与现有的严重程度和混合病理因素基于的方法相比,该方法可以更好地保护患者的生命和健康。
    Abstract Scarcity of health care resources could result in the unavoidable consequence of rationing. For example, ventilators are often limited in supply, especially during public health emergencies or in resource-constrained health care settings, such as amid the pandemic of COVID-19. Currently, there is no universally accepted standard for health care resource allocation protocols, resulting in different governments prioritizing patients based on various criteria and heuristic-based protocols. In this study, we investigate the use of reinforcement learning for critical care resource allocation policy optimization to fairly and effectively ration resources. We propose a transformer-based deep Q-network to integrate the disease progression of individual patients and the interaction effects among patients during the critical care resource allocation. We aim to improve both fairness of allocation and overall patient outcomes. Our experiments demonstrate that our method significantly reduces excess deaths and achieves a more equitable distribution under different levels of ventilator shortage, when compared to existing severity-based and comorbidity-based methods in use by different governments. Our source code is included in the supplement and will be released on Github upon publication.
    摘要 缺乏医疗资源可能导致不可避免的配分。例如,呼吸机在公共卫生紧急情况或资源受限的医疗设施中经常受限,特别是在COVID-19大流行期间。目前没有一个通用的医疗资源配分协议标准,因此不同政府会根据不同的标准和规则来优先级化患者。在这项研究中,我们研究了使用强化学习来优化护理资源配分策略,以确保公平和有效地配分资源。我们提出了基于转换器的深度Q网络,以捕捉患者个人疾病进程和患者之间交互效应。我们希望通过提高配分公平性和总体患者结果来改善医疗资源配分。我们的实验表明,我们的方法可以在不同程度的呼吸机短缺情况下,比已有严重程度和相关病理因素基于的方法减少过剩死亡和实现更公平的配分。我们的源代码将在附录中提供,并在发表后在Github上发布。

HINT: Healthy Influential-Noise based Training to Defend against Data Poisoning Attacks

  • paper_url: http://arxiv.org/abs/2309.08549
  • repo_url: None
  • paper_authors: Minh-Hao Van, Alycia N. Carey, Xintao Wu
  • for: 防止数据恶意攻击,提高深度学习模型的安全性
  • methods: 基于影响函数的健康干扰训练方法(Healthy Influential-Noise based Training,简称HINT),通过Influence functions减弱攻击者对模型的攻击,并且可以在部分训练数据被修改时仍然高效地防御
  • results: 对两个图像 dataset 进行了大量的实验,并在不同的实际攻击场景下显示了HINT 可以有效地防止深度学习模型受到攻击的影响,并且比traditional方法更具有抗性。
    Abstract While numerous defense methods have been proposed to prohibit potential poisoning attacks from untrusted data sources, most research works only defend against specific attacks, which leaves many avenues for an adversary to exploit. In this work, we propose an efficient and robust training approach to defend against data poisoning attacks based on influence functions, named Healthy Influential-Noise based Training. Using influence functions, we craft healthy noise that helps to harden the classification model against poisoning attacks without significantly affecting the generalization ability on test data. In addition, our method can perform effectively when only a subset of the training data is modified, instead of the current method of adding noise to all examples that has been used in several previous works. We conduct comprehensive evaluations over two image datasets with state-of-the-art poisoning attacks under different realistic attack scenarios. Our empirical results show that HINT can efficiently protect deep learning models against the effect of both untargeted and targeted poisoning attacks.
    摘要 多种防御方法已经提议用于禁止来自不可信数据源的毒化攻击,但大多数研究工作只防御特定的攻击,这留下了许多攻击途径。在这项工作中,我们提出了一种高效和可靠的训练方法,以防止数据毒化攻击,名为健康影响函数基本训练(HINT)。使用影响函数,我们制作了健康的噪音,以强化分类模型对毒化攻击的抵抗能力,而不会对测试数据造成显著影响。此外,我们的方法可以在部分训练数据被修改时进行有效地工作,而不是现有的所有示例都添加噪音的方法。我们在两个图像 dataset 上进行了全面的评估,并在不同的现实攻击场景下进行了 state-of-the-art 毒化攻击。我们的实验结果表明,HINT 可以高效地保护深度学习模型对毒化攻击的影响。

When do Generative Query and Document Expansions Fail? A Comprehensive Study Across Methods, Retrievers, and Datasets

  • paper_url: http://arxiv.org/abs/2309.08541
  • repo_url: None
  • paper_authors: Orion Weller, Kyle Lo, David Wadden, Dawn Lawrie, Benjamin Van Durme, Arman Cohan, Luca Soldaini
  • for: 本研究用大型自然语言模型(LM)进行查询或文档扩展,以提高信息检索的通用性。
  • methods: 本研究使用了十一种扩展技术,并在十二个 datasets 上进行了多种分布差异的测试。
  • results: 研究发现,使用扩展技术可以提高弱化模型的表现,但是对于强化模型来说,通常会导致负面影响。通过质量错误分析,我们提出了一种recipe:在弱化模型或target dataset与训练 corpora 有显著差异时使用扩展技术,否则避免使用扩展技术,以保持 relevance signal 的清晰性。
    Abstract Using large language models (LMs) for query or document expansion can improve generalization in information retrieval. However, it is unknown whether these techniques are universally beneficial or only effective in specific settings, such as for particular retrieval models, dataset domains, or query types. To answer this, we conduct the first comprehensive analysis of LM-based expansion. We find that there exists a strong negative correlation between retriever performance and gains from expansion: expansion improves scores for weaker models, but generally harms stronger models. We show this trend holds across a set of eleven expansion techniques, twelve datasets with diverse distribution shifts, and twenty-four retrieval models. Through qualitative error analysis, we hypothesize that although expansions provide extra information (potentially improving recall), they add additional noise that makes it difficult to discern between the top relevant documents (thus introducing false positives). Our results suggest the following recipe: use expansions for weaker models or when the target dataset significantly differs from training corpus in format; otherwise, avoid expansions to keep the relevance signal clear.
    摘要

Visual Speech Recognition for Low-resource Languages with Automatic Labels From Whisper Model

  • paper_url: http://arxiv.org/abs/2309.08535
  • repo_url: https://github.com/jeonghun0716/visual-speech-recognition-for-low-resource-languages
  • paper_authors: Jeong Hun Yeo, Minsu Kim, Shinji Watanabe, Yong Man Ro
  • for: 这个论文旨在提出一种可以处理多种语言的强大视说识别(VSR)方法,尤其是为低资源语言,即有限量的标注数据。
  • methods: 我们使用了一种叫做Whisper模型,它可以同时进行语言标识和音频基于的语音识别。它可以筛选出所需语言的数据,并从无标注的多语言视频数据池中提取标签。
  • results: 我们通过比较使用自动生成的标签和人工标注的标签来评估VSR模型的性能,发现我们可以达到与人工标注标签相同的VSR性能,不需要人工干预。通过自动标签过程,我们生成了大规模的无标注多语言数据库,包括VoxCeleb2和AVSpeech,共计1,002小时的数据。使用自动生成的标签,我们在四种低资源语言中实现了新的州OF-the-art性能,大幅超越之前的方法。自动生成的标签可以在线获取:https://github.com/JeongHun0716/Visual-Speech-Recognition-for-Low-Resource-Languages。
    Abstract This paper proposes a powerful Visual Speech Recognition (VSR) method for multiple languages, especially for low-resource languages that have a limited number of labeled data. Different from previous methods that tried to improve the VSR performance for the target language by using knowledge learned from other languages, we explore whether we can increase the amount of training data itself for the different languages without human intervention. To this end, we employ a Whisper model which can conduct both language identification and audio-based speech recognition. It serves to filter data of the desired languages and transcribe labels from the unannotated, multilingual audio-visual data pool. By comparing the performances of VSR models trained on automatic labels and the human-annotated labels, we show that we can achieve similar VSR performance to that of human-annotated labels even without utilizing human annotations. Through the automated labeling process, we label large-scale unlabeled multilingual databases, VoxCeleb2 and AVSpeech, producing 1,002 hours of data for four low VSR resource languages, French, Italian, Spanish, and Portuguese. With the automatic labels, we achieve new state-of-the-art performance on mTEDx in four languages, significantly surpassing the previous methods. The automatic labels are available online: https://github.com/JeongHun0716/Visual-Speech-Recognition-for-Low-Resource-Languages
    摘要

Connecting Large Language Models with Evolutionary Algorithms Yields Powerful Prompt Optimizers

  • paper_url: http://arxiv.org/abs/2309.08532
  • repo_url: https://github.com/kyegomez/EAOT
  • paper_authors: Qingyan Guo, Rui Wang, Junliang Guo, Bei Li, Kaitao Song, Xu Tan, Guoqing Liu, Jiang Bian, Yujiu Yang
  • for: 这个论文的目的是提出一种新的整数提示优化框架,即EvoPrompt,以优化大型自然语言模型(LLMs)的提示,以提高 LLMS 的性能。
  • methods: 这个论文使用了进化算法(EAs)来优化提示,并将 LLMS 与 EAs 连接起来,以同时利用 LLMS 的强大自然语言处理能力和 EAs 的高效优化能力。
  • results: 对于9个语言理解和生成任务的数据集,EvoPrompt 可以减少人工设计提示的努力,并在closed-和open-source LLMS 上提高提示的性能,相比之下,人工设计提示和现有的自动提示生成方法可以提高性能的14%和25%。
    Abstract Large Language Models (LLMs) excel in various tasks, but they rely on carefully crafted prompts that often demand substantial human effort. To automate this process, in this paper, we propose a novel framework for discrete prompt optimization, called EvoPrompt, which borrows the idea of evolutionary algorithms (EAs) as they exhibit good performance and fast convergence. To enable EAs to work on discrete prompts, which are natural language expressions that need to be coherent and human-readable, we connect LLMs with EAs. This approach allows us to simultaneously leverage the powerful language processing capabilities of LLMs and the efficient optimization performance of EAs. Specifically, abstaining from any gradients or parameters, EvoPrompt starts from a population of prompts and iteratively generates new prompts with LLMs based on the evolutionary operators, improving the population based on the development set. We optimize prompts for both closed- and open-source LLMs including GPT-3.5 and Alpaca, on 9 datasets spanning language understanding and generation tasks. EvoPrompt significantly outperforms human-engineered prompts and existing methods for automatic prompt generation by up to 25% and 14% respectively. Furthermore, EvoPrompt demonstrates that connecting LLMs with EAs creates synergies, which could inspire further research on the combination of LLMs and conventional algorithms.
    摘要 大型语言模型(LLM)在多种任务中表现出色,但它们依赖于考究制定的提示,这些提示frequently需要人类努力。为了自动化这个过程,在这篇论文中,我们提出了一种新的框架,called EvoPrompt,它借鉴了进化算法(EA)的好performanc和快速收敛特点。为了让EA工作于精确的提示上,我们将LLM与EA相连接。这种方法允许我们同时利用LLM的强大语言处理能力和EA的高效优化能力。具体来说,不需要任何梯度或参数,EvoPrompt从一个人工提示的population开始,逐渐通过进化运算器生成新的提示,以提高人工提示的质量。我们对closed-和open-source LLMs,包括GPT-3.5和Alpaca,在9个语言理解和生成任务上进行优化。EvoPrompt在人工制定提示和现有自动提示生成方法的基础上显著超越了,最高超过25%和14%。此外,EvoPrompt表明了将LLM与EA相连接可以创造共生,这可能会推动进一步的LLM和传统算法的组合研究。

SCT: A Simple Baseline for Parameter-Efficient Fine-Tuning via Salient Channels

  • paper_url: http://arxiv.org/abs/2309.08513
  • repo_url: https://github.com/showlab/sct
  • paper_authors: Henry Hengyuan Zhao, Pichao Wang, Yuyang Zhao, Hao Luo, Fan Wang, Mike Zheng Shou
  • for: 提高下游任务的表示能力,降低 Parameters 成本
  • methods: 使用 Salient Channel Tuning(SCT)方法,通过在特定任务图像上进行前向传播,选择特定的通道,进行精细调整
  • results: 在 VTAB-1K benchmark 中,与全量 Fine-Tuning 相比,提高了 18 个任务中的 19 个任务,增加了只 0.11M 参数的 ViT-B,相比全量 Fine-Tuning 的 780 倍少于 Parameters 成本,并在领域总成本和少量学习中表现出色。
    Abstract Pre-trained vision transformers have strong representation benefits to various downstream tasks. Recently, many parameter-efficient fine-tuning (PEFT) methods have been proposed, and their experiments demonstrate that tuning only 1% of extra parameters could surpass full fine-tuning in low-data resource scenarios. However, these methods overlook the task-specific information when fine-tuning diverse downstream tasks. In this paper, we propose a simple yet effective method called "Salient Channel Tuning" (SCT) to leverage the task-specific information by forwarding the model with the task images to select partial channels in a feature map that enables us to tune only 1/8 channels leading to significantly lower parameter costs. Experiments outperform full fine-tuning on 18 out of 19 tasks in the VTAB-1K benchmark by adding only 0.11M parameters of the ViT-B, which is 780$\times$ fewer than its full fine-tuning counterpart. Furthermore, experiments on domain generalization and few-shot learning surpass other PEFT methods with lower parameter costs, demonstrating our proposed tuning technique's strong capability and effectiveness in the low-data regime.
    摘要 (Note: The text is translated into Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. The translation is written in the traditional Chinese character set.)

HealthFC: A Dataset of Health Claims for Evidence-Based Medical Fact-Checking

  • paper_url: http://arxiv.org/abs/2309.08503
  • repo_url: https://github.com/jvladika/healthfc
  • paper_authors: Juraj Vladika, Phillip Schneider, Florian Matthes
  • for: 这 paper 是为了提高自动化医疗信息检验的技术水平,提供一个大量的医疗相关laims数据集,并对其进行分类和分析。
  • methods: 该 paper 使用了一种新的数据集,包含 750 个医疗相关的laims,每个laim都被医疗专家 manually 标记为真或假,并且具有相关的临床研究证据。 authors 还提供了一些基本的机器学习模型,以便自动检验医疗信息的真实性。
  • results: 该 paper 的分析表明,该数据集具有一定的特点和挑战,例如医疗相关的laims 的复杂性和多样性。 authors 还提供了一些基本的模型性能分析,以便用于自动医疗信息检验。
    Abstract Seeking health-related advice on the internet has become a common practice in the digital era. Determining the trustworthiness of medical claims found online and finding appropriate evidence for this information is increasingly challenging. Fact-checking has emerged as an approach to assess the veracity of factual claims using evidence from credible knowledge sources. To help advance the automation of this task, in this paper, we introduce a novel dataset of 750 health-related claims, labeled for veracity by medical experts and backed with evidence from appropriate clinical studies. We provide an analysis of the dataset, highlighting its characteristics and challenges. The dataset can be used for Machine Learning tasks related to automated fact-checking such as evidence retrieval, veracity prediction, and explanation generation. For this purpose, we provide baseline models based on different approaches, examine their performance, and discuss the findings.
    摘要 在数字时代,通过互联网寻求医疗相关建议已成为普遍的做法。但确定互联网上医疗laim的可靠性和找到相关证据已变得越来越困难。因此,Fact-checking作为一种方法,可以评估医疗声明的真实性,并提供来自可靠的知识源的证据。在这篇论文中,我们介绍了一个新的医疗声明数据集,包含750个医疗声明,每个声明都被医学专家标记为真实或假,并且与相关的临床研究证据相匹配。我们对这个数据集进行了分析,描述了其特点和挑战。这个数据集可以用于自动化Fact-checking任务,如证据检索、真实性预测和解释生成。为此,我们提供了基准模型,评估其表现,并讨论发现。

P-ROCKET: Pruning Random Convolution Kernels for Time Series Classification

  • paper_url: http://arxiv.org/abs/2309.08499
  • repo_url: https://github.com/shaowuchen/p-rocket
  • paper_authors: Shaowu Chen, Weize Sun, Lei Huang, Xiaopeng Li, Qingyuan Wang, Deepu John
  • for: 这篇论文主要关注的是对时间序列资料进行分类,并且提出了一个名为P-ROCKET的新方法,以提高时间序列资料的分类精度和效率。
  • methods: 这篇论文使用了ROCKET和MINIROCKET两种时间序列分类模型,并且提出了一个名为S-ROCKET的进一步改进方法。S-ROCKET使用了一个轻量级的演化算法,并且对时间序列资料进行了快速的特征提取和分类。
  • results: 这篇论文的结果显示,P-ROCKET方法可以对时间序列资料进行高精度的分类,并且比ROCKET和MINIROCKET更具有时间效率。此外,P-ROCKET方法还可以实现资源受限的设备上的时间序列分类。
    Abstract In recent years, two time series classification models, ROCKET and MINIROCKET, have attracted much attention for their low training cost and state-of-the-art accuracy. Utilizing random 1-D convolutional kernels without training, ROCKET and MINIROCKET can rapidly extract features from time series data, allowing for the efficient fitting of linear classifiers. However, to comprehensively capture useful features, a large number of random kernels are required, which is incompatible for resource-constrained devices. Therefore, a heuristic evolutionary algorithm named S-ROCKET is devised to recognize and prune redundant kernels. Nevertheless, the inherent nature of evolutionary algorithms renders the evaluation of kernels within S-ROCKET an unacceptable time-consuming process. In this paper, diverging from S-ROCKET, which directly evaluates random kernels with nonsignificant differences, we remove kernels from a feature selection perspective by eliminating associating connections in the sequential classification layer. To this end, we start by formulating the pruning challenge as a Group Elastic Net classification problem and employ the ADMM method to arrive at a solution. Sequentially, we accelerate the aforementioned time-consuming solving process by bifurcating the $l_{2,1}$ and $l_2$ regularizations into two sequential stages and solve them separately, which ultimately forms our core algorithm, named P-ROCKET. Stage 1 of P-ROCKET employs group-wise regularization similarly to our initial ADMM-based Algorithm, but introduces dynamically varying penalties to greatly accelerate the process. To mitigate overfitting, Stage 2 of P-ROCKET implements element-wise regularization to refit a linear classifier, utilizing the retained features.
    摘要 近年来,两种时序分类模型ROCKET和MINIROCKET吸引了很多关注,因为它们可以快速提取时序数据中的特征,并且可以使用随机1D卷积神经网络来适应不同的时序数据。然而,为了全面捕捉有用的特征,需要使用很多随机卷积神经网络,这对资源受限的设备来说是不可接受的。因此,我们提出了一种遗传演化算法名为S-ROCKET,以确定和剪枝无用的卷积神经网络。然而,遗传演化算法的自然特性使得在S-ROCKET中评估卷积神经网络的过程变得非常时间consuming。在这篇论文中,我们与S-ROCKET不同,直接从feature选择的角度来剪枝无用的卷积神经网络。为此,我们将剪枝挑战转化为一个Group Elastic Net分类问题,并使用ADMM方法解决。然后,我们通过将$l_{2,1}$和$l_2$正则化分为两个阶段,并在每个阶段解决它们,从而形成我们的核心算法P-ROCKET。P-ROCKET的第一个阶段使用了群体强制正则化,类似于我们的初始ADMM算法,但是引入了动态变化的罚状,以大大加速过程。为避免过拟合,P-ROCKET的第二个阶段实施元素级正则化,以重新适应保留的特征。

Using Large Language Models for Knowledge Engineering (LLMKE): A Case Study on Wikidata

  • paper_url: http://arxiv.org/abs/2309.08491
  • repo_url: https://github.com/bohuizhang/llmke
  • paper_authors: Bohui Zhang, Ioannis Reklos, Nitisha Jain, Albert Meroño Peñuela, Elena Simperl
  • for: The paper is written for knowledge engineering tasks in the context of the ISWC 2023 LM-KBC Challenge.
  • methods: The paper uses pre-trained Large Language Models (LLMs) to produce relevant objects in string format and link them to their respective Wikidata QIDs. The pipeline developed in the paper is called LLMKE, which combines knowledge probing and Wikidata entity mapping.
  • results: The paper achieved a macro-averaged F1-score of 0.701 across the properties, with scores varying from 1.00 to 0.328. The results demonstrate that the knowledge of LLMs varies significantly depending on the domain and that further experimentation is required to determine the circumstances under which LLMs can be used for automatic Knowledge Base completion and correction.Here’s the simplified Chinese text for the three key points:
  • for: 这篇论文是为了知识工程任务而写的,具体是在ISWC 2023 LM-KBC Challenge 的上下文中。
  • methods: 这篇论文使用预训练的大语言模型(LLMs)来生成相关的对象序列和将其与 Wikidata QID 相关联。提出的管道是LLMKE,它将知识探测和 Wikidata 实体映射相结合。
  • results: 这篇论文在不同属性上的平均 F1 分数为 0.701,分数从 1.00 到 0.328 不等。结果表明 LLM 的知识在不同领域存在显著差异,需要进一步的实验来确定 LLM 在自动知识基础(如 Wikidata)完成和修正的情况下是否有可能使用。
    Abstract In this work, we explore the use of Large Language Models (LLMs) for knowledge engineering tasks in the context of the ISWC 2023 LM-KBC Challenge. For this task, given subject and relation pairs sourced from Wikidata, we utilize pre-trained LLMs to produce the relevant objects in string format and link them to their respective Wikidata QIDs. We developed a pipeline using LLMs for Knowledge Engineering (LLMKE), combining knowledge probing and Wikidata entity mapping. The method achieved a macro-averaged F1-score of 0.701 across the properties, with the scores varying from 1.00 to 0.328. These results demonstrate that the knowledge of LLMs varies significantly depending on the domain and that further experimentation is required to determine the circumstances under which LLMs can be used for automatic Knowledge Base (e.g., Wikidata) completion and correction. The investigation of the results also suggests the promising contribution of LLMs in collaborative knowledge engineering. LLMKE won Track 2 of the challenge. The implementation is available at https://github.com/bohuizhang/LLMKE.
    摘要 在这项工作中,我们探索了大语言模型(LLM)在知识工程任务中的应用 potential。在ISWC 2023 LM-KBC Challenge 的上下文中,我们使用预训练的 LLM 生成基于 Wikidata 中的主题和关系对的相关对象,并将其与其相应的 Wikidata QID 相连接。我们开发了一个基于 LLM 的知识工程管道(LLMKE),结合了知识探测和 Wikidata 实体映射。该方法在不同属性上的macro-平均 F1 分数达0.701,分布范围从1.00到0.328。这些结果表明 LLM 在不同领域中知识的 vary 程度很大,需要进一步的实验来确定 LLMEn可以用于自动完成和修正 Wikidata 知识库。此外,我们发现 LLMKE 在合作知识工程中的承诺可能性非常高。LLMKE 赢得了 Track 2 的挑战。实现可以在 GitHub 上找到:https://github.com/bohuizhang/LLMKE。

XFedHunter: An Explainable Federated Learning Framework for Advanced Persistent Threat Detection in SDN

  • paper_url: http://arxiv.org/abs/2309.08485
  • repo_url: None
  • paper_authors: Huynh Thai Thi, Ngo Duc Hoang Son, Phan The Duy, Nghi Hoang Khoa, Khoa Ngo-Khanh, Van-Hau Pham
  • For: This paper proposes a framework for detecting Advanced Persistent Threats (APTs) in Software-Defined Networking (SDN) using Federated Learning (FL) and Explainable Artificial Intelligence (XAI).* Methods: The proposed framework, called XFedHunter, utilizes a combination of Graph Neural Networks (GNNs) and Deep Learning models to detect APTs in the network system. It also leverages local cyber threat knowledge from multiple training collaborators to improve the accuracy of APT detection.* Results: The experimental results on two datasets (NF-ToN-IoT and DARPA TCE3) show that XFedHunter can effectively enhance the trust and accountability of Machine Learning (ML)-based systems for cybersecurity purposes without compromising privacy.
    Abstract Advanced Persistent Threat (APT) attacks are highly sophisticated and employ a multitude of advanced methods and techniques to target organizations and steal sensitive and confidential information. APT attacks consist of multiple stages and have a defined strategy, utilizing new and innovative techniques and technologies developed by hackers to evade security software monitoring. To effectively protect against APTs, detecting and predicting APT indicators with an explanation from Machine Learning (ML) prediction is crucial to reveal the characteristics of attackers lurking in the network system. Meanwhile, Federated Learning (FL) has emerged as a promising approach for building intelligent applications without compromising privacy. This is particularly important in cybersecurity, where sensitive data and high-quality labeling play a critical role in constructing effective machine learning models for detecting cyber threats. Therefore, this work proposes XFedHunter, an explainable federated learning framework for APT detection in Software-Defined Networking (SDN) leveraging local cyber threat knowledge from many training collaborators. In XFedHunter, Graph Neural Network (GNN) and Deep Learning model are utilized to reveal the malicious events effectively in the large number of normal ones in the network system. The experimental results on NF-ToN-IoT and DARPA TCE3 datasets indicate that our framework can enhance the trust and accountability of ML-based systems utilized for cybersecurity purposes without privacy leakage.
    摘要 高级持续性威胁(APT)攻击非常复杂,使用多种先进技术和方法来目标组织和窃取敏感和机密信息。APT攻击包括多个阶段,并且具有定制化的策略,通过新创新的技术和技术来欺骗安全软件监控。为了有效地防茧APT,探测和预测APT指标是非常重要的,以揭示攻击者在网络系统中隐藏的特征。此外,联邦学习(FL)已经出现为建立智能应用程序而不损失隐私的有力方法。这是特别重要的在网络安全方面,因为敏感数据和高质量的标签具有重要的作用在建立有效的机器学习模型来检测网络威胁。因此,本工作提出了XFedHunter,一个可解释的联邦学习框架,用于SDN中的APT检测。在XFedHunter中,图 neural network(GNN)和深度学习模型被用来有效地揭示网络系统中的恶意事件,从多个培训合作者的本地网络威胁知识中获得了有用的信息。实验结果表明,XFedHunter可以提高机器学习系统在网络安全领域的信任和负责任性,而不导致隐私泄露。

VulnSense: Efficient Vulnerability Detection in Ethereum Smart Contracts by Multimodal Learning with Graph Neural Network and Language Model

  • paper_url: http://arxiv.org/abs/2309.08474
  • repo_url: None
  • paper_authors: Phan The Duy, Nghi Hoang Khoa, Nguyen Huu Quyen, Le Cong Trinh, Vu Trung Kien, Trinh Minh Hoang, Van-Hau Pham
  • for: 本研究提出的 VulnSense 框架用于高效地检测 Ethereum 智能合约中的漏洞,使用多modal 学习方法在图像基本和自然语言处理(NLP)模型上。
  • methods: 我们的提议方案结合了三种智能合约中的特征,包括源代码、 opcode 序列和控制流图(CFG)从 bytecode 中提取。我们采用 BERT、BiLSTM 和 GNN 模型来提取和分析这些特征。
  • results: 我们的多modal 方法在 Ethereum 智能合约中检测漏洞的准确率达到 77.96%,胜过了单个特征或单个模型深度学习技术的限制。
    Abstract This paper presents VulnSense framework, a comprehensive approach to efficiently detect vulnerabilities in Ethereum smart contracts using a multimodal learning approach on graph-based and natural language processing (NLP) models. Our proposed framework combines three types of features from smart contracts comprising source code, opcode sequences, and control flow graph (CFG) extracted from bytecode. We employ Bidirectional Encoder Representations from Transformers (BERT), Bidirectional Long Short-Term Memory (BiLSTM) and Graph Neural Network (GNN) models to extract and analyze these features. The final layer of our multimodal approach consists of a fully connected layer used to predict vulnerabilities in Ethereum smart contracts. Addressing limitations of existing vulnerability detection methods relying on single-feature or single-model deep learning techniques, our method surpasses accuracy and effectiveness constraints. We assess VulnSense using a collection of 1.769 smart contracts derived from the combination of three datasets: Curated, SolidiFI-Benchmark, and Smartbugs Wild. We then make a comparison with various unimodal and multimodal learning techniques contributed by GNN, BiLSTM and BERT architectures. The experimental outcomes demonstrate the superior performance of our proposed approach, achieving an average accuracy of 77.96\% across all three categories of vulnerable smart contracts.
    摘要 Existing vulnerability detection methods have limitations, such as relying on single-feature or single-model deep learning techniques, which can lead to accuracy and effectiveness constraints. In contrast, VulnSense surpasses these limitations by using a multimodal approach that combines multiple features and models to improve accuracy and effectiveness.The framework is evaluated using a collection of 1,769 smart contracts derived from three datasets: Curated, SolidiFI-Benchmark, and Smartbugs Wild. The experimental results demonstrate the superior performance of VulnSense, achieving an average accuracy of 77.96% across all three categories of vulnerable smart contracts.In comparison with various unimodal and multimodal learning techniques contributed by GNN, BiLSTM, and BERT architectures, VulnSense outperforms them all, showcasing its effectiveness in detecting vulnerabilities in Ethereum smart contracts.

Explaining Search Result Stances to Opinionated People

  • paper_url: http://arxiv.org/abs/2309.08460
  • repo_url: None
  • paper_authors: Z. Wu, T. Draws, F. Cau, F. Barile, A. Rieger, N. Tintarev
  • for: 本研究旨在探讨在搜寻信息时,用户是否可以通过站点标签和其解释来避免认知偏见。
  • methods: 本研究使用自动分类和标签搜寻结果,并生成对这些标签的解释。研究采用用户研究(N = 203)来检验搜寻结果的偏见程度影响用户的搜寻结果阅读。
  • results: 研究发现,站点标签和解释可以导致用户阅读更多的搜寻结果,但没有证据表明用户在这种情况下会有系统化的意见变化。这些结果可以帮助搜寻引擎设计师制定更 Informed 的设计决策。
    Abstract People use web search engines to find information before forming opinions, which can lead to practical decisions with different levels of impact. The cognitive effort of search can leave opinionated users vulnerable to cognitive biases, e.g., the confirmation bias. In this paper, we investigate whether stance labels and their explanations can help users consume more diverse search results. We automatically classify and label search results on three topics (i.e., intellectual property rights, school uniforms, and atheism) as against, neutral, and in favor, and generate explanations for these labels. In a user study (N =203), we then investigate whether search result stance bias (balanced vs biased) and the level of explanation (plain text, label only, label and explanation) influence the diversity of search results clicked. We find that stance labels and explanations lead to a more diverse search result consumption. However, we do not find evidence for systematic opinion change among users in this context. We believe these results can help designers of search engines to make more informed design decisions.
    摘要 (人们通过网络搜索引擎来获取信息,然后形成意见,这可能会导致不同的决策。搜索过程中的认知努力可能会使用户受到认知偏见,例如确认偏见。本文 investigate 是否可以通过立场标签和其解释来帮助用户遍历更多的搜索结果。我们自动分类和标签搜索结果的三个话题(知识产权、学校制服和无神论)为对、中立和赞成,并生成对这些标签的解释。在用户研究(N = 203)中,我们调查了搜索结果的立场偏豫(偏豫VS不偏豫)和解释水平(纯文本、标签仅、标签和解释)对搜索结果遍历的影响。我们发现,立场标签和解释可以帮助用户遍历更多的搜索结果。然而,我们没有发现在这种情况下用户的系统意见变化的证据。我们认为这些结果可以帮助搜索引擎的设计师做出更 Informed 的设计决策。)

Adversarial Attacks on Tables with Entity Swap

  • paper_url: http://arxiv.org/abs/2309.08650
  • repo_url: None
  • paper_authors: Aneta Koleva, Martin Ringsquandl, Volker Tresp
  • for: 这个论文主要是为了研究表格理解的语言模型(LLMs)的可靠性和安全性。
  • methods: 这篇论文使用了一种新的黑盒子攻击方法,即Entity-swap攻击,用于测试表格语言模型的可靠性。
  • results: 实验结果显示,提议的攻击方法可以导致表格语言模型的性能下降达70%。
    Abstract The capabilities of large language models (LLMs) have been successfully applied in the context of table representation learning. The recently proposed tabular language models have reported state-of-the-art results across various tasks for table interpretation. However, a closer look into the datasets commonly used for evaluation reveals an entity leakage from the train set into the test set. Motivated by this observation, we explore adversarial attacks that represent a more realistic inference setup. Adversarial attacks on text have been shown to greatly affect the performance of LLMs, but currently, there are no attacks targeting tabular language models. In this paper, we propose an evasive entity-swap attack for the column type annotation (CTA) task. Our CTA attack is the first black-box attack on tables, where we employ a similarity-based sampling strategy to generate adversarial examples. The experimental results show that the proposed attack generates up to a 70% drop in performance.
    摘要 大型自然语言模型(LLM)的能力已成功应用于表示学习上。最近提出的表格语言模型已经在不同任务上报告了最佳结果。然而,通过审查通常用于评估的数据集,发现了实体泄露问题,即在训练集中出现的实体在测试集中也出现。这一观察导致我们研究了更加现实的攻击场景。在文本上进行了广泛研究的 adversarial 攻击,但目前没有targeting表格语言模型的攻击。本文提出了一种逃脱实体替换攻击(CTA)任务的黑盒攻击。我们使用相似性基于采样策略生成了反恶意示例。实验结果表明,我们的攻击可以导致表格语言模型的性能下降达70%。

Toward responsible face datasets: modeling the distribution of a disentangled latent space for sampling face images from demographic groups

  • paper_url: http://arxiv.org/abs/2309.08442
  • repo_url: None
  • paper_authors: Parsa Rahimi, Christophe Ecabert, Sebastien Marcel
  • for: 本研究旨在生成具有均衡和可能是偏好的人脸识别模型训练、规范或评估用的偏好自由的合成数据集。
  • methods: 我们提出使用StyleGAN中的抽象投影模型来生成任何权重组合的人脸(例如,西班牙裔女性)。我们的实验表明可以效果地生成任何权重组合,并且人脸的身份不同于原始训练数据集。
  • results: 我们的实验结果表明,我们可以生成任何权重组合的人脸,并且人脸的身份不同于原始训练数据集。此外,我们还发布了源代码。
    Abstract Recently, it has been exposed that some modern facial recognition systems could discriminate specific demographic groups and may lead to unfair attention with respect to various facial attributes such as gender and origin. The main reason are the biases inside datasets, unbalanced demographics, used to train theses models. Unfortunately, collecting a large-scale balanced dataset with respect to various demographics is impracticable. In this paper, we investigate as an alternative the generation of a balanced and possibly bias-free synthetic dataset that could be used to train, to regularize or to evaluate deep learning-based facial recognition models. We propose to use a simple method for modeling and sampling a disentangled projection of a StyleGAN latent space to generate any combination of demographic groups (e.g. $hispanic-female$). Our experiments show that we can synthesis any combination of demographic groups effectively and the identities are different from the original training dataset. We also released the source code.
    摘要 最近,有些现代人脸识别系统的问题在社会中引起了关注,即这些系统可能会对某些民族或性别进行不公正的待遇。主要的原因是训练模型时使用的数据集中含有偏见,人口结构不均衡。然而,收集大规模的均衡数据集是不现实的。在这篇论文中,我们提出了一种代替方案:通过使用简单的方法模拟和采样StyleGAN的积分空间,生成任何民族或性别组合(如《hispanic-female》)。我们的实验表明,我们可以有效地生成任何组合,并且生成的人脸不同于原始训练数据集。我们还发布了源代码。

Learning by Self-Explaining

  • paper_url: http://arxiv.org/abs/2309.08395
  • repo_url: https://github.com/abusufyanvu/6S191_MIT_DeepLearning
  • paper_authors: Wolfgang Stammer, Felix Friedrich, David Steinmann, Hikaru Shindo, Kristian Kersting
  • for: The paper aims to improve the learning process of artificial intelligence (AI) models by incorporating self-explaining mechanisms, which are inspired by human psychology and have been neglected in current AI research.
  • methods: The proposed Learning by Self-Explaining (LSX) paradigm involves a learning module performing a base task and providing explanations for its decisions, which are then evaluated by an internal critic module. The learner is refined with the critic’s feedback, and the loop is repeated as needed. The paper provides distinct instantiations of LSX for two different learner models.
  • results: The paper shows that LSX not only boosts the generalization abilities of AI models, particularly in small-data regimes, but also aids in mitigating the influence of confounding factors and leads to more task-specific and faithful model explanations. The results provide experimental evidence of the potential of self-explaining within the learning phase of an AI model.Here are the three points in Simplified Chinese text:
  • for: 本研究旨在通过人类智能发现的自我解释机制,提高人工智能(AI)模型的学习过程。
  • methods: 提议的学习 by Self-Explaining(LSX)方法包括一个学习模块(学习者)完成基础任务,并提供解释其决策的。内部批评模块接着评估这些解释的质量,并将学习者通过批评反馈进行改进。
  • results: 研究表明,LSX不仅提高了AI模型在小数据 regime下的泛化能力,而且帮助降低干扰因素的影响,并导致更任务特定和 faithful 的模型解释。结果提供了实验证明自我解释在AI模型学习阶段的潜在力。
    Abstract Artificial intelligence (AI) research has a long track record of drawing inspirations from findings from biology, in particular human intelligence. In contrast to current AI research that mainly treats explanations as a means for model inspection, a somewhat neglected finding from human psychology is the benefit of self-explaining in an agents' learning process. Motivated by this, we introduce a novel learning paradigm, termed Learning by Self-Explaining (LSX). The underlying idea is that a learning module (learner) performs a base task, e.g. image classification, and provides explanations to its decisions. An internal critic module next evaluates the quality of these explanations given the original task. Finally, the learner is refined with the critic's feedback and the loop is repeated as required. The intuition behind this is that an explanation is considered "good" if the critic can perform the same task given the respective explanation. Despite many implementation possibilities the structure of any LSX instantiation can be taxonomized based on four learning modules which we identify as: Fit, Explain, Reflect and Revise. In our work, we provide distinct instantiations of LSX for two different learner models, each illustrating different choices for the various LSX components. We broadly evaluate these on several datasets and show that Learning by Self-Explaining not only boosts the generalization abilities of AI models, particularly in small-data regimes, but also aids in mitigating the influence of confounding factors, as well as leading to more task specific and faithful model explanations. Overall, our results provide experimental evidence of the potential of self-explaining within the learning phase of an AI model.
    摘要 人工智能(AI)研究有一长时间的传统是从生物发现中翔生新想法,特别是人类智能。与当前AI研究主要视解释为模型检查的工具而言,一种受欢迎的发现来自人类心理学是自我解释在智能代理人学习过程中的利好。为了推广这一想法,我们介绍了一种新的学习方法,称为自我解释学习(LSX)。这个想法的基本思想是,一个学习模块(学习者)在完成基础任务(例如图像分类)后,对自己的决策提供解释。一个内部批评模块(批评者)然后评估这些解释的质量,基于原始任务。最后,学习者通过批评者的反馈进行改进,并重复这个过程。我们认为,一个解释是“好”的,如果批评者可以通过该解释完成同样的任务。在我们的工作中,我们提供了不同的LSX实现方式,每个实现方式都是基于四个学习模块:适应、解释、反思和修复。我们在不同的学习器模型上实现了这些模块,并对其进行了广泛的评估。我们发现,通过自我解释学习不仅可以提高AI模型的总体化能力,特别是在小数据情况下,还可以减少干扰因素的影响,以及导致更任务特定和 faithful 的解释。总的来说,我们的结果提供了实验证据,证明了自我解释在智能代理人学习阶段的潜在力量。

MAPLE: Mobile App Prediction Leveraging Large Language model Embeddings

  • paper_url: http://arxiv.org/abs/2309.08648
  • repo_url: None
  • paper_authors: Yonchanok Khaokaew, Hao Xue, Flora D. Salim
  • for: 这篇论文旨在预测移动应用程序的使用情况,以提高应用程序的使用体验和功能性。
  • methods: 该论文提出了一种基于大语言模型(LLM)的移动应用程序预测模型(MAPLE),通过利用 LLM 来准确预测应用程序的使用情况。
  • results: 经过对两个公共数据集的严格测试,MAPLE 模型能够准确地捕捉用户行为的复杂特征和上下文,并在不同情况下保持稳定性。这些结果证明了 MAPLE 模型在不同场景下的多样性和可靠性。
    Abstract Despite the rapid advancement of mobile applications, predicting app usage remains a formidable challenge due to intricate user behaviours and ever-evolving contexts. To address these issues, this paper introduces the Mobile App Prediction Leveraging Large Language Model Embeddings (MAPLE) model. This innovative approach utilizes Large Language Models (LLMs) to predict app usage accurately. Rigorous testing on two public datasets highlights MAPLE's capability to decipher intricate patterns and comprehend user contexts. These robust results confirm MAPLE's versatility and resilience across various scenarios. While its primary design caters to app prediction, the outcomes also emphasize the broader applicability of LLMs in different domains. Through this research, we emphasize the potential of LLMs in app usage prediction and suggest their transformative capacity in modelling human behaviours across diverse fields.
    摘要 尽管移动应用的快速发展,预测应用使用仍然是一项具有挑战性的任务,因为用户行为很复杂, Contexts 也在不断发展。为解决这些问题,这篇论文提出了 Mobile App Prediction Leveraging Large Language Model Embeddings(MAPLE)模型。这种创新的方法利用大型自然语言模型(LLMs)来准确预测应用使用。经过严谨的测试两个公共数据集,MAPLE 的 robust results 表明它可以准确地理解用户行为和上下文。这些结果也证明了 MAPLE 在不同情况下的 universality 和灵活性。虽然它的主要设计是为应用预测,但结果还证明了 LLMs 在不同领域中的应用性。通过这项研究,我们强调了 LLMs 在应用使用预测方面的潜在力量,并建议它们在不同领域中的模式化人类行为的能力。

Intent Detection at Scale: Tuning a Generic Model using Relevant Intents

  • paper_url: http://arxiv.org/abs/2309.08647
  • repo_url: None
  • paper_authors: Nichal Narotamo, David Aparicio, Tiago Mesquita, Mariana Almeida
  • for: 提高客户支持请求的意图预测精度,以提高客户服务系统的效率,让客户服务代表快速理解消息并根据需要优先级化回复。
  • methods: combining a single generic model with a per-client list of relevant intents,以降低训练和维护成本,同时为客户提供个性化体验,适应客户的变化需求。
  • results: 比较industry-specific模型,final system exhibits significantly superior performance,demonstrating its flexibility and ability to cater to diverse client needs。
    Abstract Accurately predicting the intent of customer support requests is vital for efficient support systems, enabling agents to quickly understand messages and prioritize responses accordingly. While different approaches exist for intent detection, maintaining separate client-specific or industry-specific models can be costly and impractical as the client base expands. This work proposes a system to scale intent predictions to various clients effectively, by combining a single generic model with a per-client list of relevant intents. Our approach minimizes training and maintenance costs while providing a personalized experience for clients, allowing for seamless adaptation to changes in their relevant intents. Furthermore, we propose a strategy for using the clients relevant intents as model features that proves to be resilient to changes in the relevant intents of clients -- a common occurrence in production environments. The final system exhibits significantly superior performance compared to industry-specific models, showcasing its flexibility and ability to cater to diverse client needs.
    摘要 correctly 预测客户支持请求的意图是支持系统的关键,允许代表快速理解消息并根据优先级回复。 although different 方法 exists for intent detection, maintaining separate client-specific or industry-specific models can be expensive and impractical as the client base expands. This work proposes a system to scale intent predictions to various clients effectively, by combining a single generic model with a per-client list of relevant intents. Our approach minimizes training and maintenance costs while providing a personalized experience for clients, allowing for seamless adaptation to changes in their relevant intents. Furthermore, we propose a strategy for using the clients relevant intents as model features that proves to be resilient to changes in the relevant intents of clients -- a common occurrence in production environments. The final system exhibits significantly superior performance compared to industry-specific models, showcasing its flexibility and ability to cater to diverse client needs.Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you prefer Traditional Chinese, I can provide that as well.

M$^3$Net: Multilevel, Mixed and Multistage Attention Network for Salient Object Detection

  • paper_url: http://arxiv.org/abs/2309.08365
  • repo_url: https://github.com/I2-Multimedia-Lab/M3Net
  • paper_authors: Yao Yuan, Pan Gao, XiaoYang Tan
  • For: 提高精准的焦点对象检测精度(Salient Object Detection)* Methods: 提出多层次、混合和多stage的注意力网络(M$^3$Net),包括多层次交互块和混合注意力块,以及多Stage超vision策略,以提高预测精度。* Results: 在六个复杂的 dataset 上进行实验,提出的 M$^3$Net 已经超过了最近的 CNN 和 Transformer 基于的 SOD 艺术,按照四个纪录metric 进行评估。
    Abstract Most existing salient object detection methods mostly use U-Net or feature pyramid structure, which simply aggregates feature maps of different scales, ignoring the uniqueness and interdependence of them and their respective contributions to the final prediction. To overcome these, we propose the M$^3$Net, i.e., the Multilevel, Mixed and Multistage attention network for Salient Object Detection (SOD). Firstly, we propose Multiscale Interaction Block which innovatively introduces the cross-attention approach to achieve the interaction between multilevel features, allowing high-level features to guide low-level feature learning and thus enhancing salient regions. Secondly, considering the fact that previous Transformer based SOD methods locate salient regions only using global self-attention while inevitably overlooking the details of complex objects, we propose the Mixed Attention Block. This block combines global self-attention and window self-attention, aiming at modeling context at both global and local levels to further improve the accuracy of the prediction map. Finally, we proposed a multilevel supervision strategy to optimize the aggregated feature stage-by-stage. Experiments on six challenging datasets demonstrate that the proposed M$^3$Net surpasses recent CNN and Transformer-based SOD arts in terms of four metrics. Codes are available at https://github.com/I2-Multimedia-Lab/M3Net.
    摘要 现有的突出对象检测方法大多使用U-Net或特征峰结构,这些方法简单地聚合不同级别的特征图,忽略这些特征图的独特性和互相关系,以及它们对最终预测的贡献。为了解决这些问题,我们提议M$^3$Net,即多级混合多 stagel attention网络 для突出对象检测(SOD)。首先,我们提出了多尺度交互块,这里引入了交叉注意方法,以实现不同级别特征之间的交互,使高级特征导导低级特征学习,从而提高突出区域。其次,由于前一些Transformer基于SOD方法只是通过全局自注意来定位突出区域,而忽略了复杂对象的详细特征,我们提出了混合注意块。这个块组合了全局自注意和窗口自注意,以实现在全局和局部两个水平上模型对象上下文,以进一步提高预测矩阵的准确性。最后,我们提出了一种多级超vision策略,以逐步优化聚合特征的结果。实验结果表明,提案的M$^3$Net在六个复杂的数据集上超过了最近的CNN和Transformer基于SOD艺术,以四个纪录为评价指标。代码可以在https://github.com/I2-Multimedia-Lab/M3Net 中找到。

Data Distribution Bottlenecks in Grounding Language Models to Knowledge Bases

  • paper_url: http://arxiv.org/abs/2309.08345
  • repo_url: None
  • paper_authors: Yiheng Shu, Zhiwei Yu
  • for: 本研究旨在探讨语言模型(LM)在大规模知识库(KB)环境中的可靠性问题。
  • methods: 本研究采用了多种数据增强技术来提高LM的抗耗性和通用性。
  • results: 实验结果显示,even with 我们的数据增强技术,当前的LM技术在各种环境下表现不佳,尤其是面临不同语言变体和数据分布问题时。
    Abstract Language models (LMs) have already demonstrated remarkable abilities in understanding and generating both natural and formal language. Despite these advances, their integration with real-world environments such as large-scale knowledge bases (KBs) remains an underdeveloped area, affecting applications such as semantic parsing and indulging in "hallucinated" information. This paper is an experimental investigation aimed at uncovering the robustness challenges that LMs encounter when tasked with knowledge base question answering (KBQA). The investigation covers scenarios with inconsistent data distribution between training and inference, such as generalization to unseen domains, adaptation to various language variations, and transferability across different datasets. Our comprehensive experiments reveal that even when employed with our proposed data augmentation techniques, advanced small and large language models exhibit poor performance in various dimensions. While the LM is a promising technology, the robustness of the current form in dealing with complex environments is fragile and of limited practicality because of the data distribution issue. This calls for future research on data collection and LM learning paradims.
    摘要 语言模型(LM)已经表现出了对自然语言和正式语言的理解和生成的很好的能力。然而,它们在大规模知识库(KB)中的集成仍然是一个未发掘的领域,这限制了应用程序的 semantic parsing 和 indulging in "hallucinated" information 等应用。这篇论文是一项实验性调查,旨在探讨语言模型在知识库问答(KBQA)中遇到的Robustness挑战。调查覆盖了训练和推理数据分布不均的场景,如泛化到未看过的领域、语言变化的适应以及不同数据集之间的传输性。我们的全面实验表明,即使使用我们提议的数据扩充技术,先进的小型和大型语言模型在多个维度上表现不佳。而当前的语言模型技术在复杂环境中的可靠性是脆弱的,这限制了它们的实际应用。这叫喊未来的研究投入数据收集和LM学习 парадиг。

Let’s Predict Who Will Move to a New Job

  • paper_url: http://arxiv.org/abs/2309.08333
  • repo_url: https://github.com/MarkipTheMudkip/in-class-project-2
  • paper_authors: Rania Mkhinini Gahar, Adel Hidri, Minyar Sassi Hidri
  • for: The paper is written to discuss the use of machine learning (ML) to predict whether an applicant will search for a new job or stay with the company.
  • methods: The paper uses data pre-processing, data encoding, and several ML algorithms such as Random Forest (RF), Logistic Regression (LR), Decision Tree (DT), and eXtreme Gradient Boosting (XGBoost) to predict job mobility. The synthetic minority oversampling technique (SMOTE) is also used to retain minority classes.
  • results: The paper assesses the performance of the ML models using decision support metrics such as precision, recall, F1-Score, and accuracy.Here are the three points in Simplified Chinese text:
  • for: 本研究使用机器学习(ML)预测候选人是否将搜索新工作或留在公司。
  • methods: 本研究使用数据预处理、数据编码和多种ML算法(Random Forest、Logistic Regression、Decision Tree和eXtreme Gradient Boosting)预测工作流动。同时,使用Synthetic Minority Oversampling Technique(SMOTE)保留少数类。
  • results: 本研究使用决策支持指标(精度、准确率、F1-Score和准确率)评估ML模型的性能。
    Abstract Any company's human resources department faces the challenge of predicting whether an applicant will search for a new job or stay with the company. In this paper, we discuss how machine learning (ML) is used to predict who will move to a new job. First, the data is pre-processed into a suitable format for ML models. To deal with categorical features, data encoding is applied and several MLA (ML Algorithms) are performed including Random Forest (RF), Logistic Regression (LR), Decision Tree (DT), and eXtreme Gradient Boosting (XGBoost). To improve the performance of ML models, the synthetic minority oversampling technique (SMOTE) is used to retain them. Models are assessed using decision support metrics such as precision, recall, F1-Score, and accuracy.
    摘要 任何公司的人力资源部门面临着预测申请人员是否会寻找新工作的挑战。在这篇论文中,我们讨论了如何使用机器学习(ML)预测申请人员会离开公司。首先,数据进行了适应ML模型的处理。为处理 categorical 特征,数据编码是应用到了数据中。然后,我们使用了多种MLA(ML算法),包括随机森林(RF)、логистиック回归(LR)、决策树(DT)和极限梯度提升(XGBoost)。为了提高ML模型的性能,我们使用了人工少数样本填充技术(SMOTE)保留它们。模型的评估使用了决策支持度量标准,如准确率、回归率、F1-Score 和准确率。

Large Intestine 3D Shape Refinement Using Point Diffusion Models for Digital Phantom Generation

  • paper_url: http://arxiv.org/abs/2309.08289
  • repo_url: None
  • paper_authors: Kaouther Mouheb, Mobina Ghojogh Nejad, Lavsen Dahal, Ehsan Samei, W. Paul Segars, Joseph Y. Lo
  • for: precisely modeling the surface of the large intestine for virtual imaging trials
  • methods: using geometric deep learning and denoising diffusion probabilistic models to refine segmentation results, and incorporating a state-of-the-art surface reconstruction model
  • results: significantly improved surface representation compared to initial segmentation, with a 70% reduction in Chamfer distance, a 32% reduction in Hausdorff distance, and a 6% reduction in Earth Mover’s distance
    Abstract Accurate 3D modeling of human organs plays a crucial role in building computational phantoms for virtual imaging trials. However, generating anatomically plausible reconstructions of organ surfaces from computed tomography scans remains challenging for many structures in the human body. This challenge is particularly evident when dealing with the large intestine. In this study, we leverage recent advancements in geometric deep learning and denoising diffusion probabilistic models to refine the segmentation results of the large intestine. We begin by representing the organ as point clouds sampled from the surface of the 3D segmentation mask. Subsequently, we employ a hierarchical variational autoencoder to obtain global and local latent representations of the organ's shape. We train two conditional denoising diffusion models in the hierarchical latent space to perform shape refinement. To further enhance our method, we incorporate a state-of-the-art surface reconstruction model, allowing us to generate smooth meshes from the obtained complete point clouds. Experimental results demonstrate the effectiveness of our approach in capturing both the global distribution of the organ's shape and its fine details. Our complete refinement pipeline demonstrates remarkable enhancements in surface representation compared to the initial segmentation, reducing the Chamfer distance by 70%, the Hausdorff distance by 32%, and the Earth Mover's distance by 6%. By combining geometric deep learning, denoising diffusion models, and advanced surface reconstruction techniques, our proposed method offers a promising solution for accurately modeling the large intestine's surface and can easily be extended to other anatomical structures.
    摘要 准确的3D人体器官模型在计算机phantom中扮演了关键角色。然而,从计算机扫描图像中生成人体器官表面的准确重建仍然是许多人体器官的挑战。特别是对大肠的重建。在这项研究中,我们利用最新的几何深度学习和减噪概率模型来改进大肠的分 segmentation结果。我们首先将器官表示为点云,从3D分 segmentation掩模中采样的表面。然后,我们使用层次的自变量autoencoder来获得全局和局部的准确表达。我们在层次的准确空间中训练了两个条件减噪液体模型,以进行形态纠正。为了进一步增强我们的方法,我们添加了当前的Surface reconstruction模型,以生成完整的点云。实验结果表明,我们的方法可以准确地捕捉大肠的全局分布和细节。我们的完整的纠正管道可以大幅提高表面的表达,减少Chamfer距离70%,减少 Hausdorff距离32%,减少地球货物距离6%。通过结合几何深度学习、减噪液体模型和高级Surface reconstruction技术,我们提出了一种准确地模型大肠表面的方法,可以轻松扩展到其他 анатомиче结构。

Cure the headache of Transformers via Collinear Constrained Attention

  • paper_url: http://arxiv.org/abs/2309.08646
  • repo_url: https://github.com/luban-agi/coca
  • paper_authors: Shiyi Zhu, Jing Ye, Wei Jiang, Qi Zhang, Yifan Wu, Jianguo Li
  • for: 本研究旨在解决 transformer 模型中快速升级的问题,提高 extrapolating 性能。
  • methods: 我们提出了一种新的自注意力结构 named Collinear Constrained Attention (CoCA),可以轻松地与现有的 extrapolation 和 interpoltion 方法集成。
  • results: 我们在推理阶段可以达到 16 倍至 24 倍的序列长度的极佳推理性能,无需 fine-tuning。同时,我们还提高了 CoCA 的计算和空间效率,确保其实用性。
    Abstract As the rapid progression of practical applications based on Large Language Models continues, the importance of extrapolating performance has grown exponentially in the research domain. In our study, we identified an anomalous behavior in Transformer models that had been previously overlooked, leading to a chaos around closest tokens which carried the most important information. We've coined this discovery the "headache of Transformers". To address this at its core, we introduced a novel self-attention structure named Collinear Constrained Attention (CoCA). This structure can be seamlessly integrated with existing extrapolation, interpolation methods, and other optimization strategies designed for traditional Transformer models. We have achieved excellent extrapolating performance even for 16 times to 24 times of sequence lengths during inference without any fine-tuning on our model. We have also enhanced CoCA's computational and spatial efficiency to ensure its practicality. We plan to open-source CoCA shortly. In the meantime, we've made our code available in the appendix for reappearing experiments.
    摘要 As the rapid progression of practical applications based on Large Language Models continues, the importance of extrapolating performance has grown exponentially in the research domain. In our study, we identified an anomalous behavior in Transformer models that had been previously overlooked, leading to a chaos around closest tokens which carried the most important information. We've coined this discovery the "headache of Transformers". To address this at its core, we introduced a novel self-attention structure named Collinear Constrained Attention (CoCA). This structure can be seamlessly integrated with existing extrapolation, interpolation methods, and other optimization strategies designed for traditional Transformer models. We have achieved excellent extrapolating performance even for 16 times to 24 times of sequence lengths during inference without any fine-tuning on our model. We have also enhanced CoCA's computational and spatial efficiency to ensure its practicality. We plan to open-source CoCA shortly. In the meantime, we've made our code available in the appendix for reappearing experiments.Here's the text in Traditional Chinese:为了推进大型语言模型的实用应用,研究领域中对于Transformer模型的推敲性能的重要性在不断增长。在我们的研究中,我们发现Transformer模型中的一个问题,即最重要的token附近的问题,导致了一个“Transformer的头痛”。为了解决这个问题的核心,我们提出了一个名为Collinear Constrained Attention(CoCA)的新型自我对应结构。这个结构可以与传统Transformer模型中的推敲、 interpolating 方法和其他优化策略一起使用。我们在推敲时可以 дости得出excel 的表现,而且不需要任何 fine-tuning。此外,我们还对CoCA进行了计算和空间的最佳化,以确保其实用性。我们计划将CoCA shortly 开源。在等待开源之前,我们的代码已经在附录中公开,供大家进行重复实验。

Quantitative and Qualitative Evaluation of Reinforcement Learning Policies for Autonomous Vehicles

  • paper_url: http://arxiv.org/abs/2309.08254
  • repo_url: None
  • paper_authors: Laura Ferrarotti, Massimiliano Luca, Gabriele Santin, Giorgio Previati, Gianpiero Mastinu, Elena Campi, Lorenzo Uccello, Antonino Albanese, Praveen Zalaya, Alessandro Roccasalva, Bruno Lepri
  • for: 优化交通流动在不断发展的交通环境中是非常重要,特别是在自动驾驶车辆(AV)与人类驾驶车辆共存的场景下。本文提出了一种使用Proximal Policy Optimization(PPO)束规学习算法优化AV的选择。我们学习了一个策略,以减少交通堵塞(即在enario中的时间)和减少污染。
  • methods: 我们使用了PPO算法来学习一个策略,并通过实验分析表明了我们的方法可以减少时间和污染水平。此外,我们使用了一个前沿的cockpit来评估学习的策略在实际情况下的性能。
  • results: 我们的实验结果表明,我们的方法可以减少时间和污染水平。此外,我们对参与了人类参与者使用的模拟器进行了评估,发现人驾驶车辆在优化AV动态时受益。此外,参与者们表示场景中80%的AV比20%的AV更安全和更流畅。
    Abstract Optimizing traffic dynamics in an evolving transportation landscape is crucial, particularly in scenarios where autonomous vehicles (AVs) with varying levels of autonomy coexist with human-driven cars. This paper presents a novel approach to optimizing choices of AVs using Proximal Policy Optimization (PPO), a reinforcement learning algorithm. We learned a policy to minimize traffic jams (i.e., minimize the time to cross the scenario) and to minimize pollution in a roundabout in Milan, Italy. Through empirical analysis, we demonstrate that our approach can reduce time and pollution levels. Furthermore, we qualitatively evaluate the learned policy using a cutting-edge cockpit to assess its performance in near-real-world conditions. To gauge the practicality and acceptability of the policy, we conducted evaluations with human participants using the simulator, focusing on a range of metrics like traffic smoothness and safety perception. In general, our findings show that human-driven vehicles benefit from optimizing AVs dynamics. Also, participants in the study highlighted that the scenario with 80\% AVs is perceived as safer than the scenario with 20\%. The same result is obtained for traffic smoothness perception.
    摘要 优化交通征动在发展中的交通环境中是非常重要,特别在自动驾驶车辆(AV)与人类驾驶车辆共存的场景下。这篇论文提出了一种使用 proximal policy optimization(PPO)算法来优化AV的选择。我们学习了一种缩减交通堵塞(即减少场景内的时间)和减少污染的策略,并在米兰市的一个环境中进行了实验性分析。我们的研究表明,我们的方法可以减少时间和污染水平。此外,我们使用 cutting-edge 车辆控制台进行了质量评估,以评估我们学习的策略在实际 Condition 下的性能。此外,我们对人类参与者使用 simulator 进行了评估,并考虑了一系列指标,如交通畅通性和安全感。我们的发现表明,人类驾驶车辆受益于优化AV征动,并且参与者认为80% AV scenario 比20% AV scenario 更安全和更畅通。

A Geometric Perspective on Autoencoders

  • paper_url: http://arxiv.org/abs/2309.08247
  • repo_url: https://github.com/clementchadebec/geometric_perspective_on_vaes
  • paper_authors: Yonghyeon Lee
  • for: 该论文探讨了自动编码器框架中的几何方面,尽管这一方面在论文中得到了较少的关注。
  • methods: 该论文使用了高维数据点集合,以及一种低维拟合 manifold 的学习方法,同时学习 manifold 和坐标图。
  • results: 该论文发现了多个自动编码器可以对同一个数据集进行学习,并且这些自动编码器可能会生成错误的拟合 manifold 和扭曲的坐标图表示。
    Abstract This paper presents the geometric aspect of the autoencoder framework, which, despite its importance, has been relatively less recognized. Given a set of high-dimensional data points that approximately lie on some lower-dimensional manifold, an autoencoder learns the \textit{manifold} and its \textit{coordinate chart}, simultaneously. This geometric perspective naturally raises inquiries like "Does a finite set of data points correspond to a single manifold?" or "Is there only one coordinate chart that can represent the manifold?". The responses to these questions are negative, implying that there are multiple solution autoencoders given a dataset. Consequently, they sometimes produce incorrect manifolds with severely distorted latent space representations. In this paper, we introduce recent geometric approaches that address these issues.
    摘要 Here's the translation in Simplified Chinese:这篇论文探讨了自动编码器框架中的几何方面,这一方面尚未得到足够的认可,虽然它对于自动编码器的学习和应用非常重要。给定一个高维数据点集,这些数据点约束在一个lower维度的拟合上,自动编码器就会学习这个拟合和它的坐标系。这个几何视角自然地引出了一些问题,例如"finite数据点集是否对应一个唯一的拟合?"或"是否只有一个坐标系可以表示拟合?"。答案是否定的,表示给定一个数据集,存在多个解的自动编码器,这些自动编码器可能会生成错误的拟合和扭曲的几何空间表示。在这篇论文中,我们介绍了最新的几何方法,以解决这些问题。

VERSE: Virtual-Gradient Aware Streaming Lifelong Learning with Anytime Inference

  • paper_url: http://arxiv.org/abs/2309.08227
  • repo_url: None
  • paper_authors: Soumya Banerjee, Vinay K. Verma, Avideep Mukherjee, Deepak Gupta, Vinay P. Namboodiri, Piyush Rai
  • for: 本研究旨在提出一种基于流处理的智能机器学习方法,可以在动态不确定环境中不断学习而不忘记之前学习的知识。
  • methods: 我们提出了一种基于虚拟梯度的 continual representation learning 方法,以防止忘记现象,并使用 exponential-moving-average-based semantic memory 进一步提高性能。
  • results: 我们的方法在多个 dataset 上进行了广泛的实验,并证明了与现有方法相比的优异性能。
    Abstract Lifelong learning, also referred to as continual learning, is the problem of training an AI agent continuously while also preventing it from forgetting its previously acquired knowledge. Most of the existing methods primarily focus on lifelong learning within a static environment and lack the ability to mitigate forgetting in a quickly-changing dynamic environment. Streaming lifelong learning is a challenging setting of lifelong learning with the goal of continuous learning in a dynamic non-stationary environment without forgetting. We introduce a novel approach to lifelong learning, which is streaming, requires a single pass over the data, can learn in a class-incremental manner, and can be evaluated on-the-fly (anytime inference). To accomplish these, we propose virtual gradients for continual representation learning to prevent catastrophic forgetting and leverage an exponential-moving-average-based semantic memory to further enhance performance. Extensive experiments on diverse datasets demonstrate our method's efficacy and superior performance over existing methods.
    摘要 人生学习,也称为不断学习,是训练AI机器人的问题,以及防止它忘记之前学习的知识。现有的方法主要集中在静止环境下的生命学习,缺乏适应快速变化的动态环境中减弱忘记的能力。流动生命学习是生命学习的挑战 Setting,旨在在动态不站ARY环境中不断学习,而无需忘记之前学习的知识。我们提出了一种新的生命学习方法,即流动学习,只需一次遍历数据,可以在类增量学习方式下进行学习,并可以在实时(任何时候)进行评估。为了实现这些目标,我们提出了虚拟梯度为持续表征学习防止悖论忘记,并利用加权平均值基于semantic Memory来进一步提高性能。我们在多种 datasets上进行了广泛的实验,并证明了我们的方法的有效性和超越现有方法的表现。

Using Large Language Model to Solve and Explain Physics Word Problems Approaching Human Level

  • paper_url: http://arxiv.org/abs/2309.08182
  • repo_url: None
  • paper_authors: Jingzhe Ding, Yan Cen, Xinyuan Wei
  • for: 这paper的目的是证明大语言模型(LLM)可以解决物理问题,不仅是文本问题。
  • methods: 这paper使用OpenAI的GPT3.5模型,通过zero-shot learning和few-shot learning来解决物理问题。
  • results: GPT3.5可以自动解决49.3%的问题,并且可以summarize问题的知识和提供相关的解释。
    Abstract Our work demonstrates that large language model (LLM) pre-trained on texts can not only solve pure math word problems, but also physics word problems, whose solution requires calculation and inference based on prior physical knowledge. We collect and annotate the first physics word problem dataset-PhysQA, which contains over 1000 junior high school physics word problems (covering Kinematics, Mass&Density, Mechanics, Heat, Electricity). Then we use OpenAI' s GPT3.5 to generate the answer of these problems and found that GPT3.5 could automatically solve 49.3% of the problems through zero-shot learning and 73.2% through few-shot learning. This result demonstrates that by using similar problems and their answers as prompt, LLM could solve elementary physics word problems approaching human level performance. In addition to solving problems, GPT3.5 can also summarize the knowledge or topics covered by the problems, provide relevant explanations, and generate new physics word problems based on the input. Our work is the first research to focus on the automatic solving, explanation, and generation of physics word problems across various types and scenarios, and we achieve an acceptable and state-of-the-art accuracy. This underscores the potential of LLMs for further applications in secondary education.
    摘要 我们的工作表明,大型语言模型(LLM)预训练在文本上可以解决纯数学问题,以及物理问题,其解决需要计算和推理基于先前的物理知识。我们收集和标注了首个物理问题集合-PhysQA,包含了1000多个初中物理问题(涵盖运动、质量和密度、力学、热和电)。然后我们使用OpenAI的GPT3.5来生成问题的答案,发现GPT3.5可以通过零批学习解决49.3%的问题和通过几批学习解决73.2%的问题。这一结果表明,通过使用相似的问题和答案作为提示,LLM可以解决初中物理问题,接近人类水平性能。此外,GPT3.5还可以概括问题中所覆盖的知识和主题,提供相关的解释,并生成基于输入的新的物理问题。我们的工作是首次关注自动解决、概括和生成物理问题的研究,并实现了可接受的和领先的准确率。这一结果强调了LLM的潜在应用在初等教育中。

Unveiling Invariances via Neural Network Pruning

  • paper_url: http://arxiv.org/abs/2309.08171
  • repo_url: None
  • paper_authors: Derek Xu, Yizhou Sun, Wei Wang
  • for: 这篇论文是为了学习数据依赖的不变性而写的。
  • methods: 这篇论文使用了退化 neural network 来学习数据依赖的不变性,并通过剔除来捕捉这些不变性。
  • results: 这篇论文的结果表明,使用这种方法可以在视觉和表格数据集上以更高效和更高准确性进行预测。
    Abstract Invariance describes transformations that do not alter data's underlying semantics. Neural networks that preserve natural invariance capture good inductive biases and achieve superior performance. Hence, modern networks are handcrafted to handle well-known invariances (ex. translations). We propose a framework to learn novel network architectures that capture data-dependent invariances via pruning. Our learned architectures consistently outperform dense neural networks on both vision and tabular datasets in both efficiency and effectiveness. We demonstrate our framework on multiple deep learning models across 3 vision and 40 tabular datasets.
    摘要 “几何描述不变数据的本质不变。神经网络,保持自然的几何不变,可以捕捉好的归化假设,实现更好的性能。因此,现代网络通常是以知道的几何不变为设计。我们提出了一个架构,可以透过剪裁来学习资料相对的几何不变。我们的学习架构在多个深度学习模型和多个视觉和数据集上表现出色,并且在效率和有效性两方面具有优秀的表现。”

To Predict or to Reject: Causal Effect Estimation with Uncertainty on Networked Data

  • paper_url: http://arxiv.org/abs/2309.08165
  • repo_url: None
  • paper_authors: Hechuan Wen, Tong Chen, Li Kheng Chai, Shazia Sadiq, Kai Zheng, Hongzhi Yin
  • for: 这篇论文旨在提出一个能够处理对称网络观察数据的不确定性的数据科学推理框架,以便更加可靠地估计个体水平的治疗效应。
  • methods: 这篇论文使用了深度kernel学习(GraphDKL)框架,具有体积约束(Lipschitz constraint),用于模型预测不确定性,并使用 Gaussian 过程来识别不可靠的估计。
  • results: 根据实验结果,这篇论文提出的方法能够更好地处理对称网络观察数据的不确定性,并且比传统方法更加可靠地估计个体水平的治疗效应。
    Abstract Due to the imbalanced nature of networked observational data, the causal effect predictions for some individuals can severely violate the positivity/overlap assumption, rendering unreliable estimations. Nevertheless, this potential risk of individual-level treatment effect estimation on networked data has been largely under-explored. To create a more trustworthy causal effect estimator, we propose the uncertainty-aware graph deep kernel learning (GraphDKL) framework with Lipschitz constraint to model the prediction uncertainty with Gaussian process and identify unreliable estimations. To the best of our knowledge, GraphDKL is the first framework to tackle the violation of positivity assumption when performing causal effect estimation with graphs. With extensive experiments, we demonstrate the superiority of our proposed method in uncertainty-aware causal effect estimation on networked data.
    摘要

Investigating the Applicability of Self-Assessment Tests for Personality Measurement of Large Language Models

  • paper_url: http://arxiv.org/abs/2309.08163
  • repo_url: None
  • paper_authors: Akshat Gupta, Xiaoyang Song, Gopala Anumanchipalli
  • for: 这三篇研究旨在测量大语言模型(LLM)的个性,使用人类行为研究工具来量化LLM的行为。
  • methods: 这三篇研究使用人类自我评估测试来测量LLM的个性,并使用不同的提问来测量同一个LLM的个性。
  • results: 研究发现,不同的提问会导致LLM的个性分数异常大,表明人性自我评估测试不适用于LLM。此外,研究还发现,提问顺序对测量LLM个性的答案有影响。
    Abstract As large language models (LLM) evolve in their capabilities, various recent studies have tried to quantify their behavior using psychological tools created to study human behavior. One such example is the measurement of "personality" of LLMs using personality self-assessment tests. In this paper, we take three such studies on personality measurement of LLMs that use personality self-assessment tests created to study human behavior. We use the prompts used in these three different papers to measure the personality of the same LLM. We find that all three prompts lead very different personality scores. This simple test reveals that personality self-assessment scores in LLMs depend on the subjective choice of the prompter. Since we don't know the ground truth value of personality scores for LLMs as there is no correct answer to such questions, there's no way of claiming if one prompt is more or less correct than the other. We then introduce the property of option order symmetry for personality measurement of LLMs. Since most of the self-assessment tests exist in the form of multiple choice question (MCQ) questions, we argue that the scores should also be robust to not just the prompt template but also the order in which the options are presented. This test unsurprisingly reveals that the answers to the self-assessment tests are not robust to the order of the options. These simple tests, done on ChatGPT and Llama2 models show that self-assessment personality tests created for humans are not appropriate for measuring personality in LLMs.
    摘要 大型语言模型(LLM)的能力不断进步,latest studies have tried to quantify their behavior using psychological tools created to study human behavior. One such example is measuring the "personality" of LLMs using personality self-assessment tests. In this paper, we take three such studies on personality measurement of LLMs that use personality self-assessment tests created to study human behavior. We use the prompts used in these three different papers to measure the personality of the same LLM. We find that all three prompts lead to very different personality scores. This simple test reveals that personality self-assessment scores in LLMs depend on the subjective choice of the prompter. Since we don't know the ground truth value of personality scores for LLMs as there is no correct answer to such questions, there's no way of claiming if one prompt is more or less correct than the other. We then introduce the property of option order symmetry for personality measurement of LLMs. Since most self-assessment tests exist in the form of multiple choice questions (MCQ), we argue that the scores should also be robust to not just the prompt template but also the order in which the options are presented. This test unsurprisingly reveals that the answers to the self-assessment tests are not robust to the order of the options. These simple tests, done on ChatGPT and Llama2 models, show that self-assessment personality tests created for humans are not appropriate for measuring personality in LLMs.

Find What You Want: Learning Demand-conditioned Object Attribute Space for Demand-driven Navigation

  • paper_url: http://arxiv.org/abs/2309.08138
  • repo_url: None
  • paper_authors: Hongcheng Wang, Andy Guan Hong Chen, Xiaoqi Li, Mingdong Wu, Hao Dong
    for: 提高人工智能Agent在Visual Object Navigation(VON)任务中的准确性和效率,特别是在实际场景中,用户可能不知道场景中的物品或者指定的物品不存在。methods: 提议使用Demand-driven Navigation(DDN)方法,将用户的需求作为任务指令,以寻找匹配用户需求的物品。DDN方法使用大语言模型中的共同知识来提取对象的文本特征,并使用Contrastive Language-Image Pre-training(CLIP)将文本特征与视觉特征相互对应。通过 incorporating 视觉特征作为先知,提高导航过程的准确性。results: 实验结果表明,DDN方法可以提高Agent的导航性能,并在AI2Thor中与常用的VON方法进行比较,得到更好的结果。
    Abstract The task of Visual Object Navigation (VON) involves an agent's ability to locate a particular object within a given scene. In order to successfully accomplish the VON task, two essential conditions must be fulfilled:1) the user must know the name of the desired object; and 2) the user-specified object must actually be present within the scene. To meet these conditions, a simulator can incorporate pre-defined object names and positions into the metadata of the scene. However, in real-world scenarios, it is often challenging to ensure that these conditions are always met. Human in an unfamiliar environment may not know which objects are present in the scene, or they may mistakenly specify an object that is not actually present. Nevertheless, despite these challenges, human may still have a demand for an object, which could potentially be fulfilled by other objects present within the scene in an equivalent manner. Hence, we propose Demand-driven Navigation (DDN), which leverages the user's demand as the task instruction and prompts the agent to find the object matches the specified demand. DDN aims to relax the stringent conditions of VON by focusing on fulfilling the user's demand rather than relying solely on predefined object categories or names. We propose a method first acquire textual attribute features of objects by extracting common knowledge from a large language model. These textual attribute features are subsequently aligned with visual attribute features using Contrastive Language-Image Pre-training (CLIP). By incorporating the visual attribute features as prior knowledge, we enhance the navigation process. Experiments on AI2Thor with the ProcThor dataset demonstrate the visual attribute features improve the agent's navigation performance and outperform the baseline methods commonly used in VON.
    摘要 视觉对象导航(VON)任务需要一个代理人能够在给定场景中找到特定的对象。为了成功完成VON任务,需满足两个必要条件:1)用户必须知道想要的对象的名称;2)用户指定的对象必须实际存在在场景中。为了满足这两个条件,一个模拟器可以将预定义的对象名称和位置添加到场景的元数据中。然而,在实际场景中,经常存在一些挑战。人们在不熟悉的环境中可能不知道场景中的对象,或者可能指定不存在的对象。然而,尽管存在这些挑战,人们仍然可能有对某个对象的需求,这可能可以通过场景中其他对象来满足。因此,我们提出了需求驱动导航(DDN),它利用用户的需求作为任务指令,并让代理人找到匹配用户需求的对象。DDN通过放弃VON中的严格条件,而转而ocus on 满足用户需求,从而提高导航性能。我们提出了一种方法,首先从大语言模型中提取对象的文本特征,然后将这些特征与图像特征进行对比,使用Contrastive Language-Image Pre-training(CLIP)。通过将视觉特征作为先知,我们提高了导航过程。在AI2Thor上使用ProcThor数据集进行实验,我们发现视觉特征可以提高代理人的导航性能,并超越常用的VON基线方法。

“I’m Not Confident in Debiasing AI Systems Since I Know Too Little”: Teaching AI Creators About Gender Bias Through Hands-on Tutorials

  • paper_url: http://arxiv.org/abs/2309.08121
  • repo_url: None
  • paper_authors: Kyrie Zhixuan Zhou, Jiaxun Cao, Xiaowen Yuan, Daniel E. Weissglass, Zachary Kilhoffer, Madelyn Rose Sanfilippo, Xin Tong
  • for: 帮助AI创造者了解和 Mitigate gender bias in AI systems, improving user experience and reducing injustices and mental harm to women.
  • methods: 使用实践 oriented hands-on tutorials to raise AI creators’ awareness of gender bias in AI and enhance their knowledge of sources of gender bias and debiasing techniques.
  • results: tutorials were evaluated with 18 AI creators, including AI researchers, AI industrial practitioners, and students who had learned AI, and their improved awareness and knowledge demonstrated the effectiveness of the tutorials.
    Abstract Gender bias is rampant in AI systems, causing bad user experience, injustices, and mental harm to women. School curricula fail to educate AI creators on this topic, leaving them unprepared to mitigate gender bias in AI. In this paper, we designed hands-on tutorials to raise AI creators' awareness of gender bias in AI and enhance their knowledge of sources of gender bias and debiasing techniques. The tutorials were evaluated with 18 AI creators, including AI researchers, AI industrial practitioners (i.e., developers and product managers), and students who had learned AI. Their improved awareness and knowledge demonstrated the effectiveness of our tutorials, which have the potential to complement the insufficient AI gender bias education in CS/AI courses. Based on the findings, we synthesize design implications and a rubric to guide future research, education, and design efforts.
    摘要 gender bias 在 AI 系统中严重存在,导致用户体验差、不公正和对女性造成心理伤害。学 curricula 不教育 AI 创建者关于这个话题,留下他们无准备 mitigate gender bias 在 AI 中。在这篇论文中,我们设计了有手验utorials,以提高 AI 创建者对 gender bias 在 AI 中的意识和debiasing 技术的知识。我们的 tutorials 被评估了 18 名 AI 创建者,包括 AI 研究人员、 AI 工业实践者(即开发人员和产品经理)以及学生们,他们已经学习 AI。他们的改善意识和知识表明了我们的 tutorials 的效果,这些 tutorials 有可能补充 CS/AI 课程中的 gender bias 教育不足。根据发现,我们合并了设计建议和指南,以导向未来的研究、教育和设计努力。

Data-Driven Goal Recognition in Transhumeral Prostheses Using Process Mining Techniques

  • paper_url: http://arxiv.org/abs/2309.08106
  • repo_url: None
  • paper_authors: Zihang Su, Tianshi Yu, Nir Lipovetzky, Alireza Mohammadi, Denny Oetomo, Artem Polyvyanyy, Sebastian Sardina, Ying Tan, Nick van Beest
  • for: 这个论文旨在研究如何使用时间序列数据来识别截肢者的目标姿势。
  • methods: 该论文使用表面电MYography电极和动态测量仪器收集的时间序列数据,并将其转化为离散事件,然后使用现有的进程挖掘学基于的目标识别系统进行训练。
  • results: 据收集在虚拟现实环境中的数据显示,该方法可以准确地识别截肢者的目标姿势,并且比州立艺术技术更加精准和更加不信任,这有利于估算肢体运动的更灵活移动。
    Abstract A transhumeral prosthesis restores missing anatomical segments below the shoulder, including the hand. Active prostheses utilize real-valued, continuous sensor data to recognize patient target poses, or goals, and proactively move the artificial limb. Previous studies have examined how well the data collected in stationary poses, without considering the time steps, can help discriminate the goals. In this case study paper, we focus on using time series data from surface electromyography electrodes and kinematic sensors to sequentially recognize patients' goals. Our approach involves transforming the data into discrete events and training an existing process mining-based goal recognition system. Results from data collected in a virtual reality setting with ten subjects demonstrate the effectiveness of our proposed goal recognition approach, which achieves significantly better precision and recall than the state-of-the-art machine learning techniques and is less confident when wrong, which is beneficial when approximating smoother movements of prostheses.
    摘要 一种跨肩 prosthesis 可以恢复下肩 absent 的解剖结构,包括手臂。活动 prostheses 使用实数据来识别病人目标姿势或目标,并激活人工肢体。先前的研究已经研究了不考虑时间步骤的数据是否可以帮助分类目标。在这篇案例研究中,我们将关注使用时间序列数据从表面电MYography 电极和运动传感器来顺序识别病人的目标。我们的方法包括将数据转换为精确的事件,并训练现有的过程挖掘-based 目标识别系统。从在虚拟现实环境中收集的十名参与者的数据显示,我们的建议的目标识别方法在精度和准确性方面具有显著的优势,并且当错误时具有较低的自信度,这有利于精制更平滑的人工肢体运动。

Research on Joint Representation Learning Methods for Entity Neighborhood Information and Description Information

  • paper_url: http://arxiv.org/abs/2309.08100
  • repo_url: None
  • paper_authors: Le Xiao, Xin Shan, Yuhua Wang, Miaolei Deng
  • for: address the issue of poor embedding performance in the knowledge graph of a programming design course
  • methods: 使用 joint representation learning model, combining entity neighborhood information and description information
  • results: 实验结果表明,提议的模型在programming design course的知识图 dataset上 achieved favorable performance, outperforming other baseline models.
    Abstract To address the issue of poor embedding performance in the knowledge graph of a programming design course, a joint represen-tation learning model that combines entity neighborhood infor-mation and description information is proposed. Firstly, a graph at-tention network is employed to obtain the features of entity neigh-boring nodes, incorporating relationship features to enrich the structural information. Next, the BERT-WWM model is utilized in conjunction with attention mechanisms to obtain the representation of entity description information. Finally, the final entity vector representation is obtained by combining the vector representations of entity neighborhood information and description information. Experimental results demonstrate that the proposed model achieves favorable performance on the knowledge graph dataset of the pro-gramming design course, outperforming other baseline models.
    摘要 要解决程序设计课程知识图表中的埋点表现问题,提出了一种联合表示学习模型,结合实体邻居信息和描述信息。首先,使用图注意网络获取实体邻居节点特征,并将关系特征纳入结构信息。接着,使用BERT-WWM模型并加入注意机制来获取描述信息的表示。最后,将实体vector表示结果组合实体邻居信息和描述信息的vector表示。实验结果表明,提出的模型在程序设计课程知识图表 dataset上达到了优秀表现,比基eline模型高。

Fast and Accurate Deep Loop Closing and Relocalization for Reliable LiDAR SLAM

  • paper_url: http://arxiv.org/abs/2309.08086
  • repo_url: None
  • paper_authors: Chenghao Shi, Xieyuanli Chen, Junhao Xiao, Bin Dai, Huimin Lu
  • for: 本文旨在提出一种能够实现可靠和稳定长期SLAM的Loop Closing和Relocalization技术,以解决 pose estimation 漂移和缺失问题。
  • methods: 本文提出了一种多头网络LCR-Net,用于同时解决 Loop Closing 和 Relocalization 问题。该网络使用了新的特征提取和 pose-aware 注意力机制,以准确地估计相似性和 6-DoF 姿态。
  • results: Results 表明,LCR-Net 在三种设置中均表现出色,超越了现有方法,并且具有扩展性。特别是,LCR-Net 不需要使用时间消耗的稳定 pose estimator,使其适用于在线 SLAM 应用。根据我们所知,LCR-Net 是首个实现了 LiDAR SLAM 中的深度 Loop Closing 和 Relocalization 的方法。
    Abstract Loop closing and relocalization are crucial techniques to establish reliable and robust long-term SLAM by addressing pose estimation drift and degeneration. This article begins by formulating loop closing and relocalization within a unified framework. Then, we propose a novel multi-head network LCR-Net to tackle both tasks effectively. It exploits novel feature extraction and pose-aware attention mechanism to precisely estimate similarities and 6-DoF poses between pairs of LiDAR scans. In the end, we integrate our LCR-Net into a SLAM system and achieve robust and accurate online LiDAR SLAM in outdoor driving environments. We thoroughly evaluate our LCR-Net through three setups derived from loop closing and relocalization, including candidate retrieval, closed-loop point cloud registration, and continuous relocalization using multiple datasets. The results demonstrate that LCR-Net excels in all three tasks, surpassing the state-of-the-art methods and exhibiting a remarkable generalization ability. Notably, our LCR-Net outperforms baseline methods without using a time-consuming robust pose estimator, rendering it suitable for online SLAM applications. To our best knowledge, the integration of LCR-Net yields the first LiDAR SLAM with the capability of deep loop closing and relocalization. The implementation of our methods will be made open-source.
    摘要 Loop closing和重新本地化是长期可靠的SLAM中重要的技巧,可以解决pose估计漂移和缺乏稳定性问题。本文首先将loop closing和重新本地化置于一个统一框架中,然后提出了一种新的多头网络LCR-Net,可以有效地处理这两个任务。LCR-Net使用了新的特征提取和pose相关注意机制,可以准确地估计相似性和6-DoF姿态之间的对应关系。文中还将LCR-Net与SLAM系统集成,并在户外驾驶环境中实现了稳定和准确的在线LiDAR SLAM。我们对LCR-Net进行了三种设置的评估,包括候选点云重新注册、循环关闭和连续重新本地化,并使用了多个数据集进行评估。结果表明,LCR-Net在这三个任务中均表现出色,超越了现有方法,并且具有很好的总体化能力。尤其是,LCR-Net不需要使用时间consuming的稳定pose估计器,因此适用于在线SLAM应用程序。到目前为止,我们的方法的实现将被开源。

A Stochastic Online Forecast-and-Optimize Framework for Real-Time Energy Dispatch in Virtual Power Plants under Uncertainty

  • paper_url: http://arxiv.org/abs/2309.08642
  • repo_url: None
  • paper_authors: Wei Jiang, Zhongkai Yi, Li Wang, Hanwei Zhang, Jihai Zhang, Fangquan Lin, Cheng Yang
  • for: 这篇论文旨在为了提高分布式能源资源的组合和管理,减少不确定性,特别是绿色能源生产的波动。
  • methods: 本论文提出了一个实时不确定性感知能源分配框架,包括两个关键元素:(i)一个混合预测和优化的序列任务,结合深度学习预测和Stochastic优化,这两个阶段通过不确定性估计在多个时间分辨率连接。(ii)一个高效的线上数据增强计划,联合预训和线上细化阶段。
  • results: 本论文在CityLearn Challenge 2022中获得冠军,并进行了实验评估其在实际应用中的有效性。
    Abstract Aggregating distributed energy resources in power systems significantly increases uncertainties, in particular caused by the fluctuation of renewable energy generation. This issue has driven the necessity of widely exploiting advanced predictive control techniques under uncertainty to ensure long-term economics and decarbonization. In this paper, we propose a real-time uncertainty-aware energy dispatch framework, which is composed of two key elements: (i) A hybrid forecast-and-optimize sequential task, integrating deep learning-based forecasting and stochastic optimization, where these two stages are connected by the uncertainty estimation at multiple temporal resolutions; (ii) An efficient online data augmentation scheme, jointly involving model pre-training and online fine-tuning stages. In this way, the proposed framework is capable to rapidly adapt to the real-time data distribution, as well as to target on uncertainties caused by data drift, model discrepancy and environment perturbations in the control process, and finally to realize an optimal and robust dispatch solution. The proposed framework won the championship in CityLearn Challenge 2022, which provided an influential opportunity to investigate the potential of AI application in the energy domain. In addition, comprehensive experiments are conducted to interpret its effectiveness in the real-life scenario of smart building energy management.
    摘要 合并分布式能源资源在电力系统中会增加不确定性,尤其是可再生能源生产的波动。这种问题使得广泛利用先进预测控制技术在不确定环境下 Ensure long-term economics and decarbonization. 在这篇论文中,我们提出了实时不确定性意识能源分配框架,该框架由两个关键元素组成:1. hybrid预测和优化顺序任务,将深度学习预测和随机优化连接在多个时间分辨率上。2. 高效在线数据扩充方案,包括模型预训练和在线细化阶段。这种方案可以快速适应实时数据分布,同时Target uncertainties caused by data drift, model discrepancy, and environment perturbations in the control process, and finally achieve an optimal and robust dispatch solution.我们的框架在CityLearn Challenge 2022中赢得冠军,提供了一个有力的机会来调查AI应用在能源领域的潜力。此外,我们还进行了实验来解释其效果在智能建筑能源管理实际场景中。