cs.AI - 2023-08-31

PointLLM: Empowering Large Language Models to Understand Point Clouds

  • paper_url: http://arxiv.org/abs/2308.16911
  • repo_url: https://github.com/openrobotlab/pointllm
  • paper_authors: Runsen Xu, Xiaolong Wang, Tai Wang, Yilun Chen, Jiangmiao Pang, Dahua Lin
  • for: 本研究旨在将大型自然语言模型(LLM)应用于三维理解,以扩展其现有的二维视觉处理能力。
  • methods: 本研究使用PointLLM,一个初步尝试将点云资料与LLM结合,以便理解点云并生成相应的回应。PointLLM使用点云编码器与强大的LLM进行有效地融合几何、外观和语言信息。
  • results: experiments show that PointLLM outperforms existing 2D baselines and demonstrates superior performance in human-evaluated object captioning tasks, with human annotators being outperformed in over 50% of the samples.
    Abstract The unprecedented advancements in Large Language Models (LLMs) have created a profound impact on natural language processing but are yet to fully embrace the realm of 3D understanding. This paper introduces PointLLM, a preliminary effort to fill this gap, thereby enabling LLMs to understand point clouds and offering a new avenue beyond 2D visual data. PointLLM processes colored object point clouds with human instructions and generates contextually appropriate responses, illustrating its grasp of point clouds and common sense. Specifically, it leverages a point cloud encoder with a powerful LLM to effectively fuse geometric, appearance, and linguistic information. We collect a novel dataset comprising 660K simple and 70K complex point-text instruction pairs to enable a two-stage training strategy: initially aligning latent spaces and subsequently instruction-tuning the unified model. To rigorously evaluate our model's perceptual abilities and its generalization capabilities, we establish two benchmarks: Generative 3D Object Classification and 3D Object Captioning, assessed through three different methods, including human evaluation, GPT-4/ChatGPT evaluation, and traditional metrics. Experiment results show that PointLLM demonstrates superior performance over existing 2D baselines. Remarkably, in human-evaluated object captioning tasks, PointLLM outperforms human annotators in over 50% of the samples. Codes, datasets, and benchmarks are available at https://github.com/OpenRobotLab/PointLLM .
    摘要 大量的自然语言处理技术(LLM)已经创造出历史上无 precedent的进步,但是它们还未全面掌握三维理解。这篇论文介绍了PointLLM,一项初步尝试,以填补这一空白,使得 LLM 可以理解点云并提供一条新的探索途径,超出了2D视觉数据的限制。PointLLM 处理了人工指令颜色点云,并生成了上下文相应的响应,这表明它对点云和常识有深刻的理解。具体来说,它利用了一个强大的 LLM 和点云编码器,以有效地融合几何、外观和语言信息。我们收集了一个新的数据集,包括660,000个简单点云和70,000个复杂点云文本指令对,以实现两个阶段的训练策略:首先是将 latent space 对齐,然后是通过指令调整已经一体化的模型。为了严格评估我们模型的感知能力和总体化能力,我们建立了两个标准准则:生成3D物体分类和3D物体描述,通过三种不同的评价方法,包括人工评估、GPT-4/ChatGPT评估和传统指标。实验结果表明,PointLLM 在已有的2D基线上表现出色,并且在人工评估3D物体描述任务中,PointLLM 在50%以上的样本中超过了人类评估员。代码、数据集和标准准则可以在 GitHub 上获取,请参考

StyleInV: A Temporal Style Modulated Inversion Network for Unconditional Video Generation

  • paper_url: http://arxiv.org/abs/2308.16909
  • repo_url: https://github.com/johannwyh/styleinv
  • paper_authors: Yuhan Wang, Liming Jiang, Chen Change Loy
  • for: 高质量视频生成
  • methods: 学习倒整流程网络做动作生成器,具有稳定的 Temporal 协调和多样化的 Style 转换能力
  • results: 能够生成高分辨率、长寿命、具有单帧质量和时间协调的视频
    Abstract Unconditional video generation is a challenging task that involves synthesizing high-quality videos that are both coherent and of extended duration. To address this challenge, researchers have used pretrained StyleGAN image generators for high-quality frame synthesis and focused on motion generator design. The motion generator is trained in an autoregressive manner using heavy 3D convolutional discriminators to ensure motion coherence during video generation. In this paper, we introduce a novel motion generator design that uses a learning-based inversion network for GAN. The encoder in our method captures rich and smooth priors from encoding images to latents, and given the latent of an initially generated frame as guidance, our method can generate smooth future latent by modulating the inversion encoder temporally. Our method enjoys the advantage of sparse training and naturally constrains the generation space of our motion generator with the inversion network guided by the initial frame, eliminating the need for heavy discriminators. Moreover, our method supports style transfer with simple fine-tuning when the encoder is paired with a pretrained StyleGAN generator. Extensive experiments conducted on various benchmarks demonstrate the superiority of our method in generating long and high-resolution videos with decent single-frame quality and temporal consistency.
    摘要 不受限制的视频生成是一项具有挑战性的任务,旨在生成高质量的视频,同时保持视频的凝聚性和长度。为 Addressing this challenge, researchers have used pre-trained StyleGAN image generators for high-quality frame synthesis and focused on motion generator design. In this paper, we propose a novel motion generator design that uses a learning-based inversion network for GAN. Our method captures rich and smooth priors from encoding images to latents, and given the latent of an initially generated frame as guidance, our method can generate smooth future latent by modulating the inversion encoder temporally. Our method enjoys the advantage of sparse training and naturally constrains the generation space of our motion generator with the inversion network guided by the initial frame, eliminating the need for heavy discriminators. Moreover, our method supports style transfer with simple fine-tuning when the encoder is paired with a pretrained StyleGAN generator. Extensive experiments conducted on various benchmarks demonstrate the superiority of our method in generating long and high-resolution videos with decent single-frame quality and temporal consistency.

InterDiff: Generating 3D Human-Object Interactions with Physics-Informed Diffusion

  • paper_url: http://arxiv.org/abs/2308.16905
  • repo_url: https://github.com/Sirui-Xu/InterDiff
  • paper_authors: Sirui Xu, Zhengyuan Li, Yu-Xiong Wang, Liang-Yan Gui
  • for: 这篇论文目标是解决3D人际物互动(HOI)的新任务,大多数现有的HOI合成研究仅仅对小物件或静止物件进行了限制,这个任务更加具有挑战性,因为它需要模型动态物件,捕捉全身动作,并确保物件之间的物理关联性。
  • methods: 我们提出了一个名为InterDiff的框架,包括两个关键步骤:(i)互动扩散,我们利用扩散模型将未来人际物互动的分布编码为Future HOI distribution;(ii)互动修正,我们引入物理学 Informed predictor,对扩散步骤中的噪声HOI进行修正。我们的关键见解是将与接触点的互动视为一个简单的模式,容易预测。
  • results: 我们在多个人际物互动数据集上进行了实验,结果显示我们的方法能够生成真实、生动、remarkably Long-term 3D HOI预测。
    Abstract This paper addresses a novel task of anticipating 3D human-object interactions (HOIs). Most existing research on HOI synthesis lacks comprehensive whole-body interactions with dynamic objects, e.g., often limited to manipulating small or static objects. Our task is significantly more challenging, as it requires modeling dynamic objects with various shapes, capturing whole-body motion, and ensuring physically valid interactions. To this end, we propose InterDiff, a framework comprising two key steps: (i) interaction diffusion, where we leverage a diffusion model to encode the distribution of future human-object interactions; (ii) interaction correction, where we introduce a physics-informed predictor to correct denoised HOIs in a diffusion step. Our key insight is to inject prior knowledge that the interactions under reference with respect to contact points follow a simple pattern and are easily predictable. Experiments on multiple human-object interaction datasets demonstrate the effectiveness of our method for this task, capable of producing realistic, vivid, and remarkably long-term 3D HOI predictions.
    摘要
  1. Interaction Diffusion: A diffusion model is used to encode the distribution of future human-object interactions.2. Interaction Correction: A physics-informed predictor is introduced to correct the denoised HOIs in the diffusion step.The key insight of the proposed method is to inject prior knowledge that the interactions under reference with respect to contact points follow a simple pattern and are easily predictable.Experiments on multiple human-object interaction datasets demonstrate the effectiveness of the proposed method in producing realistic, vivid, and remarkably long-term 3D HOI predictions.

Transformers as Support Vector Machines

  • paper_url: http://arxiv.org/abs/2308.16898
  • repo_url: https://github.com/mohamedehab00/A-Hybrid-Arabic-Text-Summarization-Approach-based-on-Transformers
  • paper_authors: Davoud Ataee Tarzanagh, Yingcong Li, Christos Thrampoulidis, Samet Oymak
  • for: 这个论文主要是为了解释transformer架构中的自注意力层的优化方法和其在自然语言处理中的应用。
  • methods: 这篇论文使用了一种形式化的方法,将自注意力层的优化问题与硬margin支持向量机(SVM)问题等同起来,从而可以更好地理解transformer架构中的自注定对象的优化方法。
  • results: 研究人员通过这种形式化的方法,发现了一些关于transformer架构中自注意力层的优化方法的重要特征,包括优化方法的本质和global/local方向的导航方式等。此外,他们还提出了一些开放的问题和研究方向,以便进一步探索transformer架构的应用和优化方法。
    Abstract Since its inception in "Attention Is All You Need", transformer architecture has led to revolutionary advancements in NLP. The attention layer within the transformer admits a sequence of input tokens $X$ and makes them interact through pairwise similarities computed as softmax$(XQK^\top X^\top)$, where $(K,Q)$ are the trainable key-query parameters. In this work, we establish a formal equivalence between the optimization geometry of self-attention and a hard-margin SVM problem that separates optimal input tokens from non-optimal tokens using linear constraints on the outer-products of token pairs. This formalism allows us to characterize the implicit bias of 1-layer transformers optimized with gradient descent: (1) Optimizing the attention layer with vanishing regularization, parameterized by $(K,Q)$, converges in direction to an SVM solution minimizing the nuclear norm of the combined parameter $W=KQ^\top$. Instead, directly parameterizing by $W$ minimizes a Frobenius norm objective. We characterize this convergence, highlighting that it can occur toward locally-optimal directions rather than global ones. (2) Complementing this, we prove the local/global directional convergence of gradient descent under suitable geometric conditions. Importantly, we show that over-parameterization catalyzes global convergence by ensuring the feasibility of the SVM problem and by guaranteeing a benign optimization landscape devoid of stationary points. (3) While our theory applies primarily to linear prediction heads, we propose a more general SVM equivalence that predicts the implicit bias with nonlinear heads. Our findings are applicable to arbitrary datasets and their validity is verified via experiments. We also introduce several open problems and research directions. We believe these findings inspire the interpretation of transformers as a hierarchy of SVMs that separates and selects optimal tokens.
    摘要 自“专注是所有你需要”这个起源,transformer架构带来了启蒸的进步在自然语言处理(NLP)领域。内部的专注层在transformer架构中让输入序列$X$进行互动,通过 ComputeSoftmax$(XQK^\top X^\top)$中的对称关系,其中$(K,Q)$是可变的钥匙-请求参数。在这篇文章中,我们建立了对专注层的自动化构造和硬margin Support Vector Machine(SVM)问题之间的正式等价性。这个等价性让我们可以对1层transformer的专注层优化器(K,Q)的对称关系进行描述,并且评估这个对称关系的内部偏好。我们的研究结果如下:1. 对于具有干扰调整的专注层优化器(K,Q),当运算在干扰调整下时,专注层的优化将 converges到一个硬margin SVM解释,这个解释可以最小化综合参数$W=KQ^\top$的核心 нор。相反,直接对$W$进行优化则会导致一个弹性范围的对应目标。我们描述了这个对称关系的传播,并证明它可以发生在本地优化方向上,而不是全局优化方向上。2. 我们还证明了在适当的几何条件下,对于具有干扰调整的专注层优化器(K,Q),gradient descent将在本地和全球方向上传播。另外,我们显示了过 parameterization 可以刺激全球传播,并且保证搜索空间的稳定性和缺乏站点。3. 我们的理论主要适用于线性预测头,但我们提出了一个更一般的SVM等价性,可以预测专注层的隐含偏好。我们的发现适用于任意的数据集和预测任务,并且我们透过实验验证了我们的理论。 finally,我们提出了一些开放的问题和研究方向。我们认为这些发现可以启发人们对transformer架构的解释,将其视为一个对称层次的SVM,用于分类和选择最佳的节点。

PointOcc: Cylindrical Tri-Perspective View for Point-based 3D Semantic Occupancy Prediction

  • paper_url: http://arxiv.org/abs/2308.16896
  • repo_url: https://github.com/wzzheng/pointocc
  • paper_authors: Sicheng Zuo, Wenzhao Zheng, Yuanhui Huang, Jie Zhou, Jiwen Lu
  • for: 本文目的是提出一种高效的点云 semantic segmentation 方法,用于自动驾驶中的Scene understanding。
  • methods: 该方法使用 cylindrical tri-perspective view (TPV) 来表示点云,并使用 PointOcc 模型来处理它们。具体来说,该方法使用 distance distribution 来构建TPV,并使用空间群集 pooling 来保持点云的结构细节。最后,该方法使用 2D 脊梁来高效地处理每个 TPV 面,并使用 PointOcc 模型来汇聚每个点的特征。
  • results: 对于 3D occupancy prediction 和 LiDAR segmentation benchmark,PointOcc 方法 achieved state-of-the-art 性能,并且比其他方法快得多。具体来说,只使用 LiDAR 数据的 PointOcc 方法在 OpenOccupancy benchmark 上大幅超越了所有其他方法,包括多模式方法。
    Abstract Semantic segmentation in autonomous driving has been undergoing an evolution from sparse point segmentation to dense voxel segmentation, where the objective is to predict the semantic occupancy of each voxel in the concerned 3D space. The dense nature of the prediction space has rendered existing efficient 2D-projection-based methods (e.g., bird's eye view, range view, etc.) ineffective, as they can only describe a subspace of the 3D scene. To address this, we propose a cylindrical tri-perspective view to represent point clouds effectively and comprehensively and a PointOcc model to process them efficiently. Considering the distance distribution of LiDAR point clouds, we construct the tri-perspective view in the cylindrical coordinate system for more fine-grained modeling of nearer areas. We employ spatial group pooling to maintain structural details during projection and adopt 2D backbones to efficiently process each TPV plane. Finally, we obtain the features of each point by aggregating its projected features on each of the processed TPV planes without the need for any post-processing. Extensive experiments on both 3D occupancy prediction and LiDAR segmentation benchmarks demonstrate that the proposed PointOcc achieves state-of-the-art performance with much faster speed. Specifically, despite only using LiDAR, PointOcc significantly outperforms all other methods, including multi-modal methods, with a large margin on the OpenOccupancy benchmark. Code: https://github.com/wzzheng/PointOcc.
    摘要 <>translate_language: zh-CN<>自动驾驶 semantic segmentation 在进化中,从稀疏点 segmentation 转向紧凑的 voxel segmentation,目标是预测每个 voxel 在关注的 3D 空间中的semantic occupancy。紧凑的预测空间使得现有的高效 2D 投影基于方法(如鸟瞰视、距离视图等)无法描述 3D 场景中的所有信息,因此我们提出了一种筒形三视角视图来有效地处理点云,并使用 PointOcc 模型来处理它们。根据 LiDAR 点云的距离分布,我们在筒形坐标系中构建了三视角视图,以更细化近距离区域的模型化。我们使用空间组合池化以保持结构细节,并采用 2D 脊梁来高效处理每个 TPV 面。最后,我们通过对每个点的 проекted 特征进行聚合来获得每个点的特征,无需任何后处理。广泛的实验表明,我们的 PointOcc 模型在 3D 占用率预测和 LiDAR 分割 benchmark 上具有州先进性,并且具有更快的速度。具体来说,只使用 LiDAR 的 PointOcc 模型可以在 OpenOccupancy benchmark 上大幅超越所有其他方法,包括多modal方法,并且具有大的差距。代码:https://github.com/wzzheng/PointOcc。

Language-Conditioned Path Planning

  • paper_url: http://arxiv.org/abs/2308.16893
  • repo_url: https://github.com/Aryia-Behroziuan/References
  • paper_authors: Amber Xie, Youngwoon Lee, Pieter Abbeel, Stephen James
  • for: This paper focuses on the problem of path planning for robotic manipulation tasks, specifically in contact-rich environments.
  • methods: The proposed method is called Language-Conditioned Collision Functions (LACO), which uses a single-view image, language prompt, and robot configuration to learn a collision function and enable flexible, conditional path planning.
  • results: The authors demonstrate the effectiveness of LACO in both simulation and real-world experiments, showing that it can facilitate complex, nuanced path plans that allow for safe collisions with objects in the environment.
    Abstract Contact is at the core of robotic manipulation. At times, it is desired (e.g. manipulation and grasping), and at times, it is harmful (e.g. when avoiding obstacles). However, traditional path planning algorithms focus solely on collision-free paths, limiting their applicability in contact-rich tasks. To address this limitation, we propose the domain of Language-Conditioned Path Planning, where contact-awareness is incorporated into the path planning problem. As a first step in this domain, we propose Language-Conditioned Collision Functions (LACO) a novel approach that learns a collision function using only a single-view image, language prompt, and robot configuration. LACO predicts collisions between the robot and the environment, enabling flexible, conditional path planning without the need for manual object annotations, point cloud data, or ground-truth object meshes. In both simulation and the real world, we demonstrate that LACO can facilitate complex, nuanced path plans that allow for interaction with objects that are safe to collide, rather than prohibiting any collision.
    摘要 “联系”是 robotic manipulation 的核心。有时候需要联系(例如操作和抓取),有时候则需要避免触碰(例如避免障碍物)。然而,传统的路径观察算法仅专注于避免冲突的路径,这限制了它们在联系丰富任务中的应用范围。为了解决这个限制,我们提出了“语言条件路径观察”领域,其中联系意识被包含到路径观察问题中。作为这个领域的第一步,我们提出了一种新的方法:Language-Conditioned Collision Functions (LACO)。LACO 是一种学习冲突函数的方法,它使用单一的图像、语言提示和机器人配置来学习冲突。LACO 预测机器人和环境之间的冲突,允许机器人进行自动、 conditional 的路径观察,不需要手动设定物体标注、点云资料或真实物体对应。在实验和实际情况下,我们证明了 LACO 可以实现复杂、细节的路径观察,允许机器人与安全冲突的物体进行互动,而不是禁止任何冲突。

ReZero: Region-customizable Sound Extraction

  • paper_url: http://arxiv.org/abs/2308.16892
  • repo_url: None
  • paper_authors: Rongzhi Gu, Yi Luo
  • for: 这篇论文是为了解决多通道区域特定声音提取任务(R-SE)而写的。
  • methods: 这篇论文使用了不同类型的空间区域定义(如角度窗口、球体、喷气体等),以及对这些区域的特征提取和聚合方法。它还使用了多通道扩展的带分RNN(BSRNN)模型,特制 для R-SE任务。
  • results: 实验结果表明,ReZero在不同的麦克风数据格式和系统配置下都有效,并且在 simulate 和实际记录的数据上都达到了高度的提取精度。详细的实验结果和演示可以在 https://innerselfm.github.io/rezero/ 上查看。
    Abstract We introduce region-customizable sound extraction (ReZero), a general and flexible framework for the multi-channel region-wise sound extraction (R-SE) task. R-SE task aims at extracting all active target sounds (e.g., human speech) within a specific, user-defined spatial region, which is different from conventional and existing tasks where a blind separation or a fixed, predefined spatial region are typically assumed. The spatial region can be defined as an angular window, a sphere, a cone, or other geometric patterns. Being a solution to the R-SE task, the proposed ReZero framework includes (1) definitions of different types of spatial regions, (2) methods for region feature extraction and aggregation, and (3) a multi-channel extension of the band-split RNN (BSRNN) model specified for the R-SE task. We design experiments for different microphone array geometries, different types of spatial regions, and comprehensive ablation studies on different system configurations. Experimental results on both simulated and real-recorded data demonstrate the effectiveness of ReZero. Demos are available at https://innerselfm.github.io/rezero/.
    摘要 我们介绍一个通用和灵活的概念抽取框架(ReZero),用于多通道区域特定声音抽取(R-SE)任务。R-SE任务的目标是在用户定义的特定空间区域内抽取所有活动目标声音(例如人类语音),而不是传统的盲目分离或预先定义的空间区域。用户可以定义空间区域为角度窗口、球体、圆锥体或其他几何图形。作为R-SE任务的解决方案,我们的ReZero框架包括以下几个方面:1. 不同类型的空间区域定义2. 空间区域特征提取和聚合方法3. 适用于R-SE任务的多通道扩展的带谱RNN(BSRNN)模型我们设计了不同的麦克风数据列表,不同类型的空间区域,以及对不同系统配置进行了全面的减少研究。实验结果表明,ReZero在模拟和真实记录的数据上具有效果。 demo 可以在 上找到。

The Belebele Benchmark: a Parallel Reading Comprehension Dataset in 122 Language Variants

  • paper_url: http://arxiv.org/abs/2308.16884
  • repo_url: None
  • paper_authors: Lucas Bandarkar, Davis Liang, Benjamin Muller, Mikel Artetxe, Satya Narayan Shukla, Donald Husa, Naman Goyal, Abhinandan Krishnan, Luke Zettlemoyer, Madian Khabsa
  • for: 这个论文的目的是扩展自然语言理解(NLU)标准套件中的语言覆盖率,并评估多种语言模型在不同语言环境下的表现。
  • methods: 这个论文使用了一个多选机器阅读理解(MRC)数据集,包括122种语言变种,以评估文本模型在不同语言环境下的表现。每个问题基于一篇Flores-200数据集中的短文章,并提供了四个多选答案。
  • results: 这个论文的结果表明,尽管英语中心的大语言模型(LLM)在多语言环境下具有较高的泛化能力,但是较小的多语言模型(MLM)在多语言环境下仍能够理解更多的语言。此外,研究发现大词汇量和意识construct vocabulary对低资源语言的表现有着正面的关系。
    Abstract We present Belebele, a multiple-choice machine reading comprehension (MRC) dataset spanning 122 language variants. Significantly expanding the language coverage of natural language understanding (NLU) benchmarks, this dataset enables the evaluation of text models in high-, medium-, and low-resource languages. Each question is based on a short passage from the Flores-200 dataset and has four multiple-choice answers. The questions were carefully curated to discriminate between models with different levels of general language comprehension. The English dataset on its own proves difficult enough to challenge state-of-the-art language models. Being fully parallel, this dataset enables direct comparison of model performance across all languages. We use this dataset to evaluate the capabilities of multilingual masked language models (MLMs) and large language models (LLMs). We present extensive results and find that despite significant cross-lingual transfer in English-centric LLMs, much smaller MLMs pretrained on balanced multilingual data still understand far more languages. We also observe that larger vocabulary size and conscious vocabulary construction correlate with better performance on low-resource languages. Overall, Belebele opens up new avenues for evaluating and analyzing the multilingual capabilities of NLP systems.
    摘要 我们介绍了Belebele,一个多选机器读取理解(MRC)数据集,覆盖122种语言变体。这个数据集将提高自然语言理解(NLU)标准 benchmarks 的语言覆盖率,并允许evaluate文本模型在高-,中-,低-资源语言中的表现。每个问题基于Flores-200数据集中的短段文本,有四个多选答案。问题被仔细制定,以区分不同水平的通用语言理解能力。英语数据集本身也足够挑战当前语言模型。这个数据集完全平行,可以直接比较所有语言的模型性能。我们使用这个数据集评估多语言掩码语言模型(MLM)和大语言模型(LLM)的能力。我们发表了广泛的结果,发现虽然英语中心的LLMs具有显著的cross-lingual transfer,但是 Much smaller MLMs pretrained on balance multilingual data仍然能够理解更多的语言。我们还发现,大 vocabulary size和conscious vocabulary construction相关于低资源语言中的表现。总的来说,Belebele开启了新的评估和分析多语言NLP系统的avenues。

Adaptation Speed Analysis for Fairness-aware Causal Models

  • paper_url: http://arxiv.org/abs/2308.16879
  • repo_url: None
  • paper_authors: Yujie Lin, Chen Zhao, Minglai Shao, Xujiang Zhao, Haifeng Chen
  • For: This paper explores the adaptation of two models to a domain shift in the presence of a sensitive variable (bias) in a structural causal model (SCM) with a cause-bias-effect structure.* Methods: The paper uses two models with opposite directions to align the original distribution p with the modified distribution p* due to an unknown intervention. The adaptation speeds of the two models are compared across four shift scenarios, and the connection between the adaptation speeds is proven.* Results: The paper examines the adaptation of two models to a domain shift in the presence of a sensitive variable (bias) and compares their adaptation speeds across four shift scenarios, proving the connection between the adaptation speeds of the two models across all interventions.Here’s the same information in Simplified Chinese text:
  • for: 这篇论文研究了在Structural Causal Model(SCM)中存在敏感变量(偏见)的域转移问题中,两个模型对域转移的适应。
  • methods: 这篇论文使用了两个模型,每个模型都有相反的方向,以将原始分布p与 modify分布p*进行对应。
  • results: 这篇论文对两个模型在域转移问题中的适应速度进行了比较,并证明了两个模型在所有干扰情况下的适应速度之间的连接。
    Abstract For example, in machine translation tasks, to achieve bidirectional translation between two languages, the source corpus is often used as the target corpus, which involves the training of two models with opposite directions. The question of which one can adapt most quickly to a domain shift is of significant importance in many fields. Specifically, consider an original distribution p that changes due to an unknown intervention, resulting in a modified distribution p*. In aligning p with p*, several factors can affect the adaptation rate, including the causal dependencies between variables in p. In real-life scenarios, however, we have to consider the fairness of the training process, and it is particularly crucial to involve a sensitive variable (bias) present between a cause and an effect variable. To explore this scenario, we examine a simple structural causal model (SCM) with a cause-bias-effect structure, where variable A acts as a sensitive variable between cause (X) and effect (Y). The two models, respectively, exhibit consistent and contrary cause-effect directions in the cause-bias-effect SCM. After conducting unknown interventions on variables within the SCM, we can simulate some kinds of domain shifts for analysis. We then compare the adaptation speeds of two models across four shift scenarios. Additionally, we prove the connection between the adaptation speeds of the two models across all interventions.
    摘要 Original text:In machine translation tasks, to achieve bidirectional translation between two languages, the source corpus is often used as the target corpus, which involves the training of two models with opposite directions. The question of which one can adapt most quickly to a domain shift is of significant importance in many fields. Specifically, consider an original distribution p that changes due to an unknown intervention, resulting in a modified distribution p*. In aligning p with p*, several factors can affect the adaptation rate, including the causal dependencies between variables in p. In real-life scenarios, however, we have to consider the fairness of the training process, and it is particularly crucial to involve a sensitive variable (bias) present between a cause and an effect variable. To explore this scenario, we examine a simple structural causal model (SCM) with a cause-bias-effect structure, where variable A acts as a sensitive variable between cause (X) and effect (Y). The two models, respectively, exhibit consistent and contrary cause-effect directions in the cause-bias-effect SCM. After conducting unknown interventions on variables within the SCM, we can simulate some kinds of domain shifts for analysis. We then compare the adaptation speeds of two models across four shift scenarios. Additionally, we prove the connection between the adaptation speeds of the two models across all interventions.Simplified Chinese translation:在机器翻译任务中,以源文库作为目标文库,训练两个模型的对向翻译是非常重要的。具体来说,考虑一个原始分布p,由于未知的干扰而变化,导致的修改后的分布p*。在将p与p*进行对应的时候,很多因素可以影响对应速度,包括在p中的 causal 依赖关系。在实际应用中,我们需要考虑培训过程的公平性,特别是在涉及到敏感变量(偏见)的情况下。为了探讨这种情况,我们研究了一个简单的结构 causal 模型(SCM),其中变量A acts as a 敏感变量 между cause(X)和 effect(Y)。这两个模型分别在 cause-bias-effect SCM 中表现出了一致和相反的 causal 效果方向。在对 SCM 中变量进行未知干扰后,我们可以模拟一些领域变化进行分析。然后,我们将对四个干扰场景进行比较两个模型的对应速度。此外,我们还证明了两个模型在所有干扰情况下的对应速度之间的连接。

The Gender-GAP Pipeline: A Gender-Aware Polyglot Pipeline for Gender Characterisation in 55 Languages

  • paper_url: http://arxiv.org/abs/2308.16871
  • repo_url: None
  • paper_authors: Benjamin Muller, Belen Alastruey, Prangthip Hansanti, Elahe Kalbassi, Christophe Ropers, Eric Michael Smith, Adina Williams, Luke Zettlemoyer, Pierre Andrews, Marta R. Costa-jussà
  • for: 本研究旨在探讨语言生成系统中的性别偏见,并提出一种可能的来源——训练和评估数据中的性别表达不均衡。
  • methods: 本研究使用了一个多语言词典来自动量化大规模数据中的性别表达,并使用了WMT训练数据和开发数据来评估这种方法。
  • results: 研究发现,现有的数据集具有 masculine 表达的偏好,这可能导致语言生成系统对 masculine 性别表达优先化。研究建议在现有数据集中引入性别量化管道,并希望将其修改为均衡的性别表达。
    Abstract Gender biases in language generation systems are challenging to mitigate. One possible source for these biases is gender representation disparities in the training and evaluation data. Despite recent progress in documenting this problem and many attempts at mitigating it, we still lack shared methodology and tooling to report gender representation in large datasets. Such quantitative reporting will enable further mitigation, e.g., via data augmentation. This paper describes the Gender-GAP Pipeline (for Gender-Aware Polyglot Pipeline), an automatic pipeline to characterize gender representation in large-scale datasets for 55 languages. The pipeline uses a multilingual lexicon of gendered person-nouns to quantify the gender representation in text. We showcase it to report gender representation in WMT training data and development data for the News task, confirming that current data is skewed towards masculine representation. Having unbalanced datasets may indirectly optimize our systems towards outperforming one gender over the others. We suggest introducing our gender quantification pipeline in current datasets and, ideally, modifying them toward a balanced representation.
    摘要 gender bias in language generation systems is difficult to eliminate. one possible source of these biases is the gender representation disparities in the training and evaluation data. despite recent progress in documenting this problem and many attempts at mitigating it, we still lack a shared methodology and tooling to report gender representation in large datasets. such quantitative reporting will enable further mitigation, e.g., via data augmentation. this paper describes the gender-aware polyglot pipeline (for gender-aware polyglot pipeline), an automatic pipeline to characterize gender representation in large-scale datasets for 55 languages. the pipeline uses a multilingual lexicon of gendered person-nouns to quantify the gender representation in text. we showcase it to report gender representation in wmt training data and development data for the news task, confirming that current data is skewed towards masculine representation. having unbalanced datasets may indirectly optimize our systems towards outperforming one gender over the others. we suggest introducing our gender quantification pipeline in current datasets and, ideally, modifying them towards a balanced representation.Note that the translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you prefer Traditional Chinese, I can provide that as well.

Learning Driver Models for Automated Vehicles via Knowledge Sharing and Personalization

  • paper_url: http://arxiv.org/abs/2308.16870
  • repo_url: None
  • paper_authors: Wissam Kontar, Xinzhi Zhong, Soyoung Ahn
  • for: 本研究提出了一种框架,用于通过知识共享和个性化训练自动驾驶车辆(AVs)驾驶模型。由于交通系统中的自然变化使得对AVs进行实际测试或实验很困难,因此AVs可能会缺乏一些对其安全和高效操作的关键驾驶场景。这种知识共享的方法可以帮助AVs更好地适应实际驾驶情况。
  • methods: 本研究使用了联邦学习方法,通过多辆车辆之间的知识共享和借鉴,实现个性化训练AVs的驾驶模型。这种方法不需要车辆之间共享Raw数据,从而保持了数据隐私和安全性。
  • results: 我们在实验 simulations中展示了我们的方法的性能。这种方法在交通工程中拥有广泛的应用,包括智能交通系统、交通管理和车辆间通信。研究页面上提供了代码和示例数据,访问https://github.com/wissamkontar。
    Abstract This paper describes a framework for learning Automated Vehicles (AVs) driver models via knowledge sharing between vehicles and personalization. The innate variability in the transportation system makes it exceptionally challenging to expose AVs to all possible driving scenarios during empirical experimentation or testing. Consequently, AVs could be blind to certain encounters that are deemed detrimental to their safe and efficient operation. It is then critical to share knowledge across AVs that increase exposure to driving scenarios occurring in the real world. This paper explores a method to collaboratively train a driver model by sharing knowledge and borrowing strength across vehicles while retaining a personalized model tailored to the vehicle's unique conditions and properties. Our model brings a federated learning approach to collaborate between multiple vehicles while circumventing the need to share raw data between them. We showcase our method's performance in experimental simulations. Such an approach to learning finds several applications across transportation engineering including intelligent transportation systems, traffic management, and vehicle-to-vehicle communication. Code and sample dataset are made available at the project page https://github.com/wissamkontar.
    摘要

IoMT-Blockchain based Secured Remote Patient Monitoring Framework for Neuro-Stimulation Device

  • paper_url: http://arxiv.org/abs/2308.16857
  • repo_url: None
  • paper_authors: Md Sakib Ullah Sourav, Mohammad Sultan Mahmud, Md Simul Hasan Talukder, Rejwan Bin Sulaiman, Abdullah Yasin
  • for: 这篇论文的目的是提高医疗业电子设备的准确性、可靠性和生产力,通过使用互联网物联网(IoMT)技术,并利用区块链(BC)解决中心化存储和数据抢夺等问题。
  • methods: 该论文使用了一种基于IoMT的远程非侵入式脑刺激系统,使用了Android应用程序控制的硬件基于的tDCS设备,并采用了文献最佳实践来解决IoMTBC系统的问题。
  • results: 该论文的研究结果表明,使用IoMT和BC技术可以提高脑刺激系统的准确性和可靠性,并且可以实现实时远程监测病人的状况。
    Abstract Biomedical Engineering's Internet of Medical Things (IoMT) is helping to improve the accuracy, dependability, and productivity of electronic equipment in the healthcare business. Real-time sensory data from patients may be delivered and subsequently analyzed through rapid development of wearable IoMT devices, such as neuro-stimulation devices with a range of functions. Data from the Internet of Things is gathered, analyzed, and stored in a single location. However, single-point failure, data manipulation, privacy difficulties, and other challenges might arise as a result of centralization. Due to its decentralized nature, blockchain (BC) can alleviate these issues. The viability of establishing a non-invasive remote neurostimulation system employing IoMT-based transcranial Direct Current Stimulation is investigated in this work (tDCS). A hardware-based prototype tDCS device has been developed that can be operated over the internet using an android application. Our suggested framework addresses the problems of IoMTBC-based systems, meets the criteria of real-time remote patient monitoring systems, and incorporates literature best practices in the relevant fields.
    摘要 生物医学工程的互联网医疗物联网(IoMT)在医疗业中提高了电子设备的准确性、可靠性和生产力。通过快速开发的着装式IoMT设备,如神经刺激设备,可以实时传输患者的感知数据并进行分析。互联网物联网收集、分析和存储数据的问题,但是中央化的问题可能会出现单点故障、数据操纵、隐私问题等问题。由于其分布式的特点,区块链(BC)可以解决这些问题。本文提出了一种非侵入式远程神经刺激系统,使用IoMT基于的脑 Direct Current Stimulation(tDCS)。我们开发了一个基于硬件的tDCS设备,可以通过android应用程序在互联网上运行。我们建议的框架解决了IoMTBC系统中的问题,满足了实时远程病人监测系统的要求,并兼容了相关领域的文献最佳实践。

Towards Improving the Expressiveness of Singing Voice Synthesis with BERT Derived Semantic Information

  • paper_url: http://arxiv.org/abs/2308.16836
  • repo_url: None
  • paper_authors: Shaohuan Zhou, Shun Lei, Weiya You, Deyi Tuo, Yuren You, Zhiyong Wu, Shiyin Kang, Helen Meng
  • for: 高品质的歌唱voice合成系统(SVS),以提高合成的嗓音表达力。
  • methods: 使用bidirectional encoder representation from Transformers(BERT)得到的含义表达 embeddings,以及歌词文本表达、能量预测器和真实音高预测器等特定设计。
  • results: 比过去的SVS模型高品质的嗓音合成,并且在专业和主观实验中表现出色。
    Abstract This paper presents an end-to-end high-quality singing voice synthesis (SVS) system that uses bidirectional encoder representation from Transformers (BERT) derived semantic embeddings to improve the expressiveness of the synthesized singing voice. Based on the main architecture of recently proposed VISinger, we put forward several specific designs for expressive singing voice synthesis. First, different from the previous SVS models, we use text representation of lyrics extracted from pre-trained BERT as additional input to the model. The representation contains information about semantics of the lyrics, which could help SVS system produce more expressive and natural voice. Second, we further introduce an energy predictor to stabilize the synthesized voice and model the wider range of energy variations that also contribute to the expressiveness of singing voice. Last but not the least, to attenuate the off-key issues, the pitch predictor is re-designed to predict the real to note pitch ratio. Both objective and subjective experimental results indicate that the proposed SVS system can produce singing voice with higher-quality outperforming VISinger.
    摘要 Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China.

Can Programming Languages Boost Each Other via Instruction Tuning?

  • paper_url: http://arxiv.org/abs/2308.16824
  • repo_url: https://github.com/nl2code/codem
  • paper_authors: Daoguang Zan, Ailun Yu, Bo Shen, Jiaxin Zhang, Taihong Chen, Bing Geng, Bei Chen, Jichuan Ji, Yafen Yao, Yongji Wang, Qianxiang Wang
  • for: 本研究探讨了 Whether programming languages can boost each other during the instruction fine-tuning phase of code large language models.
  • methods: 我们使用了 8 种流行的编程语言 (Python, JavaScript, TypeScript, C, C++, Java, Go, HTML) 在 StarCoder 上进行了广泛的实验。
  • results: 结果表明,编程语言可以很大程度上提高对方。例如, CodeM-Python 15B 在 Python 上训练后可以提高 Java 的 pass@1 精度达 17.95%。而即使使用 HTML corpus 进行训练,CodeM-HTML 7B 也可以提高 Java 的 pass@1 精度达 15.24%。我们的训练数据可以在 GitHub 上下载。
    Abstract When human programmers have mastered a programming language, it would be easier when they learn a new programming language. In this report, we focus on exploring whether programming languages can boost each other during the instruction fine-tuning phase of code large language models. We conduct extensive experiments of 8 popular programming languages (Python, JavaScript, TypeScript, C, C++, Java, Go, HTML) on StarCoder. Results demonstrate that programming languages can significantly improve each other. For example, CodeM-Python 15B trained on Python is able to increase Java by an absolute 17.95% pass@1 on HumanEval-X. More surprisingly, we found that CodeM-HTML 7B trained on the HTML corpus can improve Java by an absolute 15.24% pass@1. Our training data is released at https://github.com/NL2Code/CodeM.
    摘要 当人工程师掌握了一种编程语言,学习新的编程语言就会变得更加容易。在这份报告中,我们关注于研究 whether 编程语言可以在代码大型自然语言模型的指令细化阶段互相提高。我们在 StarCoder 上进行了广泛的实验,测试了 8 种流行的编程语言(Python、JavaScript、TypeScript、C、C++、Java、Go、HTML)。结果表明,编程语言可以彼此提高。例如, CodeM-Python 15B 在 Python 上训练后,可以提高 Java 的 pass@1 精度达 17.95%。更 surprisngly,我们发现 CodeM-HTML 7B 在 HTML 语料库上训练后,可以提高 Java 的 pass@1 精度达 15.24%。我们的训练数据可以在 GitHub 上下载:https://github.com/NL2Code/CodeM。

Latent Variable Multi-output Gaussian Processes for Hierarchical Datasets

  • paper_url: http://arxiv.org/abs/2308.16822
  • repo_url: None
  • paper_authors: Chunchao Ma, Arthur Leroy, Mauricio Alvarez
  • for: 这篇论文旨在提出一种基于树结构的多输出泊尔 proces(MOGPs)的扩展,以处理具有层次结构的数据集。
  • methods: 该论文提出了一种适应层次结构数据集的MOGPs模型,其中包含一个适应层次结构数据集的 kernel function,以及一个新的 latent variables kernel,用于表达输出之间的下游关系。
  • results: 经过对both synthetic和实际数据进行了extensive的实验研究, authors 发现,该扩展模型可以显著提高对多任务数据的渐进性和泊尔表达能力。
    Abstract Multi-output Gaussian processes (MOGPs) have been introduced to deal with multiple tasks by exploiting the correlations between different outputs. Generally, MOGPs models assume a flat correlation structure between the outputs. However, such a formulation does not account for more elaborate relationships, for instance, if several replicates were observed for each output (which is a typical setting in biological experiments). This paper proposes an extension of MOGPs for hierarchical datasets (i.e. datasets for which the relationships between observations can be represented within a tree structure). Our model defines a tailored kernel function accounting for hierarchical structures in the data to capture different levels of correlations while leveraging the introduction of latent variables to express the underlying dependencies between outputs through a dedicated kernel. This latter feature is expected to significantly improve scalability as the number of tasks increases. An extensive experimental study involving both synthetic and real-world data from genomics and motion capture is proposed to support our claims.
    摘要 多输出泊松过程(MOGPs)已经引入以处理多个任务,通过利用不同输出之间的相关性。通常,MOGPs 模型假设输出之间的相关性平坦。然而,这种形式不会考虑更复杂的关系,例如每个输出都有多个重复观测(这是生物实验中常见的设置)。本文提出了基于层次结构的 MOGPs 扩展,我们的模型定义了适应层次结构数据的专门kernel函数,以捕捉不同层次的相关性,同时通过专门的kernel表示输出之间的依赖关系。这种特点预期会在任务数量增加时提高可扩展性。我们采用了大量的实验研究,包括synthetic和实际数据来支持我们的主张。

Irregular Traffic Time Series Forecasting Based on Asynchronous Spatio-Temporal Graph Convolutional Network

  • paper_url: http://arxiv.org/abs/2308.16818
  • repo_url: None
  • paper_authors: Weijia Zhang, Le Zhang, Jindong Han, Hao Liu, Jingbo Zhou, Yu Mei, Hui Xiong
    for: 这篇论文旨在提出一个能够实现高精度交通预测的方法,以提高智能交通信号系统的效率。methods: 本文使用了 asynchronous spatio-temporal graph convolutional neural network (ASeer),它通过连接车道的交通散射网络来模型车道间的异步空间相依性,并使用可学习的个人化时间编码来捕捉车道间的时间相依性。results: 实验结果显示,ASeer 能够实现高精度的交通预测,并且在六个度量上优于现有的方法。
    Abstract Accurate traffic forecasting at intersections governed by intelligent traffic signals is critical for the advancement of an effective intelligent traffic signal control system. However, due to the irregular traffic time series produced by intelligent intersections, the traffic forecasting task becomes much more intractable and imposes three major new challenges: 1) asynchronous spatial dependency, 2) irregular temporal dependency among traffic data, and 3) variable-length sequence to be predicted, which severely impede the performance of current traffic forecasting methods. To this end, we propose an Asynchronous Spatio-tEmporal graph convolutional nEtwoRk (ASeer) to predict the traffic states of the lanes entering intelligent intersections in a future time window. Specifically, by linking lanes via a traffic diffusion graph, we first propose an Asynchronous Graph Diffusion Network to model the asynchronous spatial dependency between the time-misaligned traffic state measurements of lanes. After that, to capture the temporal dependency within irregular traffic state sequence, a learnable personalized time encoding is devised to embed the continuous time for each lane. Then we propose a Transformable Time-aware Convolution Network that learns meta-filters to derive time-aware convolution filters with transformable filter sizes for efficient temporal convolution on the irregular sequence. Furthermore, a Semi-Autoregressive Prediction Network consisting of a state evolution unit and a semiautoregressive predictor is designed to effectively and efficiently predict variable-length traffic state sequences. Extensive experiments on two real-world datasets demonstrate the effectiveness of ASeer in six metrics.
    摘要 准确预测交通流量在智能交通信号控制系统中是关键。然而,由于智能交通信号处理器生成的交通流量时序序列具有不规则性和非线性,这导致了三大新挑战:1) asynchronous spatial dependency,2) irregular temporal dependency among traffic data,和3) variable-length sequence to be predicted。为解决这些挑战,我们提出了一种异步空间-时间图 convolutional neural network(ASeer),用于预测进入智能交通 crossing 的车道 traffic states 在未来时间窗口内。 Specifically, we first model the asynchronous spatial dependency between the time-misaligned traffic state measurements of lanes by linking lanes via a traffic diffusion graph. Then, to capture the temporal dependency within the irregular traffic state sequence, we devise a learnable personalized time encoding to embed the continuous time for each lane. After that, we propose a Transformable Time-aware Convolution Network that learns meta-filters to derive time-aware convolution filters with transformable filter sizes for efficient temporal convolution on the irregular sequence. Furthermore, a Semi-Autoregressive Prediction Network consisting of a state evolution unit and a semiautoregressive predictor is designed to effectively and efficiently predict variable-length traffic state sequences. Our extensive experiments on two real-world datasets demonstrate the effectiveness of ASeer in six metrics.

Rank Collapse Causes Over-Smoothing and Over-Correlation in Graph Neural Networks

  • paper_url: http://arxiv.org/abs/2308.16800
  • repo_url: None
  • paper_authors: Andreas Roth, Thomas Liebig
  • for: This paper aims to provide new theoretical insights into the issues of over-smoothing and feature over-correlation in deep graph neural networks.
  • methods: The paper uses a theoretical approach to demonstrate the prevalence of invariant subspaces in deep graph neural networks, and shows how this can lead to over-smoothing and over-correlation.
  • results: The paper’s results include a better understanding of the causes of over-smoothing and over-correlation, and the proposal of a sum of Kronecker products as a beneficial property that can prevent these issues. Additionally, the paper demonstrates the inability of existing models to capture linearly independent features in the non-linear case.
    Abstract Our study reveals new theoretical insights into over-smoothing and feature over-correlation in deep graph neural networks. We show the prevalence of invariant subspaces, demonstrating a fixed relative behavior that is unaffected by feature transformations. Our work clarifies recent observations related to convergence to a constant state and a potential over-separation of node states, as the amplification of subspaces only depends on the spectrum of the aggregation function. In linear scenarios, this leads to node representations being dominated by a low-dimensional subspace with an asymptotic convergence rate independent of the feature transformations. This causes a rank collapse of the node representations, resulting in over-smoothing when smooth vectors span this subspace, and over-correlation even when over-smoothing is avoided. Guided by our theory, we propose a sum of Kronecker products as a beneficial property that can provably prevent over-smoothing, over-correlation, and rank collapse. We empirically extend our insights to the non-linear case, demonstrating the inability of existing models to capture linearly independent features.
    摘要

Agent Teaming Situation Awareness (ATSA): A Situation Awareness Framework for Human-AI Teaming

  • paper_url: http://arxiv.org/abs/2308.16785
  • repo_url: None
  • paper_authors: Qi Gao, Wei Xu, Mowei Shen, Zaifeng Gao
    for:* The paper is written to provide a review of leading situation awareness (SA) theoretical models and to propose a new framework for SA in the human-AI teaming (HAT) context.methods:* The paper uses a literature review to identify key features and processes of HAT and to develop a new framework for SA in the HAT context.results:* The proposed Agent Teaming Situation Awareness (ATSA) framework unifies human and AI behavior and emphasizes cohesive and effective HAT through structures and components such as teaming understanding, teaming control, and the world.Here is the information in Simplified Chinese text, as requested:for:* 论文是为了提供人机合作(HAT)场景下的情况意识(SA)理论模型的回顾和新的SA模型框架。methods:* 论文通过文献综述来确定HAT场景中的关键特征和过程,并开发了新的SA模型框架。results:* 提出的Agent Teaming Situation Awareness(ATSA)框架将人机行为统一,强调团队理解、团队控制和世界等结构和组件,以实现效果的HAT。
    Abstract The rapid advancements in artificial intelligence (AI) have led to a growing trend of human-AI teaming (HAT) in various fields. As machines continue to evolve from mere automation to a state of autonomy, they are increasingly exhibiting unexpected behaviors and human-like cognitive/intelligent capabilities, including situation awareness (SA). This shift has the potential to enhance the performance of mixed human-AI teams over all-human teams, underscoring the need for a better understanding of the dynamic SA interactions between humans and machines. To this end, we provide a review of leading SA theoretical models and a new framework for SA in the HAT context based on the key features and processes of HAT. The Agent Teaming Situation Awareness (ATSA) framework unifies human and AI behavior, and involves bidirectional, and dynamic interaction. The framework is based on the individual and team SA models and elaborates on the cognitive mechanisms for modeling HAT. Similar perceptual cycles are adopted for the individual (including both human and AI) and the whole team, which is tailored to the unique requirements of the HAT context. ATSA emphasizes cohesive and effective HAT through structures and components, including teaming understanding, teaming control, and the world, as well as adhesive transactive part. We further propose several future research directions to expand on the distinctive contributions of ATSA and address the specific and pressing next steps.
    摘要 人工智能(AI)的快速进步导致人机合作(HAT)在不同领域得到了普遍应用。随着机器的演进从自动化到智能化,它们开始显示出人类智能的特征和不期望的行为,包括情境意识(SA)。这种变化有可能提高混合人机队列的性能,高亮了我们更好地理解人机合作中的SA交互的需要。为此,我们提供了SA理论模型的综述和人机合作情境中SA框架(ATSA),该框架将人类和AI行为结合在一起,并包括对向和动态互动。ATSA基于个体和团队SA模型,并详细介绍了人机合作中的认知机制。在团队水平上,采用同样的观察循环,包括人类和AI的个体SA,以适应HAT特殊需求。ATSA强调合作和有效的人机合作,通过结构和组件,如团队理解、团队控制和世界,以及贯通性的交互。我们还建议了一些未来研究方向,以扩展ATSA的独特贡献和解决特定和紧迫的下一步。

StratMed: Relevance Stratification for Low-resource Medication Recommendation

  • paper_url: http://arxiv.org/abs/2308.16781
  • repo_url: None
  • paper_authors: Xiang Li
  • for: 这篇论文的目的是提出一个基于人工智能的药物建议方法,以整合长期医疗历史资料和医疗知识,帮助医生诊断更加精确和安全的药物组合。
  • methods: 这篇论文使用了一个创新的相关分类机制,协调资料的长尾分布差异,并将安全和精确的药物组合表现同等化。Specifically, the authors first construct a pre-training method using deep learning networks to obtain entity representation, and then design a pyramid-like data stratification method to obtain more generalized entity relationships by reinforcing the features of unpopular entities. Based on this relationship, they designed two graph structures to express medication precision and safety at the same level to obtain visit representations.
  • results: 实验结果显示,该方法在MIMIC-III dataset上比现有的州际专业方法表现出色,在四个评估指标中(包括安全和精确)都有出色的表现。
    Abstract With the growing imbalance between limited medical resources and escalating demands, AI-based clinical tasks have become paramount. Medication recommendation, as a sub-domain, aims to amalgamate longitudinal patient history with medical knowledge, assisting physicians in prescribing safer and more accurate medication combinations. Existing methods overlook the inherent long-tail distribution in medical data, lacking balanced representation between head and tail data, which leads to sub-optimal model performance. To address this challenge, we introduce StratMed, a model that incorporates an innovative relevance stratification mechanism. It harmonizes discrepancies in data long-tail distribution and strikes a balance between the safety and accuracy of medication combinations. Specifically, we first construct a pre-training method using deep learning networks to obtain entity representation. After that, we design a pyramid-like data stratification method to obtain more generalized entity relationships by reinforcing the features of unpopular entities. Based on this relationship, we designed two graph structures to express medication precision and safety at the same level to obtain visit representations. Finally, the patient's historical clinical information is fitted to generate medication combinations for the current health condition. Experiments on the MIMIC-III dataset demonstrate that our method has outperformed current state-of-the-art methods in four evaluation metrics (including safety and accuracy).
    摘要 To address this challenge, we introduce StratMed, a model that incorporates an innovative relevance stratification mechanism. This mechanism harmonizes discrepancies in data long-tail distribution and strikes a balance between the safety and accuracy of medication combinations.Specifically, we first construct a pre-training method using deep learning networks to obtain entity representation. After that, we design a pyramid-like data stratification method to obtain more generalized entity relationships by reinforcing the features of unpopular entities. Based on this relationship, we designed two graph structures to express medication precision and safety at the same level to obtain visit representations. Finally, the patient's historical clinical information is fitted to generate medication combinations for the current health condition.Experiments on the MIMIC-III dataset demonstrate that our method has outperformed current state-of-the-art methods in four evaluation metrics (including safety and accuracy).

Efficacy of Neural Prediction-Based NAS for Zero-Shot NAS Paradigm

  • paper_url: http://arxiv.org/abs/2308.16775
  • repo_url: https://github.com/minh1409/dft-npzs-nas
  • paper_authors: Minh Le, Nhan Nguyen, Ngoc Hoang Luong
  • for: This paper focuses on addressing the limitation of performance indicators in prediction-based Neural Architecture Search (NAS), specifically the inability to evaluate architecture performance across varying search spaces.
  • methods: The proposed approach uses Fourier sum of sines encoding for convolutional kernels, which enables the construction of a computational feed-forward graph with a structure similar to the architecture under evaluation. An accompanying multi-layer perceptron (MLP) then ranks these architectures based on their encodings.
  • results: The approach proposed in this paper surpasses previous methods using graph convolutional networks in terms of correlation on the NAS-Bench-201 dataset and exhibits a higher convergence rate. Moreover, the extracted feature representation trained on each NAS-Benchmark is transferable to other NAS-Benchmarks, showing promising generalizability across multiple search spaces.
    Abstract In prediction-based Neural Architecture Search (NAS), performance indicators derived from graph convolutional networks have shown significant success. These indicators, achieved by representing feed-forward structures as component graphs through one-hot encoding, face a limitation: their inability to evaluate architecture performance across varying search spaces. In contrast, handcrafted performance indicators (zero-shot NAS), which use the same architecture with random initialization, can generalize across multiple search spaces. Addressing this limitation, we propose a novel approach for zero-shot NAS using deep learning. Our method employs Fourier sum of sines encoding for convolutional kernels, enabling the construction of a computational feed-forward graph with a structure similar to the architecture under evaluation. These encodings are learnable and offer a comprehensive view of the architecture's topological information. An accompanying multi-layer perceptron (MLP) then ranks these architectures based on their encodings. Experimental results show that our approach surpasses previous methods using graph convolutional networks in terms of correlation on the NAS-Bench-201 dataset and exhibits a higher convergence rate. Moreover, our extracted feature representation trained on each NAS-Benchmark is transferable to other NAS-Benchmarks, showing promising generalizability across multiple search spaces. The code is available at: https://github.com/minh1409/DFT-NPZS-NAS
    摘要 在预测基于的神经网络搜索(NAS)中,基于图 convolutional networks 的性能指标得到了显著的成功。这些指标,通过将 feed-forward 结构表示为组成图通过一个一个式编码,面临一个限制:它们无法评估搜索空间中不同的架构性能。相比之下,手工制作的性能指标(零shot NAS),使用同一个架构并且随机初始化,可以在多个搜索空间中 generale。为了解决这一限制,我们提出了一种新的零shot NAS 方法,使用深度学习。我们的方法使用 Fourier 和平差编码来构建一个计算 feed-forward 图,其结构与被评估的架构相似。这些编码是学习的,可以提供架构的全面信息。随后的多层感知器(MLP)将这些架构按照其编码进行排序。实验结果表明,我们的方法在 NAS-Bench-201 数据集上和前一代方法使用 graph convolutional networks 相比,具有更高的相关性和更快的收敛率。此外,我们提取的特征表示在每个 NAS-Benchmark 上训练,可以在其他 NAS-Benchmark 上提取到较好的特征,表现出了良好的普适性。代码可以在:https://github.com/minh1409/DFT-NPZS-NAS 上获取。

Towards Low-Barrier Cybersecurity Research and Education for Industrial Control Systems

  • paper_url: http://arxiv.org/abs/2308.16769
  • repo_url: None
  • paper_authors: Colman McGuan, Chansu Yu, Qin Lin
  • for: 这个研究旨在提供一个可靠的测试环境,以便 validate和比较各种入侵检测算法,以保护工业控制系统(ICS)。
  • methods: 我们使用了3D高精度模拟器,实现自动启动攻击,收集数据,训练机器学习模型,并评估在实际生产过程中。
  • results: 我们的Minimal Threshold and Window SVM(MinTWin SVM)模型可以实现避免伪阳性,并对物理过程异常进行感知。此外,我们在ICScybersecurity教育中使用了我们的数据集,让学生在实际ICS数据上练习机器学习理论。
    Abstract The protection of Industrial Control Systems (ICS) that are employed in public critical infrastructures is of utmost importance due to catastrophic physical damages cyberattacks may cause. The research community requires testbeds for validation and comparing various intrusion detection algorithms to protect ICS. However, there exist high barriers to entry for research and education in the ICS cybersecurity domain due to expensive hardware, software, and inherent dangers of manipulating real-world systems. To close the gap, built upon recently developed 3D high-fidelity simulators, we further showcase our integrated framework to automatically launch cyberattacks, collect data, train machine learning models, and evaluate for practical chemical and manufacturing processes. On our testbed, we validate our proposed intrusion detection model called Minimal Threshold and Window SVM (MinTWin SVM) that utilizes unsupervised machine learning via a one-class SVM in combination with a sliding window and classification threshold. Results show that MinTWin SVM minimizes false positives and is responsive to physical process anomalies. Furthermore, we incorporate our framework with ICS cybersecurity education by using our dataset in an undergraduate machine learning course where students gain hands-on experience in practicing machine learning theory with a practical ICS dataset. All of our implementations have been open-sourced.
    摘要 对于公共重要基础设施中使用的工业控制系统(ICS)的安全保护非常重要,因为黑客可以通过网络攻击引起严重的物理损害。研究社区需要测试平台来验证和比较不同的入侵检测算法,以保护ICS。但是,ICS安全领域的研究和教育面临着高的入门障碍,因为ICS系统的硬件、软件和实际操作是昂贵的,而且具有很高的危险性。为了解决这个问题,我们基于最近发展的3D高精度模拟器,提供了一个集成的测试平台,可以自动发起网络攻击,收集数据,训练机器学习模型,并评估实际化学和制造过程中的做法。在我们的测试平台上,我们验证了我们提出的入侵检测模型,称为最小阈值窗口支持向量机(MinTWin SVM),它利用了无监督的机器学习,并将窗口和分类阈值结合使用。结果表明,MinTWin SVM可以减少假阳性,同时快速响应物理过程异常。此外,我们将我们的框架与ICS安全教育相结合,通过使用我们的数据集在大学生Machine learning课程中进行实践,让学生通过实践机器学习理论来掌握ICS数据集的实践应用。所有我们的实现都已经开源。

Ladder-of-Thought: Using Knowledge as Steps to Elevate Stance Detection

  • paper_url: http://arxiv.org/abs/2308.16763
  • repo_url: None
  • paper_authors: Kairui Hu, Ming Yan, Joey Tianyi Zhou, Ivor W. Tsang, Wen Haw Chong, Yong Keong Yap
  • for: 提高大型自然语言模型(LLM)的逻辑能力,以及提高小型LLM的性能。
  • methods: 提出了一种名为“笔脚架”(LoT)的双阶段协同优化框架,通过充分利用高质量的外部知识,提高模型生成的中间逻辑。
  • results: 对比chatGPT和chatGPT+CoT,LoT achieved a 16% improvement in stance detection task, and a 10% improvement compared to chatGPT with CoT.
    Abstract Chain-of-Thought Prompting (CoT) reinforces the reasoning capabilities of Large Language Models (LLMs) through the generation of intermediate rationales. However, these enhancements predominantly benefit large-scale models, leaving small LMs without significant performance improvements when directly applying CoT. Despite the advanced reasoning capabilities of LLMs, CoT relies primarily on their pre-trained internal knowledge. The external knowledge that is previously unknown to the model remains unexploited. This omission becomes pronounced in tasks such as stance detection, where the external background knowledge plays a pivotal role. Additionally, the large-scale architecture of LLMs inevitably present efficiency challenges during deployment. To address these challenges, we introduce the Ladder-of-Thought (LoT) for stance detection. Grounded in a dual-phase Cascaded Optimization framework, LoT directs the model to incorporate high-quality external knowledge, enhancing the intermediate rationales it generates. These bolstered rationales subsequently serve as the foundation for more precise predictions - akin to how a ladder facilitates reaching elevated goals. LoT achieves a balance between efficiency and accuracy, making it an adaptable and efficient framework for stance detection. Our empirical evaluations underscore LoT's effectiveness, marking a 16% improvement over ChatGPT and a 10% enhancement compared to ChatGPT with CoT.
    摘要 链接思维提示(CoT)可以增强大语言模型(LLM)的逻辑能力,但是这些改进主要适用于大规模模型,小型LM无法直接应用CoT而获得显著性能提升。尽管LLM具有高度的逻辑能力,但CoT仍然主要基于其先前预训练的内部知识。外部知识,尚未被模型所知悉,则被忽略。这种欠缺特别manifest在tasks like stance detection中, где外部背景知识扮演着关键性的角色。此外,大规模的LLM架构在部署时依然会存在效率挑战。为了解决这些挑战,我们提出了思维阶梯(LoT) для stance detection。LoT基于双阶段分布式优化框架,使模型能够更好地利用高质量的外部知识,并在生成中提高中间逻辑。这些加强的逻辑后续成为更准确的预测基础,类似于如何使用梯子来达到更高的目标。LoT寻求效率和准确性之间的平衡,使其成为适应性强的和高效的框架。我们的实验证明了LoT的效果,与ChatGPT和ChatGPT with CoT相比,LoT提高了16%和10%。

Context Aware Query Rewriting for Text Rankers using LLM

  • paper_url: http://arxiv.org/abs/2308.16753
  • repo_url: None
  • paper_authors: Abhijit Anand, Venktesh V, Vinay Setty, Avishek Anand
  • for: 提高文档排序任务中的查询模糊匹配问题的解决方案。
  • methods: 使用生成模型(LLMs)生成 pseudo 文档,以优化查询模糊匹配问题。
  • results: 在训练阶段使用 CAR approach rewrite 查询,可以提高文档排序任务的表现,比基eline表现提高至多33%。
    Abstract Query rewriting refers to an established family of approaches that are applied to underspecified and ambiguous queries to overcome the vocabulary mismatch problem in document ranking. Queries are typically rewritten during query processing time for better query modelling for the downstream ranker. With the advent of large-language models (LLMs), there have been initial investigations into using generative approaches to generate pseudo documents to tackle this inherent vocabulary gap. In this work, we analyze the utility of LLMs for improved query rewriting for text ranking tasks. We find that there are two inherent limitations of using LLMs as query re-writers -- concept drift when using only queries as prompts and large inference costs during query processing. We adopt a simple, yet surprisingly effective, approach called context aware query rewriting (CAR) to leverage the benefits of LLMs for query understanding. Firstly, we rewrite ambiguous training queries by context-aware prompting of LLMs, where we use only relevant documents as context.Unlike existing approaches, we use LLM-based query rewriting only during the training phase. Eventually, a ranker is fine-tuned on the rewritten queries instead of the original queries during training. In our extensive experiments, we find that fine-tuning a ranker using re-written queries offers a significant improvement of up to 33% on the passage ranking task and up to 28% on the document ranking task when compared to the baseline performance of using original queries.
    摘要 Query 重写指的是一家已经确立的方法,用于解决文档排名中 vocabulary 匹配问题。通常情况下,查询将在查询处理时进行重写,以便更好地模型查询。随着大型语言模型(LLM)的出现,有初步的调查表明,可以使用生成方法生成 pseudo 文档来解决这种遗传 vocabulary 差距。在这种工作中,我们分析了使用 LLM 进行改进查询重写的 utility。我们发现了两种使用 LLM 作为查询重写器的内在限制:一是概念漂移,只使用查询作为提示;二是大量的推理成本 durante 查询处理。我们采用一种简单 yet 有效的方法,即 context-aware 查询重写(CAR),以利用 LLM 的优势来更好地理解查询。首先,我们将ambiguous 的训练查询重写为上下文感知的 LLM 提示,并且只使用相关的文档作为上下文。不同于现有的方法,我们在训练阶段使用 LLM 进行查询重写,而不是在查询处理阶段。最后,我们在训练阶段使用重写后的查询进行rankers 的精度。在我们的广泛实验中,我们发现,使用重写后的查询可以提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表现,提高文档排名任务的表

Socratis: Are large multimodal models emotionally aware?

  • paper_url: http://arxiv.org/abs/2308.16741
  • repo_url: None
  • paper_authors: Katherine Deng, Arijit Ray, Reuben Tan, Saadia Gabriel, Bryan A. Plummer, Kate Saenko
  • for: 提高 Multimodal 语言模型对情感的认知和生成能力
  • methods: 使用多种情感标签和理由描述来评估模型的表现
  • results: 人类对人写的理由更加喜欢,而不是机器生成的理由,而且现有的captioning metric不能与人类喜好相吻合
    Abstract Existing emotion prediction benchmarks contain coarse emotion labels which do not consider the diversity of emotions that an image and text can elicit in humans due to various reasons. Learning diverse reactions to multimodal content is important as intelligent machines take a central role in generating and delivering content to society. To address this gap, we propose Socratis, a \underline{soc}ietal \underline{r}e\underline{a}c\underline{ti}on\underline{s} benchmark, where each image-caption (IC) pair is annotated with multiple emotions and the reasons for feeling them. Socratis contains 18K free-form reactions for 980 emotions on 2075 image-caption pairs from 5 widely-read news and image-caption (IC) datasets. We benchmark the capability of state-of-the-art multimodal large language models to generate the reasons for feeling an emotion given an IC pair. Based on a preliminary human study, we observe that humans prefer human-written reasons over 2 times more often than machine-generated ones. This shows our task is harder than standard generation tasks because it starkly contrasts recent findings where humans cannot tell apart machine vs human-written news articles, for instance. We further see that current captioning metrics based on large vision-language models also fail to correlate with human preferences. We hope that these findings and our benchmark will inspire further research on training emotionally aware models.
    摘要 现有的情绪预测 benchmark 包含粗糙的情绪标签,不考虑图文内容对人类的多样化情绪响应。学习多样化情绪对图文内容是重要的,因为智能机器在生成和传递内容方面发挥了中心作用。为解决这个差距,我们提议了 Socratis benchmark,每个图文笔记 (IC) 对象被标注为多种情绪和其原因。Socratis 包含 18,000 个自由格式的反应,用于 980 种情绪的 2,075 个图文笔记对。我们使用现代大语言模型测试能否生成情绪的原因,并观察到人类更加偏好人工写的原因,相比于机器生成的原因。此外,我们还发现现有的captioning metric 基于大视语言模型并不与人类偏好相关。我们希望这些发现和我们的 benchmark 能够激发更多的情绪意识模型训练研究。

Robust Networked Federated Learning for Localization

  • paper_url: http://arxiv.org/abs/2308.16737
  • repo_url: None
  • paper_authors: Reza Mirzaeifard, Naveen K. D. Venkategowda, Stefan Werner
  • For: 本研究旨在解决 Federated Learning 环境中的地域化问题,该问题是非 convex 和非凸的,数据分布在多个设备上。* Methods: 我们提议一种使用 $L_1$-norm 稳定的分布式 sub-gradient 框架,以适应 Federated Learning 环境中的异常数据问题。* Results: 我们的方法可以快速地 converge to a stationary point,并在实验中证明其超越现有的地域化方法。
    Abstract This paper addresses the problem of localization, which is inherently non-convex and non-smooth in a federated setting where the data is distributed across a multitude of devices. Due to the decentralized nature of federated environments, distributed learning becomes essential for scalability and adaptability. Moreover, these environments are often plagued by outlier data, which presents substantial challenges to conventional methods, particularly in maintaining estimation accuracy and ensuring algorithm convergence. To mitigate these challenges, we propose a method that adopts an $L_1$-norm robust formulation within a distributed sub-gradient framework, explicitly designed to handle these obstacles. Our approach addresses the problem in its original form, without resorting to iterative simplifications or approximations, resulting in enhanced computational efficiency and improved estimation accuracy. We demonstrate that our method converges to a stationary point, highlighting its effectiveness and reliability. Through numerical simulations, we confirm the superior performance of our approach, notably in outlier-rich environments, which surpasses existing state-of-the-art localization methods.
    摘要

Post-Deployment Adaptation with Access to Source Data via Federated Learning and Source-Target Remote Gradient Alignment

  • paper_url: http://arxiv.org/abs/2308.16735
  • repo_url: https://github.com/felixwag/staralign
  • paper_authors: Felix Wagner, Zeju Li, Pramit Saha, Konstantinos Kamnitsas
  • for: 本文旨在 Addressing the distribution shift problem in deep neural network deployment for medical imaging, specifically in the context of post-deployment adaptation (PDA) with limited or no labelled target data.
  • methods: 本文提出了一种新的适应框架 called FedPDA,它利用远程学习来帮助已经部署的模型适应目标数据分布。此外,文章还提出了一种新的优化方法 StarAlign,用于将源数据和目标数据之间的梯度进行对齐,以便学习一个特定的目标模型。
  • results: 文章通过使用多个医疗机构的数据库进行肿瘤检测和皮肤病分类任务,证明了 StarAlign 方法的有效性,与之前的工作相比,其表现更好。
    Abstract Deployment of Deep Neural Networks in medical imaging is hindered by distribution shift between training data and data processed after deployment, causing performance degradation. Post-Deployment Adaptation (PDA) addresses this by tailoring a pre-trained, deployed model to the target data distribution using limited labelled or entirely unlabelled target data, while assuming no access to source training data as they cannot be deployed with the model due to privacy concerns and their large size. This makes reliable adaptation challenging due to limited learning signal. This paper challenges this assumption and introduces FedPDA, a novel adaptation framework that brings the utility of learning from remote data from Federated Learning into PDA. FedPDA enables a deployed model to obtain information from source data via remote gradient exchange, while aiming to optimize the model specifically for the target domain. Tailored for FedPDA, we introduce a novel optimization method StarAlign (Source-Target Remote Gradient Alignment) that aligns gradients between source-target domain pairs by maximizing their inner product, to facilitate learning a target-specific model. We demonstrate the method's effectiveness using multi-center databases for the tasks of cancer metastases detection and skin lesion classification, where our method compares favourably to previous work. Code is available at: https://github.com/FelixWag/StarAlign
    摘要 部署深度神经网络在医疗影像领域面临分布shift问题,导致性能下降。协作式适应(PDA)解决这个问题,通过使用有限的标注或无标注目标数据来适应目标数据分布,而不需要访问源训练数据,因为隐私问题和它们的大小。这使得可靠的适应变得困难,因为有限的学习信号。这篇论文挑战这一假设,并引入FedPDA,一种新的适应框架,它将在联合学习中获得源数据信息,并且通过远程梯度交换来优化模型,以便适应目标领域。为了适应FedPDA,我们引入了一种新的优化方法:StarAlign(源-目标远程梯度匹配),它通过最大化源-目标对的内积来匹配梯度,以便学习一个特定的目标模型。我们使用多个医疗数据中心的数据进行肿瘤检测和皮肤涂抹分类任务,并证明了我们的方法与之前的工作相比较有利。代码可以在 GitHub 上找到:https://github.com/FelixWag/StarAlign。

Proof of Deep Learning: Approaches, Challenges, and Future Directions

  • paper_url: http://arxiv.org/abs/2308.16730
  • repo_url: None
  • paper_authors: Mahmoud Salhab, Khaleel Mershad
  • for: 本研究主要旨在调查各种Proof of Deep Learning(PoDL)机制,了解它们的优缺点,以及它们在不同应用场景中的可能性。
  • methods: 本研究使用了多种方法,包括Literature Review、Algorithm Analysis和Future Research Direction等。
  • results: 本研究结果显示,PoDL机制可以充分利用计算能力,同时保持区块链的安全性和完整性。但是,PoDL还需要进一步的研究和开发,以便在实际应用中得到更好的效果。
    Abstract The rise of computational power has led to unprecedented performance gains for deep learning models. As more data becomes available and model architectures become more complex, the need for more computational power increases. On the other hand, since the introduction of Bitcoin as the first cryptocurrency and the establishment of the concept of blockchain as a distributed ledger, many variants and approaches have been proposed. However, many of them have one thing in common, which is the Proof of Work (PoW) consensus mechanism. PoW is mainly used to support the process of new block generation. While PoW has proven its robustness, its main drawback is that it requires a significant amount of processing power to maintain the security and integrity of the blockchain. This is due to applying brute force to solve a hashing puzzle. To utilize the computational power available in useful and meaningful work while keeping the blockchain secure, many techniques have been proposed, one of which is known as Proof of Deep Learning (PoDL). PoDL is a consensus mechanism that uses the process of training a deep learning model as proof of work to add new blocks to the blockchain. In this paper, we survey the various approaches for PoDL. We discuss the different types of PoDL algorithms, their advantages and disadvantages, and their potential applications. We also discuss the challenges of implementing PoDL and future research directions.
    摘要 随着计算机力的提高,深度学习模型的性能得到了历史上无 precedent 的提升。随着更多的数据变得可用并模型结构变得更加复杂,需要更多的计算机力的增加。然而,自比特币的出现以来,各种变体和方法被提出,其中大多数具有一个共同之处,即证明工作(PoW)共识机制。PoW主要用于支持新块生成过程。虽然PoW已经证明了其Robustness,但它的主要缺点是需要大量的处理能力来保持区块链的安全性和完整性。这是因为通过施加 brut force 解决哈希拟合问题。为了利用计算机能源进行有用和意义的工作而不是保持区块链的安全性,许多技术被提出,其中之一是知名的深度学习证明(PoDL)。PoDL是一种使用深度学习模型证明工作来添加新块到区块链的共识机制。在这篇文章中,我们对PoDL的不同方法进行了抽象,讨论了它们的优缺点,以及它们在不同应用场景中的潜在应用。我们还讨论了实施PoDL的挑战和未来研究方向。

Terrain Diffusion Network: Climatic-Aware Terrain Generation with Geological Sketch Guidance

  • paper_url: http://arxiv.org/abs/2308.16725
  • repo_url: None
  • paper_authors: Zexin Hu, Kun Hu, Clinton Mo, Lei Pan, Zhiyong Wang
  • for: 这paper的目的是提出一种新的液态网络(TDN),用于实现更加真实的地形生成,并提供更高级别的用户控制性。
  • methods: 该方法使用了多层混清过程,并采用了用户指导的方式,以保证生成的地形更加真实和有趣。此外,该方法还使用了预训练的地形自动编码器,以提高生成的地形精度。
  • results: 经过广泛的实验,该方法在一个新的NASA Topology Images dataset上达到了状态方法的性能,并且可以生成更加真实和有趣的地形。
    Abstract Sketch-based terrain generation seeks to create realistic landscapes for virtual environments in various applications such as computer games, animation and virtual reality. Recently, deep learning based terrain generation has emerged, notably the ones based on generative adversarial networks (GAN). However, these methods often struggle to fulfill the requirements of flexible user control and maintain generative diversity for realistic terrain. Therefore, we propose a novel diffusion-based method, namely terrain diffusion network (TDN), which actively incorporates user guidance for enhanced controllability, taking into account terrain features like rivers, ridges, basins, and peaks. Instead of adhering to a conventional monolithic denoising process, which often compromises the fidelity of terrain details or the alignment with user control, a multi-level denoising scheme is proposed to generate more realistic terrains by taking into account fine-grained details, particularly those related to climatic patterns influenced by erosion and tectonic activities. Specifically, three terrain synthesisers are designed for structural, intermediate, and fine-grained level denoising purposes, which allow each synthesiser concentrate on a distinct terrain aspect. Moreover, to maximise the efficiency of our TDN, we further introduce terrain and sketch latent spaces for the synthesizers with pre-trained terrain autoencoders. Comprehensive experiments on a new dataset constructed from NASA Topology Images clearly demonstrate the effectiveness of our proposed method, achieving the state-of-the-art performance. Our code and dataset will be publicly available.
    摘要 《绘图基 terrain 生成》是一种目标创建虚拟环境中的真实景观,如电子游戏、动画和虚拟现实等应用。现在,基于深度学习的 terrain 生成技术在不断发展,其中以生成对抗网络(GAN)为代表。然而,这些方法经常难以满足用户控制的灵活性和生成多样性,以保证真实的地形。因此,我们提出了一种新的扩散基本方法,即 terrain 扩散网络(TDN),它可以 aktiv 地 incorporate 用户指导,考虑地形特征,如河流、山脊、盆地和峰峰。而不是遵循传统的单一杂化过程,TDN 可以生成更真实的地形,并且具有较高的用户控制性。为了实现这一目标,我们采用了多层杂化机制,包括三个不同级别的 terrain 杂化器,用于处理不同级别的地形细节。此外,为了提高 TDN 的效率,我们还引入了地形和绘图幂等空间,并采用了预训练的地形自动编码器。实验结果表明,我们的提议方法可以达到现状最佳性,并且在 NASA Topology 图像 Dataset 上进行了全面的测试。我们的代码和数据将在线公开。

CReHate: Cross-cultural Re-annotation of English Hate Speech Dataset

  • paper_url: http://arxiv.org/abs/2308.16705
  • repo_url: None
  • paper_authors: Nayeon Lee, Chani Jung, Junho Myung, Jiho Jin, Juho Kim, Alice Oh
  • for: This paper aims to address cultural biases in hate speech detection models and datasets by introducing a cross-cultural re-annotation of the SBIC dataset and analyzing differences in perceptions of hate speech among individuals from five distinct countries.
  • methods: The paper uses a cross-cultural re-annotation of the SBIC dataset, which includes annotations from Australia, Singapore, South Africa, the United Kingdom, and the United States. The authors also employ transfer learning to develop a culturally sensitive hate speech classifier.
  • results: The authors find significant differences in the perception of hate speech among individuals from different countries, with only 59.4% of the samples achieving consensus among all countries. They also develop a culturally sensitive hate speech classifier that can capture the perspectives of different nationalities.
    Abstract English datasets predominantly reflect the perspectives of certain nationalities, which can lead to cultural biases in models and datasets. This is particularly problematic in tasks heavily influenced by subjectivity, such as hate speech detection. To delve into how individuals from different countries perceive hate speech, we introduce CReHate, a cross-cultural re-annotation of the sampled SBIC dataset. This dataset includes annotations from five distinct countries: Australia, Singapore, South Africa, the United Kingdom, and the United States. Our thorough statistical analysis highlights significant differences based on nationality, with only 59.4% of the samples achieving consensus among all countries. We also introduce a culturally sensitive hate speech classifier via transfer learning, adept at capturing perspectives of different nationalities. These findings underscore the need to re-evaluate certain aspects of NLP research, especially with regard to the nuanced nature of hate speech in the English language.
    摘要 Translation notes:* "English datasets" is translated as "英语 datasets" (Yīngyǔ datasets) in Simplified Chinese.* "predominantly" is translated as "主要" (zhǔyào) in Simplified Chinese.* "reflect" is translated as "反映" (fǎngyìng) in Simplified Chinese.* "certain nationalities" is translated as "specific nationalities" (特定国籍) in Simplified Chinese.* "cultural biases" is translated as "文化偏见" (wénhuà péndiǎn) in Simplified Chinese.* "tasks heavily influenced by subjectivity" is translated as "受主观因素影响的任务" (shòu zhǔyǎn yìnxīng de jìnzuò) in Simplified Chinese.* "CReHate" is translated as "CReHate" (CReHate) in Simplified Chinese, as it is a proper noun.* "cross-cultural re-annotation" is translated as "跨文化重标注" (kuà wénhuà zhòngbiǎozhù) in Simplified Chinese.* "sampled SBIC dataset" is translated as "采样的 SBIC 数据集" (chǎi yàng de SBIC dàta sets) in Simplified Chinese.* "significant differences" is translated as "显著差异" (xiǎng zhì kù yì) in Simplified Chinese.* "only 59.4% of the samples achieving consensus among all countries" is translated as "只有59.4% 的样本达成全球各国的一致" (zhīyǒu 59.4% de yàngbèi dàchéng quánxiàng zhìyì) in Simplified Chinese.* "culturally sensitive hate speech classifier" is translated as "文化敏感 hate speech 分类器" (wénhuà mǐngkan hate speech fènklè yì) in Simplified Chinese.* "transfer learning" is translated as "传输学习" (chuánxīng xuéxí) in Simplified Chinese.* "adept at capturing perspectives of different nationalities" is translated as "能够捕捉不同国籍的视角" (nénggòu bòshì bùdìng guójiè de zhìkǎng) in Simplified Chinese.* "nuanced nature of hate speech in the English language" is translated as "英语中的仇恨言语之细节" (Yīngyǔ zhōng de shūhèn yánwén zhī xiǎo jiě) in Simplified Chinese.

Fault Injection and Safe-Error Attack for Extraction of Embedded Neural Network Models

  • paper_url: http://arxiv.org/abs/2308.16703
  • repo_url: None
  • paper_authors: Kevin Hector, Pierre-Alain Moellic, Mathieu Dumont, Jean-Max Dutertre
  • for: 本研究主要针对于嵌入式深度神经网络模型在IoT设备上的安全性问题,特别是模型抽取攻击。
  • methods: 本研究使用了常见的缺陷插入攻击策略——安全错误攻击(SEA)来实现模型抽取攻击。攻击者具有有限的训练数据访问权限。
  • results: 研究发现,使用约1500个手动设计的输入可以成功抽取嵌入式深度神经网络模型中的至少90%最重要比特数据,以训练一个与受害模型具有相似准确率的假模型。
    Abstract Model extraction emerges as a critical security threat with attack vectors exploiting both algorithmic and implementation-based approaches. The main goal of an attacker is to steal as much information as possible about a protected victim model, so that he can mimic it with a substitute model, even with a limited access to similar training data. Recently, physical attacks such as fault injection have shown worrying efficiency against the integrity and confidentiality of embedded models. We focus on embedded deep neural network models on 32-bit microcontrollers, a widespread family of hardware platforms in IoT, and the use of a standard fault injection strategy - Safe Error Attack (SEA) - to perform a model extraction attack with an adversary having a limited access to training data. Since the attack strongly depends on the input queries, we propose a black-box approach to craft a successful attack set. For a classical convolutional neural network, we successfully recover at least 90% of the most significant bits with about 1500 crafted inputs. These information enable to efficiently train a substitute model, with only 8% of the training dataset, that reaches high fidelity and near identical accuracy level than the victim model.
    摘要 模型提取emerges as a critical security threat, with attack vectors exploiting both algorithmic and implementation-based approaches. The main goal of an attacker is to steal as much information as possible about a protected victim model, so that he can mimic it with a substitute model, even with limited access to similar training data. Recently, physical attacks such as fault injection have shown worrying efficiency against the integrity and confidentiality of embedded models. We focus on embedded deep neural network models on 32-bit microcontrollers, a widespread family of hardware platforms in IoT, and the use of a standard fault injection strategy - Safe Error Attack (SEA) - to perform a model extraction attack with an adversary having limited access to training data. Since the attack strongly depends on the input queries, we propose a black-box approach to craft a successful attack set. For a classical convolutional neural network, we successfully recover at least 90% of the most significant bits with about 1500 crafted inputs. These information enable to efficiently train a substitute model, with only 8% of the training dataset, that reaches high fidelity and near identical accuracy level than the victim model.Note: The translation is in Simplified Chinese, which is the standard written form of Chinese used in mainland China and Singapore. Traditional Chinese is used in Hong Kong, Macau, and Taiwan.

Using Large Language Models to Automate Category and Trend Analysis of Scientific Articles: An Application in Ophthalmology

  • paper_url: http://arxiv.org/abs/2308.16688
  • repo_url: None
  • paper_authors: Hina Raja, Asim Munawar, Mohammad Delsoz, Mohammad Elahi, Yeganeh Madadi, Amr Hassan, Hashem Abu Serhan, Onur Inam, Luis Hermandez, Sang Tran, Wuqas Munir, Alaa Abd-Alrazaq, Hao Chen, SiamakYousefi
  • for: 这 paper 的目的是提出一种自动化文献分类方法,利用大型自然语言处理(NLP)技术和大语言模型(LLM)。
  • methods: 该方法基于 NLP 技术,包括高级 ZSL LLM 模型,对科学论文的文本内容进行处理和分析。
  • results: 实验结果表明,LLM 可以高效地自动分类大量的眼科论文,无需人工干预。在 RenD 数据集上,模型达到了平均准确率 0.86 和平均 F1 分数 0.85。Here’s the breakdown of each point:
  • for: The paper aims to propose an automated method for article classification using Large Language Models (LLMs) in the field of ophthalmology, but the model is extendable to other fields.
  • methods: The method is based on Natural Language Processing (NLP) techniques, including advanced LLMs, to process and analyze the textual content of scientific papers.
  • results: The experimental results demonstrate the effectiveness of LLMs in categorizing large number of ophthalmology papers without human intervention. The model achieved a mean accuracy of 0.86 and mean F1 of 0.85 based on the RenD dataset.
    Abstract Purpose: In this paper, we present an automated method for article classification, leveraging the power of Large Language Models (LLM). The primary focus is on the field of ophthalmology, but the model is extendable to other fields. Methods: We have developed a model based on Natural Language Processing (NLP) techniques, including advanced LLMs, to process and analyze the textual content of scientific papers. Specifically, we have employed zero-shot learning (ZSL) LLM models and compared against Bidirectional and Auto-Regressive Transformers (BART) and its variants, and Bidirectional Encoder Representations from Transformers (BERT), and its variant such as distilBERT, SciBERT, PubmedBERT, BioBERT. Results: The classification results demonstrate the effectiveness of LLMs in categorizing large number of ophthalmology papers without human intervention. Results: To evalute the LLMs, we compiled a dataset (RenD) of 1000 ocular disease-related articles, which were expertly annotated by a panel of six specialists into 15 distinct categories. The model achieved mean accuracy of 0.86 and mean F1 of 0.85 based on the RenD dataset. Conclusion: The proposed framework achieves notable improvements in both accuracy and efficiency. Its application in the domain of ophthalmology showcases its potential for knowledge organization and retrieval in other domains too. We performed trend analysis that enables the researchers and clinicians to easily categorize and retrieve relevant papers, saving time and effort in literature review and information gathering as well as identification of emerging scientific trends within different disciplines. Moreover, the extendibility of the model to other scientific fields broadens its impact in facilitating research and trend analysis across diverse disciplines.
    摘要 目的:在这篇论文中,我们提出了一种自动化文章分类方法,利用大型自然语言处理(NLP)模型的力量。我们的研究领域为眼科领域,但模型可以扩展到其他领域。方法:我们开发了基于NLP技术的模型,包括高级Zero-shot学习(ZSL)模型、bi-directional和自然语言模型(BART)和其变体、Bidirectional Encoder Representations from Transformers(BERT)和其变体如distilBERT、SciBERT、PubmedBERT、BioBERT。结果:我们对1000篇眼科疾病相关文章进行了自动分类,得到了人工干预无需的高精度分类结果。结果:为评估LLMs,我们编译了1000篇眼科疾病相关文章的 dataset(RenD),由6名专家 manually标注为15种不同类别。模型在RenD dataset上取得了0.86的 mean accuracy和0.85的 mean F1。结论:我们提出的框架实现了显著的提高 both accuracy和 efficiency。在眼科领域中应用该模型,可以帮助研究者和临床医生快速地分类和检索相关文章,节省时间和劳动力,并且可以快速地发现不同领域的科学趋势。此外,模型的扩展性使其在其他科学领域中有广泛的影响,推动了研究和趋势分析的进程。

Everyone Can Attack: Repurpose Lossy Compression as a Natural Backdoor Attack

  • paper_url: http://arxiv.org/abs/2308.16684
  • repo_url: None
  • paper_authors: Sze Jue Yang, Quang Nguyen, Chee Seng Chan, Khoa Doan
  • for: 这个论文主要关注的是机器学习模型中的潜在攻击问题,具体来说是 silent backdoor 攻击。
  • methods: 这个论文使用了一种广泛使用的lossy图像压缩算法来实现攻击,而且这种攻击不需要特殊的技能和努力,只需要点击“转换”或“保存为”按钮即可。
  • results: 这个论文的实验结果表明,这种攻击可以在多个 benchmark 数据集中 achieved 100% 攻击成功率,而且在干净标签设定下,只需要杂 poisoning 率才能达到近百分之十的攻击成功率。
    Abstract The vulnerabilities to backdoor attacks have recently threatened the trustworthiness of machine learning models in practical applications. Conventional wisdom suggests that not everyone can be an attacker since the process of designing the trigger generation algorithm often involves significant effort and extensive experimentation to ensure the attack's stealthiness and effectiveness. Alternatively, this paper shows that there exists a more severe backdoor threat: anyone can exploit an easily-accessible algorithm for silent backdoor attacks. Specifically, this attacker can employ the widely-used lossy image compression from a plethora of compression tools to effortlessly inject a trigger pattern into an image without leaving any noticeable trace; i.e., the generated triggers are natural artifacts. One does not require extensive knowledge to click on the "convert" or "save as" button while using tools for lossy image compression. Via this attack, the adversary does not need to design a trigger generator as seen in prior works and only requires poisoning the data. Empirically, the proposed attack consistently achieves 100% attack success rate in several benchmark datasets such as MNIST, CIFAR-10, GTSRB and CelebA. More significantly, the proposed attack can still achieve almost 100% attack success rate with very small (approximately 10%) poisoning rates in the clean label setting. The generated trigger of the proposed attack using one lossy compression algorithm is also transferable across other related compression algorithms, exacerbating the severity of this backdoor threat. This work takes another crucial step toward understanding the extensive risks of backdoor attacks in practice, urging practitioners to investigate similar attacks and relevant backdoor mitigation methods.
    摘要 Recently, backdoor attacks have posed a significant threat to the trustworthiness of machine learning models in practical applications. Conventional wisdom suggests that only a select few can launch attacks, as designing the trigger generation algorithm requires significant effort and extensive experimentation to ensure stealth and effectiveness. However, this paper reveals a more severe backdoor threat: anyone can exploit an easily accessible algorithm for silent backdoor attacks. Specifically, the attacker can use widely-used lossy image compression tools to effortlessly inject a trigger pattern into an image without leaving any noticeable trace; i.e., the generated triggers are natural artifacts. One does not require extensive knowledge to click on the "convert" or "save as" button while using these tools. Through this attack, the adversary does not need to design a trigger generator as seen in prior works and only requires poisoning the data. Our empirical results consistently achieve a 100% attack success rate in several benchmark datasets, including MNIST, CIFAR-10, GTSRB, and CelebA. Moreover, the proposed attack can still achieve almost 100% attack success rate with very small (approximately 10%) poisoning rates in the clean label setting. The generated trigger using one lossy compression algorithm is also transferable across other related compression algorithms, exacerbating the severity of this backdoor threat. This work takes another crucial step toward understanding the extensive risks of backdoor attacks in practice, urging practitioners to investigate similar attacks and relevant backdoor mitigation methods.

Fault Injection on Embedded Neural Networks: Impact of a Single Instruction Skip

  • paper_url: http://arxiv.org/abs/2308.16665
  • repo_url: None
  • paper_authors: Clement Gaine, Pierre-Alain Moellic, Olivier Potin, Jean-Max Dutertre
  • for: 这篇论文的目的是为了研究基于32位微控制器平台的神经网络模型的安全性,并通过电磁干扰和激光干扰来模拟硬件干扰的影响。
  • methods: 该论文使用了两种干扰方式,电磁干扰和激光干扰,并在Cortex M4 32位微控制器平台上进行了实验。而不同于大多数现有的内部参数或输入值修改方法,该论文的目标是通过控制流指令跳过来模拟内存干扰的影响。
  • results: 该论文发现了一些修改攻击的潜在威胁,可以让攻击者通过修改神经网络模型的控制流来改变模型的预测结果,并且可以根据不同的恶意目标来选择合适的攻击方法。
    Abstract With the large-scale integration and use of neural network models, especially in critical embedded systems, their security assessment to guarantee their reliability is becoming an urgent need. More particularly, models deployed in embedded platforms, such as 32-bit microcontrollers, are physically accessible by adversaries and therefore vulnerable to hardware disturbances. We present the first set of experiments on the use of two fault injection means, electromagnetic and laser injections, applied on neural networks models embedded on a Cortex M4 32-bit microcontroller platform. Contrary to most of state-of-the-art works dedicated to the alteration of the internal parameters or input values, our goal is to simulate and experimentally demonstrate the impact of a specific fault model that is instruction skip. For that purpose, we assessed several modification attacks on the control flow of a neural network inference. We reveal integrity threats by targeting several steps in the inference program of typical convolutional neural network models, which may be exploited by an attacker to alter the predictions of the target models with different adversarial goals.
    摘要 随着神经网络模型的大规模集成和应用,特别是在关键附加系统中,其安全评估已成为紧迫需要。更specifically, deployed in embedded platforms, such as 32-bit microcontrollers, are physically accessible by adversaries and therefore vulnerable to hardware disturbances. We present the first set of experiments on the use of two fault injection means, electromagnetic and laser injections, applied on neural network models embedded on a Cortex M4 32-bit microcontroller platform. Contrary to most of state-of-the-art works dedicated to the alteration of the internal parameters or input values, our goal is to simulate and experimentally demonstrate the impact of a specific fault model that is instruction skip. For that purpose, we assessed several modification attacks on the control flow of a neural network inference. We reveal integrity threats by targeting several steps in the inference program of typical convolutional neural network models, which may be exploited by an attacker to alter the predictions of the target models with different adversarial goals.Here's the text with some additional information about the translation:I translated the text into Simplified Chinese, which is the most widely used standard for Chinese writing. I tried to preserve the original meaning and structure of the text as much as possible, while also making it more fluent and natural-sounding in Chinese.Some notes on the translation:* "附加系统" (fùjí systems) is used to refer to "embedded systems" or "critical embedded systems" in Chinese.* "神经网络模型" (shénxīn wǎngluò módelì) is used to refer to "neural network models" in Chinese.* "instruction skip" is translated as "指令跳过" (fùjì skīp) in Chinese.* "modification attacks" is translated as "修改攻击" (xiūgòu hángchè) in Chinese.* "integrity threats" is translated as "完整性威胁" (wánzhèngxìng wēidāi) in Chinese.I hope this helps! Let me know if you have any further questions or if you need any additional assistance.

Developing a Scalable Benchmark for Assessing Large Language Models in Knowledge Graph Engineering

  • paper_url: http://arxiv.org/abs/2308.16622
  • repo_url: None
  • paper_authors: Lars-Peter Meyer, Johannes Frey, Kurt Junghanns, Felix Brei, Kirill Bulert, Sabine Gründer-Fahrer, Michael Martin
  • for: 本研究旨在评估和监测大语言模型(LLMs)的性能,特别是在知识图工程(KGE)领域。
  • methods: 本研究提出了一个基准框架,包括三个挑战,用于测试LLMs的 sintaxis和错误 corrections、事实提取和数据集生成能力。
  • results: 研究发现,当使用零 shot 提示时,LLMs 对知识图生成仍然不具备能力,因此提出了一个LLM-KG-Bench框架,用于自动评估和存储 LLM 响应,以及统计数据和视觉化工具来支持提问工程和模型性能跟踪。
    Abstract As the field of Large Language Models (LLMs) evolves at an accelerated pace, the critical need to assess and monitor their performance emerges. We introduce a benchmarking framework focused on knowledge graph engineering (KGE) accompanied by three challenges addressing syntax and error correction, facts extraction and dataset generation. We show that while being a useful tool, LLMs are yet unfit to assist in knowledge graph generation with zero-shot prompting. Consequently, our LLM-KG-Bench framework provides automatic evaluation and storage of LLM responses as well as statistical data and visualization tools to support tracking of prompt engineering and model performance.
    摘要 随着大语言模型(LLMs)的发展速度加剧,评估和监测其性能的需求日益突出。我们提出了一个专注于知识图工程(KGE)的 benchmarcking 框架,并提出了三个挑战,其中一个是语法和错误修正,另外两个是事实提取和数据集生成。我们发现,虽然 LLMS 是一个有用的工具,但它们无法在零shot提示下帮助知识图生成。因此,我们的 LLM-KG-Bench 框架提供了自动评估和存储 LLMS 回应,以及统计数据和可视化工具,以支持提问工程和模型性能追踪。

High Accuracy Location Information Extraction from Social Network Texts Using Natural Language Processing

  • paper_url: http://arxiv.org/abs/2308.16615
  • repo_url: None
  • paper_authors: Lossan Bonde, Severin Dembele
  • for: 这篇论文是为了预测恐怖活动的目的。
  • methods: 这篇论文使用了社交媒体文本来提取必要的信息,以建立一个适合的数据集来预测恐怖活动。
  • results: 实验表明,现有的解决方案具有低精度,而我们的解决方案可以准确地识别地点信息。
    Abstract Terrorism has become a worldwide plague with severe consequences for the development of nations. Besides killing innocent people daily and preventing educational activities from taking place, terrorism is also hindering economic growth. Machine Learning (ML) and Natural Language Processing (NLP) can contribute to fighting terrorism by predicting in real-time future terrorist attacks if accurate data is available. This paper is part of a research project that uses text from social networks to extract necessary information to build an adequate dataset for terrorist attack prediction. We collected a set of 3000 social network texts about terrorism in Burkina Faso and used a subset to experiment with existing NLP solutions. The experiment reveals that existing solutions have poor accuracy for location recognition, which our solution resolves. We will extend the solution to extract dates and action information to achieve the project's goal.
    摘要 恐怖主义已成为全球的恶性疾病,对国家发展造成严重的影响。除了每天杀害无辜的人和破坏教育活动外,恐怖主义还妨碍经济增长。机器学习(ML)和自然语言处理(NLP)可以帮助斗争恐怖主义,预测未来恐怖袭击的可能性,只要有准确的数据。这篇论文是一项研究项目的一部分,使用社交媒体文本提取必要的信息建立恐怖袭击预测数据集。我们收集了3000个社交媒体文本关于恐怖主义在布基纳法索的样本,使用一个子集进行了现有NLP解决方案的实验。实验表明,现有的解决方案在位置识别方面有较差的准确率,我们的解决方案可以解决这个问题。我们将延续解决方案,以提取日期和动作信息,实现项目的目标。

Towards Long-Tailed Recognition for Graph Classification via Collaborative Experts

  • paper_url: http://arxiv.org/abs/2308.16609
  • repo_url: None
  • paper_authors: Siyu Yi, Zhengyang Mao, Wei Ju, Yongdao Zhou, Luchen Liu, Xiao Luo, Ming Zhang
  • for: 本文旨在为 Graf 级别分类提供有效的分类器,以掌握 Graph 级别的表示,并且在长尾分布的 Graph 数据上进行分类。
  • methods: 本文提出了一种基于多特效学习的长尾 Graph 级别分类框架,包括对均衡表示学习和分类器训练、硬件分类минning、灵活的权重融合和知识分离等方法。
  • results: 根据七个广泛使用的 benchmark 数据集的实验结果,我们的方法 CoMe 在与基eline比较的情况下显示出了superiority,并且在长尾分布下进行 Graph 级别分类时表现出了优异的效果。
    Abstract Graph classification, aiming at learning the graph-level representations for effective class assignments, has received outstanding achievements, which heavily relies on high-quality datasets that have balanced class distribution. In fact, most real-world graph data naturally presents a long-tailed form, where the head classes occupy much more samples than the tail classes, it thus is essential to study the graph-level classification over long-tailed data while still remaining largely unexplored. However, most existing long-tailed learning methods in visions fail to jointly optimize the representation learning and classifier training, as well as neglect the mining of the hard-to-classify classes. Directly applying existing methods to graphs may lead to sub-optimal performance, since the model trained on graphs would be more sensitive to the long-tailed distribution due to the complex topological characteristics. Hence, in this paper, we propose a novel long-tailed graph-level classification framework via Collaborative Multi-expert Learning (CoMe) to tackle the problem. To equilibrate the contributions of head and tail classes, we first develop balanced contrastive learning from the view of representation learning, and then design an individual-expert classifier training based on hard class mining. In addition, we execute gated fusion and disentangled knowledge distillation among the multiple experts to promote the collaboration in a multi-expert framework. Comprehensive experiments are performed on seven widely-used benchmark datasets to demonstrate the superiority of our method CoMe over state-of-the-art baselines.
    摘要 GRAPH classification, aiming at learning the graph-level representations for effective class assignments, has received outstanding achievements, which heavily relies on high-quality datasets that have balanced class distribution. In fact, most real-world graph data naturally presents a long-tailed form, where the head classes occupy much more samples than the tail classes, it thus is essential to study the graph-level classification over long-tailed data while still remaining largely unexplored. However, most existing long-tailed learning methods in visions fail to jointly optimize the representation learning and classifier training, as well as neglect the mining of the hard-to-classify classes. Directly applying existing methods to graphs may lead to sub-optimal performance, since the model trained on graphs would be more sensitive to the long-tailed distribution due to the complex topological characteristics. Hence, in this paper, we propose a novel long-tailed graph-level classification framework via Collaborative Multi-expert Learning (CoMe) to tackle the problem. To equilibrate the contributions of head and tail classes, we first develop balanced contrastive learning from the view of representation learning, and then design an individual-expert classifier training based on hard class mining. In addition, we execute gated fusion and disentangled knowledge distillation among the multiple experts to promote the collaboration in a multi-expert framework. Comprehensive experiments are performed on seven widely-used benchmark datasets to demonstrate the superiority of our method CoMe over state-of-the-art baselines.

The Quest of Finding the Antidote to Sparse Double Descent

  • paper_url: http://arxiv.org/abs/2308.16596
  • repo_url: None
  • paper_authors: Victor Quétu, Marta Milovanović
  • for: 本文目的是找到深度学习模型的优化大小,以提高性能并避免权重递减现象。
  • methods: 本文使用了一种简单的 $\ell_2$ 正则化方法和知识整合学习方法来解决权重递减现象。
  • results: 实验结果表明,使用这种方法可以避免权重递减现象,并且可以在图像分类任务中 достичь更好的性能。
    Abstract In energy-efficient schemes, finding the optimal size of deep learning models is very important and has a broad impact. Meanwhile, recent studies have reported an unexpected phenomenon, the sparse double descent: as the model's sparsity increases, the performance first worsens, then improves, and finally deteriorates. Such a non-monotonic behavior raises serious questions about the optimal model's size to maintain high performance: the model needs to be sufficiently over-parametrized, but having too many parameters wastes training resources. In this paper, we aim to find the best trade-off efficiently. More precisely, we tackle the occurrence of the sparse double descent and present some solutions to avoid it. Firstly, we show that a simple $\ell_2$ regularization method can help to mitigate this phenomenon but sacrifices the performance/sparsity compromise. To overcome this problem, we then introduce a learning scheme in which distilling knowledge regularizes the student model. Supported by experimental results achieved using typical image classification setups, we show that this approach leads to the avoidance of such a phenomenon.
    摘要 在能效学习方案中,发现优化模型的大小非常重要,它会对性能产生广泛的影响。然而,最近的研究发现了一种意外现象:随着模型的稀疏性增加,性能首先恶化,然后改善,最后恶化。这种非 monotonic 的行为引发了优化模型大小以保持高性能的严重问题:模型需要充分过 parametrization,但过多的参数会浪费训练资源。在这篇文章中,我们目标是寻找最佳的平衡,更具体地说,我们解决 sparse double descent 现象,并提供一些解决方案。首先,我们表明了一种简单的 $\ell_2$ 正则化方法可以减轻这种现象,但是这会牺牲性能/稀疏性的权衡。为了解决这个问题,我们然后引入一种知识整合学习方法,通过这种方法,学生模型可以从导师模型中学习知识。通过实验结果,我们显示了这种方法可以避免 sparse double descent 现象。

CL-MAE: Curriculum-Learned Masked Autoencoders

  • paper_url: http://arxiv.org/abs/2308.16572
  • repo_url: None
  • paper_authors: Neelu Madan, Nicolae-Catalin Ristea, Kamal Nasrollahi, Thomas B. Moeslund, Radu Tudor Ionescu
  • for: 提高自我超vised学习的表示学习能力
  • methods: 使用curriculum学习方法,逐渐增加masking策略的复杂度,从而训练模型学习更加复杂和可传播的表示
  • results: 训练CL-MAE模型在ImageNet上,并在五个下游任务上显示出优于MAE模型的表示学习能力
    Abstract Masked image modeling has been demonstrated as a powerful pretext task for generating robust representations that can be effectively generalized across multiple downstream tasks. Typically, this approach involves randomly masking patches (tokens) in input images, with the masking strategy remaining unchanged during training. In this paper, we propose a curriculum learning approach that updates the masking strategy to continually increase the complexity of the self-supervised reconstruction task. We conjecture that, by gradually increasing the task complexity, the model can learn more sophisticated and transferable representations. To facilitate this, we introduce a novel learnable masking module that possesses the capability to generate masks of different complexities, and integrate the proposed module into masked autoencoders (MAE). Our module is jointly trained with the MAE, while adjusting its behavior during training, transitioning from a partner to the MAE (optimizing the same reconstruction loss) to an adversary (optimizing the opposite loss), while passing through a neutral state. The transition between these behaviors is smooth, being regulated by a factor that is multiplied with the reconstruction loss of the masking module. The resulting training procedure generates an easy-to-hard curriculum. We train our Curriculum-Learned Masked Autoencoder (CL-MAE) on ImageNet and show that it exhibits superior representation learning capabilities compared to MAE. The empirical results on five downstream tasks confirm our conjecture, demonstrating that curriculum learning can be successfully used to self-supervise masked autoencoders.
    摘要 马SK模型(Masked Image Modeling)已经证明是一种强大的预tex task,可以生成可以广泛应用的多个下游任务中的稳定表示。通常,这种方法 involve randomly masking patches( tokens)在输入图像中,并且masking策略在训练过程中保持不变。在这篇论文中,我们提议了一种学习级 curriculum learningapproach,即在训练过程中不断更新masking策略,以增加自我超vised reconstruction任务的复杂性。我们 conjecture that, by gradually increasing the task complexity, the model can learn more sophisticated and transferable representations。为了实现这一目标,我们提出了一种新的可学习的masking模块,具有可以生成不同复杂性的masks的能力。我们将这个模块 integrate into masked autoencoders (MAE),并在训练过程中jointly train它。在训练过程中,我们将masking模块的行为逐渐从MAE的合作者(同样optimize reconstruction loss)转化为对手(optimize opposite loss),而在过渡过程中,masking模块的行为会随着一个因子的multiplication,以控制masking模块的权重。这种过渡过程是平滑的,使得我们可以通过训练程序来生成一个易于增加的curriculum。我们在ImageNet上训练了我们的Curriculum-Learned Masked Autoencoder (CL-MAE),并证明了它在多个下游任务中表现出了superior representation learning capabilities。empirical results on five downstream tasks confirm our conjecture, demonstrating that curriculum learning can be successfully used to self-supervise masked autoencoders。

The Power of MEME: Adversarial Malware Creation with Model-Based Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2308.16562
  • repo_url: https://github.com/stratosphereips/meme_malware_rl
  • paper_authors: Maria Rigaki, Sebastian Garcia
  • for: This paper is written for researchers and practitioners in the field of malware detection and defense, particularly those interested in the use of machine learning and automation for malware detection.
  • methods: The paper proposes a new algorithm called MEME (Malware Evasion and Model Extraction) attacks, which combines model-based reinforcement learning and adversarial modification of Windows executable binary samples to evade malware detection.
  • results: The paper evaluates the MEME algorithm against two state-of-the-art attacks in adversarial malware creation and shows that MEME outperforms the state-of-the-art methods in terms of evasion capabilities, producing evasive malware with an evasion rate in the range of 32-73%. The paper also shows that the surrogate models produced by MEME have a high agreement with the target models, with a prediction label agreement between 97-99%.
    Abstract Due to the proliferation of malware, defenders are increasingly turning to automation and machine learning as part of the malware detection tool-chain. However, machine learning models are susceptible to adversarial attacks, requiring the testing of model and product robustness. Meanwhile, attackers also seek to automate malware generation and evasion of antivirus systems, and defenders try to gain insight into their methods. This work proposes a new algorithm that combines Malware Evasion and Model Extraction (MEME) attacks. MEME uses model-based reinforcement learning to adversarially modify Windows executable binary samples while simultaneously training a surrogate model with a high agreement with the target model to evade. To evaluate this method, we compare it with two state-of-the-art attacks in adversarial malware creation, using three well-known published models and one antivirus product as targets. Results show that MEME outperforms the state-of-the-art methods in terms of evasion capabilities in almost all cases, producing evasive malware with an evasion rate in the range of 32-73%. It also produces surrogate models with a prediction label agreement with the respective target models between 97-99%. The surrogate could be used to fine-tune and improve the evasion rate in the future.
    摘要 (Simplified Chinese translation)由于恶意软件的普及,防御者正在不断地启用自动化和机器学习作为恶意软件检测工具链的一部分。然而,机器学习模型受到了针对性攻击的威胁,需要测试模型和产品的可靠性。同时,攻击者也尝试自动生成恶意软件和绕过安全软件系统,防御者则尝试了解他们的方法。这个工作提出了一个新的算法,它将恶意软件逃脱和模型提取(MEME)攻击组合起来。MEME使用基于模型的强化学习来对Windows执行文件binary样本进行针对性修改,同时培养一个与目标模型具有高协调的副模型。为了评估这种方法,我们与三个公开发布的模型和一个安全产品作为目标进行比较。结果显示,MEME在逃脱能力方面与状态法比较,生成了97-99%的预测标签一致的副模型,并且生成了32-73%的逃脱率。这些副模型可以用来练化和改进逃脱率的未来。

On a Connection between Differential Games, Optimal Control, and Energy-based Models for Multi-Agent Interactions

  • paper_url: http://arxiv.org/abs/2308.16539
  • repo_url: None
  • paper_authors: Christopher Diehl, Tobias Klosek, Martin Krüger, Nils Murzyn, Torsten Bertram
  • for: This paper is written for modeling multi-agent interactions in real-world robotics applications using game theory.
  • methods: The paper uses a combination of differential games, optimal control, and energy-based models to address challenges in applying game theory to real-world robotics.
  • results: The paper introduces a new end-to-end learning application that combines neural networks for game-parameter inference with a differentiable game-theoretic optimization layer, and demonstrates empirical evidence that the game-theoretic layer improves the predictive performance of various neural network backbones using simulated mobile robot pedestrian interactions and real-world automated driving data.Here is the same information in Simplified Chinese text:
  • for: 这篇论文是为了模型多智能体交互在现实世界机器人应用中使用游戏理论。
  • methods: 这篇论文使用了分析游戏、优化控制和能量基本模型来解决在实际世界机器人应用中应用游戏理论的挑战。
  • results: 这篇论文介绍了一种新的综合学习应用程序,将神经网络用于游戏参数推理和可微游戏理论优化层,并通过模拟移动机器人人行交互和实际自动驾驶数据进行了实验,证明了游戏理论层可以提高各种神经网络背景的预测性能。
    Abstract Game theory offers an interpretable mathematical framework for modeling multi-agent interactions. However, its applicability in real-world robotics applications is hindered by several challenges, such as unknown agents' preferences and goals. To address these challenges, we show a connection between differential games, optimal control, and energy-based models and demonstrate how existing approaches can be unified under our proposed Energy-based Potential Game formulation. Building upon this formulation, this work introduces a new end-to-end learning application that combines neural networks for game-parameter inference with a differentiable game-theoretic optimization layer, acting as an inductive bias. The experiments using simulated mobile robot pedestrian interactions and real-world automated driving data provide empirical evidence that the game-theoretic layer improves the predictive performance of various neural network backbones.
    摘要

The AI Revolution: Opportunities and Challenges for the Finance Sector

  • paper_url: http://arxiv.org/abs/2308.16538
  • repo_url: None
  • paper_authors: Carsten Maple, Lukasz Szpruch, Gregory Epiphaniou, Kalina Staykova, Simran Singh, William Penwarden, Yisi Wen, Zijian Wang, Jagdish Hariharan, Pavle Avramovic
  • for: 本研究探讨了人工智能(AI)在金融领域的应用,描述了其可能性,并讨论了其挑战。
  • methods: 本研究使用了多种方法,包括客户服务改进、诈骗检测、风险管理和信贷评估等。
  • results: 本研究发现,AI在金融领域的应用可以提高客户服务质量、提高风险管理和信贷评估等方面的效率,但同时也存在许多挑战,如透明度、解释性、公平性和信任worthiness等问题。
    Abstract This report examines Artificial Intelligence (AI) in the financial sector, outlining its potential to revolutionise the industry and identify its challenges. It underscores the criticality of a well-rounded understanding of AI, its capabilities, and its implications to effectively leverage its potential while mitigating associated risks. The potential of AI potential extends from augmenting existing operations to paving the way for novel applications in the finance sector. The application of AI in the financial sector is transforming the industry. Its use spans areas from customer service enhancements, fraud detection, and risk management to credit assessments and high-frequency trading. However, along with these benefits, AI also presents several challenges. These include issues related to transparency, interpretability, fairness, accountability, and trustworthiness. The use of AI in the financial sector further raises critical questions about data privacy and security. A further issue identified in this report is the systemic risk that AI can introduce to the financial sector. Being prone to errors, AI can exacerbate existing systemic risks, potentially leading to financial crises. Regulation is crucial to harnessing the benefits of AI while mitigating its potential risks. Despite the global recognition of this need, there remains a lack of clear guidelines or legislation for AI use in finance. This report discusses key principles that could guide the formation of effective AI regulation in the financial sector, including the need for a risk-based approach, the inclusion of ethical considerations, and the importance of maintaining a balance between innovation and consumer protection. The report provides recommendations for academia, the finance industry, and regulators.
    摘要 AI has the potential to transform the financial sector, with applications in customer service, fraud detection, risk management, credit assessments, and high-frequency trading. However, AI also raises several challenges, including issues related to transparency, interpretability, fairness, accountability, and trustworthiness. Additionally, the use of AI in the financial sector raises critical questions about data privacy and security.The report also highlights the systemic risk that AI can introduce to the financial sector, as it can exacerbate existing systemic risks and potentially lead to financial crises. To address these risks, the report proposes key principles for effective AI regulation in the financial sector, including a risk-based approach, ethical considerations, and a balance between innovation and consumer protection.The report provides recommendations for academia, the finance industry, and regulators, emphasizing the need for a comprehensive understanding of AI's potential and challenges to ensure the responsible use of AI in the financial sector.

Conditioning Score-Based Generative Models by Neuro-Symbolic Constraints

  • paper_url: http://arxiv.org/abs/2308.16534
  • repo_url: None
  • paper_authors: Davide Scassola, Sebastiano Saccani, Ginevra Carbone, Luca Bortolussi
  • for: 该论文旨在提出一种不需要额外训练的方法,可以从conditionale的score-based生成模型中随机抽取符合用户定义的逻辑约束的样本。
  • methods: 该方法首先解释了如何使用学习得到的分数来随机抽取不归一化分布的样本,然后定义了一种灵活且数字化的符号逻辑框架,用于编码软逻辑约束。最后,该方法结合了这两个元素,实现了一种通用但是近似的随机抽取算法。
  • results: 该论文通过对各种约束和数据进行实验,包括表格数据、图像和时间序列,证明了该方法的有效性。
    Abstract Score-based and diffusion models have emerged as effective approaches for both conditional and unconditional generation. Still conditional generation is based on either a specific training of a conditional model or classifier guidance, which requires training a noise-dependent classifier, even when the classifier for uncorrupted data is given. We propose an approach to sample from unconditional score-based generative models enforcing arbitrary logical constraints, without any additional training. Firstly, we show how to manipulate the learned score in order to sample from an un-normalized distribution conditional on a user-defined constraint. Then, we define a flexible and numerically stable neuro-symbolic framework for encoding soft logical constraints. Combining these two ingredients we obtain a general, but approximate, conditional sampling algorithm. We further developed effective heuristics aimed at improving the approximation. Finally, we show the effectiveness of our approach for various types of constraints and data: tabular data, images and time series.
    摘要 Score-based和扩散模型已成为条件和无条件生成的有效方法。然而,条件生成仍基于 Either a specific training of a conditional model or classifier guidance,需要训练一个受噪声依赖的分类器,即使给定了对不受扰干的数据的分类器。我们提出一种方法,可以从无条件分数基的生成模型中采样,不需要任何额外训练。我们首先示出如何 manipulate the learned score,以采样从一个未 норmal化的分布,条件于用户定义的约束。然后,我们定义了一种灵活且数值稳定的神经符号学框架,用于编码软逻辑约束。将这两个元素组合起来,我们得到一种通用的,但是近似的条件采样算法。我们进一步开发了有效的规则,以改进近似。最后,我们证明了我们的方法对各种约束和数据类型(表格数据、图像和时间序列)具有效果。

Developing Social Robots with Empathetic Non-Verbal Cues Using Large Language Models

  • paper_url: http://arxiv.org/abs/2308.16529
  • repo_url: None
  • paper_authors: Yoon Kyung Lee, Yoonwon Jung, Gyuyi Kang, Sowon Hahn
  • For: The paper aims to enhance the empathetic capacities of social robots by incorporating non-verbal cues.* Methods: The authors use a Large Language Model (LLM) to generate four types of empathetic non-verbal cues (Speech, Action, Facial expression, and Emotion) in a social robot.* Results: The preliminary results show that the robot is able to recognize and respond to social cues, such as nodding gestures and positive emotions, in a more authentic and context-aware manner.Here are the three key points in Simplified Chinese:* For: 增强社交机器人的共鸣能力,通过 integrate 非语言价值。* Methods: 使用 Large Language Model (LLM) 设计并标注四种共鸣非语言价值(Speech、Action、Facial expression、Emotion),并将其应用于社交机器人。* Results: 初步结果表明,机器人能够识别和响应社交价值,如护拍姿势和积极情感,以更加authentic和上下文感知的方式。
    Abstract We propose augmenting the empathetic capacities of social robots by integrating non-verbal cues. Our primary contribution is the design and labeling of four types of empathetic non-verbal cues, abbreviated as SAFE: Speech, Action (gesture), Facial expression, and Emotion, in a social robot. These cues are generated using a Large Language Model (LLM). We developed an LLM-based conversational system for the robot and assessed its alignment with social cues as defined by human counselors. Preliminary results show distinct patterns in the robot's responses, such as a preference for calm and positive social emotions like 'joy' and 'lively', and frequent nodding gestures. Despite these tendencies, our approach has led to the development of a social robot capable of context-aware and more authentic interactions. Our work lays the groundwork for future studies on human-robot interactions, emphasizing the essential role of both verbal and non-verbal cues in creating social and empathetic robots.
    摘要 我们提议通过 интеграción非语言cue来增强社交机器人的共鸣能力。我们的主要贡献是设计和标签四种共鸣非语言cue,简称为SAFE:语音、动作(姿势)、 facial expression 和情感,在社交机器人中。这些cue使用大自然语言模型(LLM)生成。我们开发了基于LLM的对话系统,并评估了人工辅导员定义的社交cue的对应关系。初步结果表明机器人的回应存在明显的偏好,如宁静和积极社交情感如“喜悦”和“活泼”,以及频繁的头部 nodding 动作。尽管如此,我们的方法已经导致了一个Context-aware的社交机器人,可以进行更加authentic的互动。我们的工作为未来人机交互研究提供了基础,强调语言和非语言cue在创造社交和共鸣机器人中的重要性。

Curvature-based Pooling within Graph Neural Networks

  • paper_url: http://arxiv.org/abs/2308.16516
  • repo_url: https://gitlab.com/cedric_sanders/masterarbeit
  • paper_authors: Cedric Sanders, Andreas Roth, Thomas Liebig
  • for: 本文旨在提高图 neural network (GNN) 的能力,解决图学习中的过拟合和过缩减问题。
  • methods: 本文提出了一种新的池化方法叫做 CurvPool,它利用图的曲率特征来自适应地选择结构,以减少过拟合和过缩减问题。
  • results: 比较 experiment 表明,CurvPool 在图分类任务中表现出色,其精度高于其他相关方法,并且具有更好的计算复杂性和灵活性。
    Abstract Over-squashing and over-smoothing are two critical issues, that limit the capabilities of graph neural networks (GNNs). While over-smoothing eliminates the differences between nodes making them indistinguishable, over-squashing refers to the inability of GNNs to propagate information over long distances, as exponentially many node states are squashed into fixed-size representations. Both phenomena share similar causes, as both are largely induced by the graph topology. To mitigate these problems in graph classification tasks, we propose CurvPool, a novel pooling method. CurvPool exploits the notion of curvature of a graph to adaptively identify structures responsible for both over-smoothing and over-squashing. By clustering nodes based on the Balanced Forman curvature, CurvPool constructs a graph with a more suitable structure, allowing deeper models and the combination of distant information. We compare it to other state-of-the-art pooling approaches and establish its competitiveness in terms of classification accuracy, computational complexity, and flexibility. CurvPool outperforms several comparable methods across all considered tasks. The most consistent results are achieved by pooling densely connected clusters using the sum aggregation, as this allows additional information about the size of each pool.
    摘要 Over-squashing和over-smoothing是两个关键问题,它们限制了图神经网络(GNN)的能力。而over-smoothing使得节点变得无法分辨,而over-squashing则是GNN无法在长距离传播信息的问题,这两个问题都是由图 topology引起的。为了解决这些问题在图分类任务中,我们提出了CurvPool,一种新的池化方法。CurvPool利用图的 curvature来自适应地识别导致over-smoothing和over-squashing的结构。通过基于Balanced Forman curvature的归一化,CurvPool将节点分组成更适合的结构,以便 deeper models和融合远程信息。我们与其他当前最佳池化方法进行比较,并证明CurvPool在分类精度、计算复杂度和灵活性方面具有竞争力。CurvPool在所有考虑的任务中表现出了最佳的结果,并且在使用积加聚合 pooling densely connected clusters时,可以获得更加稳定的结果,因为这种方法可以提供更多关于pool size的信息。

Recommender AI Agent: Integrating Large Language Models for Interactive Recommendations

  • paper_url: http://arxiv.org/abs/2308.16505
  • repo_url: None
  • paper_authors: Xu Huang, Jianxun Lian, Yuxuan Lei, Jing Yao, Defu Lian, Xing Xie
  • for: 本研究旨在结合推荐模型和大语言模型(LLM)创造一个多功能且交互的推荐系统,以提高推荐系统的功能和用户体验。
  • methods: 本研究使用了LLM作为智能核心,并结合了多种推荐模型作为工具,以实现交互式推荐。研究提出了一个有效的框架 named RecAgent,并实现了一个简单的工作流程,包括内存总线、动态示范增强任务规划和反射。
  • results: 实验结果表明,RecAgent在多个公共数据集上实现了满意的对话式推荐系统性能,比较于通用的LLM更高。
    Abstract Recommender models excel at providing domain-specific item recommendations by leveraging extensive user behavior data. Despite their ability to act as lightweight domain experts, they struggle to perform versatile tasks such as providing explanations and engaging in conversations. On the other hand, large language models (LLMs) represent a significant step towards artificial general intelligence, showcasing remarkable capabilities in instruction comprehension, commonsense reasoning, and human interaction. However, LLMs lack the knowledge of domain-specific item catalogs and behavioral patterns, particularly in areas that diverge from general world knowledge, such as online e-commerce. Finetuning LLMs for each domain is neither economic nor efficient. In this paper, we bridge the gap between recommender models and LLMs, combining their respective strengths to create a versatile and interactive recommender system. We introduce an efficient framework called RecAgent, which employs LLMs as the brain and recommender models as tools. We first outline a minimal set of essential tools required to transform LLMs into RecAgent. We then propose an efficient workflow within RecAgent for task execution, incorporating key components such as a memory bus, dynamic demonstration-augmented task planning, and reflection. RecAgent enables traditional recommender systems, such as those ID-based matrix factorization models, to become interactive systems with a natural language interface through the integration of LLMs. Experimental results on several public datasets show that RecAgent achieves satisfying performance as a conversational recommender system, outperforming general-purpose LLMs.
    摘要 <> translate into Simplified Chinese推荐模型在域pecific的物品推荐方面表现出色,通过对用户行为数据的抓取和分析而成为轻量级域专家。然而,它们在提供解释和进行对话方面存在限制,而大语言模型(LLM)则在指令理解、通用常识和人机交互方面表现惊人。然而,LLM缺乏域pecific的物品目录和用户行为模式的知识,特别是在与普通世界知识的领域相互独立的情况下。不得 économic nor efficient 的方式finetuning LLMs for each domain。 在这篇论文中,我们将推荐模型和LLM之间的差距bridged,将它们的优势相互结合,创造一个多才多艺的和交互的推荐系统。我们提出了一个效率的框架called RecAgent,其中LLMs acts as the brain,推荐模型 acts as tools。我们首先列出了将LLMs转化为RecAgent所需的最小必备工具。然后,我们提出了RecAgent中任务执行的高效工作流程,包括内存总线、动态示范增强任务规划和反思。RecAgent使得传统的推荐系统,如ID基于的矩阵因子化模型,成为了交互系统,通过与LLMs的集成,并通过自然语言界面提供了对用户的交互。我们在一些公共数据集上进行了实验,结果表明,RecAgent在对话推荐系统方面表现满意,比通用的LLMs更好。Note: The translation is done using a machine translation tool, and may not be perfect. Please let me know if you need any further assistance.

Individually Rational Collaborative Vehicle Routing through Give-And-Take Exchanges

  • paper_url: http://arxiv.org/abs/2308.16501
  • repo_url: None
  • paper_authors: Paul Mingzheng Tang, Ba Phong Tran, Hoong Chuin Lau
  • For: 本研究旨在自动化物流公司间的订单交易,以便最大化总收益。* Methods: 我们提出了一种多代理方法,将内给运输问题转化为协力运输问题(CVRP),并运用单位运输问题(VRP)的原则来对两辆车的组合进行优化。我们的算法考虑了标准VRP的约束和个人合理性约束,并通过帮助竞争的物流代理人实现协力,以获得更好的总路线和系统效率。* Results: 我们透过实际测试使用重要物流公司的数据,证明了我们的算法能够快速获得许多优化的解,强调了它的实际应用性和可能性Transform the logistics industry。
    Abstract In this paper, we are concerned with the automated exchange of orders between logistics companies in a marketplace platform to optimize total revenues. We introduce a novel multi-agent approach to this problem, focusing on the Collaborative Vehicle Routing Problem (CVRP) through the lens of individual rationality. Our proposed algorithm applies the principles of Vehicle Routing Problem (VRP) to pairs of vehicles from different logistics companies, optimizing the overall routes while considering standard VRP constraints plus individual rationality constraints. By facilitating cooperation among competing logistics agents through a Give-and-Take approach, we show that it is possible to reduce travel distance and increase operational efficiency system-wide. More importantly, our approach ensures individual rationality and faster convergence, which are important properties of ensuring the long-term sustainability of the marketplace platform. We demonstrate the efficacy of our approach through extensive experiments using real-world test data from major logistics companies. The results reveal our algorithm's ability to rapidly identify numerous optimal solutions, underscoring its practical applicability and potential to transform the logistics industry.
    摘要 在这篇论文中,我们关注了市场平台上的物流公司之间自动订单交换以优化总收益。我们提出了一种新的多代理模型,通过对协同车辆Routing问题(CVRP)进行定点剖析,以实现个体合理性。我们的提议的算法运用了汽车Routing问题(VRP)的原则,对不同物流公司的车辆对应的对应,优化总路径,同时考虑标准VRP约束以及个体合理性约束。通过在竞争物流代理之间促进合作,我们采用了“给与take”方法,从而减少旅行距离,提高系统综合效率。更重要的是,我们的方法保证了个体合理性和快速收敛,这些性质对于长期稳定性的市场平台是非常重要。我们通过使用实际的物流公司数据进行广泛的实验,证明了我们的算法的实用性和可能性。结果表明,我们的算法能够快速发现许多优化解决方案,这些解决方案在实际应用中具有实际意义和潜在的变革力。

Generalised Winograd Schema and its Contextuality

  • paper_url: http://arxiv.org/abs/2308.16498
  • repo_url: None
  • paper_authors: Kin Ian Lo, Mehrnoosh Sadrzadeh, Shane Mansfield
  • for: 这篇论文的目的是研究语言ambiguity和量子上下文性之间的关系。
  • methods: 该论文使用了sheaf-theoretic模型来研究语言ambiguity,并在Winograd schema中实验了量子上下文性。
  • results: 该研究发现,通过模拟Winograd schema的量子物理实验,可以观察到语言ambiguity中的量子上下文性。此外,该研究还发现了一种新的机制来扩展Winograd schema,使其更能模拟人类的理解。
    Abstract Ambiguities in natural language give rise to probability distributions over interpretations. The distributions are often over multiple ambiguous words at a time; a multiplicity which makes them a suitable topic for sheaf-theoretic models of quantum contextuality. Previous research showed that different quantitative measures of contextuality correlate well with Psycholinguistic research on lexical ambiguities. In this work, we focus on coreference ambiguities and investigate the Winograd Schema Challenge (WSC), a test proposed by Levesque in 2011 to evaluate the intelligence of machines. The WSC consists of a collection of multiple-choice questions that require disambiguating pronouns in sentences structured according to the Winograd schema, in a way that makes it difficult for machines to determine the correct referents but remains intuitive for human comprehension. In this study, we propose an approach that analogously models the Winograd schema as an experiment in quantum physics. However, we argue that the original Winograd Schema is inherently too simplistic to facilitate contextuality. We introduce a novel mechanism for generalising the schema, rendering it analogous to a Bell-CHSH measurement scenario. We report an instance of this generalised schema, complemented by the human judgements we gathered via a crowdsourcing platform. The resulting model violates the Bell-CHSH inequality by 0.192, thus exhibiting contextuality in a coreference resolution setting.
    摘要 自然语言中的歧义给出了概率分布 над interpretations。这些分布通常包括多个歧义词的时候; 这种多样性使得它们成为量子上下文uality的适当主题。过去的研究表明了不同的量化contextuality推量well with Psycholinguistic research on lexical ambiguities。在这项工作中,我们关注核心引用ambiguities和 investigate the Winograd Schema Challenge (WSC), proposed by Levesque in 2011 to evaluate the intelligence of machines. WSC consists of a collection of multiple-choice questions that require disambiguating pronouns in sentences structured according to the Winograd schema, making it difficult for machines to determine the correct referents but remains intuitive for human comprehension.在这项工作中,我们提出了一种方法,即模拟Winograd schema为量子物理实验。然而,我们认为原始的Winograd schema是太简单,无法促进上下文uality。我们引入了一种新的机制,使得Winograd schema可以扩展,类似于Bell-CHSH测量场景。我们报道了这个扩展的schema,并通过一个人类判断平台收集了数据。得到的模型违反了Bell-CHSH不等式by 0.192,因此在核心引用解决设置下表现出了上下文uality。

Expanding Frozen Vision-Language Models without Retraining: Towards Improved Robot Perception

  • paper_url: http://arxiv.org/abs/2308.16493
  • repo_url: None
  • paper_authors: Riley Tavassoli, Mani Amani, Reza Akhavian
    for:This paper aims to improve the scene understanding of vision-language models (VLMs) by aligning the embedding spaces of different modalities, such as inertial measurement unit (IMU) data, with the vision embedding space.methods:The proposed method combines supervised and contrastive training to align the embedding spaces of different modalities with the vision embedding space, without requiring retraining of the VLM. The IMU embeddings are given directly to the model, allowing for nonlinear interactions between the query, image, and IMU signal.results:The proposed method is evaluated through experiments on human activity recognition using IMU data and visual inputs. The results show that using multiple modalities as input improves the VLM’s scene understanding and enhances its overall performance in various tasks, demonstrating the effectiveness of the proposed method.
    Abstract Vision-language models (VLMs) have shown powerful capabilities in visual question answering and reasoning tasks by combining visual representations with the abstract skill set large language models (LLMs) learn during pretraining. Vision, while the most popular modality to augment LLMs with, is only one representation of a scene. In human-robot interaction scenarios, robot perception requires accurate scene understanding by the robot. In this paper, we define and demonstrate a method of aligning the embedding spaces of different modalities (in this case, inertial measurement unit (IMU) data) to the vision embedding space through a combination of supervised and contrastive training, enabling the VLM to understand and reason about these additional modalities without retraining. We opt to give the model IMU embeddings directly over using a separate human activity recognition model that feeds directly into the prompt to allow for any nonlinear interactions between the query, image, and IMU signal that would be lost by mapping the IMU data to a discrete activity label. Further, we demonstrate our methodology's efficacy through experiments involving human activity recognition using IMU data and visual inputs. Our results show that using multiple modalities as input improves the VLM's scene understanding and enhances its overall performance in various tasks, thus paving the way for more versatile and capable language models in multi-modal contexts.
    摘要 视力语言模型(VLM)已经展现出极强的能力在视觉问答和理解任务中,通过将视觉表示与大语言模型(LLM)在预训练时学习的抽象技能相结合。视觉,是现实中最受欢迎的感知模式,但是只是场景理解中的一种表示。在人机交互场景中,机器人需要准确地理解场景。在这篇论文中,我们定义并实现了将不同modalities(在这种情况下是测量单元(IMU)数据)的 embedding 空间与视觉 embedding 空间对齐,使得 VLM 能够理解和处理这些其他模式,无需重新训练。我们选择将 IMU 嵌入直接给模型,而不是使用一个独立的人类活动识别模型,以便保留非线性交互 между查询、图像和 IMU 信号。此外,我们通过对人类活动识别 tasks 进行实验,证明了我们的方法的有效性。我们的结果表明,将多种模式作为输入,可以提高 VLM 的场景理解和总性性能,从而开创更多功能强大的语言模型在多modal contexts。

In-class Data Analysis Replications: Teaching Students while Testing Science

  • paper_url: http://arxiv.org/abs/2308.16491
  • repo_url: None
  • paper_authors: Kristina Gligoric, Tiziano Piccardi, Jake Hofman, Robert West
  • for: 这个论文目的是为了探讨在数据分析教程中包含复制任务的可行性,以及这种方法对学生、教师和科学家的影响。
  • methods: 这个研究使用了在EPFL教授的应用数据分析课程(CS-401)中包含复制任务的方法,并通过在课程进行的问卷调查来收集数据。
  • results: 研究发现学生可以复制已经发表的科学论文,大多数情况下是质量的,一些情况下是准确的。学生对复制任务的期望和实际经验之间存在差异,这些差异共同证明了对critical thinking的激励作用。此外,教师可以了解在教室中包含复制任务的成本和问题,以及这种方法对传统任务的比较。研究还发现了对科学社区的具体利益,如复制报告和科学工作中避免的复制障碍。
    Abstract Science is facing a reproducibility crisis. Previous work has proposed incorporating data analysis replications into classrooms as a potential solution. However, despite the potential benefits, it is unclear whether this approach is feasible, and if so, what the involved stakeholders-students, educators, and scientists-should expect from it. Can students perform a data analysis replication over the course of a class? What are the costs and benefits for educators? And how can this solution help benchmark and improve the state of science? In the present study, we incorporated data analysis replications in the project component of the Applied Data Analysis course (CS-401) taught at EPFL (N=354 students). Here we report pre-registered findings based on surveys administered throughout the course. First, we demonstrate that students can replicate previously published scientific papers, most of them qualitatively and some exactly. We find discrepancies between what students expect of data analysis replications and what they experience by doing them along with changes in expectations about reproducibility, which together serve as evidence of attitude shifts to foster students' critical thinking. Second, we provide information for educators about how much overhead is needed to incorporate replications into the classroom and identify concerns that replications bring as compared to more traditional assignments. Third, we identify tangible benefits of the in-class data analysis replications for scientific communities, such as a collection of replication reports and insights about replication barriers in scientific work that should be avoided going forward. Overall, we demonstrate that incorporating replication tasks into a large data science class can increase the reproducibility of scientific work as a by-product of data science instruction, thus benefiting both science and students.
    摘要 科学面临着可重现危机。 previous work提议在课程中包含数据分析重复,以解决这个问题。然而,尚未确定这种方法是否实施可行,以及参与者们(学生、教师和科学家)应该期望什么。学生在课程中完成数据分析重复是否可能?教师所承担的成本和利益是什么?这种解决方案可以如何帮助评估和改进科学的状况?在 presente study中,我们在EPFL教授的应用数据分析课程(CS-401)中 integrate了数据分析重复。我们通过课程中的问naire进行了预先注册的发现,发现学生可以重复已发表的科学论文,大多数是Qualitatively相同,一些是精确相同。我们发现学生对数据分析重复的预期与实际经验存在差异,这些差异共同证明了学生的批判思维的提高。其次,我们为教师提供了包括 integrate replications into the classroom overhead和replications bring 相比传统任务的担忧。 finally,我们发现在课程中的数据分析重复提供了科学社区的 tangible benefits,如replication reports和对重复过程中的障碍的洞察,这些材料可以为未来的科学工作提供指导。总之,我们的研究表明,在课程中包含数据分析重复任务可以提高科学工作的可重现性,并为学生和科学社区带来利益。

Latent Painter

  • paper_url: http://arxiv.org/abs/2308.16490
  • repo_url: None
  • paper_authors: Shih-Chieh Su
  • for: 用于生成创意艺术动画
  • methods: 使用潜在隐藏的canvas和预测结果作为规划,通过转移一个生成的图像到另一个来实现动画变换
  • results: 能够生成具有变换性的精细艺术动画
    Abstract Latent diffusers revolutionized the generative AI and inspired creative art. When denoising the latent, the predicted original image at each step collectively animates the formation. However, the animation is limited by the denoising nature of the diffuser, and only renders a sharpening process. This work presents Latent Painter, which uses the latent as the canvas, and the diffuser predictions as the plan, to generate painting animation. Latent Painter also transits one generated image to another, which can happen between images from two different sets of checkpoints.
    摘要 Latent diffusers革新生成AI和艺术创作。当去噪latent时,预测的原图在每步 коллектив卷积动画。然而,动画受到噪音除去器的限制,只能进行锐化处理。这个工作介绍Latent Painter,它使用latent作为画布,并使用预测的diffuser来计划生成画作动画。Latent Painter还可以在一个生成的图像与另一个图像之间进行转换,这可以发生在两个不同的检点集中的图像之间。

Test-Time Adaptation for Point Cloud Upsampling Using Meta-Learning

  • paper_url: http://arxiv.org/abs/2308.16484
  • repo_url: None
  • paper_authors: Ahmed Hatem, Yiming Qian, Yang Wang
  • for: 提高激活点云upsampling的模型通用性
  • methods: 使用meta-学习来适应测试数据的特点
  • results: 比对标准基准数据的表现有所提高
    Abstract Affordable 3D scanners often produce sparse and non-uniform point clouds that negatively impact downstream applications in robotic systems. While existing point cloud upsampling architectures have demonstrated promising results on standard benchmarks, they tend to experience significant performance drops when the test data have different distributions from the training data. To address this issue, this paper proposes a test-time adaption approach to enhance model generality of point cloud upsampling. The proposed approach leverages meta-learning to explicitly learn network parameters for test-time adaption. Our method does not require any prior information about the test data. During meta-training, the model parameters are learned from a collection of instance-level tasks, each of which consists of a sparse-dense pair of point clouds from the training data. During meta-testing, the trained model is fine-tuned with a few gradient updates to produce a unique set of network parameters for each test instance. The updated model is then used for the final prediction. Our framework is generic and can be applied in a plug-and-play manner with existing backbone networks in point cloud upsampling. Extensive experiments demonstrate that our approach improves the performance of state-of-the-art models.
    摘要 便宜的3D扫描仪通常生成稀疏不均匀的点云,这会负面影响下游应用程序在робо特系统中。而现有的点云upsampling架构在标准测试数据上已经达到了可观的结果,但是它们在测试数据与训练数据之间的分布不同时会经受显著性能下降。为解决这个问题,这篇论文提出了一种测试时适应approach,用于提高点云upsampling模型的通用性。我们的方法不需要任何测试数据的先前信息。在meta-training中,模型参数被学习从一个集合实例级任务中,每个实例包含一对稀疏和密集的点云从训练数据中。在meta-testing中,已经训练过的模型被微调一些梯度更新,以生成每个测试实例唯一的网络参数。更新后的模型然后用于最终预测。我们的框架可以与现有的后缀网络在点云upsampling中进行插件式应用。广泛的实验证明了我们的方法可以提高现有模型的性能。

Point-TTA: Test-Time Adaptation for Point Cloud Registration Using Multitask Meta-Auxiliary Learning

  • paper_url: http://arxiv.org/abs/2308.16481
  • repo_url: None
  • paper_authors: Ahmed Hatem, Yiming Qian, Yang Wang
  • for: 提高点云注册模型的通用性和性能
  • methods: 提出了一种基于测试时适应的点云注册框架,通过三个自动适应任务来适应测试数据,并通过meta-依赖学习方法来在测试时进行适应。
  • results: 实验结果表明,该方法可以提高点云注册模型的通用性和性能,并且超过了其他现有方法的表现。
    Abstract We present Point-TTA, a novel test-time adaptation framework for point cloud registration (PCR) that improves the generalization and the performance of registration models. While learning-based approaches have achieved impressive progress, generalization to unknown testing environments remains a major challenge due to the variations in 3D scans. Existing methods typically train a generic model and the same trained model is applied on each instance during testing. This could be sub-optimal since it is difficult for the same model to handle all the variations during testing. In this paper, we propose a test-time adaptation approach for PCR. Our model can adapt to unseen distributions at test-time without requiring any prior knowledge of the test data. Concretely, we design three self-supervised auxiliary tasks that are optimized jointly with the primary PCR task. Given a test instance, we adapt our model using these auxiliary tasks and the updated model is used to perform the inference. During training, our model is trained using a meta-auxiliary learning approach, such that the adapted model via auxiliary tasks improves the accuracy of the primary task. Experimental results demonstrate the effectiveness of our approach in improving generalization of point cloud registration and outperforming other state-of-the-art approaches.
    摘要 我们介绍Point-TTA,一种新的测试时适应框架 для点云注册(PCR),可以提高注册模型的通用性和性能。学习型方法已经取得了很大的进步,但是在测试环境中普遍存在不同的3D扫描数据,这使得同一个模型在测试时难以处理所有变化。在这篇论文中,我们提议一种测试时适应方法 для PCR。我们的模型可以在测试时适应未看过的分布,无需任何测试数据的先知知识。具体来说,我们设计了三个自动编目任务,这些任务与主要PCR任务一起被优化。给定一个测试实例,我们使用这些自动编目任务适应我们的模型,并使用更新后的模型进行推理。在训练时,我们使用一种元助理学习方法来训练我们的模型,以便适应任务中的更新。实验结果表明,我们的方法可以提高点云注册的通用性和超越其他现有的方法。

Transformer Compression via Subspace Projection

  • paper_url: http://arxiv.org/abs/2308.16475
  • repo_url: None
  • paper_authors: Yuxuan Hu, Jing Zhang, Chen Zhao, Cuiping Li, Hong Chen
  • for: 压缩 transformer 模型,减少隐藏尺寸
  • methods: 使用矩阵运算在压缩后的空间中进行 matrix operations
  • results: 实验结果显示,TCSP 可以实现44%的压缩率,并且精度下降不超过1.6%,超过或匹配先前的压缩方法。同时,TCSP 兼容其他针对筛子和注意头尺寸压缩的方法。
    Abstract We propose TCSP, a novel method for compressing a transformer model by focusing on reducing the hidden size of the model. By projecting the whole transform model into a subspace, we enable matrix operations between the weight matrices in the model and features in a reduced-dimensional space, leading to significant reductions in model parameters and computing resources. To establish this subspace, we decompose the feature matrix, derived from different layers of sampled data instances, into a projection matrix. For evaluation, TCSP is applied to compress T5 and BERT models on the GLUE and SQuAD benchmarks. Experimental results demonstrate that TCSP achieves a compression ratio of 44\% with at most 1.6\% degradation in accuracy, surpassing or matching prior compression methods. Furthermore, TCSP exhibits compatibility with other methods targeting filter and attention head size compression.
    摘要 我们提出TCSP方法,一种 novel方法用于压缩transformer模型,主要是降低模型的隐藏大小。我们通过将整个transform模型转映到一个子空间中,使得matrix操作可以进行在模型对于特征的压缩空间中,从而实现了重要的压缩 Parameter和计算资源。为了建立这个子空间,我们将特征矩阵,从不同层次的抽象数据实例中 derivation, decomposed为一个投影矩阵。为了评估,TCSP方法在GLUE和SQuAD评分板上压缩T5和BERT模型,实现了44%的压缩比,并且对应最多1.6%的精度下降,超过或匹配先前的压缩方法。此外,TCSP方法可以与其他对答和注意头大小压缩方法相容。

Enhancing Subtask Performance of Multi-modal Large Language Model

  • paper_url: http://arxiv.org/abs/2308.16474
  • repo_url: None
  • paper_authors: Yongqiang Zhao, Zhenyu Li, Feng Zhang, Xinhai Xu, Donghong Liu
  • For: This paper aims to improve the performance of multi-modal large language models (MLLMs) by selecting multiple pre-trained models to complete the same subtask and combining their results to obtain the optimal outcome.* Methods: The proposed approach involves selecting multiple pre-trained models focused on the same subtask based on distinct evaluation approaches, invoking these models in parallel to process input data, and comparing the results from multiple pre-trained models using a large language model (LLM) to choose the best outcome.* Results: The proposed approach is shown to be effective in improving the performance of MLLMs through extensive experiments using GPT-4 annotated datasets and human-annotated datasets, with results from various evaluation metrics demonstrating the approach’s effectiveness.
    Abstract Multi-modal Large Language Model (MLLM) refers to a model expanded from a Large Language Model (LLM) that possesses the capability to handle and infer multi-modal data. Current MLLMs typically begin by using LLMs to decompose tasks into multiple subtasks, then employing individual pre-trained models to complete specific subtasks, and ultimately utilizing LLMs to integrate the results of each subtasks to obtain the results of the task. In real-world scenarios, when dealing with large projects, it is common practice to break down the project into smaller sub-projects, with different teams providing corresponding solutions or results. The project owner then decides which solution or result to use, ensuring the best possible outcome for each subtask and, consequently, for the entire project. Inspired by this, this study considers selecting multiple pre-trained models to complete the same subtask. By combining the results from multiple pre-trained models, the optimal subtask result is obtained, enhancing the performance of the MLLM. Specifically, this study first selects multiple pre-trained models focused on the same subtask based on distinct evaluation approaches, and then invokes these models in parallel to process input data and generate corresponding subtask results. Finally, the results from multiple pre-trained models for the same subtask are compared using the LLM, and the best result is chosen as the outcome for that subtask. Extensive experiments are conducted in this study using GPT-4 annotated datasets and human-annotated datasets. The results of various evaluation metrics adequately demonstrate the effectiveness of the proposed approach in this paper.
    摘要

MaintainoMATE: A GitHub App for Intelligent Automation of Maintenance Activities

  • paper_url: http://arxiv.org/abs/2308.16464
  • repo_url: None
  • paper_authors: Anas Nadeem, Muhammad Usman Sarwar, Muhammad Zubair Malik
  • for: 本研究旨在提高软件开发项目中维护任务的效率,特别是自动化issue tracking系统上的issue报告处理。
  • methods: 本研究使用BERT模型来自动分类issue报告并将其分配给相关的开发者。
  • results: experiments show that MaintainoMATE可以达到约80%的F1分数,并且可以将issue报告分配给相关的开发者,其F1分数达54%,与现有方法相当。
    Abstract Software development projects rely on issue tracking systems at the core of tracking maintenance tasks such as bug reports, and enhancement requests. Incoming issue-reports on these issue tracking systems must be managed in an effective manner. First, they must be labelled and then assigned to a particular developer with relevant expertise. This handling of issue-reports is critical and requires thorough scanning of the text entered in an issue-report making it a labor-intensive task. In this paper, we present a unified framework called MaintainoMATE, which is capable of automatically categorizing the issue-reports in their respective category and further assigning the issue-reports to a developer with relevant expertise. We use the Bidirectional Encoder Representations from Transformers (BERT), as an underlying model for MaintainoMATE to learn the contextual information for automatic issue-report labeling and assignment tasks. We deploy the framework used in this work as a GitHub application. We empirically evaluate our approach on GitHub issue-reports to show its capability of assigning labels to the issue-reports. We were able to achieve an F1-score close to 80\%, which is comparable to existing state-of-the-art results. Similarly, our initial evaluations show that we can assign relevant developers to the issue-reports with an F1 score of 54\%, which is a significant improvement over existing approaches. Our initial findings suggest that MaintainoMATE has the potential of improving software quality and reducing maintenance costs by accurately automating activities involved in the maintenance processes. Our future work would be directed towards improving the issue-assignment module.
    摘要 Translated into Simplified Chinese:软件开发项目依赖问题跟踪系统的核心,包括BUG报告和改进请求。接收到issue报告后,需要有效地管理它们。首先,它们需要被标签,然后分配给相关的开发者。这个过程是 kritical 和需要干预的,因为需要从issue报告中提取信息,这是一项劳动密集的任务。在这篇论文中,我们提出了一个统一框架,叫做MaintainoMATE,可以自动将issue报告分类到不同的类别中,并将其分配给相关的开发者。我们使用了BERT模型,作为MaintainoMATE的下一层模型,以学习issue报告中的上下文信息。我们将这个框架部署到GitHub上。我们对GitHub上的issue报告进行了实验,以示其能否将标签分配给issue报告。我们获得了一个F1分数接近80%,与现有的状态艺术结果相似。此外,我们的初步评估表明,我们可以将issue报告分配给相关的开发者,F1分数为54%,与现有方法相比,是一个显著的改进。我们的初步发现表明,MaintainoMATE有可能提高软件质量并降低维护成本,通过准确地自动化维护过程中的活动。我们未来的工作将是改进issue分配模块。

BioCoder: A Benchmark for Bioinformatics Code Generation with Contextual Pragmatic Knowledge

  • paper_url: http://arxiv.org/abs/2308.16458
  • repo_url: https://github.com/gersteinlab/biocoder
  • paper_authors: Xiangru Tang, Bill Qian, Rick Gao, Jiakang Chen, Xinyun Chen, Mark Gerstein
  • for: 本研究开发了一个名为 BioCoder 的库,用于评估现有的预训构模型在生成生物信息学程式码方面的表现。
  • methods: BioCoder 使用了 GitHub 和 Rosalind Project 上的 Python 和 Java 程式码,以及一个对测试模型的测试框架,以评估模型的表现。
  • results: 研究发现,为了在生物信息学程式码生成中取得出色的表现,模型需要具备领域知识、实用程式码生成能力和上下文理解能力。
    Abstract Pre-trained language models like ChatGPT have significantly improved code generation. As these models scale up, there is an increasing need for the output to handle more intricate tasks. Moreover, in bioinformatics, generating functional programs poses additional notable challenges due to the amount of domain knowledge, the need for complicated data operations, and intricate functional dependencies between the operations. Here, we present BioCoder, a benchmark developed to evaluate existing pre-trained models in generating bioinformatics code. In relation to function-code generation, BioCoder covers potential package dependencies, class declarations, and global variables. It incorporates 1026 functions and 1243 methods in Python and Java from GitHub and 253 examples from the Rosalind Project. BioCoder incorporates a fuzz-testing framework for evaluation, and we have applied it to evaluate many models including InCoder, CodeGen, CodeGen2, SantaCoder, StarCoder, StarCoder+, InstructCodeT5+, and ChatGPT. Our detailed analysis of these models emphasizes the importance of domain knowledge, pragmatic code generation, and contextual understanding. Our dataset, benchmark, Docker images, and scripts required for testing are all available at https://github.com/gersteinlab/biocoder.
    摘要 Pre-trained语言模型如ChatGPT已经显著改进了代码生成。随着这些模型的扩大,代码生成的输出需要承办更加复杂的任务。在生物信息学中,生成功能程序受到域知识的限制,需要进行复杂的数据操作和函数依赖关系。为此,我们提出了BioCoder,一个用于评估现有预训练模型的代码生成能力的benchmark。在函数代码生成方面,BioCoder覆盖了可能的包依赖、类声明和全局变量。它包含1026个函数和1243个方法在Python和Java中,从GitHub和Rosalind项目中提取来的253个示例。BioCoder包含一个混淆测试框架,我们已经应用到了许多模型中,包括InCoder、CodeGen、CodeGen2、SantaCoder、StarCoder、StarCoder+、InstructCodeT5+和ChatGPT。我们的详细分析表明,域知识、实用代码生成和上下文理解对代码生成的质量具有重要作用。我们的数据集、benchmark、Docker镜像和测试脚本都可以在https://github.com/gersteinlab/biocoder中获取。

Contrastive Representation Learning Based on Multiple Node-centered Subgraphs

  • paper_url: http://arxiv.org/abs/2308.16441
  • repo_url: None
  • paper_authors: Dong Li, Wenjun Wang, Minglai Shao, Chen Zhao
  • for: 学习图表示性能,即使用自我supervised方式学习图节点表示。
  • methods: 提出了一种多个节点中心子图对比学习方法,通过设计精心的多个节点中心子图来增强节点表示的自适应能力。
  • results: 在多个实际世界数据集和不同下游任务中,模型已经实现了状态级 результаts。
    Abstract As the basic element of graph-structured data, node has been recognized as the main object of study in graph representation learning. A single node intuitively has multiple node-centered subgraphs from the whole graph (e.g., one person in a social network has multiple social circles based on his different relationships). We study this intuition under the framework of graph contrastive learning, and propose a multiple node-centered subgraphs contrastive representation learning method to learn node representation on graphs in a self-supervised way. Specifically, we carefully design a series of node-centered regional subgraphs of the central node. Then, the mutual information between different subgraphs of the same node is maximized by contrastive loss. Experiments on various real-world datasets and different downstream tasks demonstrate that our model has achieved state-of-the-art results.
    摘要 为图structured数据的基本元素,节点已被认为是学习图表示的主要对象。一个单个节点拥有多个基于整个图的节点中心子图(例如,一个社交网络中的一个人有多个基于他不同关系的社交圈)。我们在图矩阵学习框架下研究这一感知,并提出了多个节点中心子图对比学习方法来自顺supervised的学习节点表示。具体来说,我们特别设计了一系列基于中心节点的节点中心子图。然后,通过对不同节点的子图进行对比损失,最大化不同节点之间的共通信息。实验结果表明,我们的模型在实际世界数据集和不同下游任务中均达到了状态率的Result。

BenchTemp: A General Benchmark for Evaluating Temporal Graph Neural Networks

  • paper_url: http://arxiv.org/abs/2308.16385
  • repo_url: https://github.com/qianghuangwhu/benchtemp
  • paper_authors: Qiang Huang, Jiawei Jiang, Xi Susie Rao, Ce Zhang, Zhichao Han, Zitao Zhang, Xin Wang, Yongjun He, Quanqing Xu, Yang Zhao, Chuang Hu, Shuo Shang, Bo Du
  • for: 评估Temporal Graph Neural Networks (TGNNs)的性能,提供一个通用的评估平台。
  • methods: 使用BenchTemp benchmark suite,包括各种任务和设置,对TGNN模型进行比较。
  • results: 对多种代表性TGNN模型进行了广泛的比较,包括效果率和效率两个指标。
    Abstract To handle graphs in which features or connectivities are evolving over time, a series of temporal graph neural networks (TGNNs) have been proposed. Despite the success of these TGNNs, the previous TGNN evaluations reveal several limitations regarding four critical issues: 1) inconsistent datasets, 2) inconsistent evaluation pipelines, 3) lacking workload diversity, and 4) lacking efficient comparison. Overall, there lacks an empirical study that puts TGNN models onto the same ground and compares them comprehensively. To this end, we propose BenchTemp, a general benchmark for evaluating TGNN models on various workloads. BenchTemp provides a set of benchmark datasets so that different TGNN models can be fairly compared. Further, BenchTemp engineers a standard pipeline that unifies the TGNN evaluation. With BenchTemp, we extensively compare the representative TGNN models on different tasks (e.g., link prediction and node classification) and settings (transductive and inductive), w.r.t. both effectiveness and efficiency metrics. We have made BenchTemp publicly available at https://github.com/qianghuangwhu/benchtemp.
    摘要 为了处理时间演化的图像,一系列的时间图神经网络(TGNN)已经被提议。尽管这些TGNN模型具有成功的表现,但之前的TGNN评价显示了四个关键问题的局限性:1)不一致的数据集,2)不一致的评价流水线,3)缺乏工作负荷多样性,4)缺乏高效的比较。总的来说,没有一个实证研究可以将TGNN模型放在一起,并对其进行全面的比较。为此,我们提出了BenchTemp,一个通用的benchmark用于评价TGNN模型的多种工作负荷。BenchTemp提供了一组benchmark数据集,以便不同的TGNN模型可以公平地比较。此外,BenchTemp还设计了一个标准的评价流水线,以确保TGNN模型在不同任务(如链接预测和节点分类)和设置(推uctive和induction)下进行公平的评价。通过BenchTemp,我们对不同的TGNN模型进行了广泛的比较,并对其效果和效率指标进行了评价。BenchTemp已经在https://github.com/qianghuangwhu/benchtemp上公开发布。

A Survey on Privacy in Graph Neural Networks: Attacks, Preservation, and Applications

  • paper_url: http://arxiv.org/abs/2308.16375
  • repo_url: None
  • paper_authors: Yi Zhang, Yuying Zhao, Zhaoqing Li, Xueqi Cheng, Yu Wang, Olivera Kotevska, Philip S. Yu, Tyler Derr
  • For: The paper aims to provide a comprehensive overview of attacks on graph data and privacy preservation techniques in graph neural networks (GNNs).* Methods: The paper categorizes privacy preservation techniques in GNNs and reviews datasets and applications for analyzing/solving privacy issues in GNNs.* Results: The paper outlines potential directions for future research to build better privacy-preserving GNNs.Here’s the Chinese version of the three key points:* For: 论文旨在提供图数据的攻击和图神经网络(GNNs)中的隐私保护技术的全面回顾。* Methods: 论文对GNNs中的隐私保护技术进行分类,并评估图数据分析/解决隐私问题的数据和应用程序。* Results: 论文提出未来研究的可能方向,以建立更好的隐私保护GNNs。
    Abstract Graph Neural Networks (GNNs) have gained significant attention owing to their ability to handle graph-structured data and the improvement in practical applications. However, many of these models prioritize high utility performance, such as accuracy, with a lack of privacy consideration, which is a major concern in modern society where privacy attacks are rampant. To address this issue, researchers have started to develop privacy-preserving GNNs. Despite this progress, there is a lack of a comprehensive overview of the attacks and the techniques for preserving privacy in the graph domain. In this survey, we aim to address this gap by summarizing the attacks on graph data according to the targeted information, categorizing the privacy preservation techniques in GNNs, and reviewing the datasets and applications that could be used for analyzing/solving privacy issues in GNNs. We also outline potential directions for future research in order to build better privacy-preserving GNNs.
    摘要 graph neural networks (GNNs) 在处理图structured数据方面已经吸引了广泛的注意力,但是许多这些模型强调高性能,如准确率,而忽略了隐私考虑,这在现代社会中是一个重要的问题,因为隐私攻击是普遍的。为解决这个问题,研究人员开始了隐私保护GNNs的开发。 despite this progress, there is a lack of a comprehensive overview of the attacks and the techniques for preserving privacy in the graph domain. In this survey, we aim to address this gap by summarizing the attacks on graph data according to the targeted information, categorizing the privacy preservation techniques in GNNs, and reviewing the datasets and applications that could be used for analyzing/solving privacy issues in GNNs. We also outline potential directions for future research in order to build better privacy-preserving GNNs.Here's the translation in Traditional Chinese:graph neural networks (GNNs) 在处理图structured数据方面已经吸引了广泛的注意力,但是许多这些模型强调高性能,如准确率,而忽略了隐私考虑,这在现代社会中是一个重要的问题,因为隐私攻击是普遍的。为解决这个问题,研究人员开始了隐私保护GNNs的开发。 despite this progress, there is a lack of a comprehensive overview of the attacks and the techniques for preserving privacy in the graph domain. In this survey, we aim to address this gap by summarizing the attacks on graph data according to the targeted information, categorizing the privacy preservation techniques in GNNs, and reviewing the datasets and applications that could be used for analyzing/solving privacy issues in GNNs. We also outline potential directions for future research in order to build better privacy-preserving GNNs.