cs.AI - 2023-11-06

Multimodal Stress Detection Using Facial Landmarks and Biometric Signals

  • paper_url: http://arxiv.org/abs/2311.03606
  • repo_url: None
  • paper_authors: Majid Hosseini, Morteza Bodaghi, Ravi Teja Bhupatiraju, Anthony Maida, Raju Gottumukkala
    for: 这种研究旨在提高人们的压力测量和情绪状况的评估,通过结合多种感知技术。methods: 这种研究使用多模态学习方法,结合脸部特征和生物指标信号进行压力检测。results: 研究发现,使用晚期融合技术可以达到94.39%的准确率,而使用早期融合技术可以超越这一成果,达到98.38%的准确率。
    Abstract The development of various sensing technologies is improving measurements of stress and the well-being of individuals. Although progress has been made with single signal modalities like wearables and facial emotion recognition, integrating multiple modalities provides a more comprehensive understanding of stress, given that stress manifests differently across different people. Multi-modal learning aims to capitalize on the strength of each modality rather than relying on a single signal. Given the complexity of processing and integrating high-dimensional data from limited subjects, more research is needed. Numerous research efforts have been focused on fusing stress and emotion signals at an early stage, e.g., feature-level fusion using basic machine learning methods and 1D-CNN Methods. This paper proposes a multi-modal learning approach for stress detection that integrates facial landmarks and biometric signals. We test this multi-modal integration with various early-fusion and late-fusion techniques to integrate the 1D-CNN model from biometric signals and 2-D CNN using facial landmarks. We evaluate these architectures using a rigorous test of models' generalizability using the leave-one-subject-out mechanism, i.e., all samples related to a single subject are left out to train the model. Our findings show that late-fusion achieved 94.39\% accuracy, and early-fusion surpassed it with a 98.38\% accuracy rate. This research contributes valuable insights into enhancing stress detection through a multi-modal approach. The proposed research offers important knowledge in improving stress detection using a multi-modal approach.
    摘要 发展不同感知技术对个人压力测量带来了改进。虽然单模态如穿戴式设备和表情识别已经取得了进步,但是将多个模态融合提供了更全面的压力测量,因为压力在不同人群中表现不同。多模态学习希望利用每个模态的优势而不仅仅依靠单一信号。由于处理和 инте格高维数据的限制,更多的研究是必要的。许多研究团队已经关注将压力和情绪信号在早期融合,例如特征级别融合使用基本机器学习方法和1D-CNN方法。这篇论文提出了一种多模态学习方法, integrating facial landmarks and biometric signals for stress detection.我们使用不同的早期融合和晚期融合技术来融合1D-CNN模型和2D-CNN使用facial landmarks。我们使用离散一个主题机制进行模型评估,即所有与一个主题相关的样本被去除,以训练模型。我们的发现显示,晚期融合达到94.39%的准确率,而早期融合超过了它,达到98.38%的准确率。这项研究为压力检测提供了价值的新发现,并且提供了改进压力检测的多模态方法的重要知识。

Brief for the Canada House of Commons Study on the Implications of Artificial Intelligence Technologies for the Canadian Labor Force: Generative Artificial Intelligence Shatters Models of AI and Labor

  • paper_url: http://arxiv.org/abs/2311.03595
  • repo_url: None
  • paper_authors: Morgan R. Frank
  • for: 探讨当前生产力技术的发展对工作 Market的影响,并提出政策建议以适应未来工作环境。
  • methods: 利用数据分析和预测技术来研究生产力技术对工作市场的影响,并对现有的自动化预测模型进行批判性分析。
  • results: 发现生产力技术可能会对一些以前被认为免疫自动化的职业产生影响,政策 makers应该促进工人的职业适应性,并鼓励教育机构开发适应AI技术的教育程序。
    Abstract Exciting advances in generative artificial intelligence (AI) have sparked concern for jobs, education, productivity, and the future of work. As with past technologies, generative AI may not lead to mass unemployment. But, unlike past technologies, generative AI is creative, cognitive, and potentially ubiquitous which makes the usual assumptions of automation predictions ill-suited for today. Existing projections suggest that generative AI will impact workers in occupations that were previously considered immune to automation. As AI's full set of capabilities and applications emerge, policy makers should promote workers' career adaptability. This goal requires improved data on job separations and unemployment by locality and job titles in order to identify early-indicators for the workers facing labor disruption. Further, prudent policy should incentivize education programs to accommodate learning with AI as a tool while preparing students for the demands of the future of work.
    摘要 新的生成人工智能技术已经引发了工作、教育、生产效率和未来工作的担忧。与过去的技术不同,生成人工智能具有创造力、认知能力和潜在的 ubique 特点,使得传统的自动化预测模型成为不适用的。现有的预测结果表明,生成人工智能可能会影响工作者在之前被认为是自动化免疫的职业。为实现工作者职业适应性,政策制定者应该促进地方各地的职业分类和失业数据的收集,以识别受到劳动干预的工作者。此外,政策应该鼓励教育计划,以便在使用人工智能为工具的同时,准备学生未来的工作需求。

  • paper_url: http://arxiv.org/abs/2311.03583
  • repo_url: None
  • paper_authors: Abbas Mehrabian, Ankit Anand, Hyunjik Kim, Nicolas Sonnerat, Matej Balog, Gheorghe Comanici, Tudor Berariu, Andrew Lee, Anian Ruoss, Anna Bulanova, Daniel Toyama, Sam Blackwell, Bernardino Romera Paredes, Petar Veličković, Laurent Orseau, Joonkyung Lee, Anurag Murty Naredla, Doina Precup, Adam Zsolt Wagner
  • for: 这个论文解决了一个中央极点图论题,这个问题是根据1975年erdős的 conjecture,找到一个给定大小的图最多的边数而不包含3-或4-цикル。
  • methods: 这个论文使用了AlphaZero和tabu搜索两种方法,并通过引入课程来提高state-of-the-art下界。
  • results: 这个论文通过引入课程和提高搜索策略,提高了几个不同大小的图的下界。此外,这个论文还提出了一种灵活的图生成环境和一种 permutation-invariant的网络架构来学习搜索在图空间中。
    Abstract This work studies a central extremal graph theory problem inspired by a 1975 conjecture of Erd\H{o}s, which aims to find graphs with a given size (number of nodes) that maximize the number of edges without having 3- or 4-cycles. We formulate this problem as a sequential decision-making problem and compare AlphaZero, a neural network-guided tree search, with tabu search, a heuristic local search method. Using either method, by introducing a curriculum -- jump-starting the search for larger graphs using good graphs found at smaller sizes -- we improve the state-of-the-art lower bounds for several sizes. We also propose a flexible graph-generation environment and a permutation-invariant network architecture for learning to search in the space of graphs.
    摘要 Translation note:* "sequential decision-making problem" becomes "连续决策问题" (liánxù juéxì wèn tí)* "AlphaZero" becomes "AlphaZero" (ā lfah zhī)* "tabu search" becomes "tabu搜索" (tā bù sōu suǒ)* "curriculum" becomes "课程" (kèxíng)* "permutation-invariant network architecture" becomes "对称的网络架构" (duìxiàng de wǎngluò jiàgòu)

AI-Enabled Unmanned Vehicle-Assisted Reconfigurable Intelligent Surfaces: Deployment, Prototyping, Experiments, and Opportunities

  • paper_url: http://arxiv.org/abs/2311.04241
  • repo_url: None
  • paper_authors: Li-Hsiang Shen, Kai-Ten Feng, Ta-Sung Lee, Yuan-Chun Lin, Shih-Cheng Lin, Chia-Chan Chang, Sheng-Fuh Chang
  • For: This paper focuses on the deployment of Reconfigurable Intelligent Surfaces (RIS) in wireless communication networks, specifically in the context of the sixth-generation (6G) technology. The paper explores the use of RIS to extend service coverage, reduce power consumption, and enhance spectral efficiency.* Methods: The paper discusses the theoretical and hardware aspects of RIS deployment, as well as the use of artificial intelligence (AI) and machine learning to optimize the deployment process. The authors propose a federated multi-agent reinforcement learning scheme to optimize the placement and configuration of RISs.* Results: The paper presents experimental results of the proposed i-Dris system, which achieves a transmission throughput of up to 980 Mbps under a bandwidth of 100 MHz with comparatively low complexity and rapid deployment. The results show that the i-Dris system outperforms existing works in this area.Here’s the simplified Chinese text for the three key points:* For: 这篇论文关注 sixth-generation (6G) 技术中的 Reconfigurable Intelligent Surfaces (RIS) 的部署,以扩展服务覆盖区域、降低功率消耗和提高频率效率。* Methods: 论文讨论了 RIS 部署的理论和硬件方面,以及使用人工智能 (AI) 和机器学习来优化部署过程。作者提议了一种联邦多代理人强化学习方案来优化 RIS 的分布和配置。* Results: 论文发表了 i-Dris 系统的实验结果,该系统可以在带宽 100 MHz 下实现传输吞吐量达 980 Mbps,与其他现有的方法相比,i-Dris 系统具有较低的复杂性和较快的部署速度。
    Abstract The requirement of wireless data demands is increasingly high as the sixth-generation (6G) technology evolves. Reconfigurable intelligent surface (RIS) is promisingly deemed to be one of 6G techniques for extending service coverage, reducing power consumption, and enhancing spectral efficiency. In this article, we have provided some fundamentals of RIS deployment in theory and hardware perspectives as well as utilization of artificial intelligence (AI) and machine learning. We conducted an intelligent deployment of RIS (i-Dris) prototype, including dual-band auto-guided vehicle (AGV) assisted RISs associated with an mmWave base station (BS) and a receiver. The RISs are deployed on the AGV with configured incident/reflection angles. While, both the mmWave BS and receiver are associated with an edge server monitoring downlink packets for obtaining system throughput. We have designed a federated multi-agent reinforcement learning scheme associated with several AGV-RIS agents and sub-agents per AGV-RIS consisting of the deployment of position, height, orientation and elevation angles. The experimental results presented the stationary measurement in different aspects and scenarios. The i-Dris can reach up to 980 Mbps transmission throughput under a bandwidth of 100 MHz with comparably low complexity as well as rapid deployment, which outperforms the other existing works. At last, we highlight some opportunities and future issues in leveraging RIS-empowered wireless communication networks.
    摘要 sixth-generation (6G) 技术的无线数据需求在不断增长,而Reconfigurable intelligent surface (RIS) 被认为是6G技术的一种扩展服务覆盖、降低功率消耗和提高频率效率的方法。在这篇文章中,我们提供了RIS部署的理论和硬件视图,以及人工智能(AI)和机器学习的应用。我们开发了一个名为“智能RIS部署”(i-Dris)的原型,包括与mmWave基站(BS)和接收器相连的双频自导车(AGV)助手RIS。RIS被部署在AGV上,并配置了入射/反射角度。而BS和接收器均与边缘服务器监控下链路包,以获取系统吞吐量。我们设计了多个AGV-RIS代理和子代理,包括RIS部署的位置、高度、orientation和倾斜角度。实验结果显示,i-Dris可以在不同方面和场景下实现静态测量,并且具有相对较低的复杂性和快速部署,超过了其他已有作品。最后,我们提出了利用RIS empowered无线通信网络的机遇和未来问题。

Inclusive Portraits: Race-Aware Human-in-the-Loop Technology

  • paper_url: http://arxiv.org/abs/2311.03567
  • repo_url: None
  • paper_authors: Claudia Flores-Saviaga, Christopher Curtis, Saiph Savage
  • for: 这篇论文旨在提出一种基于人类社会理论的人类在循环(HITL)系统,以提高人脸验证服务的性能,特别是为人类颜色的服务。
  • methods: 该论文提出了一种名为“含容图像(Inclusive Portraits,IP)”的新方法,它将人类社会理论与人脸验证服务相结合,以提高服务的可靠性和准确性。
  • results: 实验结果表明,将种族考虑到HITL系统中可以显著提高服务的性能,特别是为人类颜色的服务。此外,该研究还发现,考虑工作者个人特点在HITL系统的设计中是非常重要的。
    Abstract AI has revolutionized the processing of various services, including the automatic facial verification of people. Automated approaches have demonstrated their speed and efficiency in verifying a large volume of faces, but they can face challenges when processing content from certain communities, including communities of people of color. This challenge has prompted the adoption of "human-in-the-loop" (HITL) approaches, where human workers collaborate with the AI to minimize errors. However, most HITL approaches do not consider workers' individual characteristics and backgrounds. This paper proposes a new approach, called Inclusive Portraits (IP), that connects with social theories around race to design a racially-aware human-in-the-loop system. Our experiments have provided evidence that incorporating race into human-in-the-loop (HITL) systems for facial verification can significantly enhance performance, especially for services delivered to people of color. Our findings also highlight the importance of considering individual worker characteristics in the design of HITL systems, rather than treating workers as a homogenous group. Our research has significant design implications for developing AI-enhanced services that are more inclusive and equitable.
    摘要

Low-Rank MDPs with Continuous Action Spaces

  • paper_url: http://arxiv.org/abs/2311.03564
  • repo_url: None
  • paper_authors: Andrew Bennett, Nathan Kallus, Miruna Oprescu
  • for: 本文研究了将低级Markov决策过程(MDP)扩展到连续动作空间上,以提高RL学习的可靠性和可扩展性。
  • methods: 本文提出了多种具体的扩展方法,包括使用约束优化和离散化方法,以及对FLAMBE算法(Agarwal et al., 2020)进行修改。
  • results: 研究表明,无需修改FLAMBE算法,在transition函数具有Holder细化程度对动作的情况下,可以获得类似的PAC界限,而无需知道奖励函数。此外,当政策集合具有固定最小浓度或奖励函数具有Holder细化程度时,可以获得一个几乎同样的PAC界限。
    Abstract Low-Rank Markov Decision Processes (MDPs) have recently emerged as a promising framework within the domain of reinforcement learning (RL), as they allow for provably approximately correct (PAC) learning guarantees while also incorporating ML algorithms for representation learning. However, current methods for low-rank MDPs are limited in that they only consider finite action spaces, and give vacuous bounds as $|\mathcal{A}| \to \infty$, which greatly limits their applicability. In this work, we study the problem of extending such methods to settings with continuous actions, and explore multiple concrete approaches for performing this extension. As a case study, we consider the seminal FLAMBE algorithm (Agarwal et al., 2020), which is a reward-agnostic method for PAC RL with low-rank MDPs. We show that, without any modifications to the algorithm, we obtain similar PAC bound when actions are allowed to be continuous. Specifically, when the model for transition functions satisfies a Holder smoothness condition w.r.t. actions, and either the policy class has a uniformly bounded minimum density or the reward function is also Holder smooth, we obtain a polynomial PAC bound that depends on the order of smoothness.
    摘要

Context Unlocks Emotions: Text-based Emotion Classification Dataset Auditing with Large Language Models

  • paper_url: http://arxiv.org/abs/2311.03551
  • repo_url: None
  • paper_authors: Daniel Yang, Aditya Kommineni, Mohammad Alshehri, Nilamadhab Mohanty, Vedant Modi, Jonathan Gratch, Shrikanth Narayanan
  • for: 提高文本数据的情感分类模型性能
  • methods: 使用大语言模型生成文本上的补充 контекスト信息
  • results: 提高文本输入与人工标注的情感标签的匹配率, tanto from empirical evaluation and human evaluation
    Abstract The lack of contextual information in text data can make the annotation process of text-based emotion classification datasets challenging. As a result, such datasets often contain labels that fail to consider all the relevant emotions in the vocabulary. This misalignment between text inputs and labels can degrade the performance of machine learning models trained on top of them. As re-annotating entire datasets is a costly and time-consuming task that cannot be done at scale, we propose to use the expressive capabilities of large language models to synthesize additional context for input text to increase its alignment with the annotated emotional labels. In this work, we propose a formal definition of textual context to motivate a prompting strategy to enhance such contextual information. We provide both human and empirical evaluation to demonstrate the efficacy of the enhanced context. Our method improves alignment between inputs and their human-annotated labels from both an empirical and human-evaluated standpoint.
    摘要 文本数据中缺乏上下文信息可能使文本情感分类数据集的标注过程变得困难。因此,这些数据集的标签通常不会考虑所有可能的情感词汇。这种文本输入和标签之间的不一致可能使机器学习模型在这些数据集上训练时表现下降。然而,重新标注整个数据集是一项成本高、时间consuming的任务,不能在大规模进行。因此,我们提议使用大语言模型的表达能力来生成更多的上下文信息,以增强输入文本与注解的情感标签的协调。在这种情况下,我们提出了文本上下文的正式定义,并提出了一种提问策略来增强文本上下文信息。我们通过人类和实验评估来证明我们的方法可以提高输入文本与其人类注解标签之间的协调。

United We Stand, Divided We Fall: UnityGraph for Unsupervised Procedure Learning from Videos

  • paper_url: http://arxiv.org/abs/2311.03550
  • repo_url: None
  • paper_authors: Siddhant Bansal, Chetan Arora, C. V. Jawahar
  • for: 本研究旨在解决现有方法无法捕捉多个视频中关键步骤的问题,通过提出一种无监督图格学习(GPL)框架。
  • methods: GPL使用一种新的 UnityGraph 表示所有任务视频的图形,以获得内视频和多视频上下文。然后,使用 Node2Vec 算法更新 UnityGraph 中的坐标,以实现无监督性的同步。最后,使用 KMeans 聚类算法确定关键步骤。
  • results: 对于 ProceL、CrossTask 和 EgoProceL 测试集,GPL 实现了平均提高2% 和 3.6% 相比于现有方法。
    Abstract Given multiple videos of the same task, procedure learning addresses identifying the key-steps and determining their order to perform the task. For this purpose, existing approaches use the signal generated from a pair of videos. This makes key-steps discovery challenging as the algorithms lack inter-videos perspective. Instead, we propose an unsupervised Graph-based Procedure Learning (GPL) framework. GPL consists of the novel UnityGraph that represents all the videos of a task as a graph to obtain both intra-video and inter-videos context. Further, to obtain similar embeddings for the same key-steps, the embeddings of UnityGraph are updated in an unsupervised manner using the Node2Vec algorithm. Finally, to identify the key-steps, we cluster the embeddings using KMeans. We test GPL on benchmark ProceL, CrossTask, and EgoProceL datasets and achieve an average improvement of 2% on third-person datasets and 3.6% on EgoProceL over the state-of-the-art.
    摘要 <>将文本翻译成简化中文。<>给定多个视频任务,程序学习关注于标识任务中关键步骤的顺序执行。现有的方法使用视频对的信号来实现这一目标,但这会使关键步骤发现困难,因为算法缺乏视频间视角。我们提出一种不supervised图像基本学习(GPL)框架。GPL包括一种新的 UnityGraph,它将所有任务视频表示为一个图来获取任务内和任务间上下文。然后,使用Node2Vec算法更新 UnityGraph 中的表示,以获取同样的关键步骤的相似嵌入。最后,使用 KMeans 聚类算法确定关键步骤。我们在 ProceL、CrossTask 和 EgoProceL 数据集上测试 GPL,并在第三人数据集上提高了2%,在 EgoProceL 数据集上提高了3.6% 以上。

InterVLS: Interactive Model Understanding and Improvement with Vision-Language Surrogates

  • paper_url: http://arxiv.org/abs/2311.03547
  • repo_url: None
  • paper_authors: Jinbin Huang, Wenbin He, Liang Gou, Liu Ren, Chris Bryan
  • for: 帮助用户更好地理解深度学习模型和改进它们的性能
  • methods: 使用文本对齐的概念发现和模型无关的直线函数来衡量概念的影响
  • results: 在用户研究中,InterVLS有效地帮助用户 identific 模型中最有影响的概念,获得性能概念和调整概念影响以改进模型
    Abstract Deep learning models are widely used in critical applications, highlighting the need for pre-deployment model understanding and improvement. Visual concept-based methods, while increasingly used for this purpose, face challenges: (1) most concepts lack interpretability, (2) existing methods require model knowledge, often unavailable at run time. Additionally, (3) there lacks a no-code method for post-understanding model improvement. Addressing these, we present InterVLS. The system facilitates model understanding by discovering text-aligned concepts, measuring their influence with model-agnostic linear surrogates. Employing visual analytics, InterVLS offers concept-based explanations and performance insights. It enables users to adjust concept influences to update a model, facilitating no-code model improvement. We evaluate InterVLS in a user study, illustrating its functionality with two scenarios. Results indicates that InterVLS is effective to help users identify influential concepts to a model, gain insights and adjust concept influence to improve the model. We conclude with a discussion based on our study results.
    摘要 深度学习模型在关键应用中广泛使用,高亮了预部署模型理解和改进的需求。基于视觉概念的方法在这个目的上增加使用,但面临以下挑战:(1)大多数概念无法解释,(2)现有方法往往需要模型知识,而运行时这些知识通常不可用,(3)无法使用无代码方法进行后续模型改进。为解决这些问题,我们提出了InterVLS。该系统通过发现与文本对齐的概念,使用模型无关的线性代理来衡量这些概念的影响。通过视觉分析,InterVLS提供了基于概念的解释和性能印象。它允许用户根据概念的影响来更新模型,从而实现无代码模型改进。我们在用户研究中评估了InterVLS,并通过两个场景 illustrate its 功能。结果表明,InterVLS能够帮助用户Identify模型中的重要概念,获得情况和更改概念的影响来改进模型。我们根据我们的研究结果进行了讨论。

PcLast: Discovering Plannable Continuous Latent States

  • paper_url: http://arxiv.org/abs/2311.03534
  • repo_url: None
  • paper_authors: Anurag Koul, Shivakanth Sujit, Shaoru Chen, Ben Evans, Lili Wu, Byron Xu, Rajan Chari, Riashat Islam, Raihan Seraj, Yonathan Efroni, Lekan Molu, Miro Dudik, John Langford, Alex Lamb
  • for: goal-conditioned planning
  • methods: multi-step inverse dynamics, latent representation, and associating reachable states together in $\ell_2$ space
  • results: significant improvements in sampling efficiency and yields layered state abstractions that enable computationally efficient hierarchical planning.Here’s the Chinese translation of the three points:
  • for: 目的conditioned планинг
  • methods: 多步反动力学学习、占据表示和在 $\ell_2$ 空间相关联可达状态
  • results: 提高采样效率,并生成Computational efficient的层次 планинг。
    Abstract Goal-conditioned planning benefits from learned low-dimensional representations of rich, high-dimensional observations. While compact latent representations, typically learned from variational autoencoders or inverse dynamics, enable goal-conditioned planning they ignore state affordances, thus hampering their sample-efficient planning capabilities. In this paper, we learn a representation that associates reachable states together for effective onward planning. We first learn a latent representation with multi-step inverse dynamics (to remove distracting information); and then transform this representation to associate reachable states together in $\ell_2$ space. Our proposals are rigorously tested in various simulation testbeds. Numerical results in reward-based and reward-free settings show significant improvements in sampling efficiency, and yields layered state abstractions that enable computationally efficient hierarchical planning.
    摘要 goal-conditioned 规划受惠于学习的低维 Observations 的归一化表示。 although compact latent representations, typically learned from variational autoencoders or inverse dynamics, enable goal-conditioned planning, they ignore state affordances, thus hampering their sample-efficient planning capabilities. In this paper, we learn a representation that associates reachable states together for effective onward planning. We first learn a latent representation with multi-step inverse dynamics (to remove distracting information); and then transform this representation to associate reachable states together in $\ell_2$ space. Our proposals are rigorously tested in various simulation testbeds. Numerical results in reward-based and reward-free settings show significant improvements in sampling efficiency, and yields layered state abstractions that enable computationally efficient hierarchical planning.Here's the breakdown of the translation:* goal-conditioned 规划 (goal-conditioned planning)* 学习 (learned)* 低维 Observations (low-dimensional observations)* 归一化表示 (latent representation)* compact latent representations (compact latent representations)* 通常来自 (typically learned)* variational autoencoders (variational autoencoders)* inverse dynamics (inverse dynamics)* 忽略 (ignore)* state affordances (state affordances)* 因此 (thus)* 阻碍 (hampering)* 效果 (effective)* onward planning (onward planning)* 学习 (learn)* 多步 inverse dynamics (multi-step inverse dynamics)* 去掉干扰信息 (to remove distracting information)* 转换 (transform)* 在 $\ell_2$ 空间 associate (associate)* reachable states together (reachable states together)* 数学结果 (numerical results)* 在 reward-based 和 reward-free 设定下 show (show)* 显著提高 (significant improvements)* 采样效率 (sampling efficiency)* 层次状态抽象 (layered state abstractions)* 计算效率 (computationally efficient)* 层次规划 (hierarchical planning)

Brain Networks and Intelligence: A Graph Neural Network Based Approach to Resting State fMRI Data

  • paper_url: http://arxiv.org/abs/2311.03520
  • repo_url: None
  • paper_authors: Bishal Thapaliya, Esra Akbas, Jiayu Chen, Raam Sapkota, Bhaskar Ray, Pranav Suresh, Vince Calhoun, Jingyu Liu
    for:This paper aims to develop a novel modeling architecture called BrainRGIN for predicting intelligence (fluid, crystallized, and total intelligence) using graph neural networks on resting-state functional magnetic resonance imaging (rsfMRI) derived static functional network connectivity matrices.methods:The proposed BrainRGIN architecture incorporates a clustering-based embedding and graph isomorphism network in the graph convolutional layer, TopK pooling, and attention-based readout functions to predict intelligence. The approach uses rsfMRI data to capture the functional organization of the brain without relying on specific tasks or stimuli.results:The proposed BrainRGIN model achieved lower mean squared errors and higher correlation scores than existing relevant graph architectures and other traditional machine learning models for all of the intelligence prediction tasks. The middle frontal gyrus was found to contribute significantly to both fluid and crystallized intelligence, while total composite scores identified a diverse set of brain regions as relevant, highlighting the complex nature of total intelligence.
    Abstract Resting-state functional magnetic resonance imaging (rsfMRI) is a powerful tool for investigating the relationship between brain function and cognitive processes as it allows for the functional organization of the brain to be captured without relying on a specific task or stimuli. In this paper, we present a novel modeling architecture called BrainRGIN for predicting intelligence (fluid, crystallized, and total intelligence) using graph neural networks on rsfMRI derived static functional network connectivity matrices. Extending from the existing graph convolution networks, our approach incorporates a clustering-based embedding and graph isomorphism network in the graph convolutional layer to reflect the nature of the brain sub-network organization and efficient network expression, in combination with TopK pooling and attention-based readout functions. We evaluated our proposed architecture on a large dataset, specifically the Adolescent Brain Cognitive Development Dataset, and demonstrated its effectiveness in predicting individual differences in intelligence. Our model achieved lower mean squared errors and higher correlation scores than existing relevant graph architectures and other traditional machine learning models for all of the intelligence prediction tasks. The middle frontal gyrus exhibited a significant contribution to both fluid and crystallized intelligence, suggesting their pivotal role in these cognitive processes. Total composite scores identified a diverse set of brain regions to be relevant which underscores the complex nature of total intelligence.
    摘要 RESTING-STATE FUNCTIONAL MAGNETIC RESONANCE IMAGING (RSFMRI) 是一种强大的工具,可以评估大脑功能和认知过程之间的关系,因为它可以在不基于特定任务或刺激的情况下捕捉大脑的功能组织结构。在这篇论文中,我们提出了一种新的模型建立方式,称为BrainRGIN,可以使用图 neural networks 预测智商(流动、晶化和总智商)。 extending from the existing graph convolution networks,我们的方法包括嵌入和图同构网络,以及 TopK pooling 和注意力基本函数。我们在大量数据集,即青春期大脑认知发展数据集上评估了我们的提议的建立方式,并证明了它在智商预测任务中的有效性。我们的模型比拥有相关的图建立方式和传统机器学习模型都具有更低的平均平方误差和更高的相关性分数。中顶前颞卷积区显示在流动和晶化智商中具有重要作用,这表明它们在这些认知过程中发挥着关键作用。总合分数表明大脑各区域具有不同的重要性,这反映了智商的复杂结构。

MFAAN: Unveiling Audio Deepfakes with a Multi-Feature Authenticity Network

  • paper_url: http://arxiv.org/abs/2311.03509
  • repo_url: None
  • paper_authors: Karthik Sivarama Krishnan, Koushik Sivarama Krishnan
  • for: 防止深伪音频干预信息传播
  • methods: 多元特征数据网络(MFAAN),融合多种音频表现,包括MFCC、LFCC和Chroma-STFT,实现多元认知音频内容,精确地识别伪造音频
  • results: 在两个 benchmark 数据集上,MFAAN 表现出色,实现准确率98.93%和94.47%,说明 MFAAN 的可靠性和应用价值
    Abstract In the contemporary digital age, the proliferation of deepfakes presents a formidable challenge to the sanctity of information dissemination. Audio deepfakes, in particular, can be deceptively realistic, posing significant risks in misinformation campaigns. To address this threat, we introduce the Multi-Feature Audio Authenticity Network (MFAAN), an advanced architecture tailored for the detection of fabricated audio content. MFAAN incorporates multiple parallel paths designed to harness the strengths of different audio representations, including Mel-frequency cepstral coefficients (MFCC), linear-frequency cepstral coefficients (LFCC), and Chroma Short Time Fourier Transform (Chroma-STFT). By synergistically fusing these features, MFAAN achieves a nuanced understanding of audio content, facilitating robust differentiation between genuine and manipulated recordings. Preliminary evaluations of MFAAN on two benchmark datasets, 'In-the-Wild' Audio Deepfake Data and The Fake-or-Real Dataset, demonstrate its superior performance, achieving accuracies of 98.93% and 94.47% respectively. Such results not only underscore the efficacy of MFAAN but also highlight its potential as a pivotal tool in the ongoing battle against deepfake audio content.
    摘要 现代数字时代,深度模仿技术的普及带来了信息传递的威胁。特别是音频深度模仿,可能造成误导性的虚假信息。为解决这一问题,我们介绍了多元特征音频真实性网络(MFAAN),这是一种专门为检测假造音频内容而设计的高级架构。MFAAN通过并行的多个路径,利用不同的音频表示方法,包括Mel-frequency cepstral coefficients(MFCC)、linear-frequency cepstral coefficients(LFCC)和Chroma Short Time Fourier Transform(Chroma-STFT)。通过这种综合融合,MFAAN实现了对音频内容的细致理解,从而实现了对假造和真实录音的分辨率。初步的评估结果表明,MFAAN在两个标准数据集上('In-the-Wild' Audio Deepfake Data和The Fake-or-Real Dataset)达到了98.93%和94.47%的准确率,这不仅证明了MFAAN的有效性,还 highlighted its potential作为对深度模仿音频内容的战斗工具。

Astrocytes as a mechanism for meta-plasticity and contextually-guided network function

  • paper_url: http://arxiv.org/abs/2311.03508
  • repo_url: None
  • paper_authors: Lulu Gong, Fabio Pasqualetti, Thomas Papouin, ShiNung Ching
  • for: 这种研究旨在探讨astrocyte在脑中的功能,以及它们如何与神经元和 synapse 交互以实现学习。
  • methods: 这个研究使用了形式分析和逻辑推理来描述astrocyte如何影响神经元和synapse的adaptation,以及如何在不同的时间尺度上实现学习。
  • results: 研究发现,在存在时间尺度分开的astrocyte干扰下,神经元和synapse可以更好地适应不同的任务参数,并且可以在多个随机变化的上下文中学习。这种方法比传统的神经网络和非网络算法更可靠。
    Abstract Astrocytes are a highly expressed and highly enigmatic cell-type in the mammalian brain. Traditionally viewed as a mediator of basic physiological sustenance, it is increasingly recognized that astrocytes may play a more direct role in neural computation. A conceptual challenge to this idea is the fact that astrocytic activity takes a very different form than that of neurons, and in particular, occurs at orders-of-magnitude slower time-scales. In the current paper, we engage how such time-scale separation may endow astrocytes with the capability to enable learning in context-dependent settings, where fluctuations in task parameters may occur much more slowly than within-task requirements. This idea is based on the recent supposition that astrocytes, owing to their sensitivity to a host of physiological covariates, may be particularly well poised to modulate the dynamics of neural circuits in functionally salient ways. We pose a general model of neural-synaptic-astrocyte interaction and use formal analysis to characterize how astrocytic modulation may constitute a form of meta-plasticity, altering the ways in which synapses and neurons adapt as a function of time. We then embed this model in a bandit-based reinforcement learning task environment, and show how the presence of time-scale separated astrocytic modulation enables learning over multiple fluctuating contexts. Indeed, these networks learn far more reliably versus dynamically homogenous networks and conventional non-network-based bandit algorithms. Our results indicate how the presence of neural-astrocyte interaction in the brain may benefit learning over different time-scale and the conveyance of task relevant contextual information onto circuit dynamics.
    摘要 astrocytes是大脑中高度表达和高度神秘的细胞类型。传统上视为神经元的调节剂,但现在越来越认为astrocytes可能直接参与神经计算。一个挑战是astrocyte活动的时间尺度与神经元活动完全不同,astrocyte活动更加慢,甚至是神经元活动的数个量级慢。在当前文章中,我们探讨了如何这种时间尺度差异可能使astrocytes具有学习能力。我们提出了神经元-synapse-astrocyte交互的概念模型,并使用正式分析来描述如何astrocyte干涉可能导致神经细胞和 synapse 的适应性改变。然后,我们将这个模型嵌入到了一个基于奖励学习的bandit任务环境中,并证明了在多个随机变化的上下文中,astrocyte干涉的存在可以使网络学习更加可靠。我们的结果表明,在脑中具有神经-astrocyte交互的存在可以提高学习的可靠性和将任务相关的上下文信息传递到神经细胞动力学中。

Environmental-Impact Based Multi-Agent Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2311.04240
  • repo_url: None
  • paper_authors: Farinaz Alamiyan-Harandi, Pouria Ramazi
  • for: 提高社会冲击协力和强化个体影响集体结果
  • methods: 环境影响多智能 reinforcement learning(EMuReL)方法,每个代理都估计每个其他代理在环境中的环境影响
  • results: 在清理(resp. 收割)环境测试环境中,使用EMuReL训练的代理协作更有效,获得$54%$ ($39%$)和$20%$ ($44%$)更多的总奖励,同时保持同等水平的合作水平。
    Abstract To promote cooperation and strengthen the individual impact on the collective outcome in social dilemmas, we propose the Environmental-impact Multi-Agent Reinforcement Learning (EMuReL) method where each agent estimates the "environmental impact" of every other agent, that is, the difference in the current environment state compared to the hypothetical environment in the absence of that other agent. Inspired by the Inequity Aversion model, the agent then compares its own reward with those of its fellows multiplied by their environmental impacts. If its reward exceeds the scaled reward of one of its fellows, the agent takes "social responsibility" toward that fellow by reducing its own reward. Therefore, the less influential an agent is in reaching the current state, the more social responsibility is taken by other agents. Experiments in the Cleanup (resp. Harvest) test environment demonstrate that agents trained based on EMuReL learn to cooperate more effectively and obtain $54\%$ ($39\%$) and $20\%$ ($44\%$) more total rewards while preserving the same cooperation levels compared to when they are trained based on the two state-of-the-art reward reshaping methods inequity aversion and social influence.
    摘要

Kindness in Multi-Agent Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2311.04239
  • repo_url: None
  • paper_authors: Farinaz Alamiyan-Harandi, Mersad Hassanjani, Pouria Ramazi
  • for: 本研究旨在帮助合作agent在多智能体强化学习中增强协作能力,通过基于人类行为概念的KindMARL方法。
  • methods: KindMARL方法基于对agent动作的反思,对环境影响的估计,并通过对每个奖励的比较来评估同伙的善意。
  • results: 实验结果表明,基于KindMARL方法培训的合作agent在Cleanup和Harvest环境中赢得了89%和37%的总奖励,比基于不平等恐惧和社会影响方法的培训更高。此外,KindMARL方法在交通灯控制问题中也得到了效果。
    Abstract In human societies, people often incorporate fairness in their decisions and treat reciprocally by being kind to those who act kindly. They evaluate the kindness of others' actions not only by monitoring the outcomes but also by considering the intentions. This behavioral concept can be adapted to train cooperative agents in Multi-Agent Reinforcement Learning (MARL). We propose the KindMARL method, where agents' intentions are measured by counterfactual reasoning over the environmental impact of the actions that were available to the agents. More specifically, the current environment state is compared with the estimation of the current environment state provided that the agent had chosen another action. The difference between each agent's reward, as the outcome of its action, with that of its fellow, multiplied by the intention of the fellow is then taken as the fellow's "kindness". If the result of each reward-comparison confirms the agent's superiority, it perceives the fellow's kindness and reduces its own reward. Experimental results in the Cleanup and Harvest environments show that training based on the KindMARL method enabled the agents to earn 89\% (resp. 37\%) and 44% (resp. 43\%) more total rewards than training based on the Inequity Aversion and Social Influence methods. The effectiveness of KindMARL is further supported by experiments in a traffic light control problem.
    摘要 在人类社会中,人们常常在做出决定时包含公平性,并且往往以 reciprocal 的方式对待那些行为 kindly。他们评估他人的善良行为不仅从结果来评估,还从计划的意图来评估。这种行为概念可以适应培养合作代理人在多代理人学习环境(MARL)中。我们提出了 KindMARL 方法,其中代理人的意图通过对环境状态的 counterfactual 推理来衡量。具体来说,当前环境状态与代理人选择的行动可能的环境状态进行比较,然后计算每个代理人的奖励,并且将奖励与其他代理人的奖励进行比较。如果每个代理人的奖励与其他代理人的奖励之比大于或等于一定的阈值,那么该代理人将认为对方有善良行为,并将其奖励减少。实验结果表明,基于 KindMARL 方法进行训练后,代理人能够获得 89% (resp. 37%) 和 44% (resp. 43%) 更多的总奖励,相比于基于不平等恐惧和社会影响方法进行训练。 KindMARL 的效果还得到了在交通灯控制问题上的实验支持。

Multi-Resolution Diffusion for Privacy-Sensitive Recommender Systems

  • paper_url: http://arxiv.org/abs/2311.03488
  • repo_url: None
  • paper_authors: Derek Lilienthal, Paul Mello, Magdalini Eirinaki, Stas Tiomkin
    for: 这个论文旨在提出一种基于噪声模型的推荐系统,以保护用户隐私和安全。methods: 该方法使用扩散模型来生成可信度高的假数据,以取代或补充原始数据。results: 该方法比基于生成对抗网络、变量自动编码器和最近提出的扩散模型表现出色,平均提高了4.30%的Recall@$n$和4.65%的NDCG@$n$。
    Abstract While recommender systems have become an integral component of the Web experience, their heavy reliance on user data raises privacy and security concerns. Substituting user data with synthetic data can address these concerns, but accurately replicating these real-world datasets has been a notoriously challenging problem. Recent advancements in generative AI have demonstrated the impressive capabilities of diffusion models in generating realistic data across various domains. In this work we introduce a Score-based Diffusion Recommendation Model (SDRM), which captures the intricate patterns of real-world datasets required for training highly accurate recommender systems. SDRM allows for the generation of synthetic data that can replace existing datasets to preserve user privacy, or augment existing datasets to address excessive data sparsity. Our method outperforms competing baselines such as generative adversarial networks, variational autoencoders, and recently proposed diffusion models in synthesizing various datasets to replace or augment the original data by an average improvement of 4.30% in Recall@$n$ and 4.65% in NDCG@$n$.
    摘要 “优化推荐系统的重要组成部分是推荐系统,但它们对用户数据的依赖带来隐私和安全问题。使用生成的数据来取代用户数据可以解决这些问题,但实际生成这些真实世界数据集的问题是极其困难的。现代生成AI技术已经展示了对于不同领域的数据生成的杰出能力。在这个研究中,我们提出了一个Score-based Diffusion Recommendation Model(SDRM),可以实现真实世界数据集中的复杂模式,并且可以用来取代或补充原始数据,以保持用户隐私和增强推荐系统的准确度。我们的方法在对不同数据集进行生成和补充时,较前者4.30%和4.65%的NDCG@$n$和Recall@$n$的平均提升。”

CLIP-Motion: Learning Reward Functions for Robotic Actions Using Consecutive Observations

  • paper_url: http://arxiv.org/abs/2311.03485
  • repo_url: None
  • paper_authors: Xuzhe Dang, Stefan Edelkamp, Nicolas Ribault
  • for: 本文提出了一种新的方法,用于通过CLIP模型学习机器人动作的奖励函数。传统的奖励函数设计经常靠manual feature engineering,可能难以泛化到多种任务。我们的方法跳过这个挑战,利用CLIP模型对状态特征和图像输入进行有效处理。
  • methods: 我们的模型使用了CLIP模型,将两个连续的观察对比,并且能够准确地确定执行的动作。
  • results: 我们通过实验评估,证明了我们的方法在机器人动作中 precisely 地推断动作和其批处增强了人工奖励学习的训练。
    Abstract This paper presents a novel method for learning reward functions for robotic motions by harnessing the power of a CLIP-based model. Traditional reward function design often hinges on manual feature engineering, which can struggle to generalize across an array of tasks. Our approach circumvents this challenge by capitalizing on CLIP's capability to process both state features and image inputs effectively. Given a pair of consecutive observations, our model excels in identifying the motion executed between them. We showcase results spanning various robotic activities, such as directing a gripper to a designated target and adjusting the position of a cube. Through experimental evaluations, we underline the proficiency of our method in precisely deducing motion and its promise to enhance reinforcement learning training in the realm of robotics.
    摘要 这篇论文提出了一种新的奖函数学习方法,通过使用基于 CLIP 的模型来实现。传统的奖函数设计常常靠于手动特征工程,这可能难以泛化到多种任务上。我们的方法则利用 CLIP 模型能够有效处理状态特征和图像输入,从而缺乏手动特征工程的限制。给定两个连续观察结果,我们的模型能够准确地识别执行的动作。我们在不同的 робо类活动中,如指定目标上的抓取器和立方体的位置调整等,展示了我们的方法的精准性和其在机器人学习训练中的潜在优势。

Multi Loss-based Feature Fusion and Top Two Voting Ensemble Decision Strategy for Facial Expression Recognition in the Wild

  • paper_url: http://arxiv.org/abs/2311.03478
  • repo_url: None
  • paper_authors: Guangyao Zhou, Yuanlun Xie, Wenhong Tian
  • for: 本研究旨在提高在野外情绪识别(FER)的性能,涉及到图像质量和计算机视觉领域。
  • methods: 本研究使用了内部特征结合和多个网络之间的特征结合,以及集成策略。特别是,提出了一个新的单模型named R18+FAML,以及一个集成模型named R18+FAML-FGA-T2V,以提高FER在野外的性能。
  • results: 经验表明,我们的单模型R18+FAML和集成模型R18+FAML-FGA-T2V在三个挑战性的FER数据集上达到了$\left( 90.32, 62.17, 65.83 \right)%$和$\left( 91.59, 63.27, 66.63 \right)%$的准确率,分别超过了当前最佳结果。
    Abstract Facial expression recognition (FER) in the wild is a challenging task affected by the image quality and has attracted broad interest in computer vision. There is no research using feature fusion and ensemble strategy for FER simultaneously. Different from previous studies, this paper applies both internal feature fusion for a single model and feature fusion among multiple networks, as well as the ensemble strategy. This paper proposes one novel single model named R18+FAML, as well as one ensemble model named R18+FAML-FGA-T2V to improve the performance of the FER in the wild. Based on the structure of ResNet18 (R18), R18+FAML combines internal Feature fusion and three Attention blocks using Multiple Loss functions (FAML) to improve the diversity of the feature extraction. To improve the performance of R18+FAML, we propose a Feature fusion among networks based on the Genetic Algorithm (FGA), which can fuse the convolution kernels for feature extraction of multiple networks. On the basis of R18+FAML and FGA, we propose one ensemble strategy, i.e., the Top Two Voting (T2V) to support the classification of FER, which can consider more classification information comprehensively. Combining the above strategies, R18+FAML-FGA-T2V can focus on the main expression-aware areas. Extensive experiments demonstrate that our single model R18+FAML and the ensemble model R18+FAML-FGA-T2V achieve the accuracies of $\left( 90.32, 62.17, 65.83 \right)\%$ and $\left( 91.59, 63.27, 66.63 \right)\%$ on three challenging unbalanced FER datasets RAF-DB, AffectNet-8 and AffectNet-7 respectively, both outperforming the state-of-the-art results.
    摘要 Facial expression recognition (FER) in the wild is a challenging task affected by image quality and has attracted broad interest in computer vision. There is no research using feature fusion and ensemble strategy for FER simultaneously. Different from previous studies, this paper applies both internal feature fusion for a single model and feature fusion among multiple networks, as well as the ensemble strategy. This paper proposes one novel single model named R18+FAML, as well as one ensemble model named R18+FAML-FGA-T2V to improve the performance of the FER in the wild. Based on the structure of ResNet18 (R18), R18+FAML combines internal Feature fusion and three Attention blocks using Multiple Loss functions (FAML) to improve the diversity of the feature extraction. To improve the performance of R18+FAML, we propose a Feature fusion among networks based on the Genetic Algorithm (FGA), which can fuse the convolution kernels for feature extraction of multiple networks. On the basis of R18+FAML and FGA, we propose one ensemble strategy, i.e., the Top Two Voting (T2V) to support the classification of FER, which can consider more classification information comprehensively. Combining the above strategies, R18+FAML-FGA-T2V can focus on the main expression-aware areas. Extensive experiments demonstrate that our single model R18+FAML and the ensemble model R18+FAML-FGA-T2V achieve the accuracies of $(90.32, 62.17, 65.83)\%$ and $(91.59, 63.27, 66.63)\%$ on three challenging unbalanced FER datasets RAF-DB, AffectNet-8 and AffectNet-7 respectively, both outperforming the state-of-the-art results.

FinA: Fairness of Adverse Effects in Decision-Making of Human-Cyber-Physical-System

  • paper_url: http://arxiv.org/abs/2311.03468
  • repo_url: None
  • paper_authors: Tianyu Zhao, Salma Elmalaki
  • for: This paper focuses on ensuring fairness in decision-making systems within Human-Cyber-Physical-Systems (HCPS), particularly in the context of diverse individuals with varying behaviors and expectations.
  • methods: The paper introduces the concept of Fairness-in-Adverse-Effects (FinA) and proposes a comprehensive set of five formulations to address the challenge of fairness, taking into account both instantaneous and long-term aspects of adverse effects.
  • results: The evaluation conducted within the domain of smart homes demonstrates that the adoption of FinA significantly enhances the overall perception of fairness among individuals, with an average improvement of 66.7% compared to the state-of-the-art method.
    Abstract Ensuring fairness in decision-making systems within Human-Cyber-Physical-Systems (HCPS) is a pressing concern, particularly when diverse individuals, each with varying behaviors and expectations, coexist within the same application space, influenced by a shared set of control actions in the system. The long-term adverse effects of these actions further pose the challenge, as historical experiences and interactions shape individual perceptions of fairness. This paper addresses the challenge of fairness from an equity perspective of adverse effects, taking into account the dynamic nature of human behavior and evolving preferences while recognizing the lasting impact of adverse effects. We formally introduce the concept of Fairness-in-Adverse-Effects (FinA) within the HCPS context. We put forth a comprehensive set of five formulations for FinA, encompassing both the instantaneous and long-term aspects of adverse effects. To empirically validate the effectiveness of our FinA approach, we conducted an evaluation within the domain of smart homes, a pertinent HCPS application. The outcomes of our evaluation demonstrate that the adoption of FinA significantly enhances the overall perception of fairness among individuals, yielding an average improvement of 66.7% when compared to the state-of-the-art method.
    摘要 ( Ensuring fairness in decision-making systems within Human-Cyber-Physical-Systems (HCPS) is a pressing concern, particularly when diverse individuals, each with varying behaviors and expectations, coexist within the same application space, influenced by a shared set of control actions in the system. The long-term adverse effects of these actions further pose the challenge, as historical experiences and interactions shape individual perceptions of fairness. This paper addresses the challenge of fairness from an equity perspective of adverse effects, taking into account the dynamic nature of human behavior and evolving preferences while recognizing the lasting impact of adverse effects. We formally introduce the concept of Fairness-in-Adverse-Effects (FinA) within the HCPS context. We put forth a comprehensive set of five formulations for FinA, encompassing both the instantaneous and long-term aspects of adverse effects. To empirically validate the effectiveness of our FinA approach, we conducted an evaluation within the domain of smart homes, a pertinent HCPS application. The outcomes of our evaluation demonstrate that the adoption of FinA significantly enhances the overall perception of fairness among individuals, yielding an average improvement of 66.7% when compared to the state-of-the-art method.)Here's the translation in Simplified Chinese:保持 Human-Cyber-Physical-Systems (HCPS) 中的决策系统 fairness 是一项急需解决的问题,特别是当多个不同的个体,每个人有不同的行为和期望,共同存在同一个应用空间中,受到共享的控制动作影响。长期的不良影响还提出了挑战,因为历史经验和互动对每个人的公正感产生影响。本文从Equity 的视角来Address 这个公正感 Challenge,考虑到人类行为的动态性和不断改变的偏好,同时认可长期的不良影响。我们在 HCPS 上正式引入 Fairness-in-Adverse-Effects (FinA) 概念,并提出了 five 种 FinA 形式,涵盖了 immediate 和长期的不良影响方面。为了证明我们 FinA 方法的有效性,我们在智能家居领域进行了评估。评估结果表明,通过 FinA 的采用,人们对公正感的总体评价得到了66.7%的平均提高,相比之下与当前方法的提高率。

Exploitation-Guided Exploration for Semantic Embodied Navigation

  • paper_url: http://arxiv.org/abs/2311.03357
  • repo_url: None
  • paper_authors: Justin Wasserman, Girish Chowdhary, Abhinav Gupta, Unnat Jain
  • for: 这个论文主要针对embodied navigation和sim-to-robot transfer的问题进行研究,探讨了一种可靠的方法来 sintactically combine these components。
  • methods: 作者提出了Exploitation-Guided Exploration(XGX)方法,其中有一个分配探索和利用的两个模块,当目标变得可见时,利用模块会取代探索模块,并继续驱动一个被 override 的政策优化。
  • results: XGX方法在Object Navigation任务上达到了70%的状态听报到的性能,比之前的最佳基准值提高了3%。此外,通过 Targeted analysis 表明,XGX方法在目标conditined exploration中更高效。最后,作者在硬件机器上进行了sim-to-real transfer,并证明XGX方法在实际场景中表现出两倍于最佳基准值的性能。
    Abstract In the recent progress in embodied navigation and sim-to-robot transfer, modular policies have emerged as a de facto framework. However, there is more to compositionality beyond the decomposition of the learning load into modular components. In this work, we investigate a principled way to syntactically combine these components. Particularly, we propose Exploitation-Guided Exploration (XGX) where separate modules for exploration and exploitation come together in a novel and intuitive manner. We configure the exploitation module to take over in the deterministic final steps of navigation i.e. when the goal becomes visible. Crucially, an exploitation module teacher-forces the exploration module and continues driving an overridden policy optimization. XGX, with effective decomposition and novel guidance, improves the state-of-the-art performance on the challenging object navigation task from 70% to 73%. Along with better accuracy, through targeted analysis, we show that XGX is also more efficient at goal-conditioned exploration. Finally, we show sim-to-real transfer to robot hardware and XGX performs over two-fold better than the best baseline from simulation benchmarking. Project page: xgxvisnav.github.io
    摘要 Recent progress in embodied navigation and sim-to-robot transfer has led to the emergence of modular policies as a de facto framework. However, there is more to compositionality than just decomposing the learning load into modular components. In this work, we investigate a principled way to syntactically combine these components. Specifically, we propose Exploitation-Guided Exploration (XGX), where separate modules for exploration and exploitation are combined in a novel and intuitive manner. We configure the exploitation module to take over in the deterministic final steps of navigation when the goal becomes visible, and the crucial aspect of this approach is that the exploitation module teacher-forces the exploration module and continues driving an overridden policy optimization. XGX, with effective decomposition and novel guidance, improves the state-of-the-art performance on the challenging object navigation task from 70% to 73%. In addition to better accuracy, we show through targeted analysis that XGX is also more efficient at goal-conditioned exploration. Furthermore, we demonstrate sim-to-real transfer to robot hardware and XGX performs over two-fold better than the best baseline from simulation benchmarking. For more information, please visit the project page at xgxvisnav.github.io.Note: Please note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.

SegGen: Supercharging Segmentation Models with Text2Mask and Mask2Img Synthesis

  • paper_url: http://arxiv.org/abs/2311.03355
  • repo_url: https://github.com/prismformore/seggen
  • paper_authors: Hanrong Ye, Jason Kuen, Qing Liu, Zhe Lin, Brian Price, Dan Xu
  • for: 提高图像分割模型的性能,尤其是在 semantic segmentation、panoptic segmentation 和 instance segmentation 领域。
  • methods: 使用 two 种数据生成策略:MaskSyn 和 ImgSyn,它们可以增加数据的多样性,以便更好地训练图像分割模型。
  • results: 在 ADE20K 和 COCO 测试集上,使用 SegGen 生成的数据可以大幅提高现有的图像分割模型的性能,包括 Mask2Former R50 和 Mask2Former Swin-L。特别是,ADE20K mIoU 中 Mask2Former R50 的性能从 47.2 提高到 49.9 (+2.7),而 Mask2Former Swin-L 的性能从 56.1 提高到 57.4 (+1.3)。这些出色的结果表明 SegGen 可以在有限的人工标注数据上提高图像分割模型的性能,同时也使得模型在未看到的领域上更加稳定。
    Abstract We propose SegGen, a highly-effective training data generation method for image segmentation, which pushes the performance limits of state-of-the-art segmentation models to a significant extent. SegGen designs and integrates two data generation strategies: MaskSyn and ImgSyn. (i) MaskSyn synthesizes new mask-image pairs via our proposed text-to-mask generation model and mask-to-image generation model, greatly improving the diversity in segmentation masks for model supervision; (ii) ImgSyn synthesizes new images based on existing masks using the mask-to-image generation model, strongly improving image diversity for model inputs. On the highly competitive ADE20K and COCO benchmarks, our data generation method markedly improves the performance of state-of-the-art segmentation models in semantic segmentation, panoptic segmentation, and instance segmentation. Notably, in terms of the ADE20K mIoU, Mask2Former R50 is largely boosted from 47.2 to 49.9 (+2.7); Mask2Former Swin-L is also significantly increased from 56.1 to 57.4 (+1.3). These promising results strongly suggest the effectiveness of our SegGen even when abundant human-annotated training data is utilized. Moreover, training with our synthetic data makes the segmentation models more robust towards unseen domains. Project website: https://seggenerator.github.io
    摘要 我们提出了SegGen,一种高效的训练数据生成方法,可以大幅提高现代分割模型的性能。SegGen通过两种数据生成策略:MaskSyn和ImgSyn。(一)MaskSyn通过我们提出的文本到mask生成模型和mask到图生成模型,可以增加分割掩码的多样性,为模型提供更丰富的指导。(二)ImgSyn通过现有掩码生成新的图像,可以强化图像的多样性,为模型输入提供更多的选择。在ADE20K和COCO评测标准上,我们的数据生成方法可以明显提高现代分割模型的semantic segmentation、panoptic segmentation和instance segmentation性能。特别是在ADE20K mIoU上,Mask2Former R50的性能从47.2提高到49.9(+2.7),Mask2Former Swin-L也从56.1提高到57.4(+1.3)。这些优秀的结果表明我们的SegGen在有 suficient human-annotated训练数据的情况下也能够取得显著的效果。此外,通过我们生成的 sintetic数据,分割模型可以更好地鲁ilde对未看过的领域。项目网站:https://seggenerator.github.io

GLaMM: Pixel Grounding Large Multimodal Model

  • paper_url: http://arxiv.org/abs/2311.03356
  • repo_url: None
  • paper_authors: Hanoona Rasheed, Muhammad Maaz, Sahal Shaji, Abdelrahman Shaker, Salman Khan, Hisham Cholakkal, Rao M. Anwer, Erix Xing, Ming-Hsuan Yang, Fahad S. Khan
  • for: 这 paper 的目的是提出一种基于视觉Domain的大型多modal模型(LMMs),可以生成与对应的语言响应。
  • methods: 这 paper 使用了一种新的模型 called Grounding LMM(GLaMM),它可以根据用户提供的文本和/或区域提示来生成对应的语言响应和物体分割emas。
  • results: GLaMM可以在多种下游任务上表现出色,包括图像和区域水平的Captioning、图像和区域水平的描述、和视觉语言对话。此外,GLaMM还可以在一些新的任务上表现出色,如 Referring Expression Segmentation 和 Grounded Conversation Generation。
    Abstract Large Multimodal Models (LMMs) extend Large Language Models to the vision domain. Initial efforts towards LMMs used holistic images and text prompts to generate ungrounded textual responses. Very recently, region-level LMMs have been used to generate visually grounded responses. However, they are limited to only referring a single object category at a time, require users to specify the regions in inputs, or cannot offer dense pixel-wise object grounding. In this work, we present Grounding LMM (GLaMM), the first model that can generate natural language responses seamlessly intertwined with corresponding object segmentation masks. GLaMM not only grounds objects appearing in the conversations but is flexible enough to accept both textual and optional visual prompts (region of interest) as input. This empowers users to interact with the model at various levels of granularity, both in textual and visual domains. Due to the lack of standard benchmarks for the novel setting of generating visually grounded detailed conversations, we introduce a comprehensive evaluation protocol with our curated grounded conversations. Our proposed Grounded Conversation Generation (GCG) task requires densely grounded concepts in natural scenes at a large-scale. To this end, we propose a densely annotated Grounding-anything Dataset (GranD) using our proposed automated annotation pipeline that encompasses 7.5M unique concepts grounded in a total of 810M regions available with segmentation masks. Besides GCG, GLaMM also performs effectively on several downstream tasks e.g., referring expression segmentation, image and region-level captioning and vision-language conversations. Project Page: https://mbzuai-oryx.github.io/groundingLMM.
    摘要 大型多modal模型(LMM)拓展了大型语言模型到视觉领域。初期尝试的LMM使用整体图像和文本提示生成不关联的文本响应。最近,区域级LMM已经用于生成视觉关联的响应,但它们只能同时参考一个物体类别,需要用户在输入中指定区域,或者无法提供密集像素级对象关根。在这项工作中,我们提出了固化LMM(GLaMM),第一个可以生成自然语言响应同时与相应的对象分割mask相匹配。GLaMM不仅可以在对话中固化出现的对象,还可以随意接受文本和可选的视觉提示(区域兴趣点)作为输入。这使得用户可以与模型在文本和视觉领域进行交互,并且可以在不同的级别进行交互。由于生成视觉关联的详细对话的标准benchmark尚未出现,我们提出了全面的评价协议,并针对我们精心准备的grounded conversations进行评价。我们的提议的Grounded Conversation Generation(GCG)任务需要在自然场景中densely grounded的概念,并在大规模上进行评价。为此,我们提出了高度注解的Grounding-anything Dataset(GranD),使用我们提出的自动注解管道,涵盖了7.5万个唯一的概念,在810万个区域中均有分割mask。除了GCG,GLaMM还在多个下游任务上表现出色,如图像和区域级captioning、视力语会话等。项目页面:https://mbzuai-oryx.github.io/groundingLMM。

Scalable and Transferable Black-Box Jailbreaks for Language Models via Persona Modulation

  • paper_url: http://arxiv.org/abs/2311.03348
  • repo_url: None
  • paper_authors: Rusheb Shah, Quentin Feuillade–Montixi, Soroush Pour, Arush Tagade, Stephen Casper, Javier Rando
  • for: This paper investigates the vulnerability of large language models to jailbreaking attacks using persona modulation, and demonstrates the ability to elicit harmful responses from the models.
  • methods: The paper uses a language model assistant to automate the generation of jailbreaks, and demonstrates the effectiveness of this approach in achieving harmful completions in GPT-4, Claude 2, and Vicuna.
  • results: The paper shows that persona modulation can achieve a harmful completion rate of 42.5% in GPT-4, which is 185 times larger than before modulation, and also demonstrates the transfer of these attacks to other models, such as Claude 2 and Vicuna.
    Abstract Despite efforts to align large language models to produce harmless responses, they are still vulnerable to jailbreak prompts that elicit unrestricted behaviour. In this work, we investigate persona modulation as a black-box jailbreaking method to steer a target model to take on personalities that are willing to comply with harmful instructions. Rather than manually crafting prompts for each persona, we automate the generation of jailbreaks using a language model assistant. We demonstrate a range of harmful completions made possible by persona modulation, including detailed instructions for synthesising methamphetamine, building a bomb, and laundering money. These automated attacks achieve a harmful completion rate of 42.5% in GPT-4, which is 185 times larger than before modulation (0.23%). These prompts also transfer to Claude 2 and Vicuna with harmful completion rates of 61.0% and 35.9%, respectively. Our work reveals yet another vulnerability in commercial large language models and highlights the need for more comprehensive safeguards.
    摘要 尽管努力对大语言模型进行安全配置,它们仍然易受到劫持提示的影响,导致发送危险指令的可能性增加。在这项工作中,我们研究人格调整作为黑盒子劫逃方法,以使目标模型采取愿意遵从危险指令的人格。而不是手动设计每个人格的提示,我们使用语言模型助手自动生成劫逃。我们示例了一系列由人格调整引起的危险结果,包括Synthesize毒品、制造炸弹和洗钱等详细指令。这些自动攻击的危险完成率为GPT-4的42.5%,比之前的0.23%高185倍。这些提示还传递到Claude 2和Vicuna,它们的危险完成率分别为61.0%和35.9%。我们的工作揭示了商业大语言模型又一个漏洞,并高亮了更加全面的安全措施的需要。

Multitask Kernel-based Learning with First-Order Logic Constraints

  • paper_url: http://arxiv.org/abs/2311.03340
  • repo_url: None
  • paper_authors: Michelangelo Diligenti, Marco Gori, Marco Maggini, Leonardo Rigutini
  • for: 这个论文旨在总结一个整合超级vised和无级ished例子以及背景知识的核心机器学习框架。
  • methods: 该论文使用多任务学习方案,其中多个预测函数定义在一个对象集中的特征空间上,并且可以是先知的或通过适当的核kernel-based学习器来 aproximate。
  • results: 该论文提出了一种将逻辑逻辑约束转换为连续实现的方法,并在多个例子中进行了实验,证明了该方法可以有效地解决semi-supervised学习问题。
    Abstract In this paper we propose a general framework to integrate supervised and unsupervised examples with background knowledge expressed by a collection of first-order logic clauses into kernel machines. In particular, we consider a multi-task learning scheme where multiple predicates defined on a set of objects are to be jointly learned from examples, enforcing a set of FOL constraints on the admissible configurations of their values. The predicates are defined on the feature spaces, in which the input objects are represented, and can be either known a priori or approximated by an appropriate kernel-based learner. A general approach is presented to convert the FOL clauses into a continuous implementation that can deal with the outputs computed by the kernel-based predicates. The learning problem is formulated as a semi-supervised task that requires the optimization in the primal of a loss function that combines a fitting loss measure on the supervised examples, a regularization term, and a penalty term that enforces the constraints on both the supervised and unsupervised examples. Unfortunately, the penalty term is not convex and it can hinder the optimization process. However, it is possible to avoid poor solutions by using a two stage learning schema, in which the supervised examples are learned first and then the constraints are enforced.
    摘要 在这篇论文中,我们提出了一个通用框架,用于将经过监督和无监督示例以及背景知识表示为一组第一频逻辑条件集成到核心机器中。具体来说,我们考虑了一种多任务学习方案,在其中多个定义在对象集中的预测符被同时学习从示例中,并且对预测符的值进行约束。这些预测符是定义在特征空间中,并且可以是知道的或者通过适当的核心学习器来 aproximated。我们提出了一种通用的方法,将逻辑条件集转换成可以处理核心预测符输出的连续实现方式。学习问题被定义为一种半监督学习任务,需要在超参中优化一个损失函数,该损失函数结合监督示例上的适应损失度量、正则化项和约束项。却可能是非 convex 的罚 penalty 项,这可能会阻碍优化过程。但是,我们可以通过一种两阶段学习策略来避免 poor solution,在其中首先学习监督示例,然后强制执行约束。

ProPath: Disease-Specific Protein Language Model for Variant Pathogenicity

  • paper_url: http://arxiv.org/abs/2311.03429
  • repo_url: None
  • paper_authors: Huixin Zhan, Zijun Zhang
  • for: 预测疾病相关的遗传学变革是现代遗传学中的一个核心挑战。
  • methods: 我们提出了一个疾病特有的蛋白语言模型,即ProPath,以捕捉罕见变异中的pseudo-log-likelihood比率。
  • results: 我们的结果显示,ProPath比预先训练的ESM1b提高了超过5%的AUC,并在两个数据集中均达到了最高表现。
    Abstract Clinical variant classification of pathogenic versus benign genetic variants remains a pivotal challenge in clinical genetics. Recently, the proposition of protein language models has improved the generic variant effect prediction (VEP) accuracy via weakly-supervised or unsupervised training. However, these VEPs are not disease-specific, limiting their adaptation at point-of-care. To address this problem, we propose a disease-specific \textsc{pro}tein language model for variant \textsc{path}ogenicity, termed ProPath, to capture the pseudo-log-likelihood ratio in rare missense variants through a siamese network. We evaluate the performance of ProPath against pre-trained language models, using clinical variant sets in inherited cardiomyopathies and arrhythmias that were not seen during training. Our results demonstrate that ProPath surpasses the pre-trained ESM1b with an over $5\%$ improvement in AUC across both datasets. Furthermore, our model achieved the highest performances across all baselines for both datasets. Thus, our ProPath offers a potent disease-specific variant effect prediction, particularly valuable for disease associations and clinical applicability.
    摘要 临床变体分类严重病原 versus benign 遗传变异仍然是临床遗传学的核心挑战。最近,蛋白语言模型的提议有助于无监督或弱监督训练下variant effet prediction(VEP)的准确性。然而,这些VEP不是疾病特定,限制其在临床应用中的适应性。为解决这个问题,我们提议一种疾病特定的蛋白语言模型,称为ProPath,以捕捉 Pseudo-log-likelihood ratio 在罕见 missense 变异中。我们通过对临床变体集和遗传性心脏病和心脏病例进行评估,发现ProPath 的性能超过了预训练的 ESM1b,在两个数据集上提高了超过 5% 的 AUC。此外,我们的模型在所有基线之上表现最高,尤其在疾病相关性和临床实用性方面。因此,我们的 ProPath 提供了一种有力的疾病特定变异效应预测,对于疾病相关性和临床应用非常有价值。

FLOGA: A machine learning ready dataset, a benchmark and a novel deep learning model for burnt area mapping with Sentinel-2

  • paper_url: http://arxiv.org/abs/2311.03339
  • repo_url: None
  • paper_authors: Maria Sdraka, Alkinoos Dimakos, Alexandros Malounis, Zisoula Ntasiou, Konstantinos Karantzalos, Dimitrios Michail, Ioannis Papoutsis
    for:This paper aims to provide an accurate and robust method for automatically extracting burnt areas from satellite imagery after wildfires.methods:The authors use a machine-learning ready dataset called FLOGA, which includes satellite imagery with different spatial and spectral resolutions, and ground truth annotations from domain experts. They compare the performance of multiple machine learning and deep learning algorithms for change detection, and propose a novel deep learning model called BAM-CD.results:The proposed BAM-CD model outperforms all other methods in terms of accuracy and robustness, providing an effective way to automatically extract burnt areas from satellite imagery.
    Abstract Over the last decade there has been an increasing frequency and intensity of wildfires across the globe, posing significant threats to human and animal lives, ecosystems, and socio-economic stability. Therefore urgent action is required to mitigate their devastating impact and safeguard Earth's natural resources. Robust Machine Learning methods combined with the abundance of high-resolution satellite imagery can provide accurate and timely mappings of the affected area in order to assess the scale of the event, identify the impacted assets and prioritize and allocate resources effectively for the proper restoration of the damaged region. In this work, we create and introduce a machine-learning ready dataset we name FLOGA (Forest wiLdfire Observations for the Greek Area). This dataset is unique as it comprises of satellite imagery acquired before and after a wildfire event, it contains information from Sentinel-2 and MODIS modalities with variable spatial and spectral resolution, and contains a large number of events where the corresponding burnt area ground truth has been annotated by domain experts. FLOGA covers the wider region of Greece, which is characterized by a Mediterranean landscape and climatic conditions. We use FLOGA to provide a thorough comparison of multiple Machine Learning and Deep Learning algorithms for the automatic extraction of burnt areas, approached as a change detection task. We also compare the results to those obtained using standard specialized spectral indices for burnt area mapping. Finally, we propose a novel Deep Learning model, namely BAM-CD. Our benchmark results demonstrate the efficacy of the proposed technique in the automatic extraction of burnt areas, outperforming all other methods in terms of accuracy and robustness. Our dataset and code are publicly available at: https://github.com/Orion-AI-Lab/FLOGA.
    摘要 过去一个 décennie 中,全球受到了越来越频繁和严重的野火威胁,对人类和动物生命、生态系统和社会经济稳定性构成了严重的威胁。因此,我们需要就野火的影响作出迫切的行动,以保护地球的自然资源。 robust machine learning 技术,结合高分辨率卫星图像的丰富存在,可以为评估事件规模、确定受影响资产和有效分配资源而提供准确和时间相关的地图。在这项工作中,我们创建了一个名为 FLOGA(希腊地区森林野火观察数据集)的机器学习准备数据集。FLOGA 数据集独特之处在于,它包含了在野火事件前后由卫星图像提供的信息,其中包括 Sentinel-2 和 MODIS Modalities 的变量空间和spectral 分辨率信息,同时包含大量已由领域专家标注的烧毁区域地面 truth。FLOGA 覆盖希腊更广泛的地区,这个地区具有地中海气候和地貌特点。我们使用 FLOGA 进行了多种机器学习和深度学习算法的自动烧毁区域抽取比较,并与基于特殊 spectral 指数的烧毁区域映射方法进行比较。最后,我们提出了一种新的深度学习模型,即 BAM-CD。我们的 refer 结果表明,提议的方法在自动烧毁区域抽取方面具有高度的准确性和稳定性。我们的数据集和代码在 GitHub 上公开提供:https://github.com/Orion-AI-Lab/FLOGA。

DAIL: Data Augmentation for In-Context Learning via Self-Paraphrase

  • paper_url: http://arxiv.org/abs/2311.03319
  • repo_url: None
  • paper_authors: Dawei Li, Yaxuan Li, Dheeraj Mekala, Shuyao Li, Yulin wang, Xueqi Wang, William Hogan, Jingbo Shang
  • for: 实现低资源下的内容学习(In-Context Learning,ICL),获得更好的结果。
  • methods: 使用自己生成的内容来增强大语言模型的熟悉度,然后通过多数决来决定最终结果。
  • results: 与标准ICL方法和其他ensemble-based方法相比,DAIL在低资源情况下表现更好,并且可以提供更高的信任度。
    Abstract In-Context Learning (ICL) combined with pre-trained large language models has achieved promising results on various NLP tasks. However, ICL requires high-quality annotated demonstrations which might not be available in real-world scenarios. To overcome this limitation, we propose \textbf{D}ata \textbf{A}ugmentation for \textbf{I}n-Context \textbf{L}earning (\textbf{DAIL}). DAIL leverages the intuition that large language models are more familiar with the content generated by themselves. It first utilizes the language model to generate paraphrases of the test sample and employs majority voting to determine the final result based on individual predictions. Our extensive empirical evaluation shows that DAIL outperforms the standard ICL method and other ensemble-based methods in the low-resource scenario. Additionally, we explore the use of voting consistency as a confidence score of the model when the logits of predictions are inaccessible. We believe our work will stimulate further research on ICL in low-resource settings.
    摘要 受欢迎的自然语言处理任务(NLP)中,在线上学习(ICL)与预训练大型自然语言模型(LMM)的结合已经实现了一定的成功。然而,ICL需要高质量的注释演示,这可能在实际场景中不 disponibles。为了解决这个限制,我们提出了\textbf{数据增强 для在线学习(DAIL)}. DAIL利用了LMM生成的内容的直觉,首先使用LMM生成测试样本的重写,然后通过多数投票确定基于个体预测的最终结果。我们的广泛的实验证明,DAIL在低资源场景下表现比标准的ICL方法和其他ensemble方法更好。此外,我们还探讨了使用投票一致性作为模型的信任度,当预测的logs不可访问时。我们认为我们的工作将鼓励更多关于ICL在低资源设置下的研究。

Neural Structure Learning with Stochastic Differential Equations

  • paper_url: http://arxiv.org/abs/2311.03309
  • repo_url: None
  • paper_authors: Benjie Wang, Joel Jennings, Wenbo Gong
  • for: 这项研究旨在探讨如何从时间观察数据中找到变量之间的底层关系,以及如何使用连续时间的概率过程来描述这些系统的动态。
  • methods: 这项研究使用了神经网络随机分布函数(SDE)和可变推理来推导 posterior 分布 над 可能的结构。
  • results: 研究表明,使用 SCOTCH 方法可以在不同的时间间隔下 Learning 更好的结构,并且在实际数据上比基eline方法有更高的性能。
    Abstract Discovering the underlying relationships among variables from temporal observations has been a longstanding challenge in numerous scientific disciplines, including biology, finance, and climate science. The dynamics of such systems are often best described using continuous-time stochastic processes. Unfortunately, most existing structure learning approaches assume that the underlying process evolves in discrete-time and/or observations occur at regular time intervals. These mismatched assumptions can often lead to incorrect learned structures and models. In this work, we introduce a novel structure learning method, SCOTCH, which combines neural stochastic differential equations (SDE) with variational inference to infer a posterior distribution over possible structures. This continuous-time approach can naturally handle both learning from and predicting observations at arbitrary time points. Theoretically, we establish sufficient conditions for an SDE and SCOTCH to be structurally identifiable, and prove its consistency under infinite data limits. Empirically, we demonstrate that our approach leads to improved structure learning performance on both synthetic and real-world datasets compared to relevant baselines under regular and irregular sampling intervals.
    摘要 描述变量之间的下面关系是科学领域中长期存在的挑战,包括生物、金融和气候科学。这些系统的动态通常使用连续时间杂事件来描述。然而,大多数现有结构学习方法假设下面过程发展在离散时间和/或观测点发生在固定时间间隔。这些不一致的假设可能会导致错误地学习结构和模型。在这种工作中,我们介绍了一种新的结构学习方法,即SCOTCH,它将神经生成器泛化准则与Variational推断结合以推理 posterior 分布中可能的结构。这种连续时间方法可以自然地处理从和预测观测点的任意时间点学习和预测。从理论角度来看,我们设置了SDE和SCOTCH的可结构可识别条件,并证明在无穷数据极限下,SCOTCH是一个一致的方法。实际上,我们在 sintetic 和实际数据上比较SCOTCH和相关基线方法的结构学习性能,并证明SCOTCH在固定和不固定时间间隔下的观测点上具有更高的结构学习性能。

Learning Reusable Manipulation Strategies

  • paper_url: http://arxiv.org/abs/2311.03293
  • repo_url: None
  • paper_authors: Jiayuan Mao, Joshua B. Tenenbaum, Tomás Lozano-Pérez, Leslie Pack Kaelbling
    for: 这篇论文的目的是帮助机器学习人类 manipulate “trick” 的能力,包括通过单一示例学习和自动游戏来获得 manipulate 技能。methods: 这篇论文使用的方法包括将每个示例解释为机器人对物体和物体之间的接触模式的序列,从而学习细致的抽象器和聚合器。results: 这篇论文的结果表明,通过这种方法,机器人可以通过单一示例学习和自动游戏来获得 manipulate 技能,并且可以在不同的情况下 flexibly 应用这些技能。
    Abstract Humans demonstrate an impressive ability to acquire and generalize manipulation "tricks." Even from a single demonstration, such as using soup ladles to reach for distant objects, we can apply this skill to new scenarios involving different object positions, sizes, and categories (e.g., forks and hammers). Additionally, we can flexibly combine various skills to devise long-term plans. In this paper, we present a framework that enables machines to acquire such manipulation skills, referred to as "mechanisms," through a single demonstration and self-play. Our key insight lies in interpreting each demonstration as a sequence of changes in robot-object and object-object contact modes, which provides a scaffold for learning detailed samplers for continuous parameters. These learned mechanisms and samplers can be seamlessly integrated into standard task and motion planners, enabling their compositional use.
    摘要 人类具有吸引人的手备技巧学习能力,可以从单一示例中即使用汤匙来达到远距离对象的技巧。我们可以将这种技巧应用到新的场景中,包括不同的对象位置、大小和类别(如锹和锤)。此外,我们还可以灵活地组合不同的技巧来制定长期计划。在这篇论文中,我们提出了一种机器人学习机制的框架,通过单一示例和自动游戏来学习。我们的关键发现在于将每个示例解释为机器人对象和对象之间的接触模式序列,这提供了一个学习细节 sampler 的框架。这些学习的机制和 sampler 可以轻松地与标准任务和运动规划器集成,以便它们的组合使用。

GQKVA: Efficient Pre-training of Transformers by Grouping Queries, Keys, and Values

  • paper_url: http://arxiv.org/abs/2311.03426
  • repo_url: None
  • paper_authors: Farnoosh Javadi, Walid Ahmed, Habib Hajimolahoseini, Foozhan Ataiefard, Mohammad Hassanpour, Saina Asani, Austin Wen, Omar Mohamed Awad, Kangling Liu, Yang Liu
  • for: 这篇论文是为了解决大型transformer模型面临的问题,包括过度参数化和耗时 computationally expensive pre-training。
  • methods: 这篇论文提出了一种通用的方法 called GQKVA,它利用query, key,和value grouping技术来优化transformer pre-training,以提高模型的速度和小型化。
  • results: 我们的实验表明,GQKVA可以在不同的variant中取得明显的交换,即在维持模型性能的情况下,降低模型的大小。我们的实验还显示,传统的多头注意方法不一定是最佳选择,因为有更轻量级和更快速的选择可以。我们在ViT上进行了试验,获得了约0.3%的增加精度,同时降低了模型的大小约4%。此外,我们最具攻击性的模型缩小实验中,模型的大小缩小约15%,但只有约1%的精度下降。
    Abstract Massive transformer-based models face several challenges, including slow and computationally intensive pre-training and over-parametrization. This paper addresses these challenges by proposing a versatile method called GQKVA, which generalizes query, key, and value grouping techniques. GQKVA is designed to speed up transformer pre-training while reducing the model size. Our experiments with various GQKVA variants highlight a clear trade-off between performance and model size, allowing for customized choices based on resource and time limitations. Our findings also indicate that the conventional multi-head attention approach is not always the best choice, as there are lighter and faster alternatives available. We tested our method on ViT, which achieved an approximate 0.3% increase in accuracy while reducing the model size by about 4% in the task of image classification. Additionally, our most aggressive model reduction experiment resulted in a reduction of approximately 15% in model size, with only around a 1% drop in accuracy.
    摘要 巨大的变换器模型面临多个挑战,包括耗时和计算 интенсив的预训练和过参数化。这篇论文提出了一种通用的方法called GQKVA,该方法旨在加速变换器预训练,同时减少模型的大小。我们的实验表明,GQKVA方法在不同的variant中存在明显的性能和模型大小之间的负反相关,可以根据资源和时间限制进行定制选择。我们的发现还表明,传统的多头注意方法并不总是最佳选择,因为有轻量级和快速的代替方案可用。我们在ViT上测试了我们的方法,实现了图像分类任务中约0.3%的提升精度,同时减少了模型的大小约4%。此外,我们最具攻击性的模型减少实验结果表明,可以减少约15%的模型大小,只有约1%的精度下降。

S-LoRA: Serving Thousands of Concurrent LoRA Adapters

  • paper_url: http://arxiv.org/abs/2311.03285
  • repo_url: https://github.com/s-lora/s-lora
  • paper_authors: Ying Sheng, Shiyi Cao, Dacheng Li, Coleman Hooper, Nicholas Lee, Shuo Yang, Christopher Chou, Banghua Zhu, Lianmin Zheng, Kurt Keutzer, Joseph E. Gonzalez, Ion Stoica
  • For: The paper is written for the deployment of large language models using the “pretrain-then-finetune” paradigm, specifically focusing on the scalable serving of many LoRA adapters.* Methods: The paper proposes a system called S-LoRA, which stores all adapters in the main memory and fetches the adapters used by the currently running queries to the GPU memory. Unified Paging is used to manage dynamic adapter weights and KV cache tensors, and tensor parallelism and highly optimized custom CUDA kernels are employed for heterogeneous batching of LoRA computation.* Results: Compared to state-of-the-art libraries such as HuggingFace PEFT and vLLM, S-LoRA can improve the throughput by up to 4 times and increase the number of served adapters by several orders of magnitude, enabling scalable serving of many task-specific fine-tuned models and offering the potential for large-scale customized fine-tuning services.Here’s the simplified Chinese text for the three key points:* For: 这篇论文是为大语言模型的部署而写的,具体来说是关于大量LoRA适应器的批处理服务。* Methods: 这篇论文提出了一个名为S-LoRA的系统,它将所有适应器存储在主内存中,并在运行中查询使用的适应器被提取到GPU内存中。 Unified Paging技术用于管理动态适应器权重和KV缓存tensor,并使用tensor并行和高度优化的Custom CUDA核心来实现不同批处理LoRA计算。* Results: 相比之前的状态艺术库如HuggingFace PEFT和vLLM,S-LoRA可以提高 durchput 到最多4倍,并可以同时处理数量级别的适应器,这使得S-LoRA可以实现大规模的定制化 fine-tuning 服务,并且提供了大规模定制化 fine-tuning 服务的潜在性。
    Abstract The "pretrain-then-finetune" paradigm is commonly adopted in the deployment of large language models. Low-Rank Adaptation (LoRA), a parameter-efficient fine-tuning method, is often employed to adapt a base model to a multitude of tasks, resulting in a substantial collection of LoRA adapters derived from one base model. We observe that this paradigm presents significant opportunities for batched inference during serving. To capitalize on these opportunities, we present S-LoRA, a system designed for the scalable serving of many LoRA adapters. S-LoRA stores all adapters in the main memory and fetches the adapters used by the currently running queries to the GPU memory. To efficiently use the GPU memory and reduce fragmentation, S-LoRA proposes Unified Paging. Unified Paging uses a unified memory pool to manage dynamic adapter weights with different ranks and KV cache tensors with varying sequence lengths. Additionally, S-LoRA employs a novel tensor parallelism strategy and highly optimized custom CUDA kernels for heterogeneous batching of LoRA computation. Collectively, these features enable S-LoRA to serve thousands of LoRA adapters on a single GPU or across multiple GPUs with a small overhead. Compared to state-of-the-art libraries such as HuggingFace PEFT and vLLM (with naive support of LoRA serving), S-LoRA can improve the throughput by up to 4 times and increase the number of served adapters by several orders of magnitude. As a result, S-LoRA enables scalable serving of many task-specific fine-tuned models and offers the potential for large-scale customized fine-tuning services. The code is available at https://github.com/S-LoRA/S-LoRA
    摘要 “pretrain-then-finetune”模式广泛应用于语言模型的部署。低级别适应(LoRA),一种精炼 parameter 的 fine-tuning 方法,经常用于适应多种任务,从而生成了一大量的 LoRA 适应器。我们发现,这种模式具有批处理(batched inference)的广阔机会。为了利用这些机会,我们提出了 S-LoRA,一个用于批处理多个 LoRA 适应器的系统。S-LoRA 将所有适应器存储在主内存中,并在 GPU 内存中fetch 当前运行的查询所使用的适应器。为了高效使用 GPU 内存并避免分配不一致,S-LoRA 提出了统一分页(Unified Paging)技术。此外,S-LoRA 还使用了一种新的tensor并行执行策略和特化的 CUDA 加速器,以实现hetERogeneous批处理(heterogeneous batching)。这些特点使得 S-LoRA 可以在单个 GPU 或多个 GPU 上服务 thousands 个 LoRA 适应器,占用较小的负荷。相比之下,与 HuggingFace PEFT 和 vLLM(带有简单的 LoRA 服务支持)相比,S-LoRA 可以提高 durchput 高达 4 倍,并增加服务的适应器数目几个数量级。因此,S-LoRA 允许批处理多个任务特定的 fine-tuning 模型,并提供了大规模自定义 fine-tuning 服务。代码可以在 上找到。

An AI-Guided Data Centric Strategy to Detect and Mitigate Biases in Healthcare Datasets

  • paper_url: http://arxiv.org/abs/2311.03425
  • repo_url: None
  • paper_authors: Faris F. Gulamali, Ashwin S. Sawant, Lora Liharska, Carol R. Horowitz, Lili Chan, Patricia H. Kovatch, Ira Hofer, Karandeep Singh, Lynne D. Richardson, Emmanuel Mensah, Alexander W Charney, David L. Reich, Jianying Hu, Girish N. Nadkarni
  • for: 这篇论文旨在探讨健康域中诊断和预测算法的使用可能会对受欢迎群体产生偏见,以及如何使用深度学习方法来探测和改善这种偏见。
  • methods: 本论文提出了一种数据中心、模型无关、任务无关的方法来评估数据集偏见,通过分析不同群体在小样本大小下学习的关系(AEquity)来识别和修正健康域数据集中的种族偏见。
  • results: 研究发现,通过应用AEquity指标,可以在健康域数据集中识别和修正种族偏见,并在抑制偏见方面取得了显著的成果。
    Abstract The adoption of diagnosis and prognostic algorithms in healthcare has led to concerns about the perpetuation of bias against disadvantaged groups of individuals. Deep learning methods to detect and mitigate bias have revolved around modifying models, optimization strategies, and threshold calibration with varying levels of success. Here, we generate a data-centric, model-agnostic, task-agnostic approach to evaluate dataset bias by investigating the relationship between how easily different groups are learned at small sample sizes (AEquity). We then apply a systematic analysis of AEq values across subpopulations to identify and mitigate manifestations of racial bias in two known cases in healthcare - Chest X-rays diagnosis with deep convolutional neural networks and healthcare utilization prediction with multivariate logistic regression. AEq is a novel and broadly applicable metric that can be applied to advance equity by diagnosing and remediating bias in healthcare datasets.
    摘要 随着医疗健康预测和诊断算法的推广,对受护理难度群体的偏见问题产生了关注。深度学习方法来检测和缓解偏见的发展具有不同水平的成功。在这篇文章中,我们提出了一种数据中心、模型无关、任务无关的方法来评估数据集偏见(AEquity)。我们首先研究了不同群体在小样本大小下学习的关系,然后应用系统性分析AEquity值在不同人口 subgroup中的异常情况,以确定和缓解医疗数据集中的种族偏见。AEquity是一种新的和通用的指标,可以在医疗领域应用来提高公平性,诊断和缓解数据集偏见。

Using Symmetries to Lift Satisfiability Checking

  • paper_url: http://arxiv.org/abs/2311.03424
  • repo_url: None
  • paper_authors: Pierre Carbonnelle, Gottfried Schenner, Maurice Bruynooghe, Bart Bogaerts, Marc Denecker
  • for: 这种方法可以用来压缩结构(也称为解释)到一个更小的领域中,而不会产生信息损失。
  • methods: 该方法包括将句子自动翻译成一个等价可满足的句子,并将这个句子在扩展 vocabulary 上进行满足性检查。
  • results: 实验表明,这种方法可以在生成配置问题中获得大量的加速,并且还有应用于软件验证复杂数据结构的场景。
    Abstract We analyze how symmetries can be used to compress structures (also known as interpretations) onto a smaller domain without loss of information. This analysis suggests the possibility to solve satisfiability problems in the compressed domain for better performance. Thus, we propose a 2-step novel method: (i) the sentence to be satisfied is automatically translated into an equisatisfiable sentence over a ``lifted'' vocabulary that allows domain compression; (ii) satisfiability of the lifted sentence is checked by growing the (initially unknown) compressed domain until a satisfying structure is found. The key issue is to ensure that this satisfying structure can always be expanded into an uncompressed structure that satisfies the original sentence to be satisfied. We present an adequate translation for sentences in typed first-order logic extended with aggregates. Our experimental evaluation shows large speedups for generative configuration problems. The method also has applications in the verification of software operating on complex data structures. Further refinements of the translation are left for future work.
    摘要 我们分析如何使用对称性来压缩结构(也称为解释)到一个更小的领域 без损失信息。这种分析表明可以在压缩领域中解决满足性问题以获得更好的性能。因此,我们提出了一种新的两步方法:(i)要满足的句子自动被翻译成一个等价满足句子,使用一个扩展了词汇的“升级” vocabulary,以便压缩领域。(ii)检查升级后的句子是否满足,通过在初始不知道的压缩领域中增长 until 找到一个满足结构。关键在于确保这个满足结构可以扩展到一个不压缩的结构,以满足原始要满足的句子。我们对类型化first-order logic中的句子进行了适当的翻译。我们的实验评估表明,这种方法在生成配置问题上可以获得大量的速度提升。此方法还有软件验证复杂数据结构的应用。未来的工作将更进一步地完善翻译。

From Coupled Oscillators to Graph Neural Networks: Reducing Over-smoothing via a Kuramoto Model-based Approach

  • paper_url: http://arxiv.org/abs/2311.03260
  • repo_url: None
  • paper_authors: Tuan Nguyen, Tan M. Nguyen, Hirotada Honda, Takashi Sano, Vinh Nguyen, Shugo Nakamura
  • for: 针对 Graph Neural Network (GNN) 中的过敏问题,提出了 Kuramoto Graph Neural Network (KuramotoGNN),一种新的连续深度 GNN 类型。
  • methods: KuramotoGNN 使用 Kuramoto 模型来 Mitigate 过敏问题,Kuramoto 模型捕捉了非线性共振 oscilators 的同步行为。
  • results: 对多种图深度学习benchmark任务进行实验,表明 KuramotoGNN 可以减少过敏问题,而且与基eline GNN 和现有方法相比,具有更好的性能。
    Abstract We propose the Kuramoto Graph Neural Network (KuramotoGNN), a novel class of continuous-depth graph neural networks (GNNs) that employs the Kuramoto model to mitigate the over-smoothing phenomenon, in which node features in GNNs become indistinguishable as the number of layers increases. The Kuramoto model captures the synchronization behavior of non-linear coupled oscillators. Under the view of coupled oscillators, we first show the connection between Kuramoto model and basic GNN and then over-smoothing phenomenon in GNNs can be interpreted as phase synchronization in Kuramoto model. The KuramotoGNN replaces this phase synchronization with frequency synchronization to prevent the node features from converging into each other while allowing the system to reach a stable synchronized state. We experimentally verify the advantages of the KuramotoGNN over the baseline GNNs and existing methods in reducing over-smoothing on various graph deep learning benchmark tasks.
    摘要 我们提出了库拉莫托图 neural network(KuramotoGNN),一种新的连续深度图 neural network(GNN),它使用库拉莫托模型来 Mitigate the over-smoothing phenomenon, 在 graph neural networks 中,节点特征会在层数增加时变得无法分辨。库拉莫托模型捕捉了非线性 coupled oscillators 的同步行为。从 coupled oscillators 的视角来看,我们首先表明了库拉莫托模型和基本 GNN 之间的连接,然后我们解释了 GNN 中的过滤现象可以被看作库拉莫托模型中的相同频率同步。库拉莫托GNN 将这种相同频率同步替换为频率同步,以防止节点特征 converges 到每个节点特征,同时允许系统达到一个稳定的同步状态。我们通过实验证明了库拉莫托GNN 在不同的图深度学习 benchmark 任务上的优势,比如减少过滤现象。

Coherent Entity Disambiguation via Modeling Topic and Categorical Dependency

  • paper_url: http://arxiv.org/abs/2311.03253
  • repo_url: None
  • paper_authors: Zilin Xiao, Linjun Shou, Xingyao Zhang, Jie Wu, Ming Gong, Jian Pei, Daxin Jiang
    for:The paper aims to improve the coherence of entity disambiguation (ED) predictions by proposing a novel system called CoherentED.methods:CoherentED uses an unsupervised variational autoencoder (VAE) to extract latent topic vectors of context sentences, and incorporates an external category memory to retrieve relevant categories for undecided mentions. The system also employs step-by-step entity decisions to model entity-entity interactions and maintain maximum coherence at the category level.results:The proposed CoherentED model achieves new state-of-the-art results on popular ED benchmarks, with an average improvement of 1.3 F1 points, particularly excelling in long-text scenarios.
    Abstract Previous entity disambiguation (ED) methods adopt a discriminative paradigm, where prediction is made based on matching scores between mention context and candidate entities using length-limited encoders. However, these methods often struggle to capture explicit discourse-level dependencies, resulting in incoherent predictions at the abstract level (e.g. topic or category). We propose CoherentED, an ED system equipped with novel designs aimed at enhancing the coherence of entity predictions. Our method first introduces an unsupervised variational autoencoder (VAE) to extract latent topic vectors of context sentences. This approach not only allows the encoder to handle longer documents more effectively, conserves valuable input space, but also keeps a topic-level coherence. Additionally, we incorporate an external category memory, enabling the system to retrieve relevant categories for undecided mentions. By employing step-by-step entity decisions, this design facilitates the modeling of entity-entity interactions, thereby maintaining maximum coherence at the category level. We achieve new state-of-the-art results on popular ED benchmarks, with an average improvement of 1.3 F1 points. Our model demonstrates particularly outstanding performance on challenging long-text scenarios.
    摘要 To address this issue, we propose CoherentED, an ED system with novel designs that enhance the coherence of entity predictions. Our approach includes:1. Unsupervised variational autoencoder (VAE) to extract latent topic vectors of context sentences. This allows the encoder to handle longer documents more effectively, conserve valuable input space, and maintain topic-level coherence.2. External category memory to retrieve relevant categories for undecided mentions. This design facilitates the modeling of entity-entity interactions and maintains maximum coherence at the category level.3. Step-by-step entity decisions to model entity-entity interactions and maintain coherence at the category level.Our model achieves new state-of-the-art results on popular ED benchmarks, with an average improvement of 1.3 F1 points. It particularly excels in challenging long-text scenarios.

Instructed Language Models with Retrievers Are Powerful Entity Linkers

  • paper_url: http://arxiv.org/abs/2311.03250
  • repo_url: https://github.com/mrzilinxiao/insgenentitylinking
  • paper_authors: Zilin Xiao, Ming Gong, Jie Wu, Xingyao Zhang, Linjun Shou, Jian Pei, Daxin Jiang
  • for: 提高语言模型在实体链接任务中的性能,使其能够准确地预测知识库中的实体。
  • methods: 提出了一种新的生成式实体链接方法,包括将语言模型通过序列对应的训练目标和指导进行强制实体链接训练,以及一种轻量级的潜在提及检索器来减轻模型的解oding负担。
  • results: 与前一代生成方法进行比较,INSGENEL表现出了+6.8 F1分的提升,同时具有更高的训练数据效率和训练计算资源利用率。
    Abstract Generative approaches powered by large language models (LLMs) have demonstrated emergent abilities in tasks that require complex reasoning abilities. Yet the generative nature still makes the generated content suffer from hallucinations, thus unsuitable for entity-centric tasks like entity linking (EL) requiring precise entity predictions over a large knowledge base. We present Instructed Generative Entity Linker (INSGENEL), the first approach that enables casual language models to perform entity linking over knowledge bases. Several methods to equip language models with EL capability were proposed in this work, including (i) a sequence-to-sequence training EL objective with instruction-tuning, (ii) a novel generative EL framework based on a light-weight potential mention retriever that frees the model from heavy and non-parallelizable decoding, achieving 4$\times$ speedup without compromise on linking metrics. INSGENEL outperforms previous generative alternatives with +6.8 F1 points gain on average, also with a huge advantage in training data efficiency and training compute consumption. In addition, our skillfully engineered in-context learning (ICL) framework for EL still lags behind INSGENEL significantly, reaffirming that the EL task remains a persistent hurdle for general LLMs.
    摘要 大型语言模型驱动的生成方法已经展示出了复杂逻辑能力的emergent能力。然而,生成性还使生成内容受到幻觉的影响,因此不适用于实体关注任务(EL),需要精确的实体预测 над大量知识库。我们提出了首个使用语言模型执行EL任务的Instructed Generative Entity Linker(INSGENEL)方法。我们提出了多种方法来让语言模型拥有EL能力,包括(i)在EL目标下进行序列到序列训练,并使用指令调整;(ii)一种基于轻量级可能提取器的新一代生成EL框架,解决了重量级和非平行化解码的问题,实现了4倍的速度提升而无需牺牲链接指标。INSGENEL在前一代生成方法上提高了平均6.8个F1分,同时具有较好的训练数据效率和训练计算耗用率。此外,我们的巧妙地设计的上下文学习(ICL)框架 для EL仍然落后INSGENEL,这再次证明了EL任务对普通LLMs是一个持续的挑战。

Advancing Post Hoc Case Based Explanation with Feature Highlighting

  • paper_url: http://arxiv.org/abs/2311.03246
  • repo_url: None
  • paper_authors: Eoin Kenny, Eoin Delaney, Mark Keane
  • for: 该论文的目的是提出一种新的可解释AI(XAI)技术,用于辅助人类和AI系统之间的合作。
  • methods: 该论文使用了两种总体算法(幽默和超像素基于算法)来隔离测试图像中的多个清晰特征部分,然后将其连接到训练数据中的相关案例,以提供更全面的解释。
  • results: 该论文的结果表明,该方法可以正确地调整用户对批处理的感受,并且在实际数据集上的ImageNet dataset上实现了这一效果,而不是只是显示解释而无法连接到特征部分。
    Abstract Explainable AI (XAI) has been proposed as a valuable tool to assist in downstream tasks involving human and AI collaboration. Perhaps the most psychologically valid XAI techniques are case based approaches which display 'whole' exemplars to explain the predictions of black box AI systems. However, for such post hoc XAI methods dealing with images, there has been no attempt to improve their scope by using multiple clear feature 'parts' of the images to explain the predictions while linking back to relevant cases in the training data, thus allowing for more comprehensive explanations that are faithful to the underlying model. Here, we address this gap by proposing two general algorithms (latent and super pixel based) which can isolate multiple clear feature parts in a test image, and then connect them to the explanatory cases found in the training data, before testing their effectiveness in a carefully designed user study. Results demonstrate that the proposed approach appropriately calibrates a users feelings of 'correctness' for ambiguous classifications in real world data on the ImageNet dataset, an effect which does not happen when just showing the explanation without feature highlighting.
    摘要 explainer AI (XAI) 被提议为在人机合作下执行下游任务的有价值工具。 可能最有心理有效的 XAI 技术是情况基 Approach,通过显示“整体”的示例来解释黑盒 AI 系统的预测。然而,对于图像处理的Post hoc XAI 方法,没有尝试使用多个明确的特征部分来解释预测结果,从而与相关的训练数据中的案例相关联,以提供更全面的解释, faithful 于下面模型。在这里,我们解决这个差距,并提出了两种通用算法(潜在和超Pixel 基于),可以在测试图像中孤立多个明确的特征部分,然后与训练数据中的解释案例相关联,并在用户研究中进行测试。结果表明,我们的方法可以正确地考虑用户对异常分类结果的情感,在实际世界数据集上的 ImageNet dataset 上,并不会发生只显示解释而无需特征高亮的情况。Note that Simplified Chinese is used in this translation, as it is the most widely used form of Chinese in mainland China and Singapore. If you prefer Traditional Chinese, I can provide that as well.

An Efficient Self-Supervised Cross-View Training For Sentence Embedding

  • paper_url: http://arxiv.org/abs/2311.03228
  • repo_url: https://github.com/mrpeerat/sct
  • paper_authors: Peerat Limkonchotiwat, Wuttikorn Ponwitayarat, Lalita Lowphansirikul, Can Udomcharoenchaikit, Ekapol Chuangsuwanich, Sarana Nutanong
  • for: 提高小型语言模型(PLM)的自监sentence表示学习性能。
  • methods: 提出了一种名为Self-supervised Cross-View Training(SCT)的框架,用于缩小大小PLM之间性能差异。
  • results: SCT在7个Semantic Textual Similarity(STS)benchmark上比基eline和state-of-the-art竞争对手perform better,特别是对PLM的参数量小于100M的情况下。
    Abstract Self-supervised sentence representation learning is the task of constructing an embedding space for sentences without relying on human annotation efforts. One straightforward approach is to finetune a pretrained language model (PLM) with a representation learning method such as contrastive learning. While this approach achieves impressive performance on larger PLMs, the performance rapidly degrades as the number of parameters decreases. In this paper, we propose a framework called Self-supervised Cross-View Training (SCT) to narrow the performance gap between large and small PLMs. To evaluate the effectiveness of SCT, we compare it to 5 baseline and state-of-the-art competitors on seven Semantic Textual Similarity (STS) benchmarks using 5 PLMs with the number of parameters ranging from 4M to 340M. The experimental results show that STC outperforms the competitors for PLMs with less than 100M parameters in 18 of 21 cases.
    摘要 自动监督句子表示学习是建立句子嵌入空间的任务,不需要人工标注努力。一种直观方法是通过对预训练语言模型(PLM)进行微调和表示学习方法,如对比学习。然而,这种方法在PLM的参数数量减少时表现迅速下降。在这篇论文中,我们提出了一个名为自动监督跨视图训练(SCT)的框架,以减少大小PLM的表现差距。为评估SCT的效果,我们与5个基线和当前顶峰竞争对手进行比较,在7个Semantic Textual Similarity(STS)标准套件上使用5个PLM的参数量从4M到340M。实验结果显示,STC在PLM参数量少于100M的18个情况中表现更好于竞争对手。

LDM3D-VR: Latent Diffusion Model for 3D VR

  • paper_url: http://arxiv.org/abs/2311.03226
  • repo_url: None
  • paper_authors: Gabriela Ben Melech Stan, Diana Wofk, Estelle Aflalo, Shao-Yen Tseng, Zhipeng Cai, Michael Paulitsch, Vasudev Lal
  • for: 这篇论文是为了提出一种基于潜在扩散模型的虚拟现实开发框架,包括LDM3D-pano和LDM3D-SR两种模型。这两种模型可以根据文本提示生成全景RGBD图像,并将低分辨率输入图像upscale到高分辨率RGBD图像。
  • methods: 这两种模型都是基于现有预训练模型的 fine-tuning,使用包括全景RGB图像、深度图像和标签在内的数据集进行训练。
  • results: 对于LDM3D-pano模型,研究人员通过对比与现有相关方法进行评估,发现它可以生成高质量的全景RGBD图像,而且比现有方法更具有创新性和灵活性。对于LDM3D-SR模型,研究人员发现它可以高效地upscale低分辨率RGBD图像到高分辨率,并且比现有方法更具有稳定性和可靠性。
    Abstract Latent diffusion models have proven to be state-of-the-art in the creation and manipulation of visual outputs. However, as far as we know, the generation of depth maps jointly with RGB is still limited. We introduce LDM3D-VR, a suite of diffusion models targeting virtual reality development that includes LDM3D-pano and LDM3D-SR. These models enable the generation of panoramic RGBD based on textual prompts and the upscaling of low-resolution inputs to high-resolution RGBD, respectively. Our models are fine-tuned from existing pretrained models on datasets containing panoramic/high-resolution RGB images, depth maps and captions. Both models are evaluated in comparison to existing related methods.
    摘要 <>将文本翻译成简化中文。<>潜在扩散模型已经证明是视觉输出创造和操作的状态之一。然而,据我们知道,RGBD的深度图生成并非很广泛。我们介绍LDM3D-VR,一个针对虚拟现实开发的扩散模型集合,包括LDM3D-pano和LDM3D-SR。这两个模型允许通过文本提示生成宽角RGBD和低分辨率输入到高分辨率RGBD的upscaling。我们的模型来自现有预训练模型,并在包含宽角/高分辨率RGB图像、深度图和标签的数据集上进行了微调。两个模型与现有相关方法进行了评估。

ALYMPICS: Language Agents Meet Game Theory

  • paper_url: http://arxiv.org/abs/2311.03220
  • repo_url: None
  • paper_authors: Shaoguang Mao, Yuzhe Cai, Yan Xia, Wenshan Wu, Xun Wang, Fengyi Wang, Tao Ge, Furu Wei
  • for: 这篇论文是为了探讨语言模型代理(LLM)在游戏理论中的应用。
  • methods: 该论文使用LLM和自动化代理来模拟人类行为,并实现多代理合作,以构建人类交互的真实和动态模型。
  • results: 通过 manipulate 资源可用性和代理个性,我们观察了不同代理在竞争中如何参与和适应策略。LLM代理在游戏理论研究中提供了优势,包括模拟真实行为、提供可控、可扩展和可重现环境。
    Abstract This paper introduces Alympics, a platform that leverages Large Language Model (LLM) agents to facilitate investigations in game theory. By employing LLMs and autonomous agents to simulate human behavior and enable multi-agent collaborations, we can construct realistic and dynamic models of human interactions for game theory hypothesis formulating and testing. To demonstrate this, we present and implement a survival game involving unequal competition for limited resources. Through manipulation of resource availability and agent personalities, we observe how different agents engage in the competition and adapt their strategies. The use of LLM agents in game theory research offers significant advantages, including simulating realistic behavior, providing a controlled, scalable, and reproducible environment. Our work highlights the potential of LLM agents in enhancing the understanding of strategic decision-making within complex socioeconomic contexts. All codes will be made public soon.
    摘要 这篇论文介绍了Alympics平台,该平台利用大语言模型(LLM)代理来促进游戏理论研究。通过使用LLM和自动化代理来模拟人类行为并实现多代理合作,我们可以构建真实和动态的人类互动模型,用于游戏理论假设设计和测试。为了证明这一点,我们在一个有限资源的存储游戏中展示了不同代理的竞争和战略适应。通过资源可用性和代理个性的调整,我们观察到不同的代理如何参与竞争并适应策略。使用LLM代理在游戏理论研究中提供了显著优势,包括模拟真实行为、提供可控、可扩展和可重现的环境。我们的工作强调了LLM代理在复杂社会经济背景下的决策战略理解的潜在优势。所有代码即将公开。

Mini Minds: Exploring Bebeshka and Zlata Baby Models

  • paper_url: http://arxiv.org/abs/2311.03216
  • repo_url: https://github.com/upunaprosk/small-language-models
  • paper_authors: Irina Proskurina, Guillaume Metzler, Julien Velcin
  • for: 这个论文描述了卢梭大学2所提交给Strict-Smalltrack的BabyLM竞赛任务。该任务强调从头开始,使用有限数据量和人类语言学习来进行语言模型化。共享任务数据集有1000万个词语,与儿童词汇相当。
  • methods: 我们采用了建立搜索,使用masked语言模型损失来调整数据集上的配置。我们发现了一个优化的配置,并引入了两个小型语言模型(LMs),即Bebeshka和Zlata。这两个模型具有4层encoder和6层decoder,具有8个注意头和12个注意头。虽然这两个模型的规模远小于基elineLMs,但它们在性能上具有相似的表现。
  • results: 我们发现,这两个小型语言模型在包括道德判断任务中的语言理解任务中表现出色。我们还发现,这些任务的预测结果与人类价值观念相align。这些发现表明,小型语言模型在实际语言理解任务中具有潜在的应用前景。
    Abstract In this paper, we describe the University of Lyon 2 submission to the Strict-Small track of the BabyLM competition. The shared task is created with an emphasis on small-scale language modelling from scratch on limited-size data and human language acquisition. Dataset released for the Strict-Small track has 10M words, which is comparable to children's vocabulary size. We approach the task with an architecture search, minimizing masked language modelling loss on the data of the shared task. Having found an optimal configuration, we introduce two small-size language models (LMs) that were submitted for evaluation, a 4-layer encoder with 8 attention heads and a 6-layer decoder model with 12 heads which we term Bebeshka and Zlata, respectively. Despite being half the scale of the baseline LMs, our proposed models achieve comparable performance. We further explore the applicability of small-scale language models in tasks involving moral judgment, aligning their predictions with human values. These findings highlight the potential of compact LMs in addressing practical language understanding tasks.
    摘要 在这篇论文中,我们描述了里昂第二大学对Strict-Small赛道的提交。这个比赛强调从头开始,使用有限数据量和人类语言学习的小规模语言模型。比赛数据集的词汇量为1000万个词,与儿童 vocabulary 大小相当。我们通过架构搜索,在数据集上减少了遮盖语言模型损失。我们发现了一个优化的配置,并引入了两个小型语言模型(LM),即Bebeshka和Zlata。这两个模型各有4层encoder和6层decoder,每个encoder具有8个注意头,每个decoder具有12个注意头。尽管这两个模型的规模只是基elineLMs的一半,但它们在比赛中表现很强。我们进一步探索了小规模语言模型在道德判断任务中的可行性,并将其预测与人类价值观Alignment。这些发现表明了小规模语言模型在实际语言理解任务中的潜在潜力。

Pseudo-Labeling for Domain-Agnostic Bangla Automatic Speech Recognition

  • paper_url: http://arxiv.org/abs/2311.03196
  • repo_url: https://github.com/hishab-nlp/pseudo-labeling-for-domain-agnostic-bangla-asr
  • paper_authors: Rabindra Nath Nandi, Mehadi Hasan Menon, Tareq Al Muntasir, Sagor Sarker, Quazi Sarwar Muhtaseem, Md. Tariqul Islam, Shammur Absar Chowdhury, Firoj Alam
  • For: The paper aims to develop a large-scale domain-agnostic automatic speech recognition (ASR) dataset for low-resource languages, specifically Bangla.* Methods: The proposed methodology uses pseudo-labeling to develop a 20k+ hours labeled Bangla speech dataset covering diverse topics, speaking styles, dialects, noisy environments, and conversational scenarios.* Results: The developed ASR system is benchmarked with publicly available datasets and compared with other available models, demonstrating its efficacy on a human-annotated domain-agnostic test set composed of news, telephony, and conversational data.
    Abstract One of the major challenges for developing automatic speech recognition (ASR) for low-resource languages is the limited access to labeled data with domain-specific variations. In this study, we propose a pseudo-labeling approach to develop a large-scale domain-agnostic ASR dataset. With the proposed methodology, we developed a 20k+ hours labeled Bangla speech dataset covering diverse topics, speaking styles, dialects, noisy environments, and conversational scenarios. We then exploited the developed corpus to design a conformer-based ASR system. We benchmarked the trained ASR with publicly available datasets and compared it with other available models. To investigate the efficacy, we designed and developed a human-annotated domain-agnostic test set composed of news, telephony, and conversational data among others. Our results demonstrate the efficacy of the model trained on psuedo-label data for the designed test-set along with publicly-available Bangla datasets. The experimental resources will be publicly available.(https://github.com/hishab-nlp/Pseudo-Labeling-for-Domain-Agnostic-Bangla-ASR)
    摘要 一个主要挑战是开发自动语音识别(ASR)系统的低资源语言是有限的标注数据的领域特定变化。在本研究中,我们提出了一种 Pseudo-labeling 方法,以开发大规模的领域不偏的 ASR 数据集。我们通过该方法,开发了20000+小时的标注的孟加拉语音数据集,覆盖了多样化的话题、说话风格、方言、噪音环境和对话场景。然后,我们利用开发的 corpus 设计了一个基于 Confomer 的 ASR 系统。我们对培金的 ASR 进行了评估,并与其他可用的模型进行了比较。为了调查效果,我们设计了和开发了一个人类标注的领域不偏测试集,包括新闻、电信和对话数据等。我们的结果表明,使用 Pseudo-labeling 方法对于我们设计的测试集以及公共可用的孟加拉语言数据集都有效。实验资源将公开。(参考:https://github.com/hishab-nlp/Pseudo-Labeling-for-Domain-Agnostic-Bangla-ASR)

Nexus at ArAIEval Shared Task: Fine-Tuning Arabic Language Models for Propaganda and Disinformation Detection

  • paper_url: http://arxiv.org/abs/2311.03184
  • repo_url: None
  • paper_authors: Yunze Xiao, Firoj Alam
  • for: 本研究旨在探讨自媒体内容中假信息和宣传内容的扩散,以及这些内容如何影响社会稳定和公众的信任。
  • methods: 本研究使用了基于变换器的 fine-tuning 和零或几次shot学习,以及 GPT-4 的使用。
  • results: 本研究在 ArAIEval 共享任务中的成绩为9名和10名。
    Abstract The spread of disinformation and propagandistic content poses a threat to societal harmony, undermining informed decision-making and trust in reliable sources. Online platforms often serve as breeding grounds for such content, and malicious actors exploit the vulnerabilities of audiences to shape public opinion. Although there have been research efforts aimed at the automatic identification of disinformation and propaganda in social media content, there remain challenges in terms of performance. The ArAIEval shared task aims to further research on these particular issues within the context of the Arabic language. In this paper, we discuss our participation in these shared tasks. We competed in subtasks 1A and 2A, where our submitted system secured positions 9th and 10th, respectively. Our experiments consist of fine-tuning transformer models and using zero- and few-shot learning with GPT-4.
    摘要 社会和谐受到假信息和宣传内容的威胁,这会损害了知情决策和可靠来源的信任。在线平台经常成为这种内容的殖民地,恶意者会利用听众的漏洞来形成公众意见。虽然已有研究尝试自动识别社交媒体中的假信息和宣传,但还有许多挑战。阿拉伯语 ArAIEval 分享任务旨在进一步研究这些问题。在这篇论文中,我们讲述了我们在这些分享任务中的参与。我们参加了1A和2A两个子任务,我们提交的系统在这两个子任务中分别获得了第9名和第10名。我们的实验包括细化变换器模型和使用零或几次shot学习与GPT-4。

ArAIEval Shared Task: Persuasion Techniques and Disinformation Detection in Arabic Text

  • paper_url: http://arxiv.org/abs/2311.03179
  • repo_url: None
  • paper_authors: Maram Hasanain, Firoj Alam, Hamdy Mubarak, Samir Abdaljalil, Wajdi Zaghouani, Preslav Nakov, Giovanni Da San Martino, Abed Alhakim Freihat
  • For: The paper is written for the first ArabicNLP 2023 conference and is focused on the ArAIEval shared task, which is a task of persuasion technique detection and disinformation detection in Arabic text.* Methods: The paper uses fine-tuning transformer models such as AraBERT as the core of the majority of the participating systems.* Results: The paper provides a description of the task setup, including the dataset construction and evaluation setup, and gives a brief overview of the participating systems. All datasets and evaluation scripts from the shared task are released to the research community.Here’s the information in Simplified Chinese text:* For: 这篇论文是为了第一届阿拉伯语自然语言处理会议(ArabicNLP 2023)的一部分,主要关注于 ArAIEval 共享任务,这个任务的目标是在 arabic 文本中检测吸引技巧和谎言检测。* Methods: 这篇论文主要使用 fine-tuning 转换器模型,如 AraBERT,作为大多数参与系统的核心。* Results: 论文提供了任务设置的描述,包括数据集构建和评估设置,并提供了参与系统的简短概述。所有数据集和评估脚本都被发布到研究社区。
    Abstract We present an overview of the ArAIEval shared task, organized as part of the first ArabicNLP 2023 conference co-located with EMNLP 2023. ArAIEval offers two tasks over Arabic text: (i) persuasion technique detection, focusing on identifying persuasion techniques in tweets and news articles, and (ii) disinformation detection in binary and multiclass setups over tweets. A total of 20 teams participated in the final evaluation phase, with 14 and 16 teams participating in Tasks 1 and 2, respectively. Across both tasks, we observed that fine-tuning transformer models such as AraBERT was at the core of the majority of the participating systems. We provide a description of the task setup, including a description of the dataset construction and the evaluation setup. We further give a brief overview of the participating systems. All datasets and evaluation scripts from the shared task are released to the research community. (https://araieval.gitlab.io/) We hope this will enable further research on these important tasks in Arabic.
    摘要 我们提供了阿拉伯语评价分享任务的概述,作为2023年阿拉伯语自然语言处理会议(ArabicNLP 2023)的一部分,并与EMNLP 2023会议共同举行。这个任务提供了两个任务,即在新闻文章和推文中发现吸引人技巧的检测,以及在推文中发现不实信息的检测。共有20个队伍参加了最终评估阶段,其中14个队伍参加了任务1,16个队伍参加了任务2。在两个任务中,大多数参与系统都是使用transformer模型进行精度调整,如AraBERT。我们提供了任务设置的描述,包括数据集的建构和评估设置的描述,以及参与系统的简要概述。此外,我们还发布了所有数据集和评估脚本,以便研究人员可以进一步进行研究。(https://araieval.gitlab.io/)。我们希望这将促进阿拉伯语的研究。

1D-Convolutional transformer for Parkinson disease diagnosis from gait

  • paper_url: http://arxiv.org/abs/2311.03177
  • repo_url: https://github.com/safwennaimi/1d-convolutional-transformer-for-parkinson-disease-diagnosis-from-gait
  • paper_authors: Safwen Naimi, Wassim Bouachir, Guillaume-Alexandre Bilodeau
  • for: 这个研究的目的是用深度神经网络模型诊断 Parkinson 病患的步态。
  • methods: 这个研究使用了一种混合 ConvNet-Transformer 架构,通过捕捉本地特征和长期空间时间关系来准确地诊断病种的严重程度。
  • results: 研究结果表明,这种混合架构可以从步态数据中准确地诊断 Parkinson 病患的不同阶段,具体来说是88%的准确率,超过了其他现有的人工智能方法。此外,这种方法可以普遍应用于其他分类问题,以联合地解决1D信号中的特征相关性和空间时间关系问题。
    Abstract This paper presents an efficient deep neural network model for diagnosing Parkinson's disease from gait. More specifically, we introduce a hybrid ConvNet-Transformer architecture to accurately diagnose the disease by detecting the severity stage. The proposed architecture exploits the strengths of both Convolutional Neural Networks and Transformers in a single end-to-end model, where the former is able to extract relevant local features from Vertical Ground Reaction Force (VGRF) signal, while the latter allows to capture long-term spatio-temporal dependencies in data. In this manner, our hybrid architecture achieves an improved performance compared to using either models individually. Our experimental results show that our approach is effective for detecting the different stages of Parkinson's disease from gait data, with a final accuracy of 88%, outperforming other state-of-the-art AI methods on the Physionet gait dataset. Moreover, our method can be generalized and adapted for other classification problems to jointly address the feature relevance and spatio-temporal dependency problems in 1D signals. Our source code and pre-trained models are publicly available at https://github.com/SafwenNaimi/1D-Convolutional-transformer-for-Parkinson-disease-diagnosis-from-gait.
    摘要 这篇论文提出了一种高效的深度神经网络模型,用于从步态诊断parkinson病。更具体地说,我们引入了一种混合ConvNet-Transformer架构,以准确地诊断病种的严重度。我们的提议的架构利用了Convolutional Neural Networks和Transformers两者的优势,在单一端到端模型中结合使用,以EXTract local特征和long-termspatio-temporal关系。这种方法实现了使用单个模型来诊断parkinson病的更高性能,我们的实验结果表明,我们的方法可以准确地从步态数据中识别不同的parkinson病stage,最终准确率为88%,超过了其他state-of-the-art AI方法在Physionet步态数据集上。此外,我们的方法可以普化和适应其他分类问题,以 JOINTLY地解决1D信号中的特征相关性和spatio-temporal关系问题。我们的源代码和预训练模型可以在https://github.com/SafwenNaimi/1D-Convolutional-transformer-for-Parkinson-disease-diagnosis-from-gait上获取。

Findings of the WMT 2023 Shared Task on Discourse-Level Literary Translation: A Fresh Orb in the Cosmos of LLMs

  • paper_url: http://arxiv.org/abs/2311.03127
  • repo_url: None
  • paper_authors: Longyue Wang, Zhaopeng Tu, Yan Gu, Siyou Liu, Dian Yu, Qingsong Ma, Chenyang Lyu, Liting Zhou, Chao-Hong Liu, Yufeng Ma, Weiyu Chen, Yvette Graham, Bonnie Webber, Philipp Koehn, Andy Way, Yulin Yuan, Shuming Shi
  • for: 本研究旨在探讨机器翻译Literary Translation task的问题,以便促进这一领域的进步。
  • methods: 作者们使用了一个新的文本 corpora和一套产业界承认的评价标准,以评估参与者提交的系统性能。
  • results: 研究发现了一系列有趣的发现,包括文学和语言领域的MT问题,以及一些可能有助于解决这些问题的策略。
    Abstract Translating literary works has perennially stood as an elusive dream in machine translation (MT), a journey steeped in intricate challenges. To foster progress in this domain, we hold a new shared task at WMT 2023, the first edition of the Discourse-Level Literary Translation. First, we (Tencent AI Lab and China Literature Ltd.) release a copyrighted and document-level Chinese-English web novel corpus. Furthermore, we put forth an industry-endorsed criteria to guide human evaluation process. This year, we totally received 14 submissions from 7 academia and industry teams. We employ both automatic and human evaluations to measure the performance of the submitted systems. The official ranking of the systems is based on the overall human judgments. In addition, our extensive analysis reveals a series of interesting findings on literary and discourse-aware MT. We release data, system outputs, and leaderboard at http://www2.statmt.org/wmt23/literary-translation-task.html.
    摘要 machine translation (MT) Literary translation 总是被看作是一个逃逸的梦,一个涉及繁复挑战的旅程。为了推动这个领域的进步,我们在 WMT 2023 上开设了一个新的共同任务,即第一届 Discourse-Level Literary Translation。首先,我们(Tencent AI Lab 和 China Literature Ltd.)发布了一个版权保护的、中英文网络小说 corpus。其次,我们提出了产业界认可的评价标准,以导引人类评价过程。这年,我们总共收到了 14 个来自 7 所学术和产业团队的提交。我们使用自动和人类评价两者来评价提交系统的性能。人类评价结果作为官方排名的基础。此外,我们进行了广泛的分析,发现了一系列有趣的文学和报道意识MT的发现。我们在 http://www2.statmt.org/wmt23/literary-translation-task.html 上发布了数据、系统输出和排名。

Pelvic floor MRI segmentation based on semi-supervised deep learning

  • paper_url: http://arxiv.org/abs/2311.03105
  • repo_url: None
  • paper_authors: Jianwei Zuo, Fei Feng, Zhuhui Wang, James A. Ashton-Miller, John O. L. Delancey, Jiajia Luo
    for:这篇论文的目的是提出一个半supervised的框架来进行骨盘器官分类。methods:这个框架包括两个阶段:第一个阶段是使用自我监督预训 tasks进行自我预训,然后使用标注数据进行精确训练分类模型。第二个阶段是使用自我预训的分类模型来生成伪标签 для无标的数据。最后,这两个阶段的数据都被用在半supervised的训练中。results:在评估中,我们的方法可以将骨盘器官分类的性能提高,特别是难以分类的器官,如uterus,可以提高Semantic segmentation的精度 by up to 3.70%.
    Abstract The semantic segmentation of pelvic organs via MRI has important clinical significance. Recently, deep learning-enabled semantic segmentation has facilitated the three-dimensional geometric reconstruction of pelvic floor organs, providing clinicians with accurate and intuitive diagnostic results. However, the task of labeling pelvic floor MRI segmentation, typically performed by clinicians, is labor-intensive and costly, leading to a scarcity of labels. Insufficient segmentation labels limit the precise segmentation and reconstruction of pelvic floor organs. To address these issues, we propose a semi-supervised framework for pelvic organ segmentation. The implementation of this framework comprises two stages. In the first stage, it performs self-supervised pre-training using image restoration tasks. Subsequently, fine-tuning of the self-supervised model is performed, using labeled data to train the segmentation model. In the second stage, the self-supervised segmentation model is used to generate pseudo labels for unlabeled data. Ultimately, both labeled and unlabeled data are utilized in semi-supervised training. Upon evaluation, our method significantly enhances the performance in the semantic segmentation and geometric reconstruction of pelvic organs, Dice coefficient can increase by 2.65% averagely. Especially for organs that are difficult to segment, such as the uterus, the accuracy of semantic segmentation can be improved by up to 3.70%.
    摘要 pelvic organs的semantic segmentation via MRI具有重要的临床意义。现在,通过深度学习启用的semantic segmentation,可以实现三维的预测和重建 pelvic floor organs,为临床人员提供准确和直观的诊断结果。然而,pelvic floor MRI segmentation的标注工作,通常由临床人员进行,是劳动密集和成本高的,导致标注数据的缺乏。不充分的标注数据限制了精准的pelvic floor organs的分割和重建。为解决这些问题,我们提议了一种 semi-supervised 框架 для pelvic organ segmentation。该框架的实现包括两个阶段。在第一阶段,它通过自我监督的预训练来完成图像恢复任务。然后,使用标注数据来训练分割模型的精度调整。在第二阶段,自我监督分割模型被用来生成 pseudo 标签 для无标注数据。最终,两者都被用于 semi-supervised 训练。经评估,我们的方法可以显著提高pelvic organs的semantic segmentation和三维重建的性能。平均可以提高 Dice 系数2.65%,特别是难以分割的器官,如uterus,可以提高semantic segmentation的准确率达3.70%。

A Simple yet Efficient Ensemble Approach for AI-generated Text Detection

  • paper_url: http://arxiv.org/abs/2311.03084
  • repo_url: None
  • paper_authors: Harika Abburi, Kalyani Roy, Michael Suesserman, Nirmala Pudota, Balaji Veeramani, Edward Bowen, Sanmitra Bhattacharya
  • for: 本研究旨在开发一种自动可 distinguish between artificially generated text和人类写作的方法,以防止大语言模型(LLMs)的潜在滥用,如假新闻生成、垃圾邮件创建和学术作业违规使用。
  • methods: 我们提出了一种简单 yet efficient的解决方案,通过将多个组件LLMs的预测 ensemble。与前一代方法相比,我们的压缩结合方法仅使用了两个组件LLMs,而且可以达到相同的性能水平。
  • results: 我们在四个生成文本分类 benchmark 上进行了实验,结果表明,与前一代方法相比,我们的方法在性能提升的范围内为0.5%-100%。此外,我们还研究了各个LLMs的训练数据对模型性能的影响,发现可以将商业束缚的GPT数据取代为其他开源语言模型生成的数据,这是一种可行的替代方案。最后,我们通过英文作文数据集的实验,证明了我们的结合方法可以有效地处理新数据。
    Abstract Recent Large Language Models (LLMs) have demonstrated remarkable capabilities in generating text that closely resembles human writing across wide range of styles and genres. However, such capabilities are prone to potential abuse, such as fake news generation, spam email creation, and misuse in academic assignments. Hence, it is essential to build automated approaches capable of distinguishing between artificially generated text and human-authored text. In this paper, we propose a simple yet efficient solution to this problem by ensembling predictions from multiple constituent LLMs. Compared to previous state-of-the-art approaches, which are perplexity-based or uses ensembles with a number of LLMs, our condensed ensembling approach uses only two constituent LLMs to achieve comparable performance. Experiments conducted on four benchmark datasets for generative text classification show performance improvements in the range of 0.5 to 100\% compared to previous state-of-the-art approaches. We also study the influence that the training data from individual LLMs have on model performance. We found that substituting commercially-restrictive Generative Pre-trained Transformer (GPT) data with data generated from other open language models such as Falcon, Large Language Model Meta AI (LLaMA2), and Mosaic Pretrained Transformers (MPT) is a feasible alternative when developing generative text detectors. Furthermore, to demonstrate zero-shot generalization, we experimented with an English essays dataset, and results suggest that our ensembling approach can handle new data effectively.
    摘要 现代大语言模型(LLM)在生成文本方面表现出了惊人的能力,能够生成具有人类特点的文本,覆盖了各种风格和类型。然而,这些能力也存在潜在的危险,如生成假新闻、垃圾邮件和学术作业上的违规使用。因此,建立自动 distinguishing 人造文本和人类写作文本的方法变得非常重要。在这篇论文中,我们提出了一种简单 yet efficient 的解决方案,通过多个组件 LLM 的 ensemble 来实现。与之前的状态态前方法相比,我们的压缩 ensemble 方法只需使用两个组件 LLM,却可以达到相同的性能。在四个生成文本分类 benchmark 数据集上进行了实验,并取得了0.5% 到100% 的性能提升。我们还研究了各个 LLM 的训练数据对模型性能的影响。我们发现可以将商业 restriction 的 Generative Pre-trained Transformer(GPT)数据替换为其他开源语言模型如 Falcon、Large Language Model Meta AI(LLaMA2)和 Mosaic Pretrained Transformers(MPT)生成的数据,这是一种可行的方法。此外,我们进行了零shot泛化的实验,并发现我们的 ensemble 方法可以有效地处理新的数据。

SugarViT – Multi-objective Regression of UAV Images with Vision Transformers and Deep Label Distribution Learning Demonstrated on Disease Severity Prediction in Sugar Beet

  • paper_url: http://arxiv.org/abs/2311.03076
  • repo_url: None
  • paper_authors: Maurice Günder, Facundo Ramón Ispizua Yamati, Abel Andree Barreto Alcántara, Anne-Katrin Mahlein, Rafet Sifa, Christian Bauckhage
  • for: 这个研究旨在开发一个基于人工智能的植物特征标注框架,用于自动化大规模的甘蔗叶斑病苗分类。
  • methods: 这个研究使用了深度标签分布学习(DLDL)、特殊的损失函数和自订的模型架构,开发了一个基于视觉 трансформер的病苗度评分模型called SugarViT。
  • results: 这个研究获得了一个可靠的病苗度评分模型,并且还能够融合 remote sensing 数据和实验站的环境参数来预测病苗度。这个模型可以应用于多种像素基于的分类和回归 зада务。
    Abstract Remote sensing and artificial intelligence are pivotal technologies of precision agriculture nowadays. The efficient retrieval of large-scale field imagery combined with machine learning techniques shows success in various tasks like phenotyping, weeding, cropping, and disease control. This work will introduce a machine learning framework for automatized large-scale plant-specific trait annotation for the use case disease severity scoring for Cercospora Leaf Spot (CLS) in sugar beet. With concepts of Deep Label Distribution Learning (DLDL), special loss functions, and a tailored model architecture, we develop an efficient Vision Transformer based model for disease severity scoring called SugarViT. One novelty in this work is the combination of remote sensing data with environmental parameters of the experimental sites for disease severity prediction. Although the model is evaluated on this special use case, it is held as generic as possible to also be applicable to various image-based classification and regression tasks. With our framework, it is even possible to learn models on multi-objective problems as we show by a pretraining on environmental metadata.
    摘要 <>现代农业精度农业技术中,远程感知和人工智能是非常重要的。大规模田地图像的有效回收,结合机器学习技术,在不同任务中具有成功,如型态识别、除草、种植和病虫控制。本文将介绍一种基于机器学习框架的自动化大规模植物特征注释技术,用于 sugar beet Cercospora Leaf Spot (CLS) 疾病严重度评估。通过深度标签分布学习(DLDL)、特殊的损失函数和适应性的模型架构,我们开发了一种高效的视Transformer 模型,称为 SugarViT。本文的一个新特点是结合远程感知数据和实验室测试站的环境参数,进行疾病严重度预测。尽管这个模型是在这个特定的应用场景中评估的,但它是可以通用的,也可以应用于多种图像基于分类和回归任务。此外,我们还表明了如何通过环境元数据预训练,学习多目标问题。

Distributed Agent-Based Collaborative Learning in Cross-Individual Wearable Sensor-Based Human Activity Recognition

  • paper_url: http://arxiv.org/abs/2311.04236
  • repo_url: None
  • paper_authors: Ahmad Esmaeili, Zahra Ghorrati, Eric T. Matson
  • For: This paper is written for the field of personalized and context-aware Human Activity Recognition, with a focus on developing scalable, adaptable, and privacy-conscious methodologies using multi-agent systems.* Methods: The paper introduces a collaborative distributed learning approach rooted in multi-agent principles, where individual users of sensor-equipped devices function as agents within a distributed network, collectively contributing to the process of learning and classifying human activities.* Results: The proposed approach has been empirically tested on two publicly accessible human activity recognition datasets, showing the efficacy of inter-individual collaborative learning compared to centralized configurations, with both local and global generalization.
    Abstract The rapid growth of wearable sensor technologies holds substantial promise for the field of personalized and context-aware Human Activity Recognition. Given the inherently decentralized nature of data sources within this domain, the utilization of multi-agent systems with their inherent decentralization capabilities presents an opportunity to facilitate the development of scalable, adaptable, and privacy-conscious methodologies. This paper introduces a collaborative distributed learning approach rooted in multi-agent principles, wherein individual users of sensor-equipped devices function as agents within a distributed network, collectively contributing to the comprehensive process of learning and classifying human activities. In this proposed methodology, not only is the privacy of activity monitoring data upheld for each individual, eliminating the need for an external server to oversee the learning process, but the system also exhibits the potential to surmount the limitations of conventional centralized models and adapt to the unique attributes of each user. The proposed approach has been empirically tested on two publicly accessible human activity recognition datasets, specifically PAMAP2 and HARTH, across varying settings. The provided empirical results conclusively highlight the efficacy of inter-individual collaborative learning when contrasted with centralized configurations, both in terms of local and global generalization.
    摘要 “快速增长的戴式传感器技术持有大量潜在的个性化和上下文意识识别潜力。由于这个领域的数据来源本身具有分散化特点,因此使用多代理系统的特点可以推动开发可扩展、适应性强和隐私意识的方法。本文介绍一种基于多代理原则的分布式学习方法,在其中每个戴式设备上的用户作为分布式网络中的代理,共同参与人活动识别的全面学习过程。在该提议方法中,不仅保护了每个人的活动监测数据隐私,而且消除了中央服务器的学习过程监控需求,系统还能够超越传统中央化模型的局限性,适应每个用户的特点。这种方法在两个公共可访问的人类活动识别数据集(PAMAP2和HARTH)上进行了实验测试,结果表明在不同的设定下,分布式学习方法比中央化配置更高效, both locally and globally。”

Maximal Consistent Subsystems of Max-T Fuzzy Relational Equations

  • paper_url: http://arxiv.org/abs/2311.03059
  • repo_url: None
  • paper_authors: Ismaïl Baaj
  • for: 这个论文研究了一个$\max-T$不确定方程系统的不一致性,其中$T$是一个t-整数 among $\min$, 产品或Lukasiewicz的t-整数。
  • methods: 对于不一致的$\max-T$系统,我们直接构建了一个 canonical最大一致子系统(w.r.t. 包含关系),主要工具是计算不确定系统中Chebychev距离$\Delta = \inf_{c \in \mathcal{C} \Vert b - c \Vert$的分析式公式,其中$\mathcal{C}$是定义同样矩阵$A$的一致系统的二元组集。
  • results: 对于不一致的$\max-\min$系统,我们提供了一种高效的方法来获得所有一致子系统,并证明可以逐次获得所有最大一致子系统。
    Abstract In this article, we study the inconsistency of a system of $\max-T$ fuzzy relational equations of the form $A \Box_{T}^{\max} x = b$, where $T$ is a t-norm among $\min$, the product or Lukasiewicz's t-norm. For an inconsistent $\max-T$ system, we directly construct a canonical maximal consistent subsystem (w.r.t the inclusion order). The main tool used to obtain it is the analytical formula which compute the Chebyshev distance $\Delta = \inf_{c \in \mathcal{C} \Vert b - c \Vert$ associated to the inconsistent $\max-T$ system, where $\mathcal{C}$ is the set of second members of consistent systems defined with the same matrix $A$. Based on the same analytical formula, we give, for an inconsistent $\max-\min$ system, an efficient method to obtain all its consistent subsystems, and we show how to iteratively get all its maximal consistent subsystems.
    摘要 在这篇文章中,我们研究了 $\max-T$ 模糊关系方程的不一致性,其中 $T$ 是一种 t-整数(可能是乘法或卢卡氏 t-整数)。对于一个不一致的 $\max-T$ 系统,我们直接构造了一个宽义最大可consistent subsystem(w.r.t. 包含顺序)。我们使用了关于不一致 $\max-T$ 系统的分析式公式来计算 Chebyshev 距离 $\Delta = \inf_{c \in \mathcal{C} \Vert b - c \Vert$,其中 $\mathcal{C}$ 是定义同 $A$ 矩阵的一致系统的二元部分。基于同样的分析式公式,我们为一个不一致 $\max-\min$ 系统提供了一种高效的方法来获取所有一致子系统,并证明了可以递归地获取所有最大可consistent subsystem。

LitSumm: Large language models for literature summarisation of non-coding RNAs

  • paper_url: http://arxiv.org/abs/2311.03056
  • repo_url: https://github.com/rnacentral/litscan-summarization
  • paper_authors: Andrew Green, Carlos Ribas, Nancy Ontiveros-Palacios, Anton I. Petrov, Alex Bateman, Blake Sweeney
  • for: 本研究旨在解决生物医学文献筛选的挑战,即由于文献数量不断增加,而有限的审核人员无法适应所有相关文献。
  • methods: 本研究使用大型自然语言模型(LLM)生成非编码RNA文献摘要,并通过商业LLM和链接的提问和检查来生成高质量、准确的摘要。
  • results: 研究表明,使用现有的LLM可以自动生成高质量的非编码RNA文献摘要,并且与人类评价高度相关。此外,研究还发现自动评价方法与人类评价不 correlate。最后,研究应用其工具到选择的4,600多个ncRNA上,并将生成的摘要公开提供给RNAcentral资源。
    Abstract Motivation: Curation of literature in life sciences is a growing challenge. The continued increase in the rate of publication, coupled with the relatively fixed number of curators worldwide presents a major challenge to developers of biomedical knowledgebases. Very few knowledgebases have resources to scale to the whole relevant literature and all have to prioritise their efforts. Results: In this work, we take a first step to alleviating the lack of curator time in RNA science by generating summaries of literature for non-coding RNAs using large language models (LLMs). We demonstrate that high-quality, factually accurate summaries with accurate references can be automatically generated from the literature using a commercial LLM and a chain of prompts and checks. Manual assessment was carried out for a subset of summaries, with the majority being rated extremely high quality. We also applied the most commonly used automated evaluation approaches, finding that they do not correlate with human assessment. Finally, we apply our tool to a selection of over 4,600 ncRNAs and make the generated summaries available via the RNAcentral resource. We conclude that automated literature summarization is feasible with the current generation of LLMs, provided careful prompting and automated checking are applied. Availability: Code used to produce these summaries can be found here: https://github.com/RNAcentral/litscan-summarization and the dataset of contexts and summaries can be found here: https://huggingface.co/datasets/RNAcentral/litsumm-v1. Summaries are also displayed on the RNA report pages in RNAcentral (https://rnacentral.org/)
    摘要 目的:生命科学文献筛选是一项快速增长的挑战。随着发表速度的不断增加,与全球团队规模相比,生命科学文献筛选者的数量几乎固定,这对生物医学知识库开发者带来了主要的挑战。很少的知识库有资源可以涵盖整个相关文献,所以它们都必须优化努力。结果:在这项工作中,我们通过使用大型自然语言模型(LLM)生成非编码RNA文献摘要,以解决生命科学文献筛选者缺乏时间的问题。我们证明了可以使用商业LLM和链接的提示和检查生成高质量、精准的文献摘要,并且人工评估表明大多数摘要具有极高质量。我们还应用了常用的自动评估方法,发现它们与人工评估不符。最后,我们使用我们的工具对4600多个ncRNA进行摘要,并将生成的摘要公开于RNAcentral资源上。我们 conclued that使用当代LLM生成文献摘要是可能的,只要仔细制定提示和自动检查。可用性:用于生成这些摘要的代码可以在GitHub上找到:https://github.com/RNAcentral/litscan-summarization。我们生成的摘要和文献上下文可以在Hugging Face上找到:https://huggingface.co/datasets/RNAcentral/litsumm-v1。摘要也被显示在RNAcentral报告页面上(https://rnacentral.org/)。

Masking Hyperspectral Imaging Data with Pretrained Models

  • paper_url: http://arxiv.org/abs/2311.03053
  • repo_url: https://github.com/hifexplo/masking
  • paper_authors: Elias Arbash, Andréa de Lima Ribeiro, Sam Thiele, Nina Gnann, Behnood Rasti, Margret Fuchs, Pedram Ghamisi, Richard Gloaguen
  • for: 提高干扰谱数据处理性能,避免干扰谱数据中的背景频谱特征的影响。
  • methods: 提出了一种基于Segment Anything Model(SAM)和零shot Grounding Dino对象检测器的图像分割方法,并在 intersect和 exclude步骤中进行了筛选和排除。
  • results: 在三个复杂的应用场景中(塑料碎屑特征化、钻核扫描和垃圾监测)得到了明显的改善,包括计算成本、内存需求和总性能。
    Abstract The presence of undesired background areas associated with potential noise and unknown spectral characteristics degrades the performance of hyperspectral data processing. Masking out unwanted regions is key to addressing this issue. Processing only regions of interest yields notable improvements in terms of computational costs, required memory, and overall performance. The proposed processing pipeline encompasses two fundamental parts: regions of interest mask generation, followed by the application of hyperspectral data processing techniques solely on the newly masked hyperspectral cube. The novelty of our work lies in the methodology adopted for the preliminary image segmentation. We employ the Segment Anything Model (SAM) to extract all objects within the dataset, and subsequently refine the segments with a zero-shot Grounding Dino object detector, followed by intersection and exclusion filtering steps, without the need for fine-tuning or retraining. To illustrate the efficacy of the masking procedure, the proposed method is deployed on three challenging applications scenarios that demand accurate masking; shredded plastics characterization, drill core scanning, and litter monitoring. The numerical evaluation of the proposed masking method on the three applications is provided along with the used hyperparameters. The scripts for the method will be available at https://github.com/hifexplo/Masking.
    摘要 您的干扰背景预测数据处理性能的问题可以通过Masking来解决。我们的提案包括两个基本部分:首先生成区域兴趣标识符,然后仅应用干扰数据处理技术于新生成的干扰数据立方体中。我们的创新在于采用的Segment Anything Model(SAM)来提取数据集中的所有对象,然后使用零扩展Grounding Dino对象检测器进行筛选、排除掉重叠的步骤,无需进行微调或重新训练。为证明掩码方法的效果,我们在三个复杂的应用场景中运用了该方法,即杂物Characterization、钻探核心扫描和垃圾监测。我们提供了这三个应用场景的数值评估结果,并附上使用的超参数。方法的脚本将于https://github.com/hifexplo/Masking上公开。

Grouping Local Process Models

  • paper_url: http://arxiv.org/abs/2311.03040
  • repo_url: https://github.com/djdprogramming/adfa2
  • paper_authors: Viki Peeva, Wil M. P. van der Aalst
  • for: 本研究旨在提出一种三步管道来归类类似的本地过程模型(LPM),以解决模型爆炸和重复问题。
  • methods: 本研究使用了不同的过程模型相似度度量来归类LPM。
  • results: 实验结果表明, grouping 可以减少模型的数量,提高模型的精度和可读性。
    Abstract In recent years, process mining emerged as a proven technology to analyze and improve operational processes. An expanding range of organizations using process mining in their daily operation brings a broader spectrum of processes to be analyzed. Some of these processes are highly unstructured, making it difficult for traditional process discovery approaches to discover a start-to-end model describing the entire process. Therefore, the subdiscipline of Local Process Model (LPM) discovery tries to build a set of LPMs, i.e., smaller models that explain sub-behaviors of the process. However, like other pattern mining approaches, LPM discovery algorithms also face the problems of model explosion and model repetition, i.e., the algorithms may create hundreds if not thousands of models, and subsets of them are close in structure or behavior. This work proposes a three-step pipeline for grouping similar LPMs using various process model similarity measures. We demonstrate the usefulness of grouping through a real-life case study, and analyze the impact of different measures, the gravity of repetition in the discovered LPMs, and how it improves after grouping on multiple real event logs.
    摘要 Recently, process mining has emerged as a proven technology to analyze and improve operational processes. With an increasing number of organizations using process mining in their daily operations, a broader range of processes are being analyzed. However, some of these processes are highly unstructured, making it difficult for traditional process discovery approaches to discover a start-to-end model describing the entire process. Therefore, the subdiscipline of Local Process Model (LPM) discovery has emerged to build a set of LPMs, i.e., smaller models that explain sub-behaviors of the process. However, like other pattern mining approaches, LPM discovery algorithms also face the problems of model explosion and model repetition, i.e., the algorithms may create hundreds if not thousands of models, and subsets of them are close in structure or behavior. This work proposes a three-step pipeline for grouping similar LPMs using various process model similarity measures. We demonstrate the usefulness of grouping through a real-life case study, and analyze the impact of different measures, the gravity of repetition in the discovered LPMs, and how it improves after grouping on multiple real event logs.Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. Traditional Chinese is used in Hong Kong, Macau, and Taiwan.

GTP-ViT: Efficient Vision Transformers via Graph-based Token Propagation

  • paper_url: http://arxiv.org/abs/2311.03035
  • repo_url: https://github.com/ackesnal/gtp-vit
  • paper_authors: Xuwei Xu, Sen Wang, Yudong Chen, Yanping Zheng, Zhewei Wei, Jiajun Liu
  • for: 提高资源受限设备上的预训练 ViT 的效率,使其能够快速推理图像。
  • methods: 提出了一种基于图的减少方法,即图像信息传递方法(GTP),通过在图像中减少不重要的图像信息,并将其传递给相邻的重要图像信息,以提高模型的效率。
  • results: 对 ImageNet-1K 进行了大量的实验,并证明了 GTP 可以减少 DeiT-S 和 DeiT-B 的计算复杂度,同时保持模型的性能水平。特别是,GTP 可以在不需要Finetune的情况下,对各种背景下的各种核心体系进行更快的推理。
    Abstract Vision Transformers (ViTs) have revolutionized the field of computer vision, yet their deployments on resource-constrained devices remain challenging due to high computational demands. To expedite pre-trained ViTs, token pruning and token merging approaches have been developed, which aim at reducing the number of tokens involved in the computation. However, these methods still have some limitations, such as image information loss from pruned tokens and inefficiency in the token-matching process. In this paper, we introduce a novel Graph-based Token Propagation (GTP) method to resolve the challenge of balancing model efficiency and information preservation for efficient ViTs. Inspired by graph summarization algorithms, GTP meticulously propagates less significant tokens' information to spatially and semantically connected tokens that are of greater importance. Consequently, the remaining few tokens serve as a summarization of the entire token graph, allowing the method to reduce computational complexity while preserving essential information of eliminated tokens. Combined with an innovative token selection strategy, GTP can efficiently identify image tokens to be propagated. Extensive experiments have validated GTP's effectiveness, demonstrating both efficiency and performance improvements. Specifically, GTP decreases the computational complexity of both DeiT-S and DeiT-B by up to 26% with only a minimal 0.3% accuracy drop on ImageNet-1K without finetuning, and remarkably surpasses the state-of-the-art token merging method on various backbones at an even faster inference speed. The source code is available at https://github.com/Ackesnal/GTP-ViT.
    摘要 Computer vision 领域中的 Vision Transformers (ViTs) 已经革命化了,但是在具有限制的资源设备上部署它们仍然是一个挑战。为了快速部署预训练的 ViTs,批量缩减和批量合并方法已经被开发出来,以减少计算中的符号数。然而,这些方法仍然有一些限制,例如图像信息的损失和符号匹配过程的不效率。在这篇论文中,我们介绍了一种新的图表-基于的符号传播(GTP)方法,以解决计算复杂性和信息保留之间的平衡问题。这种方法基于图 summarization 算法,细致地在空间和Semantic上传递不重要的符号信息到更重要的符号上。因此,剩下的符号只需要扮演一个图像的概要,允许方法减少计算复杂性而不失信息。与一种创新的符号选择策略结合使用,GTP可以高效地选择图像中的符号。经验证明了 GTP 的有效性,可以在 ImageNet-1K 上 без fine-tuning 下降到 0.3% 的准确率下减少 DeiT-S 和 DeiT-B 的计算复杂性,并在不同的核心上轻松突破当前的状况报告。详细的源代码可以在 上找到。

Beyond Words: A Mathematical Framework for Interpreting Large Language Models

  • paper_url: http://arxiv.org/abs/2311.03033
  • repo_url: None
  • paper_authors: Javier González, Aditya V. Nori
  • for: This paper aims to provide a mathematical framework for understanding and improving large language models (LLMs).
  • methods: The paper proposes a framework called Hex, which clarifies key terms and concepts in LLM research and offers a precise and consistent way to characterize LLMs.
  • results: The paper differentiates chain-of-thought reasoning from chain-of-thought prompting and establishes the conditions under which they are equivalent. The paper argues that its formal definitions and results are crucial for advancing the discussion on how to build generative AI systems that are safe, reliable, fair, and robust.Here’s the Simplified Chinese version of the three points:
  • for: 这篇论文目的是为大语言模型(LLM)提供数学基础。
  • methods: 论文提出了名为“Hex”的框架,它在 LLM 研究中清晰地解释了关键术语和概念,并提供了一种精确和一致的方式来描述 LLM。
  • results: 论文区分了链条思维和链条提示,并确定了它们在何种情况下是等价的。论文认为,其 формаль定义和结果对于构建安全、可靠、公平、Robust的生成 AI 系统是关键的。
    Abstract Large language models (LLMs) are powerful AI tools that can generate and comprehend natural language text and other complex information. However, the field lacks a mathematical framework to systematically describe, compare and improve LLMs. We propose Hex a framework that clarifies key terms and concepts in LLM research, such as hallucinations, alignment, self-verification and chain-of-thought reasoning. The Hex framework offers a precise and consistent way to characterize LLMs, identify their strengths and weaknesses, and integrate new findings. Using Hex, we differentiate chain-of-thought reasoning from chain-of-thought prompting and establish the conditions under which they are equivalent. This distinction clarifies the basic assumptions behind chain-of-thought prompting and its implications for methods that use it, such as self-verification and prompt programming. Our goal is to provide a formal framework for LLMs that can help both researchers and practitioners explore new possibilities for generative AI. We do not claim to have a definitive solution, but rather a tool for opening up new research avenues. We argue that our formal definitions and results are crucial for advancing the discussion on how to build generative AI systems that are safe, reliable, fair and robust, especially in domains like healthcare and software engineering.
    摘要

Federated Learning for Clinical Structured Data: A Benchmark Comparison of Engineering and Statistical Approaches

  • paper_url: http://arxiv.org/abs/2311.03417
  • repo_url: https://github.com/nliulab/fl-benchmark
  • paper_authors: Siqi Li, Di Miao, Qiming Wu, Chuan Hong, Danny D’Agostino, Xin Li, Yilin Ning, Yuqing Shang, Huazhu Fu, Marcus Eng Hock Ong, Hamed Haddadi, Nan Liu
  • for: 保护医疗合作中数据隐私
  • methods: 比较工程域和统计领域的 Federated learning 框架
  • results: 统计式 Federated learning 算法提供较为准确的点估计,但工程域基础的方法可以生成更高精度的预测结果。
    Abstract Federated learning (FL) has shown promising potential in safeguarding data privacy in healthcare collaborations. While the term "FL" was originally coined by the engineering community, the statistical field has also explored similar privacy-preserving algorithms. Statistical FL algorithms, however, remain considerably less recognized than their engineering counterparts. Our goal was to bridge the gap by presenting the first comprehensive comparison of FL frameworks from both engineering and statistical domains. We evaluated five FL frameworks using both simulated and real-world data. The results indicate that statistical FL algorithms yield less biased point estimates for model coefficients and offer convenient confidence interval estimations. In contrast, engineering-based methods tend to generate more accurate predictions, sometimes surpassing central pooled and statistical FL models. This study underscores the relative strengths and weaknesses of both types of methods, emphasizing the need for increased awareness and their integration in future FL applications.
    摘要 联合学习(FL)在医疗合作中保护数据隐私的潜力备受关注。虽然“FL”这个词汇最初是由工程师社群提出的,但随后的统计学界也开始探索类似的隐私保护算法。统计学上的FL算法与工程师社群的算法相比,尚未获得相同的认知程度。我们的目标是将这两种领域的FL框架进行首次全面比较,以评估它们在实际应用中的优劣。我们使用了五种FL框架,包括工程师社群和统计学界的方法,并使用实验和真实数据进行评估。结果显示,统计学上的FL算法对数据的批评估计获得较低的偏见,并且可以轻松地Estimate interval的信度。相比之下,工程师社群的方法具有较高的预测精度,有时超过中央联合和统计学上的FL模型。这个研究强调了两种方法之间的相对优劣,并强调未来FL应用中需要增加这两种方法的融合。

Visual-information-driven model for crowd simulation using temporal convolutional network

  • paper_url: http://arxiv.org/abs/2311.02996
  • repo_url: None
  • paper_authors: Xuanwen Liang, Eric Wai Ming Lee
  • for: 提高数据驱动人群模拟模型的适应性和现实感
  • methods: incorporate visual information, including scenario geometry and pedestrian locomotion, to improve the adaptability and realism of data-driven crowd simulation models
  • results: 在三个不同的公共步行者动态数据集上测试并评估了视觉驱动的人群模拟模型,并显示了该模型在所有三个几何场景中的改进适应性。
    Abstract Crowd simulations play a pivotal role in building design, influencing both user experience and public safety. While traditional knowledge-driven models have their merits, data-driven crowd simulation models promise to bring a new dimension of realism to these simulations. However, most of the existing data-driven models are designed for specific geometries, leading to poor adaptability and applicability. A promising strategy for enhancing the adaptability and realism of data-driven crowd simulation models is to incorporate visual information, including the scenario geometry and pedestrian locomotion. Consequently, this paper proposes a novel visual-information-driven (VID) crowd simulation model. The VID model predicts the pedestrian velocity at the next time step based on the prior social-visual information and motion data of an individual. A radar-geometry-locomotion method is established to extract the visual information of pedestrians. Moreover, a temporal convolutional network (TCN)-based deep learning model, named social-visual TCN, is developed for velocity prediction. The VID model is tested on three public pedestrian motion datasets with distinct geometries, i.e., corridor, corner, and T-junction. Both qualitative and quantitative metrics are employed to evaluate the VID model, and the results highlight the improved adaptability of the model across all three geometric scenarios. Overall, the proposed method demonstrates effectiveness in enhancing the adaptability of data-driven crowd models.
    摘要 人群模拟在建筑设计中发挥重要作用,影响用户体验和公共安全。传统的知识驱动模型有其优点,但数据驱动人群模拟模型可以带来新的现实感。然而,现有的数据驱动模型大多适用于特定的几何结构,导致适应性和可用性强度有限。为了提高数据驱动人群模拟模型的适应性和现实感,本文提出了一种视觉信息驱动(VID)人群模拟模型。VID模型根据先前的社交视觉信息和人员运动数据预测下一步人群速度。为了提取视觉信息,本文提出了一种雷达几何运动方法。此外,本文还开发了一种基于深度学习的社交视觉径向网络(Social-Visual TCN)模型,用于速度预测。VID模型在三个不同的公共人群动向数据集上进行测试,包括通道、角落和T字口。使用质量和量度指标评估VID模型,结果表明VID模型在所有三个几何场景中的适应性得到了提高。总的来说,本文提出的方法可以提高数据驱动人群模拟模型的适应性。

PowerFlowNet: Leveraging Message Passing GNNs for Improved Power Flow Approximation

  • paper_url: http://arxiv.org/abs/2311.03415
  • repo_url: None
  • paper_authors: Nan Lin, Stavros Orfanoudakis, Nathan Ordonez Cardenas, Juan S. Giraldo, Pedro P. Vergara
  • for: 提高现代电力网络的准确和高效运行和规划
  • methods: 使用图神经网络(GNNs)提高PF近似的速度和准确性
  • results: 在简单的IEEE 14-bus系统和实际的法国高压网络(6470rte)中,PowerFlowNet在性能和执行时间方面与新顿-拉普逊方法具有相似性,但是实现4倍 faster,并在其他传统近似方法(如DC relaxation方法)的基础上显著地超越它们。
    Abstract Accurate and efficient power flow (PF) analysis is crucial in modern electrical networks' efficient operation and planning. Therefore, there is a need for scalable algorithms capable of handling large-scale power networks that can provide accurate and fast solutions. Graph Neural Networks (GNNs) have emerged as a promising approach for enhancing the speed of PF approximations by leveraging their ability to capture distinctive features from the underlying power network graph. In this study, we introduce PowerFlowNet, a novel GNN architecture for PF approximation that showcases similar performance with the traditional Newton-Raphson method but achieves it 4 times faster in the simple IEEE 14-bus system and 145 times faster in the realistic case of the French high voltage network (6470rte). Meanwhile, it significantly outperforms other traditional approximation methods, such as the DC relaxation method, in terms of performance and execution time; therefore, making PowerFlowNet a highly promising solution for real-world PF analysis. Furthermore, we verify the efficacy of our approach by conducting an in-depth experimental evaluation, thoroughly examining the performance, scalability, interpretability, and architectural dependability of PowerFlowNet. The evaluation provides insights into the behavior and potential applications of GNNs in power system analysis.
    摘要 准确高效电流流动(PF)分析是现代电力网络的efficient操作和规划中的关键。因此,有一需要可扩展的算法,可以处理大规模的电力网络,提供准确和快速的解决方案。图neuronal networks(GNNs)已经出现为PF近似中的一种有前途的方法,通过它们能够从电力网络图中捕捉特征。在这种研究中,我们介绍PowerFlowNet,一种新的GNN架构,用于PF近似,它在简单的IEEE 14-bus系统和实际的法国高压网络(6470rte)中显示了与传统的Newton-Raphson方法类似的性能,但是实现速度为4倍快和145倍快。此外,它也明显超过了其他传统的近似方法,如DC缓和方法,在性能和执行时间方面,因此PowerFlowNet是一个非常有前途的PF分析解决方案。此外,我们通过进行深入的实验评估,全面检查PowerFlowNet的性能、可扩展性、可读性和架构可靠性。实验结果为我们提供了GNN在电力系统分析中的行为和应用前景。

A Generative Neural Network Approach for 3D Multi-Criteria Design Generation and Optimization of an Engine Mount for an Unmanned Air Vehicle

  • paper_url: http://arxiv.org/abs/2311.03414
  • repo_url: None
  • paper_authors: Christoph Petroll, Sebastian Eilermann, Philipp Hoefer, Oliver Niggemann
  • for: 这 paper 的目的是用生成神经网络进行功能兼ね合的 3D 设计重构和生成。
  • methods: 这 paper 使用 Conditional Variational Autoencoder (CVAE) 和 Marching cubes algorithm 来生成 meshes 并对其进行 simulated 评估。
  • results: paper 能够生成符合自定义功能条件的优化设计。
    Abstract One of the most promising developments in computer vision in recent years is the use of generative neural networks for functionality condition-based 3D design reconstruction and generation. Here, neural networks learn dependencies between functionalities and a geometry in a very effective way. For a neural network the functionalities are translated in conditions to a certain geometry. But the more conditions the design generation needs to reflect, the more difficult it is to learn clear dependencies. This leads to a multi criteria design problem due various conditions, which are not considered in the neural network structure so far. In this paper, we address this multi-criteria challenge for a 3D design use case related to an unmanned aerial vehicle (UAV) motor mount. We generate 10,000 abstract 3D designs and subject them all to simulations for three physical disciplines: mechanics, thermodynamics, and aerodynamics. Then, we train a Conditional Variational Autoencoder (CVAE) using the geometry and corresponding multicriteria functional constraints as input. We use our trained CVAE as well as the Marching cubes algorithm to generate meshes for simulation based evaluation. The results are then evaluated with the generated UAV designs. Subsequently, we demonstrate the ability to generate optimized designs under self-defined functionality conditions using the trained neural network.
    摘要 In this paper, we address this multi-criteria challenge for a 3D design use case related to an unmanned aerial vehicle (UAV) motor mount. We generate 10,000 abstract 3D designs and subject them all to simulations for three physical disciplines: mechanics, thermodynamics, and aerodynamics. Then, we train a Conditional Variational Autoencoder (CVAE) using the geometry and corresponding multicriteria functional constraints as input. We use our trained CVAE as well as the Marching cubes algorithm to generate meshes for simulation-based evaluation. The results are then evaluated with the generated UAV designs. Subsequently, we demonstrate the ability to generate optimized designs under self-defined functionality conditions using the trained neural network.

Discret2Di – Deep Learning based Discretization for Model-based Diagnosis

  • paper_url: http://arxiv.org/abs/2311.03413
  • repo_url: None
  • paper_authors: Lukas Moddemann, Henrik Sebastian Steude, Alexander Diedrich, Oliver Niggemann
  • for: 本文提出了一种自动学习逻辑表示法,用于进行基于一致性的诊断。
  • methods: 本文使用了机器学习技术,将时间序列转化为逻辑表示,并自动学习逻辑规则。
  • results: 本文通过实验示出,自动学习逻辑规则可以有效地进行基于一致性的诊断。In English, this translates to:
  • for: The paper proposes an automated learning method for logical expressions for consistency-based diagnosis.
  • methods: The paper uses machine learning techniques to convert time series into logical representations and automatically learn logical rules.
  • results: The paper shows through experiments that automated learning of logical rules can effectively perform consistency-based diagnosis.
    Abstract Consistency-based diagnosis is an established approach to diagnose technical applications, but suffers from significant modeling efforts, especially for dynamic multi-modal time series. Machine learning seems to be an obvious solution, which becomes less obvious when looking at details: Which notion of consistency can be used? If logical calculi are still to be used, how can dynamic time series be transferred into the discrete world? This paper presents the methodology Discret2Di for automated learning of logical expressions for consistency-based diagnosis. While these logical calculi have advantages by providing a clear notion of consistency, they have the key problem of relying on a discretization of the dynamic system. The solution presented combines machine learning from both the time series and the symbolic domain to automate the learning of logical rules for consistency-based diagnosis.
    摘要 系统稳定性分析是一种已经确立的方法,用于诊断技术应用程序,但是它受到了模型化努力的限制,特别是对动态多Modal时间序列的诊断。机器学习似乎是一个自然的解决方案,但是当考虑到细节时,问题变得不那么明显:哪种一致性概念可以使用?如果逻辑calculus仍然被用,如何将动态时间序列转化成离散世界?本文介绍了一种方法ологи法Discret2Di,用于自动学习逻辑表达式 для稳定性分析。这些逻辑calculus有优点,因为它们提供了明确的一致性概念,但它们的关键问题是基于离散系统的分析。本文的解决方案是结合时间序列和符号领域的机器学习来自动学习逻辑规则 для稳定性分析。

TabRepo: A Large Scale Repository of Tabular Model Evaluations and its AutoML Applications

  • paper_url: http://arxiv.org/abs/2311.02971
  • repo_url: https://github.com/autogluon/tabrepo
  • paper_authors: David Salinas, Nick Erickson
  • for: 这篇论文是为了介绍一个新的Tabular模型评估和预测数据集(TabRepo)。
  • methods: 论文使用了1206个模型在200个回归和分类 datasets上进行了预测和评估。
  • results: 论文表明,通过使用 TabRepo 可以免费地比较自动化机器学习(AutoML)系统和精细化优化hyperparameter,以及应用标准的传输学习技术可以超越当前的标准Tabular系统在准确性、运行时间和延迟方面。
    Abstract We introduce TabRepo, a new dataset of tabular model evaluations and predictions. TabRepo contains the predictions and metrics of 1206 models evaluated on 200 regression and classification datasets. We illustrate the benefit of our datasets in multiple ways. First, we show that it allows to perform analysis such as comparing Hyperparameter Optimization against current AutoML systems while also considering ensembling at no cost by using precomputed model predictions. Second, we show that our dataset can be readily leveraged to perform transfer-learning. In particular, we show that applying standard transfer-learning techniques allows to outperform current state-of-the-art tabular systems in accuracy, runtime and latency.
    摘要 我们介绍TabRepo,一个新的Tabular模型评估和预测Dataset。TabRepo包含1206个模型在200个回归和分类Dataset上的预测和度量。我们显示了我们的Dataset的价值,包括:1. 可以免费使用预computed模型预测来比较搜寻过程优化和现有的AutoML系统,以及考虑结合。2. 可以快速地将模型应用到新的Dataset上,并且使用标准的转移学习技术来超越目前的Tabular系统在准确、运行时间和延迟方面的表现。Here's the translation in Traditional Chinese:我们介绍TabRepo,一个新的Tabular模型评估和预测Dataset。TabRepo包含1206个模型在200个回归和分类Dataset上的预测和度量。我们显示了我们的Dataset的价值,包括:1. 可以免费使用预computed模型预测来比较搜寻过程优化和现有的AutoML系统,以及考虑结合。2. 可以快速地将模型应用到新的Dataset上,并且使用标准的转移学习技术来超越目前的Tabular系统在准确、运行时间和延遁方面的表现。

Retrieval-Augmented Code Generation for Universal Information Extraction

  • paper_url: http://arxiv.org/abs/2311.02962
  • repo_url: None
  • paper_authors: Yucan Guo, Zixuan Li, Xiaolong Jin, Yantao Liu, Yutao Zeng, Wenxuan Liu, Xiang Li, Pan Yang, Long Bai, Jiafeng Guo, Xueqi Cheng
  • for: 本文提出了一个基于大语言模型(LLMs)的通用扩展代码生成框架,用于信息抽取(IE)任务。
  • methods: 本文使用了Python类定义任务特定的结构知识表示,并运用了内容学习机制以将文本中的信息转换为代码。
  • results: 实验结果显示, Code4UIE 框架可以实现高效地对五种代表性的IE任务进行扩展代码生成。
    Abstract Information Extraction (IE) aims to extract structural knowledge (e.g., entities, relations, events) from natural language texts, which brings challenges to existing methods due to task-specific schemas and complex text expressions. Code, as a typical kind of formalized language, is capable of describing structural knowledge under various schemas in a universal way. On the other hand, Large Language Models (LLMs) trained on both codes and texts have demonstrated powerful capabilities of transforming texts into codes, which provides a feasible solution to IE tasks. Therefore, in this paper, we propose a universal retrieval-augmented code generation framework based on LLMs, called Code4UIE, for IE tasks. Specifically, Code4UIE adopts Python classes to define task-specific schemas of various structural knowledge in a universal way. By so doing, extracting knowledge under these schemas can be transformed into generating codes that instantiate the predefined Python classes with the information in texts. To generate these codes more precisely, Code4UIE adopts the in-context learning mechanism to instruct LLMs with examples. In order to obtain appropriate examples for different tasks, Code4UIE explores several example retrieval strategies, which can retrieve examples semantically similar to the given texts. Extensive experiments on five representative IE tasks across nine datasets demonstrate the effectiveness of the Code4UIE framework.
    摘要 信息提取(IE)的目标是从自然语言文本中提取结构知识(例如实体、关系、事件),这会对现有方法带来挑战,因为任务特定的 schema 和文本表达的复杂性。代码作为一种形式化语言,可以在不同的 schema 下描述结构知识,并且在各种任务下可以通过代码来实现这些知识。在这篇论文中,我们提出了一种基于大语言模型(LLM)的通用检索增强代码生成框架,称为 Code4UIE,用于IE任务。specifically,Code4UIE 使用 Python 类来定义任务特定的 schema,并通过在这些类中实例化文本中的信息来提取知识。为了生成代码更加准确,Code4UIE 采用了在上下文学习机制,使 LLM 通过示例进行指导。为了获得不同任务的合适示例,Code4UIE 探索了多种示例检索策略,可以从文本中检索到与给定文本相似的示例。经验表明,Code4UIE 框架在五种代表性IE任务中的九个数据集上得到了广泛的应用。

In-Context Learning for Knowledge Base Question Answering for Unmanned Systems based on Large Language Models

  • paper_url: http://arxiv.org/abs/2311.02956
  • repo_url: None
  • paper_authors: Yunlong Chen, Yaming Zhang, Jianfei Yu, Li Yang, Rui Xia
  • for: Answer factoid questions based on knowledge bases
  • methods: Use ChatGPT-based Cypher Query Language (CQL) generation framework to generate the most appropriate CQL based on Natural Language Questions (NLQ)
  • results: Achieved the second place in the CCKS 2023 Question Answering with Knowledge Graph Inference for Unmanned Systems competition, with an F1-score of 0.92676
    Abstract Knowledge Base Question Answering (KBQA) aims to answer factoid questions based on knowledge bases. However, generating the most appropriate knowledge base query code based on Natural Language Questions (NLQ) poses a significant challenge in KBQA. In this work, we focus on the CCKS2023 Competition of Question Answering with Knowledge Graph Inference for Unmanned Systems. Inspired by the recent success of large language models (LLMs) like ChatGPT and GPT-3 in many QA tasks, we propose a ChatGPT-based Cypher Query Language (CQL) generation framework to generate the most appropriate CQL based on the given NLQ. Our generative framework contains six parts: an auxiliary model predicting the syntax-related information of CQL based on the given NLQ, a proper noun matcher extracting proper nouns from the given NLQ, a demonstration example selector retrieving similar examples of the input sample, a prompt constructor designing the input template of ChatGPT, a ChatGPT-based generation model generating the CQL, and an ensemble model to obtain the final answers from diversified outputs. With our ChatGPT-based CQL generation framework, we achieved the second place in the CCKS 2023 Question Answering with Knowledge Graph Inference for Unmanned Systems competition, achieving an F1-score of 0.92676.
    摘要 知识库问答(KBQA)目标是基于知识库回答问题,但生成基于自然语言问题(NLQ)的知识库查询代码具有显著挑战。在这项工作中,我们将ocusonCCKS2023问答知识图推理对无人系统的竞赛。受最近大语言模型(LLMs)如ChatGPT和GPT-3在多种问答任务中的成功启发,我们提议一个基于ChatGPT的CypherQuery语言(CQL)生成框架,以生成基于给定NLQ的最佳CQL。我们的生成框架包括六部分:一个辅助模型预测基于给定NLQ的CQL语法信息,一个正式名词匹配器从NLQ中提取正式名词,一个示例选择器选择与输入样本相似的示例,一个提示构建器设计输入模板,一个基于ChatGPT的生成模型生成CQL,以及一个 ensemble模型以获得多元输出的最终答案。与我们的ChatGPT基于CQL生成框架,我们在CCKS2023问答知识图推理对无人系统竞赛中获得第二名,实现了F1分数0.92676。

Can LLMs Follow Simple Rules?

  • paper_url: http://arxiv.org/abs/2311.04235
  • repo_url: https://github.com/normster/llm_rules
  • paper_authors: Norman Mu, Sarah Chen, Zifan Wang, Sizhe Chen, David Karamardian, Lulwa Aljeraisy, Dan Hendrycks, David Wagner
  • for: 本研究旨在提供一个程式码框架,以评估自然语言处理器(LLM)在开发者提供的规则下运行。
  • methods: 本研究使用了15个简单文本场景,让模型遵循开发者提供的规则来互动 avec human user。每个场景都有一个简洁的评估程式,以判断模型是否违反了规则。
  • results: 研究发现所有评估过的 популярProprietary和开源模型都受到了访问者调制的手动输入攻击,而GPT-4是所有模型中最好的表现。此外,研究还评估了开放模型对于梯度基本攻击的漏洞。
    Abstract As Large Language Models (LLMs) are deployed with increasing real-world responsibilities, it is important to be able to specify and constrain the behavior of these systems in a reliable manner. Model developers may wish to set explicit rules for the model, such as "do not generate abusive content", but these may be circumvented by jailbreaking techniques. Evaluating how well LLMs follow developer-provided rules in the face of adversarial inputs typically requires manual review, which slows down monitoring and methods development. To address this issue, we propose Rule-following Language Evaluation Scenarios (RuLES), a programmatic framework for measuring rule-following ability in LLMs. RuLES consists of 15 simple text scenarios in which the model is instructed to obey a set of rules in natural language while interacting with the human user. Each scenario has a concise evaluation program to determine whether the model has broken any rules in a conversation. Through manual exploration of model behavior in our scenarios, we identify 6 categories of attack strategies and collect two suites of test cases: one consisting of unique conversations from manual testing and one that systematically implements strategies from the 6 categories. Across various popular proprietary and open models such as GPT-4 and Llama 2, we find that all models are susceptible to a wide variety of adversarial hand-crafted user inputs, though GPT-4 is the best-performing model. Additionally, we evaluate open models under gradient-based attacks and find significant vulnerabilities. We propose RuLES as a challenging new setting for research into exploring and defending against both manual and automatic attacks on LLMs.
    摘要 As Large Language Models (LLMs) are deployed with increasing real-world responsibilities, it is important to be able to specify and constrain the behavior of these systems in a reliable manner. Model developers may wish to set explicit rules for the model, such as "do not generate abusive content", but these may be circumvented by jailbreaking techniques. Evaluating how well LLMs follow developer-provided rules in the face of adversarial inputs typically requires manual review, which slows down monitoring and methods development. To address this issue, we propose Rule-following Language Evaluation Scenarios (RuLES), a programmatic framework for measuring rule-following ability in LLMs. RuLES consists of 15 simple text scenarios in which the model is instructed to obey a set of rules in natural language while interacting with the human user. Each scenario has a concise evaluation program to determine whether the model has broken any rules in a conversation. Through manual exploration of model behavior in our scenarios, we identify 6 categories of attack strategies and collect two suites of test cases: one consisting of unique conversations from manual testing and one that systematically implements strategies from the 6 categories. Across various popular proprietary and open models such as GPT-4 and Llama 2, we find that all models are susceptible to a wide variety of adversarial hand-crafted user inputs, though GPT-4 is the best-performing model. Additionally, we evaluate open models under gradient-based attacks and find significant vulnerabilities. We propose RuLES as a challenging new setting for research into exploring and defending against both manual and automatic attacks on LLMs.Here's the translation in Traditional Chinese:当大型语言模型(LLMs)在实际应用中推广时,重要的是能够Specify和限制这些系统的行为,以确保它们可靠地进行工作。开发模型的人可能会想要设定模型的Explicit规则,例如“不要生成攻击性内容”,但这些规则可能会被破坏。评估LLMs在面对恶意输入时遵循开发者提供的规则需要手动审查,这会让监控和开发方法变得 slower。为解决这个问题,我们提出了Rule-following Language Evaluation Scenarios(RuLES),一个程式设计的框架,用于评估LLMs的遵循能力。RuLES包括15个简单文本场景,在其中模型需要遵循开发者提供的规则,并与人类用户互动。每个场景有一个简洁的评估程式,用于决定模型在对话中是否破坏了任何规则。通过我们的手动探索模型行为的方式,我们识别出6种攻击策略,并收集了两个测试集:一个是从手动测试中获得的唯一对话,另一个是实现了6种攻击策略的系统性测试集。在各种受欢迎的专有和开源模型(如GPT-4和Llama 2)中,我们发现所有模型都受到了许多攻击性的手动输入的威胁。此外,我们在Gradient-based攻击下进行评估,发现开放模型存在重大的漏洞。我们提议RuLES作为一个挑战性的新设定,用于研究LLMs在面对手动和自动攻击时的探索和防御。

Contrastive Multi-Level Graph Neural Networks for Session-based Recommendation

  • paper_url: http://arxiv.org/abs/2311.02938
  • repo_url: None
  • paper_authors: Fuyun Wang, Xingyu Gao, Zhenyu Chen, Lei Lyu
  • for: This paper aims to improve session-based recommendation by exploiting complex and high-order item transition information.
  • methods: The proposed method, called contrastive multi-level graph neural networks (CM-GNN), uses a combination of local-level, global-level, and hyper-level graph convolutional networks, as well as an attention-based fusion module to capture pairwise relations and high-order information among item transitions.
  • results: The proposed method outperforms state-of-the-art session-based recommendation techniques in extensive experiments on multiple benchmark datasets.Here’s the Chinese translation of the three points:
  • for: 这篇论文目标是改进会话基于推荐,通过捕捉复杂和高阶ITEM转换信息。
  • methods: 提议方法是对比例多级图 neural networks (CM-GNN),使用了本地级、全球级和超过级图 convolutional networks,以及一个注意力基于的融合模块,来捕捉ITEM转换对的对应关系和高阶信息。
  • results: 提议方法在多个广泛使用的数据集上进行了广泛的实验,并证明了与会话基于推荐技术的状态OF-THE-ART的性能优于。
    Abstract Session-based recommendation (SBR) aims to predict the next item at a certain time point based on anonymous user behavior sequences. Existing methods typically model session representation based on simple item transition information. However, since session-based data consists of limited users' short-term interactions, modeling session representation by capturing fixed item transition information from a single dimension suffers from data sparsity. In this paper, we propose a novel contrastive multi-level graph neural networks (CM-GNN) to better exploit complex and high-order item transition information. Specifically, CM-GNN applies local-level graph convolutional network (L-GCN) and global-level network (G-GCN) on the current session and all the sessions respectively, to effectively capture pairwise relations over all the sessions by aggregation strategy. Meanwhile, CM-GNN applies hyper-level graph convolutional network (H-GCN) to capture high-order information among all the item transitions. CM-GNN further introduces an attention-based fusion module to learn pairwise relation-based session representation by fusing the item representations generated by L-GCN and G-GCN. CM-GNN averages the item representations obtained by H-GCN to obtain high-order relation-based session representation. Moreover, to convert the high-order item transition information into the pairwise relation-based session representation, CM-GNN maximizes the mutual information between the representations derived from the fusion module and the average pool layer by contrastive learning paradigm. We conduct extensive experiments on multiple widely used benchmark datasets to validate the efficacy of the proposed method. The encouraging results demonstrate that our proposed method outperforms the state-of-the-art SBR techniques.
    摘要 Session-based recommendation (SBR) 目标是在特定时间点预测下一个项目,基于匿名用户行为序列。现有方法通常是基于简单的项目转移信息来建模会话表示。然而,由于会话数据由有限数量的用户的短时间互动组成,使得基于单一维度的项目转移信息来建模会话表示存在数据稀缺。在这篇论文中,我们提出了一种新的对比式多级图 neural network (CM-GNN),用于更好地利用复杂的高阶项目转移信息。特别是,CM-GNN 在当前会话和所有会话上应用了本地级别的图干涉网络 (L-GCN) 和全级别网络 (G-GCN),以有效地捕捉会话中对所有会话的对比关系。此外,CM-GNN 还应用了高级别的图干涉网络 (H-GCN),以捕捉高阶的项目转移信息。CM-GNN 还引入了一种注意力基于的融合模块,用于学习对比关系基于会话表示。CM-GNN 将 obtained 由 L-GCN 和 G-GCN 生成的项目表示进行 fusion,并使用 H-GCN 生成的高阶关系基于SESSION表示。此外,为了将高阶项目转移信息转化为对比关系基于会话表示,CM-GNN 使用对比学习框架强制最大化对这两个表示之间的共 informations。我们在多个广泛使用的 benchmark 数据集上进行了广泛的实验,以验证我们提出的方法的有效性。结果表明,我们的提出的方法在对State-of-the-art SBR 技术进行比较时表现出色。

Deep Image Semantic Communication Model for Artificial Intelligent Internet of Things

  • paper_url: http://arxiv.org/abs/2311.02926
  • repo_url: https://github.com/meatery/semantic-segmentation
  • paper_authors: Li Ping Qian, Yi Zhang, Sikai Lyu, Huijie Zhu, Yuan Wu, Xuemin Sherman Shen, Xiaoniu Yang
  • for: 提出一种深度学习图像semantic通信模型,以提高AIoT设备中图像数据的有效传输和恢复。
  • methods: 提议在传输端使用高精度图像semantic分割算法提取图像semantic信息,以实现图像数据的显著压缩。在接收端,使用基于GAN的semantic图像恢复算法将semantic图像转换为详细的真实场景图像。
  • results: 对比WebP和CycleGAN,提议的图像semantic通信模型可以提高图像压缩率和恢复精度,平均提高71.93%和25.07%。此外,我们的demo实验表明,提议模型可以将图像传输延迟降低95.26%。
    Abstract With the rapid development of Artificial Intelligent Internet of Things (AIoT), the image data from AIoT devices has been witnessing the explosive increasing. In this paper, a novel deep image semantic communication model is proposed for the efficient image communication in AIoT. Particularly, at the transmitter side, a high-precision image semantic segmentation algorithm is proposed to extract the semantic information of the image to achieve significant compression of the image data. At the receiver side, a semantic image restoration algorithm based on Generative Adversarial Network (GAN) is proposed to convert the semantic image to a real scene image with detailed information. Simulation results demonstrate that the proposed image semantic communication model can improve the image compression ratio and recovery accuracy by 71.93% and 25.07% on average in comparison with WebP and CycleGAN, respectively. More importantly, our demo experiment shows that the proposed model reduces the total delay by 95.26% in the image communication, when comparing with the original image transmission.
    摘要 随着人工智能互联网物联网(AIoT)的快速发展,AIoT设备中的图像数据已经经历了急剧增长。在这篇论文中,我们提出了一种新的深度图像Semantic Communication模型,用于AIoT中高效的图像通信。特别是在发送端,我们提出了一种高精度图像semantic分割算法,以EXTRACT图像中的semantic信息,以实现图像数据的显著压缩。在接收端,我们提出了一种基于Generative Adversarial Network(GAN)的semantic图像恢复算法,以将semantic图像转换为详细信息的真实场景图像。实验结果表明,我们提出的图像Semantic Communication模型可以提高图像压缩比和恢复精度,比WebP和CycleGAN的平均提高71.93%和25.07%。此外,我们的demo实验表明,我们的模型可以将图像通信总延迟减少95.26%,比原始图像传输更加高效。

Virtual Action Actor-Critic Framework for Exploration (Student Abstract)

  • paper_url: http://arxiv.org/abs/2311.02916
  • repo_url: None
  • paper_authors: Bumgeun Park, Taeyoung Kim, Quoc-Vinh Lai-Dang, Dongsoo Har
  • for: 提高RL中agent的寻找效率
  • methods: 提出了一种新的actor-critic框架,即虚拟行为actor-critic(VAAC),以解决RL中agent的寻找效率挑战。
  • results: 实验结果显示,VAAC比现有算法更高效地进行寻找。
    Abstract Efficient exploration for an agent is challenging in reinforcement learning (RL). In this paper, a novel actor-critic framework namely virtual action actor-critic (VAAC), is proposed to address the challenge of efficient exploration in RL. This work is inspired by humans' ability to imagine the potential outcomes of their actions without actually taking them. In order to emulate this ability, VAAC introduces a new actor called virtual actor (VA), alongside the conventional actor-critic framework. Unlike the conventional actor, the VA takes the virtual action to anticipate the next state without interacting with the environment. With the virtual policy following a Gaussian distribution, the VA is trained to maximize the anticipated novelty of the subsequent state resulting from a virtual action. If any next state resulting from available actions does not exhibit high anticipated novelty, training the VA leads to an increase in the virtual policy entropy. Hence, high virtual policy entropy represents that there is no room for exploration. The proposed VAAC aims to maximize a modified Q function, which combines cumulative rewards and the negative sum of virtual policy entropy. Experimental results show that the VAAC improves the exploration performance compared to existing algorithms.
    摘要 RL中的agent寻找最有效的探索方式是一项挑战。本文提出了一种新的actor-critic框架,即虚拟动作actor-critic(VAAC),以解决RL中的探索挑战。这项工作受人类能够想象自己的行动结果而不需要实际行动的能力所启发。为了模仿这种能力,VAAC引入了一个新的actor,即虚拟actor(VA)。与传统actor-critic框架不同的是,VA不需要与环境交互就可以预测下一个状态。通过虚拟策略按照高维度分布随机选择虚拟动作,VA在预测下一个状态时尽可能增加其 novaativity。如果可用的动作中任何一个状态不具备高预测 novaativity,则训练VA会增加虚拟策略的 entropy。因此,高虚拟策略 entropy表示探索空间充满潜能。VAAC的目标是最大化修改后Q函数,该函数组合了总奖励和虚拟策略 entropy的负数。实验结果表明,VAAC在探索性能方面比现有算法提高了。

Imitation Learning based Alternative Multi-Agent Proximal Policy Optimization for Well-Formed Swarm-Oriented Pursuit Avoidance

  • paper_url: http://arxiv.org/abs/2311.02912
  • repo_url: None
  • paper_authors: Sizhao Li, Yuming Xiang, Rongpeng Li, Zhifeng Zhao, Honggang Zhang
  • for: 该论文主要研究了多机器人系统(MRS)的协同控制问题,尤其是在大规模Decentralized MRS中实现追逐避免任务的可能性。
  • methods: 该论文提出了一种基于仿写学的多代理控制算法(IA-MAPPO),用于在协同控制下实现追逐逃脱任务。该算法包括一个基于策略浸泡的MAPPO执行器,以及使用仿写学来减少通信开销和提高扩展性。
  • results: simulations results validate the effectiveness of IA-MAPPO, and extensive ablation experiments show that the performance is comparable to a centralized solution with significant decrease in communication overheads.
    Abstract Multi-Robot System (MRS) has garnered widespread research interest and fostered tremendous interesting applications, especially in cooperative control fields. Yet little light has been shed on the compound ability of formation, monitoring and defence in decentralized large-scale MRS for pursuit avoidance, which puts stringent requirements on the capability of coordination and adaptability. In this paper, we put forward a decentralized Imitation learning based Alternative Multi-Agent Proximal Policy Optimization (IA-MAPPO) algorithm to provide a flexible and communication-economic solution to execute the pursuit avoidance task in well-formed swarm. In particular, a policy-distillation based MAPPO executor is firstly devised to capably accomplish and swiftly switch between multiple formations in a centralized manner. Furthermore, we utilize imitation learning to decentralize the formation controller, so as to reduce the communication overheads and enhance the scalability. Afterwards, alternative training is leveraged to compensate the performance loss incurred by decentralization. The simulation results validate the effectiveness of IA-MAPPO and extensive ablation experiments further show the performance comparable to a centralized solution with significant decrease in communication overheads.
    摘要

ViDa: Visualizing DNA hybridization trajectories with biophysics-informed deep graph embeddings

  • paper_url: http://arxiv.org/abs/2311.03411
  • repo_url: https://github.com/chenwei-zhang/ViDa
  • paper_authors: Chenwei Zhang, Jordan Lovrod, Boyan Beronov, Khanh Dao Duc, Anne Condon
  • For: 这个论文是为了帮助生物化学家和分子编程师理解核酸反应的复杂激发路径,并可以用于多种应用。* Methods: 该论文使用了一种名为 kontinuous-time Markov chain(CTMC)的模型,并使用了一种新的视觉化方法called ViDa,以Visualize DNA reaction trajectories的二维嵌入。* Results: 该论文的结果表明,使用域Specific supervised terms可以提高visualization的质量,并成功分离不同的folding pathways,提供了有用的反应机理的启示。
    Abstract Visualization tools can help synthetic biologists and molecular programmers understand the complex reactive pathways of nucleic acid reactions, which can be designed for many potential applications and can be modelled using a continuous-time Markov chain (CTMC). Here we present ViDa, a new visualization approach for DNA reaction trajectories that uses a 2D embedding of the secondary structure state space underlying the CTMC model. To this end, we integrate a scattering transform of the secondary structure adjacency, a variational autoencoder, and a nonlinear dimensionality reduction method. We augment the training loss with domain-specific supervised terms that capture both thermodynamic and kinetic features. We assess ViDa on two well-studied DNA hybridization reactions. Our results demonstrate that the domain-specific features lead to significant quality improvements over the state-of-the-art in DNA state space visualization, successfully separating different folding pathways and thus providing useful insights into dominant reaction mechanisms.
    摘要 <>Visualization 工具可以帮助 sintetic biology 和分子程序员理解聚合物reactions的复杂reacting pathways,这些reactions可以设计为多种可能性,并且可以使用连续时间Markov链(CTMC)来模型。在这里,我们介绍了一种新的可见化方法,即 ViDa,它使用CTMC模型下的secondary structure状态空间的2D嵌入来可见化DNA反应轨迹。为此,我们将scattering transform of secondary structure adjacency、variational autoencoder和非线性维度减少方法相互融合。我们还添加了域pecific的supervised terms,以捕捉thermodynamic和kinetic特征。我们对两个已经广泛研究的DNA杂化反应进行评估。我们的结果表明,域pecific特征导致了state space可见化中的质量提升,成功地分离不同的folding pathways,从而提供了关键的反应机理的视角。Note: Simplified Chinese is used here, which is a more casual and informal style of Chinese. If you prefer Traditional Chinese or a more formal style, please let me know and I can translate it accordingly.

Deep Learning-Empowered Semantic Communication Systems with a Shared Knowledge Base

  • paper_url: http://arxiv.org/abs/2311.02884
  • repo_url: None
  • paper_authors: Peng Yi, Yang Cao, Xin Kang, Ying-Chang Liang
  • for: 提高6G网络中semantic communication系统的可解释性。
  • methods: 提出一种基于深度学习的semantic communication系统,利用共享知识库提高系统的可解释性。
  • results: 对比基eline方法,提出的方法可以减少数据传输量,同时保持语句相似性。Here’s a brief explanation of each point:
  • for: The paper aims to improve the explainability of semantic communication systems in future 6G networks.
  • methods: The proposed method uses a shared knowledge base to integrate messages and corresponding knowledge, enabling the system to transmit fewer symbols without sacrificing semantic performance.
  • results: The proposed approach outperforms existing baseline methods in terms of transmitted data size and sentence similarity, as demonstrated by simulation results.
    Abstract Deep learning-empowered semantic communication is regarded as a promising candidate for future 6G networks. Although existing semantic communication systems have achieved superior performance compared to traditional methods, the end-to-end architecture adopted by most semantic communication systems is regarded as a black box, leading to the lack of explainability. To tackle this issue, in this paper, a novel semantic communication system with a shared knowledge base is proposed for text transmissions. Specifically, a textual knowledge base constructed by inherently readable sentences is introduced into our system. With the aid of the shared knowledge base, the proposed system integrates the message and corresponding knowledge from the shared knowledge base to obtain the residual information, which enables the system to transmit fewer symbols without semantic performance degradation. In order to make the proposed system more reliable, the semantic self-information and the source entropy are mathematically defined based on the knowledge base. Furthermore, the knowledge base construction algorithm is developed based on a similarity-comparison method, in which a pre-configured threshold can be leveraged to control the size of the knowledge base. Moreover, the simulation results have demonstrated that the proposed approach outperforms existing baseline methods in terms of transmitted data size and sentence similarity.
    摘要 深度学习驱动的semantic通信被视为未来6G网络中的优秀候选人。虽然现有的semantic通信系统已经达到了传统方法的超越性,但大多数semantic通信系统的端到端架构被视为黑盒模型,导致无法解释性的问题。为解决这个问题,本文提出了一种基于文本传输的新的semantic通信系统。具体来说,我们提出了一个基于自然可读的句子构建的文本知识库。通过与共享知识库的集成,我们的系统可以通过获取剩余信息来传输更少的符号,而无需增加 semantic 性能下降。为使我们的系统更加可靠,我们在知识库中定义了semantic自信息和源 entropy。此外,我们还开发了基于相似比较方法的知识库构建算法,可以通过配置阈值来控制知识库的大小。最后,我们通过对比实验结果,证明了我们的方法可以比现有的基准方法更好地压缩数据和保持句子相似性。

DP-DCAN: Differentially Private Deep Contrastive Autoencoder Network for Single-cell Clustering

  • paper_url: http://arxiv.org/abs/2311.03410
  • repo_url: None
  • paper_authors: Huifa Li, Jie Fu, Zhili Chen, Xiaomin Yang, Haitao Liu, Xinpeng Ling
  • for: 本研究旨在提出一种基于深度学习的具有隐私保护特性的单元细胞 clustering 方法,以保护用户隐私。
  • methods: 该方法基于 autoencoder 网络,通过部分网络杂化来实现隐私保护。
  • results: 实验结果显示,DP-DCAN 比传统的DP方案具有更好的性能和更强的鲁棒性。In English, this translates to:
  • for: The paper aims to propose a deep learning-based single-cell clustering method with privacy protection, to protect user privacy.
  • methods: The method is based on an autoencoder network, and achieves privacy protection through partial network perturbation.
  • results: Experimental results show that DP-DCAN outperforms traditional DP schemes and has stronger robustness to adversarial attacks.
    Abstract Single-cell RNA sequencing (scRNA-seq) is important to transcriptomic analysis of gene expression. Recently, deep learning has facilitated the analysis of high-dimensional single-cell data. Unfortunately, deep learning models may leak sensitive information about users. As a result, Differential Privacy (DP) is increasingly used to protect privacy. However, existing DP methods usually perturb whole neural networks to achieve differential privacy, and hence result in great performance overheads. To address this challenge, in this paper, we take advantage of the uniqueness of the autoencoder that it outputs only the dimension-reduced vector in the middle of the network, and design a Differentially Private Deep Contrastive Autoencoder Network (DP-DCAN) by partial network perturbation for single-cell clustering. Since only partial network is added with noise, the performance improvement is obvious and twofold: one part of network is trained with less noise due to a bigger privacy budget, and the other part is trained without any noise. Experimental results of six datasets have verified that DP-DCAN is superior to the traditional DP scheme with whole network perturbation. Moreover, DP-DCAN demonstrates strong robustness to adversarial attacks. The code is available at https://github.com/LFD-byte/DP-DCAN.
    摘要 单元细胞RNAseq(scRNA-seq)对转录组分析表达物的研究具有重要意义。现在,深度学习技术已经使得高维单元细胞数据的分析变得更加容易。然而,深度学习模型可能泄露用户敏感信息,因此隐私保护成为了一项重要的问题。现有的隐私保护方法通常是整个神经网络的杂化,从而导致性能开销很大。为解决这个挑战,我们在这篇论文中利用自动encoder的独特性,即它只输出减少维度的向量,并设计了一种部分神经网络杂化的干扰隐私网络(DP-DCAN)。由于只有部分神经网络受到噪声杂化,性能提高是明显的两倍:一部分神经网络在噪声更小的情况下训练,另一部分则是没有噪声的训练。实验结果表明,DP-DCAN比传统的DP方案更加有优势,并且具有强大的鲁棒性。代码可以在https://github.com/LFD-byte/DP-DCAN上下载。

Visualizing DNA reaction trajectories with deep graph embedding approaches

  • paper_url: http://arxiv.org/abs/2311.03409
  • repo_url: https://github.com/chenwei-zhang/ViDa
  • paper_authors: Chenwei Zhang, Khanh Dao Duc, Anne Condon
  • for: 这个论文是为了设计新的聚合酶反应,以便在各种应用中使用。
  • methods: 这篇论文使用了深度图像嵌入模型和常见维度减少方法,将高维数据映射到2D的欧式空间中。
  • results: 我们的初步结果表明,ViDa可以成功地分离不同的折叠机制,从而为用户提供有用的信息,并且比现有的DNA动力学视化方法更好。
    Abstract Synthetic biologists and molecular programmers design novel nucleic acid reactions, with many potential applications. Good visualization tools are needed to help domain experts make sense of the complex outputs of folding pathway simulations of such reactions. Here we present ViDa, a new approach for visualizing DNA reaction folding trajectories over the energy landscape of secondary structures. We integrate a deep graph embedding model with common dimensionality reduction approaches, to map high-dimensional data onto 2D Euclidean space. We assess ViDa on two well-studied and contrasting DNA hybridization reactions. Our preliminary results suggest that ViDa's visualization successfully separates trajectories with different folding mechanisms, thereby providing useful insight to users, and is a big improvement over the current state-of-the-art in DNA kinetics visualization.
    摘要 生物化学家和分子程序员设计了新的核酸反应,有很多应用前景。为了帮助领域专家理解复杂的输出,我们需要一些好的可视化工具。我们现在提出了ViDa,一种新的方法用于可视化DNA反应折叠路径在二维空间中的能量阶段特征。我们将深度图嵌入模型与常见维度减少方法结合,将高维数据映射到二维欧氏空间中。我们对两种已经广泛研究和不同折叠机制的DNA协同反应进行了预liminary测试,结果表明ViDa的可视化成功地分离不同折叠机制的轨迹,为用户提供有用的信息,而且与当前DNA动力学可视化领域的状态 искусственный智能有很大改进。

Temporal Shift – Multi-Objective Loss Function for Improved Anomaly Fall Detection

  • paper_url: http://arxiv.org/abs/2311.02863
  • repo_url: None
  • paper_authors: Stefan Denkovski, Shehroz S. Khan, Alex Mihailidis
  • for: 预防跌倒lder adults中的伤害和死亡,精确的跌倒探测可以帮助降低这些风险。
  • methods: 使用 autoencoder 和其他相关的数据填充架构,进行跌倒探测。
  • results: 比较多种模型,发现 Temporal Shift 对于跌倒探测有着优秀的表现,尤其是在单一摄像头上,与传统的重建 alone 相比。
    Abstract Falls are a major cause of injuries and deaths among older adults worldwide. Accurate fall detection can help reduce potential injuries and additional health complications. Different types of video modalities can be used in a home setting to detect falls, including RGB, Infrared, and Thermal cameras. Anomaly detection frameworks using autoencoders and their variants can be used for fall detection due to the data imbalance that arises from the rarity and diversity of falls. However, the use of reconstruction error in autoencoders can limit the application of networks' structures that propagate information. In this paper, we propose a new multi-objective loss function called Temporal Shift, which aims to predict both future and reconstructed frames within a window of sequential frames. The proposed loss function is evaluated on a semi-naturalistic fall detection dataset containing multiple camera modalities. The autoencoders were trained on normal activities of daily living (ADL) performed by older adults and tested on ADLs and falls performed by young adults. Temporal shift shows significant improvement to a baseline 3D Convolutional autoencoder, an attention U-Net CAE, and a multi-modal neural network. The greatest improvement was observed in an attention U-Net model improving by 0.20 AUC ROC for a single camera when compared to reconstruction alone. With significant improvement across different models, this approach has the potential to be widely adopted and improve anomaly detection capabilities in other settings besides fall detection.
    摘要 falls 是全球older adults中的主要导致伤害和死亡的原因之一。准确的落下检测可以帮助减少可能的伤害和额外的健康问题。家庭设置中可以使用RGB、Infrared和Thermal相机进行落下检测。使用自适应网络的异常检测框架可以用于落下检测,因为落下的数据异常性和多样性导致数据不匹配。在这篇论文中,我们提出了一种新的多目标损失函数,即时间偏移,以预测序列帧中的未来帧和重建帧。我们的提案的损失函数被评估在具有多个相机模式的半自然的落下检测数据集上。我们使用了正常老年人进行日常活动的学习,并在老年人和年轻人之间进行测试。时间偏移显示在不同模型中具有显著改进,特别是使用注意力U-Net CAE模型,其在单个相机上提高了0.20 AUC ROC。在不同的模型中,这种方法有广泛的应用前景,可以提高异常检测Capability在其他设置中。

Training Multi-layer Neural Networks on Ising Machine

  • paper_url: http://arxiv.org/abs/2311.03408
  • repo_url: None
  • paper_authors: Xujie Song, Tong Liu, Shengbo Eben Li, Jingliang Duan, Wenxuan Wang, Keqiang Li
  • for: 这paper aimed to train multi-layer feedforward neural networks on Ising machines using an Ising learning algorithm.
  • methods: The algorithm incorporates two essential techniques: binary representation of topological network and order reduction of loss function. The QNN is formulated as a QCBO problem, which is then converted to a QUBO problem that can be efficiently solved on Ising machines.
  • results: The algorithm achieved a classification accuracy of 98.3% on MNIST dataset after annealing for 700 ms, with a success probability of 72% in finding the optimal solution. The algorithm has the potential to train deeper neural networks with more spins on the Ising machine.
    Abstract As a dedicated quantum device, Ising machines could solve large-scale binary optimization problems in milliseconds. There is emerging interest in utilizing Ising machines to train feedforward neural networks due to the prosperity of generative artificial intelligence. However, existing methods can only train single-layer feedforward networks because of the complex nonlinear network topology. This paper proposes an Ising learning algorithm to train quantized neural network (QNN), by incorporating two essential techinques, namely binary representation of topological network and order reduction of loss function. As far as we know, this is the first algorithm to train multi-layer feedforward networks on Ising machines, providing an alternative to gradient-based backpropagation. Firstly, training QNN is formulated as a quadratic constrained binary optimization (QCBO) problem by representing neuron connection and activation function as equality constraints. All quantized variables are encoded by binary bits based on binary encoding protocol. Secondly, QCBO is converted to a quadratic unconstrained binary optimization (QUBO) problem, that can be efficiently solved on Ising machines. The conversion leverages both penalty function and Rosenberg order reduction, who together eliminate equality constraints and reduce high-order loss function into a quadratic one. With some assumptions, theoretical analysis shows the space complexity of our algorithm is $\mathcal{O}(H^2L + HLN\log H)$, quantifying the required number of Ising spins. Finally, the algorithm effectiveness is validated with a simulated Ising machine on MNIST dataset. After annealing 700 ms, the classification accuracy achieves 98.3%. Among 100 runs, the success probability of finding the optimal solution is 72%. Along with the increasing number of spins on Ising machine, our algorithm has the potential to train deeper neural networks.
    摘要 As a dedicated quantum device, 积极机器(Ising machine)可以在毫秒内解决大规模的二进制优化问题。因为生成型人工智能的发展,有越来越多的人对使用积极机器来训练Feedforward Neural Network(FFNN)感兴趣。然而,现有的方法只能训练单层FFNN,因为积极机器的非线性网络架构造问题。本文提出了一个积极学习算法,用于训练量化神经网络(QNN),通过结合两种重要技术:一是二进制表示网络的拓扑网络,二是排序缩减损失函数的技术。根据我们所知,这是第一个可以在积极机器上训练多层FFNN的算法,提供了一个alternative的方法。首先,训练QNN是转化为二进制受控制的问题(QCBO),通过表示神经连接和活化函数为等式约束。所有量化变数都是基于二进制编码协议编码。其次,QCBO被转换为二进制不约束的问题(QUBO),可以高效地解决在积极机器上。转换是通过 penalty function和Rosenberg排序缩减技术,将等式约束和高阶损失函数转换为二进制问题。假设时,我们进行了一些假设,实际分析显示算法的空间复杂度为 $\mathcal{O}(H^2L + HLN\log H)$,这个结果表明了我们需要多少积极转换的磁矩。最后,我们验证了我们的算法在MNIST dataset上的效果,通过氧化700毫秒后,分类精度达到98.3%。在100次实验中,成功率为72%。随着积极转换的磁矩增加,我们的算法有可能训练更深的神经网络。

Co-training and Co-distillation for Quality Improvement and Compression of Language Models

  • paper_url: http://arxiv.org/abs/2311.02849
  • repo_url: None
  • paper_authors: Hayeon Lee, Rui Hou, Jongpil Kim, Davis Liang, Hongbo Zhang, Sung Ju Hwang, Alexander Min
  • for: 减少计算成本的预训练语言模型(PLM)的压缩,以便在资源有限或实时设置下使用。
  • methods: 基于两个模型同时受训和互相知识传递的框架,即Co-Training and Co-Distillation(CTCD)。
  • results: CTCD框架可以同时提高性能和执行速度,并且可以与现有的技术相结合,如建筑设计或数据增强,以实现更高的性能提升。小模型通过CTCD框架进行传递学习,可以超越原始大模型的性能。
    Abstract Knowledge Distillation (KD) compresses computationally expensive pre-trained language models (PLMs) by transferring their knowledge to smaller models, allowing their use in resource-constrained or real-time settings. However, most smaller models fail to surpass the performance of the original larger model, resulting in sacrificing performance to improve inference speed. To address this issue, we propose Co-Training and Co-Distillation (CTCD), a novel framework that improves performance and inference speed together by co-training two models while mutually distilling knowledge. The CTCD framework successfully achieves this based on two significant findings: 1) Distilling knowledge from the smaller model to the larger model during co-training improves the performance of the larger model. 2) The enhanced performance of the larger model further boosts the performance of the smaller model. The CTCD framework shows promise as it can be combined with existing techniques like architecture design or data augmentation, replacing one-way KD methods, to achieve further performance improvement. Extensive ablation studies demonstrate the effectiveness of CTCD, and the small model distilled by CTCD outperforms the original larger model by a significant margin of 1.66 on the GLUE benchmark.
    摘要 知识塑化(KD)可以压缩 computationally expensive 预训练语言模型(PLMs),将它们的知识传递给更小的模型,使其在资源受限或实时设置中使用。然而,大多数更小的模型无法超越原始更大的模型的性能, resulting in sacrificing performance to improve inference speed. To address this issue, we propose Co-Training and Co-Distillation (CTCD), a novel framework that improves performance and inference speed together by co-training two models while mutually distilling knowledge. The CTCD framework successfully achieves this based on two significant findings:1. 压缩知识从更小的模型到更大的模型 durante co-training 可以提高更大模型的性能。2. 更大模型的性能的提高可以再次提高更小模型的性能。CTCD框架显示出了其可以与现有的技术,如建筑设计或数据扩展,结合使用,代替一个方向的 KD 方法,以达到更高的性能改进。广泛的抑制研究表明 CTCD 的效果,并且使用 CTCD 塑化的小模型在 GLUE benchmark 上超过原始更大模型的性能表现。

Kinematic-aware Prompting for Generalizable Articulated Object Manipulation with LLMs

  • paper_url: http://arxiv.org/abs/2311.02847
  • repo_url: https://github.com/gewu-lab/llm_articulated_object_manipulation
  • paper_authors: Wenke Xia, Dong Wang, Xincheng Pang, Zhigang Wang, Bin Zhao, Di Hu
  • for: 这个论文旨在提高家用助手机器人的通用适应性,使其能够在各种不同的链接物上进行有效的操作。
  • methods: 该论文提出了一种基于语言模型的努力学习框架,通过提供物体的骨骼知识来帮助语言模型生成低级别的运动轨迹点。
  • results: 研究表明,该框架不仅在8种已经见过的类型上比传统方法表现出色,而且在8种未见过的类型上也有强大的零shot能力。此外,在实际场景中对7种不同的物体进行了实验,证明了该框架的实用性。
    Abstract Generalizable articulated object manipulation is essential for home-assistant robots. Recent efforts focus on imitation learning from demonstrations or reinforcement learning in simulation, however, due to the prohibitive costs of real-world data collection and precise object simulation, it still remains challenging for these works to achieve broad adaptability across diverse articulated objects. Recently, many works have tried to utilize the strong in-context learning ability of Large Language Models (LLMs) to achieve generalizable robotic manipulation, but most of these researches focus on high-level task planning, sidelining low-level robotic control. In this work, building on the idea that the kinematic structure of the object determines how we can manipulate it, we propose a kinematic-aware prompting framework that prompts LLMs with kinematic knowledge of objects to generate low-level motion trajectory waypoints, supporting various object manipulation. To effectively prompt LLMs with the kinematic structure of different objects, we design a unified kinematic knowledge parser, which represents various articulated objects as a unified textual description containing kinematic joints and contact location. Building upon this unified description, a kinematic-aware planner model is proposed to generate precise 3D manipulation waypoints via a designed kinematic-aware chain-of-thoughts prompting method. Our evaluation spanned 48 instances across 16 distinct categories, revealing that our framework not only outperforms traditional methods on 8 seen categories but also shows a powerful zero-shot capability for 8 unseen articulated object categories. Moreover, the real-world experiments on 7 different object categories prove our framework's adaptability in practical scenarios. Code is released at \href{https://github.com/GeWu-Lab/LLM_articulated_object_manipulation/tree/main}{here}.
    摘要 通用的链接物体操作是家庭助手机器人的必备能力。最近的努力主要集中在示范学习或者在模拟环境中使用奖励学习,但由于实际世界数据收集和精准的物体模拟成本 prohibitively expensive,这些工作还未能实现广泛的适应性。近些年,许多研究尝试使用大语言模型(LLM)的强Context Learning能力来实现通用的机器人操作,但大多数研究都集中在高级任务规划,忽略低级机器人控制。在这项工作中,我们建立了基于物体链接结构的Prompting框架,通过提示LLMs with 链接知识来生成低级运动轨迹点。为了有效地提示LLMs链接结构的不同物体,我们设计了一个统一的链接知识解析器,该解析器将各种链接物体表示为一个统一的文本描述,包括链接 JOINTS 和 Contact Location。基于这个统一描述,我们提出了一种基于链接结构的逻辑链式思维Prompting方法,以生成精确的3D操作轨迹点。我们的评估涵盖了48个实例,8种已看到类和8种未看到类,结果显示,我们的框架不仅在8种已看到类上超越传统方法,还在0shot情况下展现出强大的适应能力。此外,我们在7种实际物体类型上进行了实际实验,证明了我们的框架在实际场景中的适应性。代码可以在 \href{https://github.com/GeWu-Lab/LLM_articulated_object_manipulation/tree/main}{这里} 找到。

Saturn: Efficient Multi-Large-Model Deep Learning

  • paper_url: http://arxiv.org/abs/2311.02840
  • repo_url: None
  • paper_authors: Kabir Nagrecha, Arun Kumar
  • for: 提高多大型模型训练效率 (improve the efficiency of multi-large-model training)
  • methods: 提出了一种新的数据系统,即Saturn,用于解决在多大型模型训练中的三个关键问题:并行技术选择、GPU分配和调度。 (propose a new data system called Saturn to solve three key interconnected systems challenges for users building large models in this setting)
  • results: 比较研究显示,Saturn 的共优化方法可以提高模型选择运行时间的效率,比现今常见深度学习实践下降39-49%。 (evaluations show that the joint-optimization approach of Saturn can improve the efficiency of model selection runtimes, reducing them by 39-49% compared to typical current deep learning practices)
    Abstract In this paper, we propose Saturn, a new data system to improve the efficiency of multi-large-model training (e.g., during model selection/hyperparameter optimization). We first identify three key interconnected systems challenges for users building large models in this setting -- parallelism technique selection, distribution of GPUs over jobs, and scheduling. We then formalize these as a joint problem, and build a new system architecture to tackle these challenges simultaneously. Our evaluations show that our joint-optimization approach yields 39-49% lower model selection runtimes than typical current DL practice.
    摘要 在这篇论文中,我们提出了一种新的数据系统,用于提高多大型模型训练的效率(例如, durante model selection/超参数优化)。我们首先认为有三个关联的系统挑战,用户在建立大型模型时面临——并行技术选择、GPU分配到任务以及调度。我们然后将这些问题形式化为一个共同问题,并构建了一个新的系统架构,以同时解决这些挑战。我们的评估结果表明,我们的联合优化方法可以提供39-49%比现今深度学习实践的模型选择运行时间更低。

Mesh Neural Cellular Automata

  • paper_url: http://arxiv.org/abs/2311.02820
  • repo_url: None
  • paper_authors: Ehsan Pajouheshgar, Yitao Xu, Alexander Mordvintsev, Eyvind Niklasson, Tong Zhang, Sabine Süsstrunk
  • for: 提高虚拟环境的实际感 (enhancing the realism of virtual environments)
  • methods: 直接synthesize 3D纹理 (directly synthesize 3D textures) without UV mapping
  • results: 可以在实时中synthesize 3D纹理 (can synthesize 3D textures in real time) on any mesh, with generalization and multi-modal supervision capabilities.
    Abstract Modeling and synthesizing textures are essential for enhancing the realism of virtual environments. Methods that directly synthesize textures in 3D offer distinct advantages to the UV-mapping-based methods as they can create seamless textures and align more closely with the ways textures form in nature. We propose Mesh Neural Cellular Automata (MeshNCA), a method for directly synthesizing dynamic textures on 3D meshes without requiring any UV maps. MeshNCA is a generalized type of cellular automata that can operate on a set of cells arranged on a non-grid structure such as vertices of a 3D mesh. While only being trained on an Icosphere mesh, MeshNCA shows remarkable generalization and can synthesize textures on any mesh in real time after the training. Additionally, it accommodates multi-modal supervision and can be trained using different targets such as images, text prompts, and motion vector fields. Moreover, we conceptualize a way of grafting trained MeshNCA instances, enabling texture interpolation. Our MeshNCA model enables real-time 3D texture synthesis on meshes and allows several user interactions including texture density/orientation control, a grafting brush, and motion speed/direction control. Finally, we implement the forward pass of our MeshNCA model using the WebGL shading language and showcase our trained models in an online interactive demo which is accessible on personal computers and smartphones. Our demo and the high resolution version of this PDF are available at https://meshnca.github.io/.
    摘要 模型和 sintesizing texture 是虚拟环境的重要组成部分。直接 sintesizing texture 在 3D 提供了明显的优势,因为它们可以创建无缝 texture 并更好地遵循自然界中 texture 的形成方式。我们提出了 Mesh Neural Cellular Automata (MeshNCA),一种直接 sintesizing dynamic texture 的方法,无需 UV 映射。MeshNCA 是一种通用的细胞自动机,可以在 3D 网格结构上的细胞集上进行操作。它只需在 icosphere 网格上训练,但它可以在实时中 sintesize texture 于任何网格。此外,它可以接受多modal 监督和使用不同的目标,如图像、文本提示和运动向量场。此外,我们还提出了将训练好的 MeshNCA 实例结合的思想,以实现 texture interpolate。我们的 MeshNCA 模型允许在 mesh 上实时 sintesize texture,并提供了许多用户交互,包括 texture 密度/方向控制、grafting 毛刷、速度/方向控制。最后,我们使用 WebGL 渲染语言实现了我们的 MeshNCA 模型的前向传播,并在个人电脑和手机上展示了我们训练好的模型。我们的 demo 和高分辨率版PDF 可以在 https://meshnca.github.io/ 上获取。

QualEval: Qualitative Evaluation for Model Improvement

  • paper_url: http://arxiv.org/abs/2311.02807
  • repo_url: https://github.com/vmurahari3/qualeval
  • paper_authors: Vishvak Murahari, Ameet Deshpande, Peter Clark, Tanmay Rajpurohit, Ashish Sabharwal, Karthik Narasimhan, Ashwin Kalyan
    for: 这种研究旨在提高人工智能系统中的大语言模型(LLM)评估方法,并提供一种自动化质量评估方法来加速模型改进。methods: 该研究使用了一种强大的语言理解器和一种新的灵活线性Programming solver来生成人类可读的报告,以提供model improvement的数据科学家。results: 研究表明,通过使用QualEval的报告,可以提高Llama 2模型在对话任务(DialogSum)中的绝对性能,相比基线方案,提高15%点。QualEval成功地加速了模型开发的pace,因此可以视为一个数据科学家在盒子中。
    Abstract Quantitative evaluation metrics have traditionally been pivotal in gauging the advancements of artificial intelligence systems, including large language models (LLMs). However, these metrics have inherent limitations. Given the intricate nature of real-world tasks, a single scalar to quantify and compare is insufficient to capture the fine-grained nuances of model behavior. Metrics serve only as a way to compare and benchmark models, and do not yield actionable diagnostics, thus making the model improvement process challenging. Model developers find themselves amid extensive manual efforts involving sifting through vast datasets and attempting hit-or-miss adjustments to training data or setups. In this work, we address the shortcomings of quantitative metrics by proposing QualEval, which augments quantitative scalar metrics with automated qualitative evaluation as a vehicle for model improvement. QualEval uses a powerful LLM reasoner and our novel flexible linear programming solver to generate human-readable insights that when applied, accelerate model improvement. The insights are backed by a comprehensive dashboard with fine-grained visualizations and human-interpretable analyses. We corroborate the faithfulness of QualEval by demonstrating that leveraging its insights, for example, improves the absolute performance of the Llama 2 model by up to 15% points relative on a challenging dialogue task (DialogSum) when compared to baselines. QualEval successfully increases the pace of model development, thus in essence serving as a data-scientist-in-a-box. Given the focus on critiquing and improving current evaluation metrics, our method serves as a refreshingly new technique for both model evaluation and improvement.
    摘要 To address these limitations, we propose QualEval, which combines automated qualitative evaluation with quantitative scalar metrics to facilitate model improvement. QualEval utilizes a powerful LLM reasoner and our novel flexible linear programming solver to generate human-readable insights that can be applied to accelerate model improvement. These insights are supported by a comprehensive dashboard with fine-grained visualizations and human-interpretable analyses.We demonstrate the effectiveness of QualEval by showing that it can improve the absolute performance of the Llama 2 model by up to 15% points relative to baselines on a challenging dialogue task (DialogSum). By increasing the pace of model development, QualEval serves as a data-scientist-in-a-box, providing a refreshingly new technique for both model evaluation and improvement. Given the focus on critiquing and improving current evaluation metrics, our method offers a much-needed alternative to traditional evaluation methods.

Incorporating Worker Perspectives into MTurk Annotation Practices for NLP

  • paper_url: http://arxiv.org/abs/2311.02802
  • repo_url: None
  • paper_authors: Olivia Huang, Eve Fleisig, Dan Klein
  • for: 本研究旨在改进当前在 Amazon Mechanical Turk (MTurk) 上进行自然语言处理数据收集的现行实践,以提高数据质量和考虑工作者权益。
  • methods: 本研究采用了 kritische 文献综述和MTurk工作者问卷,以解决关于最佳实践、公平支付、工作者隐私、数据质量和工作者奖励的开问。
  • results: 调查结果表明,工作者偏好可靠、合理的支付,而不是不确定、非常高的支付;报告经常谎报个人信息问题;表达对无解释工作拒绝的沮丧。此外,工作者认为一些质量控制方法,如需要最低响应时间或硬件资格要求,是偏袋和效果不佳。根据调查结果,本研究提出了将来的 NLP 研究如何更好地考虑MTurk工作者的经验,以尊重工作者权益并提高数据质量。
    Abstract Current practices regarding data collection for natural language processing on Amazon Mechanical Turk (MTurk) often rely on a combination of studies on data quality and heuristics shared among NLP researchers. However, without considering the perspectives of MTurk workers, these approaches are susceptible to issues regarding workers' rights and poor response quality. We conducted a critical literature review and a survey of MTurk workers aimed at addressing open questions regarding best practices for fair payment, worker privacy, data quality, and considering worker incentives. We found that worker preferences are often at odds with received wisdom among NLP researchers. Surveyed workers preferred reliable, reasonable payments over uncertain, very high payments; reported frequently lying on demographic questions; and expressed frustration at having work rejected with no explanation. We also found that workers view some quality control methods, such as requiring minimum response times or Master's qualifications, as biased and largely ineffective. Based on the survey results, we provide recommendations on how future NLP studies may better account for MTurk workers' experiences in order to respect workers' rights and improve data quality.
    摘要 现有的MTurk数据收集做法 frequently rely on combination of studies on data quality和 shared heuristics among NLP researchers. However, without considering the perspectives of MTurk workers, these approaches are susceptible to issues regarding workers' rights and poor response quality. We conducted a critical literature review and a survey of MTurk workers aimed at addressing open questions regarding best practices for fair payment, worker privacy, data quality, and considering worker incentives. We found that worker preferences are often at odds with received wisdom among NLP researchers. Surveyed workers preferred reliable, reasonable payments over uncertain, very high payments; reported frequently lying on demographic questions; and expressed frustration at having work rejected with no explanation. We also found that workers view some quality control methods, such as requiring minimum response times or Master's qualifications, as biased and largely ineffective. Based on the survey results, we provide recommendations on how future NLP studies may better account for MTurk workers' experiences in order to respect workers' rights and improve data quality.