cs.AI - 2023-07-04

The Inner Sentiments of a Thought

  • paper_url: http://arxiv.org/abs/2307.01784
  • repo_url: None
  • paper_authors: Chris Gagne, Peter Dayan
  • for: 这 paper 探讨了Transformer-based large-scale language models (LLMs) 能够生成高度现实的文本,以及这些模型能够表达和直接或间接表达各种情感和颜色的能力。
  • methods: 作者使用了LLMs的隐藏表示来训练分布预测器,用于预测句子的最终情感分布的多个量。
  • results: 作者发现了这些分布预测器具有良好的准确性和均匀性,并且可以用于分析句子的情感 trait。例如,用于分析句子的情感 trait,例如用于分析句子的情感 trait,例如“but” conjunction 可以对话趋于极端情感的变化。此外,作者还使用了这些分布预测器来生成具有特定情感 trait 的句子。
    Abstract Transformer-based large-scale language models (LLMs) are able to generate highly realistic text. They are duly able to express, and at least implicitly represent, a wide range of sentiments and color, from the obvious, such as valence and arousal to the subtle, such as determination and admiration. We provide a first exploration of these representations and how they can be used for understanding the inner sentimental workings of single sentences. We train predictors of the quantiles of the distributions of final sentiments of sentences from the hidden representations of an LLM applied to prefixes of increasing lengths. After showing that predictors of distributions of valence, determination, admiration, anxiety and annoyance are well calibrated, we provide examples of using these predictors for analyzing sentences, illustrating, for instance, how even ordinary conjunctions (e.g., "but") can dramatically alter the emotional trajectory of an utterance. We then show how to exploit the distributional predictions to generate sentences with sentiments in the tails of distributions. We discuss the implications of our results for the inner workings of thoughts, for instance for psychiatric dysfunction.
    摘要 Transformer-based大型语言模型(LLMs)能够生成高度真实的文本。它们能够表达,并至少隐含表达,从明显的投情和兴奋到 SUBTLE的决心和赞誉。我们提供了首次探索这些表示方式,并如何用它们来理解单句的内心情感运作。我们使用LLM应用到预FIX的长度 prefixes 中的隐藏表示进行训练,然后预测句子的final sentiment distribution 的quantiles。我们显示了这些预测器具有良好的准确性,然后提供了使用这些预测器来分析句子的例子,例如,如何通过使用 "but" 来剧烈地改变一句话的情感轨迹。最后,我们显示了如何利用分布预测器来生成具有不同情感的句子。我们讨论了我们的结果对内心工作的影响,例如心理障碍。

GHOST: A Graph Neural Network Accelerator using Silicon Photonics

  • paper_url: http://arxiv.org/abs/2307.01782
  • repo_url: None
  • paper_authors: Salma Afifi, Febin Sunny, Amin Shafiee, Mahdi Nikdast, Sudeep Pasricha
  • for: 这篇论文旨在提出一种基于光学频谱的干扰加速器,用于加速基于图structured数据的图神经网络(GNNs)模型的运行。
  • methods: 该论文使用了光学频谱技术,实现了图神经网络的三个主要阶段:邻居更新、Message Passing和采样。这些阶段都是在光学频谱中实现的,以提高加速器的效率和能效性。
  • results: simulations 表明,GHOST 比 GPU、TPU、CPU 和多种现有的 GNN 硬件加速器具有至少 10.2 倍的吞吐量和 3.8 倍的能效率。
    Abstract Graph neural networks (GNNs) have emerged as a powerful approach for modelling and learning from graph-structured data. Multiple fields have since benefitted enormously from the capabilities of GNNs, such as recommendation systems, social network analysis, drug discovery, and robotics. However, accelerating and efficiently processing GNNs require a unique approach that goes beyond conventional artificial neural network accelerators, due to the substantial computational and memory requirements of GNNs. The slowdown of scaling in CMOS platforms also motivates a search for alternative implementation substrates. In this paper, we present GHOST, the first silicon-photonic hardware accelerator for GNNs. GHOST efficiently alleviates the costs associated with both vertex-centric and edge-centric operations. It implements separately the three main stages involved in running GNNs in the optical domain, allowing it to be used for the inference of various widely used GNN models and architectures, such as graph convolution networks and graph attention networks. Our simulation studies indicate that GHOST exhibits at least 10.2x better throughput and 3.8x better energy efficiency when compared to GPU, TPU, CPU and multiple state-of-the-art GNN hardware accelerators.
    摘要 GRAPH神经网络(GNNs)已经成为模elling和学习图structured数据的强大方法。多个领域受益于GNNs的能力,如推荐系统、社交网络分析、药物发现和机器人。但是,加速和有效地处理GNNs需要一种特殊的方法,这些方法超出了传统的人工神经网络加速器的能力,因为GNNs的计算和存储需求很大。CMOS平台的慢速化也驱动了寻找代替实现基台。在这篇论文中,我们提出了GHOST,首个基于光学频谱的GNN硬件加速器。GHOST efficiently减少了顶点中心和边中心操作的成本。它在光学频谱中实现了GNNs的三个主要阶段,allowing it to be used for the inference of various widely used GNN models and architectures, such as graph convolution networks and graph attention networks。我们的Simulation studies indicate that GHOST exhibits at least 10.2 times better throughput and 3.8 times better energy efficiency compared to GPU, TPU, CPU, and multiple state-of-the-art GNN hardware accelerators.

Physically Realizable Natural-Looking Clothing Textures Evade Person Detectors via 3D Modeling

  • paper_url: http://arxiv.org/abs/2307.01778
  • repo_url: https://github.com/WhoTHU/Adversarial_camou
  • paper_authors: Zhanhao Hu, Wenda Chu, Xiaopei Zhu, Hui Zhang, Bo Zhang, Xiaolin Hu
  • for: 这 paper 的目的是为了逃脱人体检测器,而且它们可以在多个视角下工作。
  • methods: 这 paper 使用了3D 模型来制作隐蔽的 Texture,这种技术已经在隐蔽固定物体上得到成功。然而,人体和服装都是非固定的,因此实现这种技术在物理上是非常困难的。
  • results: 这 paper 的实验结果表明,使用 AdvCaT 技术可以在多个视角下逃脱多种人体检测器,并且可以在实际世界中应用。
    Abstract Recent works have proposed to craft adversarial clothes for evading person detectors, while they are either only effective at limited viewing angles or very conspicuous to humans. We aim to craft adversarial texture for clothes based on 3D modeling, an idea that has been used to craft rigid adversarial objects such as a 3D-printed turtle. Unlike rigid objects, humans and clothes are non-rigid, leading to difficulties in physical realization. In order to craft natural-looking adversarial clothes that can evade person detectors at multiple viewing angles, we propose adversarial camouflage textures (AdvCaT) that resemble one kind of the typical textures of daily clothes, camouflage textures. We leverage the Voronoi diagram and Gumbel-softmax trick to parameterize the camouflage textures and optimize the parameters via 3D modeling. Moreover, we propose an efficient augmentation pipeline on 3D meshes combining topologically plausible projection (TopoProj) and Thin Plate Spline (TPS) to narrow the gap between digital and real-world objects. We printed the developed 3D texture pieces on fabric materials and tailored them into T-shirts and trousers. Experiments show high attack success rates of these clothes against multiple detectors.
    摘要 最近的研究提出了为逃脱人体检测器而制作险oso的衣服,但它们只有有限的视场效果或对人类非常明显。我们想制作基于3D模型的险oso texture для衣服,这是已经用于制作固定险oso对象,如3D打印的乌龟。与固定对象不同,人类和衣服是非固定的,这导致物理实现的困难。为了制作多视场可以逃脱人体检测器的自然looking险oso衣服,我们提出了险oso披落文(AdvCaT),它们类似于日常衣服的一种典型文化,披落文。我们利用 Voronoi 图和 Gumbel-softmax 技巧来参数化险oso披落文并优化参数。此外,我们提出了一种高效的3D矩阵增强管道,将数字和实际对象之间的差异缩小。我们打印了开发的3D texture Piece onto fabric材料,并将其制成T恤和裤子。实验表明这些衣服可以高效地逃脱多个检测器。

MOPO-LSI: A User Guide

  • paper_url: http://arxiv.org/abs/2307.01719
  • repo_url: None
  • paper_authors: Yong Zheng, Kumar Neelotpal Shukla, Jasmine Xu, David, Wang, Michael O’Leary
  • for: 这份论文是为了介绍MOPO-LSI库的用户指南,包括问题设置、工作流程和配置参数。
  • methods: 该论文使用MOPO-LSI库来实现多目标投资策略的优化,包括问题设置、工作流程和配置参数。
  • results: 该论文提供了MOPO-LSI库的用户指南,包括问题设置、工作流程和配置参数,以帮助用户快速地采用该库进行多目标投资策略的优化。
    Abstract MOPO-LSI is an open-source Multi-Objective Portfolio Optimization Library for Sustainable Investments. This document provides a user guide for MOPO-LSI version 1.0, including problem setup, workflow and the hyper-parameters in configurations.
    摘要 MOPO-LSI是一个开源的多目标投资组合优化库,用于可持续投资。本文件提供MOPO-LSI版本1.0的用户手册,包括问题设置、工作流程和配置参数。

On the Constrained Time-Series Generation Problem

  • paper_url: http://arxiv.org/abs/2307.01717
  • repo_url: None
  • paper_authors: Andrea Coletta, Sriram Gopalakrishan, Daniel Borrajo, Svitlana Vyetrenko
  • for: 这种论文是为了提供一种有效的时间序列生成方法,以满足实际应用中的需求,如增强机器学习算法的性能、增加罕见事件的发生率和创造更多的对称时间序列enario。
  • methods: 该论文提出了一种新的时间序列生成方法,基于受限制的优化框架,并使用一种名为“指导噪时间”的干扰模型来生成真实的时间序列。
  • results: 该论文的实验结果表明,相比于现有的方法,该方法能够更高效地生成受限制的时间序列,并且不需要重新训练,从而降低碳脚印。
    Abstract Synthetic time series are often used in practical applications to augment the historical time series dataset for better performance of machine learning algorithms, amplify the occurrence of rare events, and also create counterfactual scenarios described by the time series. Distributional-similarity (which we refer to as realism) as well as the satisfaction of certain numerical constraints are common requirements in counterfactual time series scenario generation requests. For instance, the US Federal Reserve publishes synthetic market stress scenarios given by the constrained time series for financial institutions to assess their performance in hypothetical recessions. Existing approaches for generating constrained time series usually penalize training loss to enforce constraints, and reject non-conforming samples. However, these approaches would require re-training if we change constraints, and rejection sampling can be computationally expensive, or impractical for complex constraints. In this paper, we propose a novel set of methods to tackle the constrained time series generation problem and provide efficient sampling while ensuring the realism of generated time series. In particular, we frame the problem using a constrained optimization framework and then we propose a set of generative methods including ``GuidedDiffTime'', a guided diffusion model to generate realistic time series. Empirically, we evaluate our work on several datasets for financial and energy data, where incorporating constraints is critical. We show that our approaches outperform existing work both qualitatively and quantitatively. Most importantly, we show that our ``GuidedDiffTime'' model is the only solution where re-training is not necessary for new constraints, resulting in a significant carbon footprint reduction.
    摘要 通常情况下,人工时间序列被用于实际应用中增强历史时间序列数据,增加罕见事件的发生,以及创建具有时间序列的对应的counterfactualenario。实际性(即 Distributional-similarity)以及满足某些数学条件是常见的需求。例如,美国联邦储金会发布基于受限时间序列的 Synthetic market stress scenarios,用于评估金融机构在假设的经济衰退中的性能。现有的时间序列生成方法通常是通过权重loss来强制满足约束,并拒绝不符合约束的样本。然而,这些方法会需要重新训练,如果变更约束,并且拒绝样本可能是计算昂贵的,或者对于复杂的约束来说是不实用。在这篇论文中,我们提出一种新的方法来解决受约束时间序列生成问题,并提供高效的采样,同时保证生成的时间序列具有实际性。具体来说,我们将问题带入受约束优化框架,然后我们提出了一些生成方法,包括一种名为“GuidedDiffTime”的导向扩散模型,用于生成实际时间序列。我们对几个金融和能源数据集进行了实验,并证明我们的方法在质量和量上都超过了现有的方法。最重要的是,我们的“GuidedDiffTime”模型不需要重新训练,因此可以避免重新训练所带来的碳脚印。

Distributional Model Equivalence for Risk-Sensitive Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2307.01708
  • repo_url: None
  • paper_authors: Tyler Kastner, Murat A. Erdogdu, Amir-massoud Farahmand
  • for: 学习风险敏感奖励学习模型
  • methods: 使用分布式奖励学习引入两种新的模型相等性定义,一种是通用的但是 Computationally intractable,另一种是实用的可以选择希望计划优化的风险度量。
  • results: 在Tabular和大规模实验中证明了这种框架可以增强任何模型自由风险敏感算法,并提供了多种实际应用场景。
    Abstract We consider the problem of learning models for risk-sensitive reinforcement learning. We theoretically demonstrate that proper value equivalence, a method of learning models which can be used to plan optimally in the risk-neutral setting, is not sufficient to plan optimally in the risk-sensitive setting. We leverage distributional reinforcement learning to introduce two new notions of model equivalence, one which is general and can be used to plan for any risk measure, but is intractable; and a practical variation which allows one to choose which risk measures they may plan optimally for. We demonstrate how our framework can be used to augment any model-free risk-sensitive algorithm, and provide both tabular and large-scale experiments to demonstrate its ability.
    摘要 我团队正在研究风险敏感的推荐学习问题。我们理论上显示,合适的值相等方法,可以在风险中性设置下进行优化规划,但是这些方法在风险敏感设置下不够。我们利用分布式推荐学习来引入两种新的模型相等性,一种是通用的,可以用来规划任何风险度量,但是它是不可能实现的;另一种是实用的,允许您选择想要优化的风险度量。我们 demonstarte了我们的框架可以用来增强任何模型自由风险敏感算法,并提供了表格和大规模实验来证明其能力。

Synthetic is all you need: removing the auxiliary data assumption for membership inference attacks against synthetic data

  • paper_url: http://arxiv.org/abs/2307.01701
  • repo_url: None
  • paper_authors: Florent Guépin, Matthieu Meeus, Ana-Maria Cretu, Yves-Alexandre de Montjoye
  • for: 这个论文目的是评估人工数据的隐私。
  • methods: 这篇论文使用了阴影模型进行会员推断攻击,以评估人工数据的安全性。
  • results: 研究人员通过使用只有人工数据进行三种攻击场景,成功地实现了会员推断攻击,并在两个实际数据集和两个人工数据生成器上进行了测试。这些结果表明,当审核人工数据时,可以减轻对auxiliary dataset的假设,从而实现实际的攻击。
    Abstract Synthetic data is emerging as the most promising solution to share individual-level data while safeguarding privacy. Membership inference attacks (MIAs), based on shadow modeling, have become the standard to evaluate the privacy of synthetic data. These attacks, however, currently assume the attacker to have access to an auxiliary dataset sampled from a similar distribution as the training dataset. This often is a very strong assumption that would make an attack unlikely to happen in practice. We here show how this assumption can be removed and how MIAs can be performed using only the synthetic data. More specifically, in three different attack scenarios using only synthetic data, our results demonstrate that MIAs are still successful, across two real-world datasets and two synthetic data generators. These results show how the strong hypothesis made when auditing synthetic data releases - access to an auxiliary dataset - can be relaxed to perform an actual attack.
    摘要 现代数据是许多领域的解决方案,它可以保护个人隐私的同时,共享个人数据。模型阴影攻击(MIAs),基于阴影模型,已成为评估合成数据隐私的标准方法。然而,这些攻击假设攻击者有访问类似于训练数据的 auxiliary dataset 的权限,这是一个很强的假设,在实际情况中很难发生。我们在这里显示了如何除掉这个假设,并使用只有合成数据进行 MIAs。具体来说,我们在三种不同的攻击场景中,使用了两个真实数据集和两个合成数据生成器,结果表明,MIAs 仍然成功,不需要 auxiliary dataset。这些结果表明,在审核合成数据发布时,对于实际攻击而言,可以放宽这一假设。

Online Learning and Solving Infinite Games with an ERM Oracle

  • paper_url: http://arxiv.org/abs/2307.01689
  • repo_url: None
  • paper_authors: Angelos Assos, Idan Attias, Yuval Dagan, Constantinos Daskalakis, Maxwell Fishelson
  • for: 这篇论文旨在解决在线学习中的泛化误差问题,而现有的算法依赖于计算不fficient的oracle,如标准优化算法(SOA)。
  • methods: 该论文提出了基于ERM oracle的在线二分类Setting的算法,并证明其在可 realizable 设定下具有有限征逐和在agnostic 设定下具有线性增长的 regret。 regret 的 bound 基于下面两个维度:Littlestone 维度和阈值维度。
  • results: 该论文显示了在非参数化游戏中,ERM oracle可以被视为best response oracle,并提供了基于best response oracle的学习算法,可以在两个玩家zero-sum 游戏和多个玩家general-sum 游戏中达到approximate-minimax 均衡和approximate coarse correlated 均衡,只要游戏有 bounded fat-threshold 维度。
    Abstract While ERM suffices to attain near-optimal generalization error in the stochastic learning setting, this is not known to be the case in the online learning setting, where algorithms for general concept classes rely on computationally inefficient oracles such as the Standard Optimal Algorithm (SOA). In this work, we propose an algorithm for online binary classification setting that relies solely on ERM oracle calls, and show that it has finite regret in the realizable setting and sublinearly growing regret in the agnostic setting. We bound the regret in terms of the Littlestone and threshold dimensions of the underlying concept class. We obtain similar results for nonparametric games, where the ERM oracle can be interpreted as a best response oracle, finding the best response of a player to a given history of play of the other players. In this setting, we provide learning algorithms that only rely on best response oracles and converge to approximate-minimax equilibria in two-player zero-sum games and approximate coarse correlated equilibria in multi-player general-sum games, as long as the game has a bounded fat-threshold dimension. Our algorithms apply to both binary-valued and real-valued games and can be viewed as providing justification for the wide use of double oracle and multiple oracle algorithms in the practice of solving large games.
    摘要 在搜索学习设定下,ERM 可以达到近似优化的泛化误差,但在在线学习设定下,算法们往往需要计算效率低的oracle,如标准优化算法(SOA)。在这种情况下,我们提出了一种凭据仅仅基于ERM oracle call的在线二分类设定算法,并证明其在 realizable 设定下有finite regret,在agnostic 设定下有sublinearly growing regret。我们将 regret 约束为Littlestone和阈值维度的下界。在非参数学习游戏中,ERM oracle可以被视为最优回应 oracle,找到对某个玩家的历史玩家的最优回应。在这个设定下,我们提供了只凭据最优回应 oracle 的学习算法,可以在二player零Sum游戏中 converge to approximate-minimax equilibria,在多player general-sum游戏中 converge to approximate coarse correlated equilibria,只要游戏有bounded fat-threshold dimension。我们的算法适用于 binary-valued 和 real-valued 游戏,可以被视为对double oracle和多个 oracle 算法在实践中的 justify。

Serving Graph Neural Networks With Distributed Fog Servers For Smart IoT Services

  • paper_url: http://arxiv.org/abs/2307.01684
  • repo_url: None
  • paper_authors: Liekang Zeng, Xu Chen, Peng Huang, Ke Luo, Xiaoxi Zhang, Zhi Zhou
  • for: 这个论文是为了提供一个分布式实时 Graph Neural Network (GNN) 推论框架,以便在 IoT 驱动的智能应用中提供 GNN 服务。
  • methods: 这个论文使用了轻量级的 fog computing 技术,并将 GNN 推论框架分布在多个 fog 节点上,以便更好地利用 IoT 资料来源附近的多元化和动态资源。
  • results: 这个论文的实验和案例研究显示,Fograph 可以与现有的云 computing 和 fog 部署相比,提供更高的执行速度和过程效率,最高可以达到 5.39 倍的执行速度提升和 6.84 倍的过程提升。
    Abstract Graph Neural Networks (GNNs) have gained growing interest in miscellaneous applications owing to their outstanding ability in extracting latent representation on graph structures. To render GNN-based service for IoT-driven smart applications, traditional model serving paradigms usually resort to the cloud by fully uploading geo-distributed input data to remote datacenters. However, our empirical measurements reveal the significant communication overhead of such cloud-based serving and highlight the profound potential in applying the emerging fog computing. To maximize the architectural benefits brought by fog computing, in this paper, we present Fograph, a novel distributed real-time GNN inference framework that leverages diverse and dynamic resources of multiple fog nodes in proximity to IoT data sources. By introducing heterogeneity-aware execution planning and GNN-specific compression techniques, Fograph tailors its design to well accommodate the unique characteristics of GNN serving in fog environments. Prototype-based evaluation and case study demonstrate that Fograph significantly outperforms the state-of-the-art cloud serving and fog deployment by up to 5.39x execution speedup and 6.84x throughput improvement.
    摘要 graph neural networks (GNNs) 已经在不同的应用中引起了广泛的关注,因为它们在图结构上能够提取潜在的表示。为了在智能应用中提供 GNN 基于服务,传统的模型服务 paradigm 通常会将全部的地理分布的输入数据上传到远程数据中心。然而,我们的实验测量表明,这种云端服务中的通信开销很大,而fog计算可以带来极大的潜在优势。为了最大化fog计算的建筑减法,在这篇论文中,我们提出了一种名为 Fograph 的分布式实时 GNN 推理框架。通过对不同类型的 fog 节点进行hetereogeneity-aware执行规划和 GNN 特定压缩技术,Fograph 设计得特别适应 GNN 在fog环境中的服务。实验和案例研究表明,Fograph 可以与状态艺术云服务和 fog 部署相比,提高执行速度和通过put Throughput by up to 5.39 倍和 6.84 倍。

Learning Discrete Weights and Activations Using the Local Reparameterization Trick

  • paper_url: http://arxiv.org/abs/2307.01683
  • repo_url: None
  • paper_authors: Guy Berger, Aviv Navon, Ethan Fetaya
  • for: 降低计算机视ION和机器学习中的 neural network inference 计算复杂性和内存需求
  • methods: 使用 binarization 方法,将 neural network weights 和活动化函数 binarized,以实现更高效的计算
  • results: 实现了针对 binary activations 的网络训练,并且可以在具有低资源的设备上进行高效的计算,并且可以实现 state-of-the-art 的Results
    Abstract In computer vision and machine learning, a crucial challenge is to lower the computation and memory demands for neural network inference. A commonplace solution to address this challenge is through the use of binarization. By binarizing the network weights and activations, one can significantly reduce computational complexity by substituting the computationally expensive floating operations with faster bitwise operations. This leads to a more efficient neural network inference that can be deployed on low-resource devices. In this work, we extend previous approaches that trained networks with discrete weights using the local reparameterization trick to also allow for discrete activations. The original approach optimized a distribution over the discrete weights and uses the central limit theorem to approximate the pre-activation with a continuous Gaussian distribution. Here we show that the probabilistic modeling can also allow effective training of networks with discrete activation as well. This further reduces runtime and memory footprint at inference time with state-of-the-art results for networks with binary activations.
    摘要 在计算机视觉和机器学习领域,一个重要的挑战是降低神经网络推理的计算和内存占用。一种常见的解决方案是通过binarization来实现。通过将神经网络权重和活动化值binarized,可以很大减少计算复杂性,将计算昂贵的浮点运算替换为更快的位运算。这导致一个更高效的神经网络推理,可以在低资源设备上部署。在这项工作中,我们将之前的方法扩展,使得神经网络可以使用离散权重和离散活动化值进行训练。原来的方法使用了local reparameterization trick来优化分布式权重的学习,并使用中心假设定理来近似预活化的 kontinuous Gaussian Distribution。在这里,我们表明了概率模型也可以有效地训练离散活动化的神经网络。这进一步减少了执行时间和内存占用,并达到了当前最佳的结果,在神经网络中使用二进制活动化。

RaidEnv: Exploring New Challenges in Automated Content Balancing for Boss Raid Games

  • paper_url: http://arxiv.org/abs/2307.01676
  • repo_url: None
  • paper_authors: Hyeon-Chang Jeon, In-Chang Baek, Cheong-mok Bae, Taehwa Park, Wonsang You, Taegwan Ha, Hoyun Jung, Jinha Noh, Seungwon Oh, Kyung-Joong Kim
  • for: 这研究旨在提供一个新的游戏模拟器和两个评价指标,用于自动游戏内容均衡。
  • methods: 这研究使用人工智能技术自动调整游戏内容,并在MMORPG游戏中采用多样化和可定制的内容来测试其效果。
  • results: 这研究提出了一个新的游戏研究平台,可以扩大自动游戏均衡问题的研究范围,并提供一个真实的游戏生产管道中的框架。
    Abstract The balance of game content significantly impacts the gaming experience. Unbalanced game content diminishes engagement or increases frustration because of repetitive failure. Although game designers intend to adjust the difficulty of game content, this is a repetitive, labor-intensive, and challenging process, especially for commercial-level games with extensive content. To address this issue, the game research community has explored automated game balancing using artificial intelligence (AI) techniques. However, previous studies have focused on limited game content and did not consider the importance of the generalization ability of playtesting agents when encountering content changes. In this study, we propose RaidEnv, a new game simulator that includes diverse and customizable content for the boss raid scenario in MMORPG games. Additionally, we design two benchmarks for the boss raid scenario that can aid in the practical application of game AI. These benchmarks address two open problems in automatic content balancing, and we introduce two evaluation metrics to provide guidance for AI in automatic content balancing. This novel game research platform expands the frontiers of automatic game balancing problems and offers a framework within a realistic game production pipeline.
    摘要 游戏内容平衡对游戏体验产生很大影响。不均衡的游戏内容会导致玩家失望或厌烦,因为玩家需要重复失败。虽然游戏设计师希望通过调整游戏内容的Difficulty来解决这个问题,但这是一项重复、劳动 INTENSIVE 和挑战性较高的过程,特别是在商业级游戏中。为解决这个问题,游戏研究社区已经开始使用人工智能(AI)技术自动平衡游戏内容。然而,前一些研究都集中在有限的游戏内容上,并未考虑AI游戏测试者在内容变化时的总体化能力的重要性。在这项研究中,我们提出了RaidEnv,一个新的游戏模拟器,包括MMORPG游戏中的bossoid难度scenario中的多样化和可定制内容。此外,我们设计了两个备用测试基准,可以帮助在游戏AI中自动平衡内容的实践应用。这两个基准解决了游戏自动平衡问题中的两个开放问题,并我们引入了两个评价指标,为AI在自动平衡内容中提供指导。这种新的游戏研究平台扩展了自动游戏平衡问题的前iers,并提供了一个在真实游戏生产管道中可行的框架。

mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding

  • paper_url: http://arxiv.org/abs/2307.02499
  • repo_url: https://github.com/x-plug/mplug-docowl
  • paper_authors: Jiabo Ye, Anwen Hu, Haiyang Xu, Qinghao Ye, Ming Yan, Yuhao Dan, Chenlin Zhao, Guohai Xu, Chenliang Li, Junfeng Tian, Qian Qi, Ji Zhang, Fei Huang
  • for: 这篇论文是关于免需OCR文档理解的研究,旨在提高现有多种模型的 document understanding 能力。
  • methods: 该论文使用了 mPLUG-Owl 模型,并通过自定义数据集和训练策略进行了强化和调整。
  • results: 实验结果表明,该模型在免需OCR文档理解任务上表现出色,并且在不具体 fine-tuning 的情况下也能够在多个下游任务上发挥良好的效果。
    Abstract Document understanding refers to automatically extract, analyze and comprehend information from various types of digital documents, such as a web page. Existing Multi-model Large Language Models (MLLMs), including mPLUG-Owl, have demonstrated promising zero-shot capabilities in shallow OCR-free text recognition, indicating their potential for OCR-free document understanding. Nevertheless, without in-domain training, these models tend to ignore fine-grained OCR features, such as sophisticated tables or large blocks of text, which are essential for OCR-free document understanding. In this paper, we propose mPLUG-DocOwl based on mPLUG-Owl for OCR-free document understanding. Specifically, we first construct a instruction tuning dataset featuring a wide range of visual-text understanding tasks. Then, we strengthen the OCR-free document understanding ability by jointly train the model on language-only, general vision-and-language, and document instruction tuning dataset with our unified instruction tuning strategy. We also build an OCR-free document instruction understanding evaluation set LLMDoc to better compare models' capabilities on instruct compliance and document understanding. Experimental results show that our model outperforms existing multi-modal models, demonstrating its strong ability of document understanding. Besides, without specific fine-tuning, mPLUG-DocOwl generalizes well on various downstream tasks. Our code, models, training data and evaluation set are available at https://github.com/X-PLUG/mPLUG-DocOwl.
    摘要 文档理解指的是自动提取、分析和理解各种数字文档中的信息,如网页。现有的多模型大型语言模型(MLLMs),包括mPLUG-Owl,在零批量情况下已经表现出了扑捉人的可能性,这表明它们可能为无需OCR的文档理解做出贡献。然而,无法在域内训练时,这些模型往往忽略细腻的OCR特征,如复杂的表格或大块文本,这些特征是无需OCR文档理解的关键。在这篇论文中,我们提出了基于mPLUG-Owl的mPLUG-DocOwl模型,用于无需OCR的文档理解。具体来说,我们首先构建了一个具有各种视觉语言理解任务的指导调教数据集。然后,我们通过将模型同时在语言只、通用视觉语言和文档指导调教数据集上进行联合训练,使模型具备更强的无需OCR文档理解能力。此外,我们还建立了一个无需OCR文档指导理解评估集LLMDoc,以更好地比较模型在指令遵从和文档理解方面的能力。实验结果表明,我们的模型在现有多modal模型中表现出色,并且无需特定的 fine-tuning,mPLUG-DocOwl在多种下游任务上具有良好的普适性。我们的代码、模型、训练数据和评估集可以在https://github.com/X-PLUG/mPLUG-DocOwl上获取。

SwinGNN: Rethinking Permutation Invariance in Diffusion Models for Graph Generation

  • paper_url: http://arxiv.org/abs/2307.01646
  • repo_url: https://github.com/qiyan98/swingnn
  • paper_authors: Qi Yan, Zhengyang Liang, Yang Song, Renjie Liao, Lele Wang
  • For: 本文研究了基于Permutation-equivariant networks的扩散模型,可以学习 permutation-invariant的分布。但是,与非恒等模型相比,这些 invariable 模型在学习中遇到更大的挑战,其有效的目标分布具有更多的模式,且最佳一步噪声分布是 Gaussian mixture 的分布。* Methods: 本文提出了一种非恒等扩散模型,即 $\textit{SwinGNN}$,它使用高效的 edge-to-edge 2-WL 消息传递网络,并使用 shifted window 基于 SwinTransformers 的自注意力。此外,通过系统的ablations,我们确定了一些关键的训练和采样技术,可以大幅提高生成的样本质量。* Results: 我们的 $\textit{SwinGNN}$ 在 synthetic 和实际的蛋白质和分子数据集上达到了领先的性能。我们的代码在 https://github.com/qiyan98/SwinGNN 上发布。
    Abstract Diffusion models based on permutation-equivariant networks can learn permutation-invariant distributions for graph data. However, in comparison to their non-invariant counterparts, we have found that these invariant models encounter greater learning challenges since 1) their effective target distributions exhibit more modes; 2) their optimal one-step denoising scores are the score functions of Gaussian mixtures with more components. Motivated by this analysis, we propose a non-invariant diffusion model, called $\textit{SwinGNN}$, which employs an efficient edge-to-edge 2-WL message passing network and utilizes shifted window based self-attention inspired by SwinTransformers. Further, through systematic ablations, we identify several critical training and sampling techniques that significantly improve the sample quality of graph generation. At last, we introduce a simple post-processing trick, $\textit{i.e.}$, randomly permuting the generated graphs, which provably converts any graph generative model to a permutation-invariant one. Extensive experiments on synthetic and real-world protein and molecule datasets show that our SwinGNN achieves state-of-the-art performances. Our code is released at https://github.com/qiyan98/SwinGNN.
    摘要 “基于 permutation-equivariant 网络的扩散模型可以学习 permutation-invariant 分布 для图数据。然而,相比其非 invariat 对手,我们发现这些 invariat 模型在学习中遇到更大的挑战,主要表现在以下两点:1) 其有效目标分布具有更多的模式; 2) 其最佳一步干扰得分函数是 Gaussian 混合函数的更多组件。这些分析结果为我们提供了灵感,我们提议一种非 invariat 扩散模型,即 $\textit{SwinGNN}$,该模型使用高效的 edge-to-edge 2-WL 消息传递网络,并使用 shifted window 基于 SwinTransformers 的自注意力。此外,通过系统性的ablation 研究,我们确定了一些关键的训练和采样技术,可以大幅提高生成的样本质量。最后,我们提出了一个简单的后处理技术,即随机排序生成的图,这可以证明任何图生成模型都可以变换为 permutation-invariant 模型。我们在 synthetic 和实际世界的蛋白质和分子数据上进行了广泛的实验,并证明了我们的 SwinGNN 达到了状态的最佳性能。我们的代码可以在 https://github.com/qiyan98/SwinGNN 上下载。”

Insert-expansions for Tool-enabled Conversational Agents

  • paper_url: http://arxiv.org/abs/2307.01644
  • repo_url: None
  • paper_authors: Andreas Göldi, Roman Rietsche
  • for: 这篇论文关注大语言模型中的链条思维提示实现方法,特别是在用工具(或“插件”)在明确的思维路径中生成的 conversational agents 中使用工具。
  • methods: 我们使用 conversation analysis 来研究用户如何在 conversational agents 中提供必要的细节和纠正请求,以便实现更好的回答。
  • results: 我们通过两个实验直接比较,发现在推荐领域使用“用户为工具”方法可以获得利益。
    Abstract This paper delves into an advanced implementation of Chain-of-Thought-Prompting in Large Language Models, focusing on the use of tools (or "plug-ins") within the explicit reasoning paths generated by this prompting method. We find that tool-enabled conversational agents often become sidetracked, as additional context from tools like search engines or calculators diverts from original user intents. To address this, we explore a concept wherein the user becomes the tool, providing necessary details and refining their requests. Through Conversation Analysis, we characterize this interaction as insert-expansion - an intermediary conversation designed to facilitate the preferred response. We explore possibilities arising from this 'user-as-a-tool' approach in two empirical studies using direct comparison, and find benefits in the recommendation domain.
    摘要

Heuristic Algorithms for the Approximation of Mutual Coherence

  • paper_url: http://arxiv.org/abs/2307.01639
  • repo_url: None
  • paper_authors: Gregor Betz, Vera Chekan, Tamara Mchedlidze
  • for: This paper aims to accelerate the computation of mutual coherence, which is a measure of similarity between two opinions, in the context of the Wahl-O-Mat system used in Germany to help voters find candidates that align with their political preferences.
  • methods: The authors model the distribution of confirmation values as a mixture of three Gaussians and present efficient heuristics to estimate the model parameters. They also use the expected value of the distribution to approximate the mutual coherence. Some of the presented algorithms are fully polynomial-time, while others only require solving a small number of instances of the SAT model counting problem.
  • results: The authors’ best algorithm achieves an average squared error of less than 0.0035, which is considered insignificant given the efficiency of the algorithm. The accuracy is precise enough to be used in Wahl-O-Mat-like systems.
    Abstract Mutual coherence is a measure of similarity between two opinions. Although the notion comes from philosophy, it is essential for a wide range of technologies, e.g., the Wahl-O-Mat system. In Germany, this system helps voters to find candidates that are the closest to their political preferences. The exact computation of mutual coherence is highly time-consuming due to the iteration over all subsets of an opinion. Moreover, for every subset, an instance of the SAT model counting problem has to be solved which is known to be a hard problem in computer science. This work is the first study to accelerate this computation. We model the distribution of the so-called confirmation values as a mixture of three Gaussians and present efficient heuristics to estimate its model parameters. The mutual coherence is then approximated with the expected value of the distribution. Some of the presented algorithms are fully polynomial-time, others only require solving a small number of instances of the SAT model counting problem. The average squared error of our best algorithm lies below 0.0035 which is insignificant if the efficiency is taken into account. Furthermore, the accuracy is precise enough to be used in Wahl-O-Mat-like systems.
    摘要 共同 coherence 是两个意见之间的相似度量量。这个概念来自哲学,但它对各种技术领域都是关键的,例如德国的 Wahl-O-Mat 系统。这个系统帮助选民找到最接近其政治偏好的候选人。然而,正确计算共同 coherence 是非常时间consuming,因为需要遍历所有意见集合中的所有子集,并对每个子集解决一个 SAT 模型计数问题。这个问题在计算机科学中是一个知名的困难问题。这个研究是第一个加速这种计算的研究。我们模型了confirmation value 的分布为三个 Gaussian 的混合,并提供了高效的启发法来估算其模型参数。然后,我们使用这些参数来近似共同 coherence。我们的算法中有一些是完全 polynomial-time,另一些只需解决一小数量的 SAT 模型计数问题。我们的最佳算法的平均平方误差低于 0.0035,这是可以忽略的。此外,我们的精度够高,可以用于 Wahl-O-Mat 类系统。

Random Walk on Multiple Networks

  • paper_url: http://arxiv.org/abs/2307.01637
  • repo_url: https://github.com/flyingdoog/rwm
  • paper_authors: Dongsheng Luo, Yuchen Bian, Yaowei Yan, Xiong Yu, Jun Huan, Xiao Liu, Xiang Zhang
    for: 本研究旨在利用多个网络来提高实体之间的归一化和网络推荐等任务中的做出更好的推断。methods: 本研究提出了随机游走在多个网络上(RWM),可以处理多种多网络和多类实体的数据。RWM使用随机游走者在每个网络上访问节点,并计算每个节点的本地邻居关系(i.e., 节点访问概率)。在发现类似访问概率的节点时,游走者们强制合作。results: 研究人员通过分析RWM的整合性和可靠性,并提出了两种可靠性保证的优化方法。在链接预测、网络嵌入和本地社区检测等任务中,RWM表现出色,并且在实际数据上进行了广泛的 экспериментирования。
    Abstract Random Walk is a basic algorithm to explore the structure of networks, which can be used in many tasks, such as local community detection and network embedding. Existing random walk methods are based on single networks that contain limited information. In contrast, real data often contain entities with different types or/and from different sources, which are comprehensive and can be better modeled by multiple networks. To take advantage of rich information in multiple networks and make better inferences on entities, in this study, we propose random walk on multiple networks, RWM. RWM is flexible and supports both multiplex networks and general multiple networks, which may form many-to-many node mappings between networks. RWM sends a random walker on each network to obtain the local proximity (i.e., node visiting probabilities) w.r.t. the starting nodes. Walkers with similar visiting probabilities reinforce each other. We theoretically analyze the convergence properties of RWM. Two approximation methods with theoretical performance guarantees are proposed for efficient computation. We apply RWM in link prediction, network embedding, and local community detection. Comprehensive experiments conducted on both synthetic and real-world datasets demonstrate the effectiveness and efficiency of RWM.
    摘要 随机漫步是一种基本算法,用于探索网络结构,可以用于多种任务,如本地社区检测和网络嵌入。现有的随机漫步方法基于单个网络,它们只包含有限信息。然而,实际数据通常包含不同类型的实体或来自不同来源的实体,这些信息更加全面,可以更好地使用多个网络来模型。为了利用多个网络中的丰富信息,提高对实体的推断,我们提出了随机漫步多网络(RWM)。RWM是灵活的,支持多类多网络和通用多网络,它们可能形成多对多节点映射。RWM将在每个网络上Random walker,以获取起始节点的本地邻近性(即节点访问概率)。漫步者之间相似的访问概率强化对方。我们 theoretically 分析 RWM 的收敛性质。我们还提出了两种有理性 guarantees 的近似方法,用于有效地计算。我们在链接预测、网络嵌入和本地社区检测中应用 RWM。我们在 Synthetic 和实际数据集上进行了广泛的实验,并证明了 RWM 的效果和效率。

SageFormer: Series-Aware Graph-Enhanced Transformers for Multivariate Time Series Forecasting

  • paper_url: http://arxiv.org/abs/2307.01616
  • repo_url: None
  • paper_authors: Zhenwei Zhang, Xin Wang, Yuantao Gu
  • for: 本文旨在提出一种能够有效地捕捉和模型系列之间的依赖关系的系列意识推断模型,以提高多ivariate时间序列预测的准确性。
  • methods: 本文提出了一种基于图结构的Series-aware Graph-enhanced Transformer模型,可以有效地表示多个时间序列的多样性模式,并避免系列之间的重复信息。
  • results: 经过对真实数据和 sintetic 数据的广泛实验,本文表明了SageFormer 模型在比较现有方法时的显著性能优势。
    Abstract Multivariate time series forecasting plays a critical role in diverse domains. While recent advancements in deep learning methods, especially Transformers, have shown promise, there remains a gap in addressing the significance of inter-series dependencies. This paper introduces SageFormer, a Series-aware Graph-enhanced Transformer model designed to effectively capture and model dependencies between series using graph structures. SageFormer tackles two key challenges: effectively representing diverse temporal patterns across series and mitigating redundant information among series. Importantly, the proposed series-aware framework seamlessly integrates with existing Transformer-based models, augmenting their ability to model inter-series dependencies. Through extensive experiments on real-world and synthetic datasets, we showcase the superior performance of SageFormer compared to previous state-of-the-art approaches.
    摘要 多变量时间序列预测在多个领域发挥重要作用。Recent Advances in Deep Learning Methods, especially Transformers, have shown promise, but there is still a gap in addressing the significance of inter-series dependencies. This paper introduces SageFormer, a Series-aware Graph-enhanced Transformer model designed to effectively capture and model dependencies between series using graph structures. SageFormer tackles two key challenges: effectively representing diverse temporal patterns across series and mitigating redundant information among series. Importantly, the proposed series-aware framework seamlessly integrates with existing Transformer-based models, augmenting their ability to model inter-series dependencies. Through extensive experiments on real-world and synthetic datasets, we showcase the superior performance of SageFormer compared to previous state-of-the-art approaches.Here is the word-for-word translation of the text into Simplified Chinese:多变量时间序列预测在多个领域发挥重要作用。最近的深度学习方法,特别是转换器,已经展示了承诺,但还有一个差距在强调多个时间序列之间的相互关系。这篇论文介绍了SageFormer,一种基于图strucutres的Series-aware Graph-enhanced Transformer模型,旨在通过图结构来有效地捕捉和模型时间序列之间的相互关系。SageFormer解决了两个关键挑战:一是在多个时间序列之间有效地表示多样的时间模式,二是在多个时间序列之间减少重复的信息。重要的是,提议的系列意识框架可以轻松地与现有的转换器基本模型集成,从而增强其模型多个时间序列之间的相互关系。通过对真实世界和 sintetic数据集进行了广泛的实验,我们展示了SageFormer的性能超过了前一个状态的方法。

Overconfidence is a Dangerous Thing: Mitigating Membership Inference Attacks by Enforcing Less Confident Prediction

  • paper_url: http://arxiv.org/abs/2307.01610
  • repo_url: https://github.com/dependablesystemslab/mia_defense_hamp
  • paper_authors: Zitao Chen, Karthik Pattabiraman
  • for: 防止机器学习模型被攻击,保护模型的训练数据隐私。
  • methods: 提出了一种防御技术,即高度积震训练框架和均衡抑制器,使模型在训练和测试样本上具有类似的预测结果,从而保护模型的隐私。
  • results: 对五个 benchmark 数据集进行了广泛的评估,并显示了HAMP 可以保持高度的准确率和强的会员隐私。与七种现有的防御技术进行比较,HAMP 在隐私利用与用途之间具有更好的质量比。
    Abstract Machine learning (ML) models are vulnerable to membership inference attacks (MIAs), which determine whether a given input is used for training the target model. While there have been many efforts to mitigate MIAs, they often suffer from limited privacy protection, large accuracy drop, and/or requiring additional data that may be difficult to acquire. This work proposes a defense technique, HAMP that can achieve both strong membership privacy and high accuracy, without requiring extra data. To mitigate MIAs in different forms, we observe that they can be unified as they all exploit the ML model's overconfidence in predicting training samples through different proxies. This motivates our design to enforce less confident prediction by the model, hence forcing the model to behave similarly on the training and testing samples. HAMP consists of a novel training framework with high-entropy soft labels and an entropy-based regularizer to constrain the model's prediction while still achieving high accuracy. To further reduce privacy risk, HAMP uniformly modifies all the prediction outputs to become low-confidence outputs while preserving the accuracy, which effectively obscures the differences between the prediction on members and non-members. We conduct extensive evaluation on five benchmark datasets, and show that HAMP provides consistently high accuracy and strong membership privacy. Our comparison with seven state-of-the-art defenses shows that HAMP achieves a superior privacy-utility trade off than those techniques.
    摘要 机器学习(ML)模型容易受到会员推测攻击(MIA),该攻击可以判断给定输入是否用于训练目标模型。虽然有许多防御技术,但它们通常受到有限的隐私保护、大幅下降精度和/或需要难以获得的额外数据。这个工作提出了一种防御技术,即HAMP,可以同时保障强大的会员隐私和高精度。为了 Mitigate MIAs 的不同形式,我们发现它们都利用 ML 模型对训练样本的过于自信。这种动机导致我们的设计强制模型在测试样本上预测更加不自信,从而使模型在训练和测试样本上行为相同。HAMP 包括一种新的训练框架,高 entropy 软标签和一种基于Entropy的正则化器,以限制模型的预测,并且仍然实现高精度。为了进一步减少隐私风险,HAMP 对所有预测输出进行同步修改,使其变成低自信输出,保持精度,同时减少了会员隐私风险。我们对五个 benchmark 数据集进行了广泛的评估,并表明HAMP 可以保持高精度和强大的会员隐私。与七种 state-of-the-art 防御技术进行比较,我们发现HAMP 在隐私利用和实用性之间取得了更好的平衡。

Bridge the Performance Gap in Peak-hour Series Forecasting: The Seq2Peak Framework

  • paper_url: http://arxiv.org/abs/2307.01597
  • repo_url: None
  • paper_authors: Zhenwei Zhang, Xin Wang, Jingyuan Xie, Heling Zhang, Yuantao Gu
  • For: 预测高峰时间序列 (PHSF) 是各个领域中的一项重要 yet 未经充分利用的任务。现有的深度学习模型在常规时间序列预测 (TSF) 中具有出色的表现,但在 PHSF 中却表现不佳。这可以归结于高峰时间序列的非站ARY性问题,导致直接预测更加困难于标准 TSF。* Methods: 该 paper 提出了一个名为 Seq2Peak 的新框架,用于解决 PHSF 任务。Seq2Peak 包括两个关键组成部分:一个名为 CyclicNorm 的管道,用于 Mitigate 非站ARY性问题,以及一个简单 yet effective 的可调 Parameters-free 峰值时间解码器,使用 hybrid 损失函数,将原始序列和峰值序列作为监督信号。* Results: 对于四个实际 dataset 进行了广泛的实验,证明了提posed 框架的有效性,得到了平均相对改进率为 37.7%,对于 transformer- 和非 transformer- 基于 TSF 模型。
    Abstract Peak-Hour Series Forecasting (PHSF) is a crucial yet underexplored task in various domains. While state-of-the-art deep learning models excel in regular Time Series Forecasting (TSF), they struggle to achieve comparable results in PHSF. This can be attributed to the challenges posed by the high degree of non-stationarity in peak-hour series, which makes direct forecasting more difficult than standard TSF. Additionally, manually extracting the maximum value from regular forecasting results leads to suboptimal performance due to models minimizing the mean deficit. To address these issues, this paper presents Seq2Peak, a novel framework designed specifically for PHSF tasks, bridging the performance gap observed in TSF models. Seq2Peak offers two key components: the CyclicNorm pipeline to mitigate the non-stationarity issue, and a simple yet effective trainable-parameter-free peak-hour decoder with a hybrid loss function that utilizes both the original series and peak-hour series as supervised signals. Extensive experimentation on publicly available time series datasets demonstrates the effectiveness of the proposed framework, yielding a remarkable average relative improvement of 37.7\% across four real-world datasets for both transformer- and non-transformer-based TSF models.
    摘要 《峰值时间序列预测(PHSF)是许多领域中的一项重要 yet 未得到充分研究的任务。当前的深度学习模型在标准时间序列预测(TSF)中表现出色,但在 PHSF 中却表现不佳。这可以归结于峰值时间序列的非站点性问题,导致直接预测变得更加困难。此外,手动提取常规预测结果中的最大值会导致性能下降,因为模型会尝试最小化均方误差。为了解决这些问题,本文提出了 Seq2Peak 框架,这是一种特有的 PHSF 任务解决方案, bridging 标准 TSF 模型的性能差距。Seq2Peak 框架包括两个关键组件:CyclingNorm 管道来 mitigate 非站点性问题,以及一个简单 yet 高效的可学习参数无法 peak-hour 解码器,使用两种不同的超参数来学习峰值时间序列和常规时间序列。经过对公开的时间序列数据集进行了广泛的实验,显示了提议的框架的效果,其中平均相对提升率为 37.7%,对四个实际世界数据集中的两种 transformer 和非 transformer 基于 TSF 模型进行了评估。

Prompt Tuning Pushes Farther, Contrastive Learning Pulls Closer: A Two-Stage Approach to Mitigate Social Biases

  • paper_url: http://arxiv.org/abs/2307.01595
  • repo_url: https://github.com/liyingji1996/CCPA
  • paper_authors: Yingji Li, Mengnan Du, Xin Wang, Ying Wang
    for: 这个研究旨在减少预训练 corpora 中的社会偏见,并提高预训练 Language Model 的表现。methods: 本研究使用了两个阶段的方法:第一个阶段是基于 continuous prompt tuning 的数据增强方法,第二个阶段是使用对照学习抑制预训练 Model 的参数。results: 实验结果显示,CCPA 比基于 Counterfactual Data Augmentation 的方法有更好的减少社会偏见表现,并且保持了预训练 Language Model 的语言模型表现。
    Abstract As the representation capability of Pre-trained Language Models (PLMs) improve, there is growing concern that they will inherit social biases from unprocessed corpora. Most previous debiasing techniques used Counterfactual Data Augmentation (CDA) to balance the training corpus. However, CDA slightly modifies the original corpus, limiting the representation distance between different demographic groups to a narrow range. As a result, the debiasing model easily fits the differences between counterfactual pairs, which affects its debiasing performance with limited text resources. In this paper, we propose an adversarial training-inspired two-stage debiasing model using Contrastive learning with Continuous Prompt Augmentation (named CCPA) to mitigate social biases in PLMs' encoding. In the first stage, we propose a data augmentation method based on continuous prompt tuning to push farther the representation distance between sample pairs along different demographic groups. In the second stage, we utilize contrastive learning to pull closer the representation distance between the augmented sample pairs and then fine-tune PLMs' parameters to get debiased encoding. Our approach guides the model to achieve stronger debiasing performance by adding difficulty to the training process. Extensive experiments show that CCPA outperforms baselines in terms of debiasing performance. Meanwhile, experimental results on the GLUE benchmark show that CCPA retains the language modeling capability of PLMs.
    摘要 随着预训言语模型(PLMs)的表达能力提高,社会偏见的继承问题日益减少。大多数前一代减偏技术使用Counterfactual Data Augmentation(CDA)来填补训练集。然而,CDA只是略微修改原始集合,因此在不同的人口组中的表达距离仍然很窄。这会使减偏模型轻松地适应差异 между counterfactual pair,从而影响其减偏性能。在这篇论文中,我们提出一种基于对抗学习的两stage减偏模型,使用Continuous Prompt Augmentation(CPA)来减少PLMs的社会偏见。在第一个阶段,我们提出一种基于连续提示调整的数据增强方法,以增加不同人口组之间的表达距离。在第二个阶段,我们使用对抗学习来吸引增强后的样本对以更近的表达距离,然后细化PLMs的参数以获得减偏编码。我们的方法会导致模型增强减偏性能,通过增加训练过程的困难程度。广泛的实验表明,CCPA超过基准的减偏性能。同时,在GLUE benchmark上的实验结果表明,CCPA保留了PLMs的语言模型能力。

Cross-Element Combinatorial Selection for Multi-Element Creative in Display Advertising

  • paper_url: http://arxiv.org/abs/2307.01593
  • repo_url: None
  • paper_authors: Wei Zhang, Ping Zhang, Jian Dong, Yongkang Wang, Pengye Zhang, Bo Zhang, Xingxing Wang, Dong Wang
  • for: 提高电商广告创作效果
  • methods: 跨元素组合选择框架(CECS)
  • results: 实现了最高纪录的Offline指标分数,并且在实际业务中实现了6.02%的点击率和10.37%的营收增长
    Abstract The effectiveness of ad creatives is greatly influenced by their visual appearance. Advertising platforms can generate ad creatives with different appearances by combining creative elements provided by advertisers. However, with the increasing number of ad creative elements, it becomes challenging to select a suitable combination from the countless possibilities. The industry's mainstream approach is to select individual creative elements independently, which often overlooks the importance of interaction between creative elements during the modeling process. In response, this paper proposes a Cross-Element Combinatorial Selection framework for multiple creative elements, termed CECS. In the encoder process, a cross-element interaction is adopted to dynamically adjust the expression of a single creative element based on the current candidate creatives. In the decoder process, the creative combination problem is transformed into a cascade selection problem of multiple creative elements. A pointer mechanism with a cascade design is used to model the associations among candidates. Comprehensive experiments on real-world datasets show that CECS achieved the SOTA score on offline metrics. Moreover, the CECS algorithm has been deployed in our industrial application, resulting in a significant 6.02% CTR and 10.37% GMV lift, which is beneficial to the business.
    摘要 “广告创意的有效性受到它的见识性标志影响。广告平台可以通过不同的创意元素结合来生成不同的创意标志。然而,随着创意元素的数量增加,选择合适的结合成为愈来愈困难。industry的主流方法是选择个别创意元素独立,往往忽略了创意元素间的互动过程中的重要性。对此,本文提出了跨元素兼容选择框架,简称为CECS。在Encoder过程中,采用跨元素互动以静态地调整单一创意元素的表达,以满足目前候选的创意标志。在Decoder过程中,创意组合问题转化为多个创意元素之间的传递选择问题。使用一个链接机制,模型候选者之间的协调关系。实际测试结果显示,CECS取得了线上数据上的SOTA分数。此外,CECS算法已经在我们的商业应用中实现了6.02%的Click Through Rate(CTR)和10.37%的Gross Merchandise Value(GMV)提升,对业务有益。”

Transcribing Educational Videos Using Whisper: A preliminary study on using AI for transcribing educational videos

  • paper_url: http://arxiv.org/abs/2307.03200
  • repo_url: None
  • paper_authors: Ashwin Rao
  • for: 这篇论文是为了探讨如何使用自动语音识别(ASR)系统来提高电子学习视频的掌握效果。
  • methods: 该论文使用了各种语音识别算法和技术来生成视频的字幕,并对25个教育视频进行了评估。
  • results: 研究发现,使用ASR系统可以减少对视频的杂音和干扰的影响,并提高视频的掌握效果。同时,还有一些开放的研究方向,如如何更好地识别教育视频中的语音、如何提高语音识别精度等。
    Abstract Videos are increasingly being used for e-learning, and transcripts are vital to enhance the learning experience. The costs and delays of generating transcripts can be alleviated by automatic speech recognition (ASR) systems. In this article, we quantify the transcripts generated by whisper for 25 educational videos and identify some open avenues of research when leveraging ASR for transcribing educational videos.
    摘要

IAdet: Simplest human-in-the-loop object detection

  • paper_url: http://arxiv.org/abs/2307.01582
  • repo_url: https://github.com/franchesoni/iadet
  • paper_authors: Franco Marchesoni-Acland, Gabriele Facciolo
  • for: 这个论文是为了提出一种人工智能注解策略,帮助在数据标注过程中训练模型。
  • methods: 这个策略包括三个模块:一、助け物标注;二、背景模型训练;三、活动选择下一个数据点。这个框架下开源了一个专门用于单类物体检测的工具——IAdet。
  • results: 对于PASCAL VOC数据集,IAdet工具可以将数据标注时间减少$25%$,并提供一个免费的训练模型。这些结果是基于一个故意简单的IAdet设计而得到的。因此,IAdet具有多种可以轻松改进的可能性,这为人工智能 loop对象检测系统开创了道路。
    Abstract This work proposes a strategy for training models while annotating data named Intelligent Annotation (IA). IA involves three modules: (1) assisted data annotation, (2) background model training, and (3) active selection of the next datapoints. Under this framework, we open-source the IAdet tool, which is specific for single-class object detection. Additionally, we devise a method for automatically evaluating such a human-in-the-loop system. For the PASCAL VOC dataset, the IAdet tool reduces the database annotation time by $25\%$ while providing a trained model for free. These results are obtained for a deliberately very simple IAdet design. As a consequence, IAdet is susceptible to multiple easy improvements, paving the way for powerful human-in-the-loop object detection systems.
    摘要 这个工作提出了一种名为智能标注(IA)的模型训练策略。IA包括三个模块:(1)助记数据标注、(2)背景模型训练和(3)活动选择下一个数据点。在这个框架下,我们开源了专门 для单类物体检测的IADE工具。此外,我们开发了一种自动评估这种人工Loop系统的方法。对于PASCAL VOC数据集,IADE工具可以将数据库标注时间减少$25\%$,并提供一个免费的训练模型。这些结果是基于一个故意非常简单的IADE设计来获得的。因此,IADE具有多个容易改进的地方,这将为人工Loop对象检测系统开额。

Optimal and Efficient Binary Questioning for Human-in-the-Loop Annotation

  • paper_url: http://arxiv.org/abs/2307.01578
  • repo_url: None
  • paper_authors: Franco Marchesoni-Acland, Jean-Michel Morel, Josselin Kherroubi, Gabriele Facciolo
  • for: 本研究旨在解决人工纠正数据集的全面标注问题,即使有预测器可用。
  • methods: 本研究使用了一系列的优化策略和lookahead最小化代理成本函数来解决问题。
  • results: 对于 synthetic 和实际世界的数据集,提议的方法可以实现 significiant 改善(23-86%)的标注效率。In English, that would be:
  • for: The paper aims to solve the problem of fully annotating a binary classification dataset when a predictor is available.
  • methods: The paper uses a series of optimization strategies and lookahead minimization of proxy cost functions to solve the problem.
  • results: On synthetic and real-world datasets, the proposed method achieves significant improvements (23-86%) in annotation efficiency.
    Abstract Even though data annotation is extremely important for interpretability, research and development of artificial intelligence solutions, most research efforts such as active learning or few-shot learning focus on the sample efficiency problem. This paper studies the neglected complementary problem of getting annotated data given a predictor. For the simple binary classification setting, we present the spectrum ranging from optimal general solutions to practical efficient methods. The problem is framed as the full annotation of a binary classification dataset with the minimal number of yes/no questions when a predictor is available. For the case of general binary questions the solution is found in coding theory, where the optimal questioning strategy is given by the Huffman encoding of the possible labelings. However, this approach is computationally intractable even for small dataset sizes. We propose an alternative practical solution based on several heuristics and lookahead minimization of proxy cost functions. The proposed solution is analysed, compared with optimal solutions and evaluated on several synthetic and real-world datasets. On these datasets, the method allows a significant improvement ($23-86\%$) in annotation efficiency.
    摘要 Translated into Simplified Chinese:尽管数据标注对人工智能研发和解释性很重要,大多数研究强调样本效率问题,这篇论文却研究受到忽略的问题——使用预测器获取标注数据。在简单的二分类设置下,我们提出了谱范围从优质通用解决方案到实用效果的方法。问题设为使用预测器完全标注二分类 dataset 的最小 yes/no 问题数。对于普通的二分类问题,解决方案基于编码理论,其中优化问题的策略是根据可能的标签编码使用 Huffman 编码。然而,这种方法对小型数据集来说是计算拥塞的。我们提出了一种实用的解决方案,基于多种准则和lookahead最小化代理成本函数。解决方案被分析、与优质解决方案进行比较,并在多个 sintetic 和实际世界数据集上评估。在这些数据集上,方法可以实现23-86%的标注效率提升。

Conceptual Cognitive Maps Formation with Neural Successor Networks and Word Embeddings

  • paper_url: http://arxiv.org/abs/2307.01577
  • repo_url: None
  • paper_authors: Paul Stoewer, Achim Schilling, Andreas Maier, Patrick Krauss
  • for: 这个论文旨在探讨人工智能中如何利用人脑中的Contextualization能力,以提高人工智能模型的表现。
  • methods: 该论文使用Successor Representation和神经网络,以及word embedding vector,构建了三个不同概念的认知地图。
  • results: 该模型能够学习两种不同的地图级别,并将新信息与相关的先前知识表示相似地分布在认知地图中。
    Abstract The human brain possesses the extraordinary capability to contextualize the information it receives from our environment. The entorhinal-hippocampal plays a critical role in this function, as it is deeply engaged in memory processing and constructing cognitive maps using place and grid cells. Comprehending and leveraging this ability could significantly augment the field of artificial intelligence. The multi-scale successor representation serves as a good model for the functionality of place and grid cells and has already shown promise in this role. Here, we introduce a model that employs successor representations and neural networks, along with word embedding vectors, to construct a cognitive map of three separate concepts. The network adeptly learns two different scaled maps and situates new information in proximity to related pre-existing representations. The dispersion of information across the cognitive map varies according to its scale - either being heavily concentrated, resulting in the formation of the three concepts, or spread evenly throughout the map. We suggest that our model could potentially improve current AI models by providing multi-modal context information to any input, based on a similarity metric for the input and pre-existing knowledge representations.
    摘要 人脑具有Contextualizing信息的杰出能力,即将来自环境的信息Contextualized在我们的认知中。Entorhinal-hippocampal系统在这一功能中扮演关键角色,因为它深入参与记忆处理和构建认知地图,使用Place和Grid维度。理解和利用这种能力可能会大幅提升人工智能领域。我们提出一种使用Successor表示和神经网络,以及Word Embedding向量,构建三个不同概念的认知地图。该网络能够学习两种不同的缩放级别的地图,并将新的信息与相关的先前表示相关联。信息在认知地图中的散布方式因缩放级别而异,可能是集中形成三个概念,或者在整个地图中均匀分布。我们建议,我们的模型可能可以提高当前的人工智能模型,通过为输入提供多modal的上下文信息,基于输入和先前知识表示之间的相似度 metric。

Machine Learning-Based Intrusion Detection: Feature Selection versus Feature Extraction

  • paper_url: http://arxiv.org/abs/2307.01570
  • repo_url: None
  • paper_authors: Vu-Duc Ngo, Tuan-Cuong Vuong, Thien Van Luong, Hung Tran
  • for: 这种研究旨在比较Feature Selection和Feature Extraction两种方法在网络入侵检测中的性能,以及它们在不同的数据集和分类方式下的运行时间复杂度。
  • methods: 这种研究使用了UNSW-NB15数据集和多种性能指标,如准确率、回归率、检测精度和运行时间复杂度,对Feature Selection和Feature Extraction两种方法进行了比较。
  • results: 研究发现,Feature Selection方法在多数情况下提供了更好的检测性能,同时具有较低的训练和检测时间复杂度。然而,Feature Extraction方法在某些情况下(如K很小)表现更为可靠,并且对K的变化更为敏感。
    Abstract Internet of things (IoT) has been playing an important role in many sectors, such as smart cities, smart agriculture, smart healthcare, and smart manufacturing. However, IoT devices are highly vulnerable to cyber-attacks, which may result in security breaches and data leakages. To effectively prevent these attacks, a variety of machine learning-based network intrusion detection methods for IoT networks have been developed, which often rely on either feature extraction or feature selection techniques for reducing the dimension of input data before being fed into machine learning models. This aims to make the detection complexity low enough for real-time operations, which is particularly vital in any intrusion detection systems. This paper provides a comprehensive comparison between these two feature reduction methods of intrusion detection in terms of various performance metrics, namely, precision rate, recall rate, detection accuracy, as well as runtime complexity, in the presence of the modern UNSW-NB15 dataset as well as both binary and multiclass classification. For example, in general, the feature selection method not only provides better detection performance but also lower training and inference time compared to its feature extraction counterpart, especially when the number of reduced features K increases. However, the feature extraction method is much more reliable than its selection counterpart, particularly when K is very small, such as K = 4. Additionally, feature extraction is less sensitive to changing the number of reduced features K than feature selection, and this holds true for both binary and multiclass classifications. Based on this comparison, we provide a useful guideline for selecting a suitable intrusion detection type for each specific scenario, as detailed in Tab. 14 at the end of Section IV.
    摘要 互联网智能化 (IoT) 在多个领域中扮演着重要角色,如智能城市、智能农业、智能医疗和智能制造。然而,IoT 设备高度易受到黑客攻击,这可能会导致安全泄露和数据泄露。为了有效防止这些攻击,一些基于机器学习的网络入侵检测方法在 IoT 网络中得到了开发,这些方法通常是基于特征抽象或特征选择技术来将输入数据的维度降低到可以被机器学习模型处理的水平。这样可以确保检测的复杂度足够低,以便在实时运行,特别是在任何入侵检测系统中。本文提供了对这两种特征减少方法的入侵检测方法在不同的性能指标下进行了比较,包括精度率、回传率、检测率和运行时间复杂度。例如,通常来说,选择特征方法不仅提供更好的检测性能,而且还比特征抽象方法来的训练和检测时间更短,尤其当K增加时。然而,抽象方法在K很小时(例如K=4)时更加可靠,而且特征选择方法比特征抽象方法更敏感于K的变化。根据这个比较,我们提供了一个实用的指南,可以帮助您在具体情况下选择适合的入侵检测类型,详细可见在表14中。

Scalable variable selection for two-view learning tasks with projection operators

  • paper_url: http://arxiv.org/abs/2307.01558
  • repo_url: https://github.com/aalto-ics-kepaco/projse
  • paper_authors: Sandor Szedmak, Riikka Huusari, Tat Hong Duong Le, Juho Rousu
  • for: 提出了一种新的变量选择方法,适用于两视图设置或 vector-valued supervised learning 问题。
  • methods: 使用迭代选择高度相关于输出变量的变量,但与先前选择的变量不相关。使用投影算子和其代数来测量相关性,并可以利用 kernel 函数来表达非线性相关模型。
  • results: 通过实验 validate 了我们的方法,并在真实数据上验证了其扩展性和选择的有用性。
    Abstract In this paper we propose a novel variable selection method for two-view settings, or for vector-valued supervised learning problems. Our framework is able to handle extremely large scale selection tasks, where number of data samples could be even millions. In a nutshell, our method performs variable selection by iteratively selecting variables that are highly correlated with the output variables, but which are not correlated with the previously chosen variables. To measure the correlation, our method uses the concept of projection operators and their algebra. With the projection operators the relationship, correlation, between sets of input and output variables can also be expressed by kernel functions, thus nonlinear correlation models can be exploited as well. We experimentally validate our approach, showing on both synthetic and real data its scalability and the relevance of the selected features. Keywords: Supervised variable selection, vector-valued learning, projection-valued measure, reproducing kernel Hilbert space
    摘要 在这篇论文中,我们提出了一种新的变量选择方法,适用于两视设置或vector-valued生育学问题。我们的框架可以处理巨大规模的选择任务,数据样本数可以达到十万级。总之,我们的方法通过逐步选择与输出变量高度相关的变量,但与之前选择的变量不相关的方式进行变量选择。为了度量相关性,我们使用投影算子和其代数来测量输入和输出变量之间的相关性。通过投影算子,我们可以将输入和输出变量之间的相关性表示为kernel函数,从而可以利用非线性相关模型。我们实验 validate我们的方法,并在Synthetic和实际数据上展示了其扩展性和选择的有效性。关键词:supervised变量选择,vector-valued学习,投影值度量,复制kernel希尔бер特空间。

Separated RoadTopoFormer

  • paper_url: http://arxiv.org/abs/2307.01557
  • repo_url: None
  • paper_authors: Mingjie Lu, Yuanxian Huang, Ji Liu, Jinzhang Peng, Lu Tian, Ashish Sirasao
  • for: 本研究的目的是提高自动驾驶技术的实现,强调了驾驶场景理解的重要性。
  • methods: 本研究使用了Separated RoadTopoFormer框架,这是一个端到端的框架,可以同时检测路径中的交通元素和车道中心线,以及这些元素之间的关系。
  • results: 本研究的最终提交得分为0.445 OLS,在两个子任务和总分中都具有竞争力。
    Abstract Understanding driving scenarios is crucial to realizing autonomous driving. Previous works such as map learning and BEV lane detection neglect the connection relationship between lane instances, and traffic elements detection tasks usually neglect the relationship with lane lines. To address these issues, the task is presented which includes 4 sub-tasks, the detection of traffic elements, the detection of lane centerlines, reasoning connection relationships among lanes, and reasoning assignment relationships between lanes and traffic elements. We present Separated RoadTopoFormer to tackle the issues, which is an end-to-end framework that detects lane centerline and traffic elements with reasoning relationships among them. We optimize each module separately to prevent interaction with each other and aggregate them together with few finetunes. For two detection heads, we adopted a DETR-like architecture to detect objects, and for the relationship head, we concat two instance features from front detectors and feed them to the classifier to obtain relationship probability. Our final submission achieves 0.445 OLS, which is competitive in both sub-task and combined scores.
    摘要 理解驾驶场景是自动驾驶实现的关键。先前的工作,如地图学习和BEV车道检测,忽略了车道实例之间的连接关系,而交通元素检测任务通常忽略车道线的关系。为解决这些问题,我们提出了一个包含4个子任务的任务,即交通元素检测、车道中心线检测、车道间连接关系的推理和车道和交通元素之间的关系推理。我们提出了分离的路况拟合器(Separated RoadTopoFormer)来解决这些问题,它是一个端到端框架,可以同时检测车道中心线和交通元素,并推理它们之间的关系。我们对每个模块进行独立优化,以避免它们之间的交互,并将它们粗略地融合。为两个检测头,我们采用了一种类似于DETR架构来检测对象,而对于关系头,我们将前两个检测器的实例特征 concatenate 并feed 到分类器来获得关系概率。我们的最终提交得分为0.445 OLS,这在两个子任务和合并分数中都是竞争力强的。

Knowledge Graph for NLG in the context of conversational agents

  • paper_url: http://arxiv.org/abs/2307.01548
  • repo_url: None
  • paper_authors: Hussam Ghanem, Massinissa Atmani, Christophe Cruz
  • for: 本文提供了对知识图(KG)到文本生成的不同架构的回顾,包括图神经网络、图转换器和 seq2seq 模型。
  • methods: 本文讲解了不同架构的优势和局限性,并指出了在实际任务中选择架构的重要性。
  • results: 本文选择了基于 PLM 的 seq2seq 转换器模型来进行知识图到文本生成任务,并计划修改 PLM 上的 kg-to-text 生成数据集,以及在未来的工作中探讨情感和多语言维度。
    Abstract The use of knowledge graphs (KGs) enhances the accuracy and comprehensiveness of the responses provided by a conversational agent. While generating answers during conversations consists in generating text from these KGs, it is still regarded as a challenging task that has gained significant attention in recent years. In this document, we provide a review of different architectures used for knowledge graph-to-text generation including: Graph Neural Networks, the Graph Transformer, and linearization with seq2seq models. We discuss the advantages and limitations of each architecture and conclude that the choice of architecture will depend on the specific requirements of the task at hand. We also highlight the importance of considering constraints such as execution time and model validity, particularly in the context of conversational agents. Based on these constraints and the availability of labeled data for the domains of DAVI, we choose to use seq2seq Transformer-based models (PLMs) for the Knowledge Graph-to-Text Generation task. We aim to refine benchmark datasets of kg-to-text generation on PLMs and to explore the emotional and multilingual dimensions in our future work. Overall, this review provides insights into the different approaches for knowledge graph-to-text generation and outlines future directions for research in this area.
    摘要 使用知识图(KG)可以提高对话机器人的回答准确性和全面性。在生成回答时,从KG中生成文本是一项有趣且复杂的任务,在最近几年内受到了广泛关注。在本文中,我们提供了不同架构的知识图到文本生成评论,包括图神经网络、图转换器和 linearization with seq2seq 模型。我们讨论了每个架构的优点和缺点,并结论认为选择架构取决于任务的具体需求。我们还 highlight了考虑执行时间和模型有效性的重要性,特别在对话机器人的上下文中。基于这些约束和 DAVI 领域的可用标注数据,我们选择使用 Transformer 基于 seq2seq 模型(PLMs)进行知识图到文本生成任务。我们希望可以修改 PLMs 上 kg-to-text 生成的标准数据集,并在未来的工作中探讨情感和多语言维度。总的来说,本文提供了不同的知识图到文本生成方法的概述,并预示了未来这个领域的研究方向。

Learning to Prompt in the Classroom to Understand AI Limits: A pilot study

  • paper_url: http://arxiv.org/abs/2307.01540
  • repo_url: None
  • paper_authors: Emily Theophilou, Cansu Koyuturk, Mona Yavari, Sathya Bursic, Gregor Donabauer, Alessia Telari, Alessia Testa, Raffaele Boiano, Davinia Hernandez-Leo, Martin Ruskov, Davide Taibi, Alessandro Gabbiadini, Dimitri Ognibene
  • for: 本研究的目的是研究人工智能的acceptance和用途,以及如何将人工智能应用于解决社会问题。
  • methods: 本研究使用了Large Language Models(LLM)和其 derivated chatbots,如ChatGPT,来实现人工智能的自然语言处理能力。
  • results: 研究获得了显著的结果,包括学生对人工智能的评价高,对 chatGPT 的互动质量改善,对人工智能的态度变得更积极,同时更好地理解人工智能的限制。
    Abstract Artificial intelligence's progress holds great promise in assisting society in addressing pressing societal issues. In particular Large Language Models (LLM) and the derived chatbots, like ChatGPT, have highly improved the natural language processing capabilities of AI systems allowing them to process an unprecedented amount of unstructured data. The consequent hype has also backfired, raising negative sentiment even after novel AI methods' surprising contributions. One of the causes, but also an important issue per se, is the rising and misleading feeling of being able to access and process any form of knowledge to solve problems in any domain with no effort or previous expertise in AI or problem domain, disregarding current LLMs limits, such as hallucinations and reasoning limits. Acknowledging AI fallibility is crucial to address the impact of dogmatic overconfidence in possibly erroneous suggestions generated by LLMs. At the same time, it can reduce fear and other negative attitudes toward AI. AI literacy interventions are necessary that allow the public to understand such LLM limits and learn how to use them in a more effective manner, i.e. learning to "prompt". With this aim, a pilot educational intervention was performed in a high school with 30 students. It involved (i) presenting high-level concepts about intelligence, AI, and LLM, (ii) an initial naive practice with ChatGPT in a non-trivial task, and finally (iii) applying currently-accepted prompting strategies. Encouraging preliminary results have been collected such as students reporting a) high appreciation of the activity, b) improved quality of the interaction with the LLM during the educational activity, c) decreased negative sentiments toward AI, d) increased understanding of limitations and specifically We aim to study factors that impact AI acceptance and to refine and repeat this activity in more controlled settings.
    摘要 人工智能的进步具有巨大的承诺,可以帮助社会解决一系列的社会问题。特别是大型自然语言处理模型(LLM)和其 derivated chatbot如ChatGPT,有效提高了人工智能系统的自然语言处理能力,使其能处理前所未有的大量不结构化数据。然而,这种热情也导致了负面情绪的升温,包括误导和过度自信的问题。一个重要的原因是人们对人工智能的训练和应用方面的知识和经验不足,导致他们假设AI系统可以轻松地解决任何问题,不需要专业知识或培训。我们认为,承诺AI系统的有限性是关键,以避免“欺诈式”的过度自信和误导。同时,了解AI系统的限制可以减少对人工智能的恐惧和负面情绪。为此,我们发展了一种AI文化干预措施,以帮助公众更好地理解LLM的限制和如何更有效地使用它们。在这个研究中,我们在一所高中进行了一场教育干预,涉及到(i)介绍智能、AI和LLM的概念,(ii)初步使用ChatGPT完成一项非常困难的任务,以及(iii)应用已知的提示策略。结果表明,学生对这个活动表示高度的欢迎,并且在与LLM的交互中改善了质量,同时减少了对人工智能的负面情绪。我们计划进一步研究这些因素,以便更好地理解AI接受度的影响因素,并在更加控制的环境下重复和改进这种活动。

Anomaly detection in image or latent space of patch-based auto-encoders for industrial image analysis

  • paper_url: http://arxiv.org/abs/2307.02495
  • repo_url: None
  • paper_authors: Nicolas Pinon, Robin Trombetta, Carole Lartizien
  • for: 检测颜色图像中异常点的方法
  • methods: 基于patch-based auto-encoder构建的三种方法: errors between original image and its reconstruction, support estimation of normal image distribution in latent space, and error between original image and restored version of reconstructed image
  • results: 在MVTecAD工业图像数据库上评估和比较三种方法的性能,并与两种现有的状态前方法进行比较
    Abstract We study several methods for detecting anomalies in color images, constructed on patch-based auto-encoders. Wecompare the performance of three types of methods based, first, on the error between the original image and its reconstruction,second, on the support estimation of the normal image distribution in the latent space, and third, on the error between the originalimage and a restored version of the reconstructed image. These methods are evaluated on the industrial image database MVTecADand compared to two competitive state-of-the-art methods.
    摘要 我们研究了一些用于检测颜色图像异常的方法,基于patch-based自适应编码器。我们对三种类型的方法进行比较,分别是根据原始图像与其重建图像之间的错误、在隐藏空间中正常图像分布的支持估计、以及原始图像与重建后的图像之间的错误。这些方法在MVTecAD工业图像数据库上进行评估,并与两种现有的状态艺术方法进行比较。

Analyzing Intentional Behavior in Autonomous Agents under Uncertainty

  • paper_url: http://arxiv.org/abs/2307.01532
  • repo_url: https://github.com/filipcano/intentional-autonomous-agents
  • paper_authors: Filip Cano Córdoba, Samuel Judson, Timos Antonopoulos, Katrine Bjørner, Nicholas Shoemaker, Scott J. Shapiro, Ruzica Piskac, Bettina Könighofer
  • for: 本研究旨在提供一种量化评估自动决策系统的责任性,以便在不确定环境中进行原则正的决策。
  • methods: 我们使用Markov决策过程(MDP)模型不确定环境,并使用概率模型检查来计算机器人在达到某个事件的能力。我们称这为“职责范围”。我们还使用Counterfactual reasoning来自动生成相关的场景,以提高评估的可靠性。
  • results: 我们通过一个实验示例,证明我们的方法可以区分“意图的”和“巧合的”交通事故。
    Abstract Principled accountability for autonomous decision-making in uncertain environments requires distinguishing intentional outcomes from negligent designs from actual accidents. We propose analyzing the behavior of autonomous agents through a quantitative measure of the evidence of intentional behavior. We model an uncertain environment as a Markov Decision Process (MDP). For a given scenario, we rely on probabilistic model checking to compute the ability of the agent to influence reaching a certain event. We call this the scope of agency. We say that there is evidence of intentional behavior if the scope of agency is high and the decisions of the agent are close to being optimal for reaching the event. Our method applies counterfactual reasoning to automatically generate relevant scenarios that can be analyzed to increase the confidence of our assessment. In a case study, we show how our method can distinguish between 'intentional' and 'accidental' traffic collisions.
    摘要 <>使用原则性的负责任度来评估自主决策在不确定环境中的决策过程,需要分辨出意图的结果和不注意的设计。我们提出了基于量化度量的意图行为分析方法。我们将不确定环境模型为马尔可夫决策过程(MDP)。对于每个场景,我们采用概率模型检查来计算机器人的影响力。我们称之为作用范围。如果作用范围高并且机器人的决策较近于最优的达成事件,我们认为有意图行为的证据。我们的方法使用Counterfactual reasoning自动生成相关的场景,以提高评估的信度。在一个案例研究中,我们示例如如何使用我们的方法分辨出“意图”和“意外”的交通事故。Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need Traditional Chinese, please let me know.

Convolutional Transformer for Autonomous Recognition and Grading of Tomatoes Under Various Lighting, Occlusion, and Ripeness Conditions

  • paper_url: http://arxiv.org/abs/2307.01530
  • repo_url: None
  • paper_authors: Asim Khan, Taimur Hassan, Muhammad Shafay, Israa Fahmy, Naoufel Werghi, Lakmal Seneviratne, Irfan Hussain
    for:* Tomatoes are harvested using mobile robots in real-world scenarios, but this is challenging due to factors such as occlusion and color similarity.methods:* The proposed framework uses a convolutional transformer architecture to autonomously recognize and grade tomatoes, regardless of occlusion level, lighting conditions, and ripeness.results:* The proposed framework outperforms existing methods by 58.14%, 65.42%, and 66.39% in terms of mean average precision scores on three different datasets.Here are the three points in Simplified Chinese text:for:* Tomatoes 被用mobile robots在实际场景中收割,但这会受到叶子和枝条的 occlusion 和 Tomatoes 和周围植物的颜色相似性的影响。methods:* this research 提出了一种基于 convolutional transformer 架构的自动 Tomatoes 识别和评分方法,不 regard occlusion 水平、照明条件和成熟度。results:* this research 的提出方法在三个不同的 dataset 上比基eline 方法高出 58.14%, 65.42%, 66.39% 的mean average precision 分数。
    Abstract Harvesting fully ripe tomatoes with mobile robots presents significant challenges in real-world scenarios. These challenges arise from factors such as occlusion caused by leaves and branches, as well as the color similarity between tomatoes and the surrounding foliage during the fruit development stage. The natural environment further compounds these issues with varying light conditions, viewing angles, occlusion factors, and different maturity levels. To overcome these obstacles, this research introduces a novel framework that leverages a convolutional transformer architecture to autonomously recognize and grade tomatoes, irrespective of their occlusion level, lighting conditions, and ripeness. The proposed model is trained and tested using carefully annotated images curated specifically for this purpose. The dataset is prepared under various lighting conditions, viewing perspectives, and employs different mobile camera sensors, distinguishing it from existing datasets such as Laboro Tomato and Rob2Pheno Annotated Tomato. The effectiveness of the proposed framework in handling cluttered and occluded tomato instances was evaluated using two additional public datasets, Laboro Tomato and Rob2Pheno Annotated Tomato, as benchmarks. The evaluation results across these three datasets demonstrate the exceptional performance of our proposed framework, surpassing the state-of-the-art by 58.14%, 65.42%, and 66.39% in terms of mean average precision scores for KUTomaData, Laboro Tomato, and Rob2Pheno Annotated Tomato, respectively. The results underscore the superiority of the proposed model in accurately detecting and delineating tomatoes compared to baseline methods and previous approaches. Specifically, the model achieves an F1-score of 80.14%, a Dice coefficient of 73.26%, and a mean IoU of 66.41% on the KUTomaData image dataset.
    摘要 摘取全熟 Tomatoes 的 mobile robot 在实际应用场景中具有显著的挑战。这些挑战来自于叶子和枝条所引起的遮挡,以及 Tomatoes 和周围的植物发育阶段的颜色相似性。自然环境更进一步复杂这些问题,包括不同的照明条件、视角、遮挡因子和不同的成熟度。为了解决这些问题,这项研究提出了一种新的框架,利用卷积变换器架构来自动认识和评分 Tomatoes,不受遮挡、照明条件和成熟度的影响。该模型被训练和测试使用特别为这个目的制作的精心注释图像集。该数据集在不同的照明条件和视角下准备,并使用不同的移动摄像头感知器,与现有的数据集不同。为了评估提案的效果,我们使用了两个额外的公共数据集,即 Laboro Tomato 和 Rob2Pheno Annotated Tomato,作为标准。测试结果表明,提案的框架在处理受遮挡和受遮挡的 Tomatoes 实例方面表现出色,相比基eline方法和前一代方法,提案的模型在三个数据集上的mean average precision分数上出色,高于state-of-the-art的58.14%、65.42%和66.39%。结果表明,提案的模型在识别和定义 Tomatoes 方面具有显著的优势,具体来说,模型在 KUTomaData 图像集上达到了80.14%的F1分数、73.26%的Dice系数和66.41%的mean IoU。

LEAT: Towards Robust Deepfake Disruption in Real-World Scenarios via Latent Ensemble Attack

  • paper_url: http://arxiv.org/abs/2307.01520
  • repo_url: None
  • paper_authors: Joonkyo Shim, Hyunsoo Yoon
  • for: 防止深伪(deepfake)威胁, recent studies 使用 adversarial perturbation 来攻击深伪模型的输出。
  • methods: 我们提出了一个简单又有效的攻击方法,called Latent Ensemble ATtack (LEAT),它攻击独立的潜在编码过程,从而生成不同于目标属性的复杂的出力图像。
  • results: 我们的方法在实际应用中具有更高的防护成功率,比之前的方法更加有效。
    Abstract Deepfakes, malicious visual contents created by generative models, pose an increasingly harmful threat to society. To proactively mitigate deepfake damages, recent studies have employed adversarial perturbation to disrupt deepfake model outputs. However, previous approaches primarily focus on generating distorted outputs based on only predetermined target attributes, leading to a lack of robustness in real-world scenarios where target attributes are unknown. Additionally, the transferability of perturbations between two prominent generative models, Generative Adversarial Networks (GANs) and Diffusion Models, remains unexplored. In this paper, we emphasize the importance of target attribute-transferability and model-transferability for achieving robust deepfake disruption. To address this challenge, we propose a simple yet effective disruption method called Latent Ensemble ATtack (LEAT), which attacks the independent latent encoding process. By disrupting the latent encoding process, it generates distorted output images in subsequent generation processes, regardless of the given target attributes. This target attribute-agnostic attack ensures robust disruption even when the target attributes are unknown. Additionally, we introduce a Normalized Gradient Ensemble strategy that effectively aggregates gradients for iterative gradient attacks, enabling simultaneous attacks on various types of deepfake models, involving both GAN-based and Diffusion-based models. Moreover, we demonstrate the insufficiency of evaluating disruption quality solely based on pixel-level differences. As a result, we propose an alternative protocol for comprehensively evaluating the success of defense. Extensive experiments confirm the efficacy of our method in disrupting deepfakes in real-world scenarios, reporting a higher defense success rate compared to previous methods.
    摘要 深刻复制(Deepfakes),由生成模型生成的恶意视觉内容,对社会 pose 危害性的增加。为了积极防止深刻复制害处, latest studies have employed adversarial perturbation to disrupt deepfake model outputs. However, previous approaches primarily focus on generating distorted outputs based on predetermined target attributes, leading to a lack of robustness in real-world scenarios where target attributes are unknown. Additionally, the transferability of perturbations between two prominent generative models, 生成 adversarial Networks (GANs) and Diffusion Models, remains unexplored. In this paper, we emphasize the importance of target attribute-transferability and model-transferability for achieving robust deepfake disruption. To address this challenge, we propose a simple yet effective disruption method called Latent Ensemble ATtack (LEAT), which attacks the independent latent encoding process. By disrupting the latent encoding process, it generates distorted output images in subsequent generation processes, regardless of the given target attributes. This target attribute-agnostic attack ensures robust disruption even when the target attributes are unknown. Additionally, we introduce a Normalized Gradient Ensemble strategy that effectively aggregates gradients for iterative gradient attacks, enabling simultaneous attacks on various types of deepfake models, involving both GAN-based and Diffusion-based models. Moreover, we demonstrate the insufficiency of evaluating disruption quality solely based on pixel-level differences. As a result, we propose an alternative protocol for comprehensively evaluating the success of defense. Extensive experiments confirm the efficacy of our method in disrupting deepfakes in real-world scenarios, reporting a higher defense success rate compared to previous methods.

Deep Attention Q-Network for Personalized Treatment Recommendation

  • paper_url: http://arxiv.org/abs/2307.01519
  • repo_url: https://github.com/stevenmsm/rl-icu-daqn
  • paper_authors: Simin Ma, Junghwan Lee, Nicoleta Serban, Shihao Yang
  • for: 这研究旨在提出个性化治疗建议的新方法,以提高医疗结果。
  • methods: 该研究使用深度注意力Q网络, combinig transformer架构和深度强化学习框架,fficiently incorporate all past patient observations。
  • results: 研究在实际的 septic shock 和急性低血压 cohorts 中展示了其超过当前状态艺术模型的优势。
    Abstract Tailoring treatment for individual patients is crucial yet challenging in order to achieve optimal healthcare outcomes. Recent advances in reinforcement learning offer promising personalized treatment recommendations; however, they rely solely on current patient observations (vital signs, demographics) as the patient's state, which may not accurately represent the true health status of the patient. This limitation hampers policy learning and evaluation, ultimately limiting treatment effectiveness. In this study, we propose the Deep Attention Q-Network for personalized treatment recommendations, utilizing the Transformer architecture within a deep reinforcement learning framework to efficiently incorporate all past patient observations. We evaluated the model on real-world sepsis and acute hypotension cohorts, demonstrating its superiority to state-of-the-art models. The source code for our model is available at https://github.com/stevenmsm/RL-ICU-DAQN.
    摘要 个人化治疗是现代医疗的关键,但同时也是非常具有挑战性,以达到最佳医疗结果。最新的增强学习技术具有个人化治疗建议的承诺,但是它们只是基于当前患者的观察(生命 Parameters、人口)来判断患者的状态,这可能并不准确地反映患者的真实健康状况。这种限制会阻碍策略学习和评估,从而限制治疗的效果。在这项研究中,我们提出了深度注意力Q网络,利用转换器架构在深度增强学习框架中高效地包含所有过去患者的观察。我们对实际世界的 septic shock 和急性低血压群体进行了评估,并证明了我们的模型的优越性。模型的源代码可以在https://github.com/stevenmsm/RL-ICU-DAQN上获取。

All in One: Multi-task Prompting for Graph Neural Networks

  • paper_url: http://arxiv.org/abs/2307.01504
  • repo_url: https://github.com/sheldonresearch/ProG
  • paper_authors: Xiangguo Sun, Hong Cheng, Jia Li, Bo Liu, Jihong Guan
  • for: 填充预训练模型的知识空间,以提高graph任务的性能。
  • methods: 提出了一种多任务提问方法,通过统一格式、语言提问的拓展和下游任务的改进,使得自然语言处理中的提问思想可以轻松地应用于图领域。
  • results: 经过广泛的实验,结果表明该方法能够提高graph任务的性能,并且可以适应不同的任务。
    Abstract Recently, ''pre-training and fine-tuning'' has been adopted as a standard workflow for many graph tasks since it can take general graph knowledge to relieve the lack of graph annotations from each application. However, graph tasks with node level, edge level, and graph level are far diversified, making the pre-training pretext often incompatible with these multiple tasks. This gap may even cause a ''negative transfer'' to the specific application, leading to poor results. Inspired by the prompt learning in natural language processing (NLP), which has presented significant effectiveness in leveraging prior knowledge for various NLP tasks, we study the prompting topic for graphs with the motivation of filling the gap between pre-trained models and various graph tasks. In this paper, we propose a novel multi-task prompting method for graph models. Specifically, we first unify the format of graph prompts and language prompts with the prompt token, token structure, and inserting pattern. In this way, the prompting idea from NLP can be seamlessly introduced to the graph area. Then, to further narrow the gap between various graph tasks and state-of-the-art pre-training strategies, we further study the task space of various graph applications and reformulate downstream problems to the graph-level task. Afterward, we introduce meta-learning to efficiently learn a better initialization for the multi-task prompt of graphs so that our prompting framework can be more reliable and general for different tasks. We conduct extensive experiments, results from which demonstrate the superiority of our method.
    摘要 近期,“预训练和精度调整”被广泛采用为许多图像任务的标准工作流程,因为它可以将通用的图像知识传递给不同应用程序,填补每个应用程序的图像缺失。然而,图像任务中的节点水平、边水平和图像水平具有很大的多样性,这使得预训练预测经常与这些多种任务不兼容。这种差距甚至可能导致特定应用程序的负面传播,从而导致Results poor。 inspirited by the prompt learning in natural language processing (NLP), which has shown significant effectiveness in leveraging prior knowledge for various NLP tasks, we investigate the prompting topic for graphs with the motivation of filling the gap between pre-trained models and various graph tasks. In this paper, we propose a novel multi-task prompting method for graph models. Specifically, we first unify the format of graph prompts and language prompts with the prompt token, token structure, and inserting pattern. In this way, the prompting idea from NLP can be seamlessly introduced to the graph area. Then, to further narrow the gap between various graph tasks and state-of-the-art pre-training strategies, we further study the task space of various graph applications and reformulate downstream problems to the graph-level task. Afterward, we introduce meta-learning to efficiently learn a better initialization for the multi-task prompt of graphs so that our prompting framework can be more reliable and general for different tasks. We conduct extensive experiments, results from which demonstrate the superiority of our method.

HEDI: First-Time Clinical Application and Results of a Biomechanical Evaluation and Visualisation Tool for Incisional Hernia Repair

  • paper_url: http://arxiv.org/abs/2307.01502
  • repo_url: None
  • paper_authors: Jacob J. Relle, Samuel Voß, Ramesch Raschidi, Regine Nessel, Johannes Görich, Mark O. Wielpütz, Thorsten Löffler, Vincent Heuveline, Friedrich Kallinowski, Philipp D. Lösel
  • for: 治疗腹部坏死病例,减少疼痛、不适和重复手术的风险
  • methods: 使用生物力学方法,考虑腹部团附肌肉的活动、内部压力、组织弹性和腹部膨润,以提高腹部坏死病例的治疗效果
  • results: 在31名患者的临床应用中,使用HEDI工具进行预操作评估,比报告的成功率显著提高,所有患者三年后没有疼痛和坏死病例重复Here’s the English version of the three key points for reference:
  • for: Treatment of abdominal wall defects to reduce pain, discomfort, and the risk of repeated surgical repairs
  • methods: Use of biomechanical methods that consider muscle activation, intra-abdominal pressure, tissue elasticity, and abdominal wall distention to improve treatment outcomes
  • results: Significantly improved success rates in the first clinical application of HEDI in the preoperative evaluation of 31 patients, with all patients remaining pain-free and showing no hernia recurrence after three years of follow-up.
    Abstract Abdominal wall defects often lead to pain, discomfort, and recurrence of incisional hernias, resulting in significant morbidity and repeated surgical repairs worldwide. Mesh repair for large hernias is usually based on the defect area with a fixed overlap, without considering biomechanical aspects such as muscle activation, intra-abdominal pressure, tissue elasticity, and abdominal wall distention. To address this issue, we present a biomechanical approach to incisional hernia repair that takes into account the unstable abdominal wall. Additionally, we introduce HEDI, a tool that uses dynamic computed tomography with Valsalva maneuver to automatically detect and assess hernia size, volume, and abdominal wall instability. Our first clinical application of HEDI in the preoperative evaluation of 31 patients shows significantly improved success rates compared to reported rates, with all patients remaining pain-free and showing no hernia recurrence after three years of follow-up.
    摘要 腹壁损害常引起疼痛、不适和重复的 Hernia 复发,导致全球较高的患者病况和多次手术修复。固定 overlap 的网 repair 通常基于缺陷区域,没有考虑生物力学方面的因素,如肌肉活动、腹部压力、组织弹性和腹壁膨胀。为解决这个问题,我们提出了一种生物力学方法 для剖股 Hernia 修复,考虑到不稳定的腹壁。此外,我们还介绍了 HEDI,一种使用动态计算tomography 和 Valsalva 优化来自动检测和评估 Hernia 大小、体积和腹壁不稳定性。我们在31名患者的前Operative 评估中首次应用 HEDI,显示成功率明显高于报道率,所有患者三年跟踪后保持无疼痛,无 Hernia 复发。

FREEDOM: Target Label & Source Data & Domain Information-Free Multi-Source Domain Adaptation for Unsupervised Personalization

  • paper_url: http://arxiv.org/abs/2307.02493
  • repo_url: None
  • paper_authors: Eunju Yang, Gyusang Cho, Chan-Hyun Youn
  • for: 本研究旨在提出一种实用的多源领域适应(MSDA)问题scenario,以适应部署模型到客户端的数据集。
  • methods: 本研究提出了一种新的适应问题scenario,称为Three-Free Domain Adaptation(TFDA),其中目标标签、源数据集和源领域信息(领域标签和领域数量)都不可用。为解决这种问题scenario,我们提出了一种实用的适应框架called FREEDOM。
  • results: FREEDOM可以在无源领域信息的情况下实现state-of-the-art或相当的性能,同时减少了最终模型的大小。此外,FREEDOM可以独立于源领域数量进行部署。
    Abstract From a service perspective, Multi-Source Domain Adaptation (MSDA) is a promising scenario to adapt a deployed model to a client's dataset. It can provide adaptation without a target label and support the case where a source dataset is constructed from multiple domains. However, it is impractical, wherein its training heavily relies on prior domain information of the multi-source dataset -- how many domains exist and the domain label of each data sample. Moreover, MSDA requires both source and target datasets simultaneously (physically), causing storage limitations on the client device or data privacy issues by transferring client data to a server. For a more practical scenario of model adaptation from a service provider's point of view, we relax these constraints and present a novel problem scenario of Three-Free Domain Adaptation, namely TFDA, where 1) target labels, 2) source dataset, and mostly 3) source domain information (domain labels + the number of domains) are unavailable. Under the problem scenario, we propose a practical adaptation framework called FREEDOM. It leverages the power of the generative model, disentangling data into class and style aspects, where the style is defined as the class-independent information from the source data and designed with a nonparametric Bayesian approach. In the adaptation stage, FREEDOM aims to match the source class distribution with the target's under the philosophy that class distribution is consistent even if the style is different; after then, only part of the classification model is deployed as a personalized network. As a result, FREEDOM achieves state-of-the-art or comparable performance even without domain information, with reduced final model size on the target side, independent of the number of source domains.
    摘要 从服务方面来看,多源领域适应(MSDA)是一个有前途的enario,可以将部署的模型适应到客户的资料集。它可以无需目标标签进行适应,并且可以处理多个领域的资料集合建构的情况。然而,MSDA的训练 heavily rely on多个领域的领域信息(每个数据标签),并且需要源和目标资料集同时存在(physically),导致客户设备限制或资料隐私问题。为了更实际的模型适应方案,我们将这些限制放宽,并提出了一个新的问题场景:三自领域适应(TFDA),其中1)目标标签、2)源资料集和3)源领域信息(领域标签和领域数量)都不可用。在这个问题场景下,我们提出了一个实用的适应框架called FREEDOM。它利用了生成模型的力量,将数据分解为类别和风格的两个方面,其中风格是源数据中的class独立信息,通过非 Parametric Bayesian方法设计。在适应阶段,FREEDOM的目标是将源类别分布与目标的类别分布相对consistent,然后仅部署一部分的分类模型为个人化网络。因此,FREEDOM可以实现state-of-the-art或相等的性能,而不需要领域信息,并且对数据集大小进行干扰。

A Bibliographic Study on Artificial Intelligence Research: Global Panorama and Indian Appearance

  • paper_url: http://arxiv.org/abs/2308.00705
  • repo_url: None
  • paper_authors: Amit Tiwari, Susmita Bardhan, Vikas Kumar
  • for: 本研究用科学映射方法对2015-2020年的人工智能(AI)研究进行了 bibliometric 分析,以了解AI研究的发展趋势。
  • methods: 本研究使用了Scopus数据库收集必要的数据,并对数据进行了手动和工具(OpenRefine)的数据转换,以便进行分析。
  • results: 研究发现,在2015-2020年间,开放获取和商业刊物的AI研究量相对较高,IEEE是最主要的出版商,发表了84%的最高引用文章。此外,中国和美国是AI领域的主要贡献国。研究还发现, neural networks和深度学习是AI研究中最主要的话题。最后,研究发现,不仅公共机构,私人机构也在投入AI研究。
    Abstract The present study identifies and assesses the bibliographic trend in Artificial Intelligence (AI) research for the years 2015-2020 using the science mapping method of bibliometric study. The required data has been collected from the Scopus database. To make the collected data analysis-ready, essential data transformation was performed manually and with the help of a tool viz. OpenRefine. For determining the trend and performing the mapping techniques, top five open access and commercial journals of AI have been chosen based on their citescore driven ranking. The work includes 6880 articles published in the specified period for analysis. The trend is based on Country-wise publications, year-wise publications, topical terms in AI, top-cited articles, prominent authors, major institutions, involvement of industries in AI and Indian appearance. The results show that compared to open access journals; commercial journals have a higher citescore and number of articles published over the years. Additionally, IEEE is the prominent publisher which publishes 84% of the top-cited publications. Further, China and the United States are the major contributors to literature in the AI domain. The study reveals that neural networks and deep learning are the major topics included in top AI research publications. Recently, not only public institutions but also private bodies are investing their resources in AI research. The study also investigates the relative position of Indian researchers in terms of AI research. Present work helps in understanding the initial development, current stand and future direction of AI.
    摘要
  1. Country-wise publications: China and the United States are the major contributors to AI literature.2. Year-wise publications: There has been a steady increase in the number of publications over the years.3. Topical terms in AI: Neural networks and deep learning are the major topics included in top AI research publications.4. Top-cited articles: IEEE publishes 84% of the top-cited publications.5. Prominent authors: The study reveals that there are several prominent authors in the field of AI.6. Major institutions: The study shows that China and the United States are the major contributors to AI research.7. Involvement of industries in AI: Private bodies are investing their resources in AI research, in addition to public institutions.8. Indian appearance: The study investigates the relative position of Indian researchers in terms of AI research.The present work provides an understanding of the initial development, current stand, and future direction of AI research.

Mitigating Bias: Enhancing Image Classification by Improving Model Explanations

  • paper_url: http://arxiv.org/abs/2307.01473
  • repo_url: None
  • paper_authors: Raha Ahmadi, Mohammad Javad Rajabi, Mohammad Khalooie, Mohammad Sabokrou
  • for: 提高图像分类器的主要概念理解和表示,增强模型对图像中主要元素的理解。
  • methods: 提出了一种新的方法,通过同时引导模型对前景进行注意力调控,使模型更好地捕捉图像中的主要概念。
  • results: 通过对标准数据集进行广泛的实验,证明了该方法的有效性,提高了图像分类器的准确率。
    Abstract Deep learning models have demonstrated remarkable capabilities in learning complex patterns and concepts from training data. However, recent findings indicate that these models tend to rely heavily on simple and easily discernible features present in the background of images rather than the main concepts or objects they are intended to classify. This phenomenon poses a challenge to image classifiers as the crucial elements of interest in images may be overshadowed. In this paper, we propose a novel approach to address this issue and improve the learning of main concepts by image classifiers. Our central idea revolves around concurrently guiding the model's attention toward the foreground during the classification task. By emphasizing the foreground, which encapsulates the primary objects of interest, we aim to shift the focus of the model away from the dominant influence of the background. To accomplish this, we introduce a mechanism that encourages the model to allocate sufficient attention to the foreground. We investigate various strategies, including modifying the loss function or incorporating additional architectural components, to enable the classifier to effectively capture the primary concept within an image. Additionally, we explore the impact of different foreground attention mechanisms on model performance and provide insights into their effectiveness. Through extensive experimentation on benchmark datasets, we demonstrate the efficacy of our proposed approach in improving the classification accuracy of image classifiers. Our findings highlight the importance of foreground attention in enhancing model understanding and representation of the main concepts within images. The results of this study contribute to advancing the field of image classification and provide valuable insights for developing more robust and accurate deep-learning models.
    摘要 深度学习模型已经展现出了学习复杂模式和概念的Remarkable能力。然而,最近的发现表明这些模型在训练数据中很可能会依赖于图像的背景中的简单和易于识别的特征,而不是主要的概念或 объек。这种情况会导致图像分类器的挑战,因为图像中的关键元素可能会被遮盖。在这篇论文中,我们提出了一种新的方法来解决这个问题,并提高图像分类器的学习。我们的中心思想是在分类任务中同时引导模型的注意力向前景方向。通过强调前景,我们希望使模型忽略背景的主导性的影响。为实现这一点,我们引入了一种机制,使得模型能够充分分配注意力于前景。我们 investigate了多种策略,包括修改损失函数或添加额外的建筑 комponents,以使模型能够有效地捕捉图像中的主要概念。此外,我们还探讨了不同的前景注意力机制对模型性能的影响,并提供了有价值的发现。通过对标准数据集进行广泛的实验,我们证明了我们提出的方法的可行性和效果。我们的发现指出了图像分类器中的前景注意力对模型理解和图像中主要概念的表示的重要性。这些发现对图像分类领域的进一步发展具有重要意义,并为开发更加稳定和准确的深度学习模型提供了有价值的发现。

Beyond Conservatism: Diffusion Policies in Offline Multi-agent Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2307.01472
  • repo_url: None
  • paper_authors: Zhuoran Li, Ling Pan, Longbo Huang
  • For: 提出了一种新的多智能体偏Diffusion Offline Multi-agent Model(DOM2),用于离线多智能体学习(MARL)。* Methods: 在政策网络中 интегри了扩散模型,并提出了一种轨迹基据增强方案,以提高政策表达力和多样性。* Results: 对多智能体particle和多智能体MuJoCo环境进行了广泛的实验,并显示了DOM2在环境变化时的稳定性和表现优于现有方法。此外,DOM2在数据效率方面也表现出了优势,可以在$20+$ times less data的情况下达到现有方法的性能水平。
    Abstract We present a novel Diffusion Offline Multi-agent Model (DOM2) for offline Multi-Agent Reinforcement Learning (MARL). Different from existing algorithms that rely mainly on conservatism in policy design, DOM2 enhances policy expressiveness and diversity based on diffusion. Specifically, we incorporate a diffusion model into the policy network and propose a trajectory-based data-augmentation scheme in training. These key ingredients make our algorithm more robust to environment changes and achieve significant improvements in performance, generalization and data-efficiency. Our extensive experimental results demonstrate that DOM2 outperforms existing state-of-the-art methods in multi-agent particle and multi-agent MuJoCo environments, and generalizes significantly better in shifted environments thanks to its high expressiveness and diversity. Furthermore, DOM2 shows superior data efficiency and can achieve state-of-the-art performance with $20+$ times less data compared to existing algorithms.
    摘要 我们提出了一种新的多智能体吸引模型(DOM2),用于离线多智能体学习(MARL)。与现有的策略设计依赖保守主义的算法不同,DOM2 增强策略表达性和多样性基于吸引模型。我们在策略网络中 integrate 了吸引模型,并提出了一种基于轨迹的数据增强方案。这些关键元素使得我们的算法更加鲁棒地应对环境变化,并实现了显著提高表现、泛化和数据效率。我们的广泛的实验结果表明,DOM2 在多智能体粒子和多智能体 MuJoCo 环境中表现出色,并在Shifted环境中具有更高的表现和泛化能力。此外,DOM2 还示出了更高的数据效率,可以使用 $20++$ times less data than existing algorithms 达到相同的表现。

A multilevel framework for AI governance

  • paper_url: http://arxiv.org/abs/2307.03198
  • repo_url: None
  • paper_authors: Hyesun Choung, Prabu David, John S. Seberger
  • for: 本研究旨在发展一种基于伦理和基本人类价值观的AI治理框架,以实现AI的潜在好荣和风险减少。
  • methods: 本研究使用多级治理方法,包括政府、企业和公民三个层次的潜在利益相互关系,以及三个维度的信任(能力、完整性和善良)。
  • results: 本研究提供了实用的洞察,可以用于进一步提高用户体验和指导AI相关公共政策。
    Abstract To realize the potential benefits and mitigate potential risks of AI, it is necessary to develop a framework of governance that conforms to ethics and fundamental human values. Although several organizations have issued guidelines and ethical frameworks for trustworthy AI, without a mediating governance structure, these ethical principles will not translate into practice. In this paper, we propose a multilevel governance approach that involves three groups of interdependent stakeholders: governments, corporations, and citizens. We examine their interrelationships through dimensions of trust, such as competence, integrity, and benevolence. The levels of governance combined with the dimensions of trust in AI provide practical insights that can be used to further enhance user experiences and inform public policy related to AI.
    摘要 为了实现人工智能的潜在优势并mitigate其潜在风险,需要建立一个遵循伦理和基本人类价值观的管理框架。虽然一些组织已经发布了信任worthy AI的指南和伦理体系,但无法mediating governance结构,这些伦理原则将不会在实践中传递。在这篇论文中,我们提议一种多级管理方法,其包括三个相互依赖的团队:政府、企业和公民。我们研究这些团队之间的关系通过信任的维度,如能力、完整性和好意。这些管理层结合了信任的维度,可以为AI的用户经验提供实用的洞察,同时也可以为AI相关的公共政策提供指导。

Causal Reinforcement Learning: A Survey

  • paper_url: http://arxiv.org/abs/2307.01452
  • repo_url: https://github.com/Aryia-Behroziuan/References
  • paper_authors: Zhihong Deng, Jing Jiang, Guodong Long, Chengqi Zhang
  • for: 本文主要是为了介绍 causal reinforcement learning,一种增强现有算法的措施,通过 incorporating causal relationships 来增强知识传递效果。
  • methods: 本文首先介绍了 causality 和 reinforcement learning 的基本概念,然后解释了如何通过 causality Address core challenges in non-causal reinforcement learning。最后,文章系统地审视了现有的 causal reinforcement learning 方法,根据其 Target problems 和方法ologies 进行分类。
  • results: 本文提供了一个 comprehensive review of the literature on causal reinforcement learning,包括了现有的方法和技术,以及未来的开发方向。
    Abstract Reinforcement learning is an essential paradigm for solving sequential decision problems under uncertainty. Despite many remarkable achievements in recent decades, applying reinforcement learning methods in the real world remains challenging. One of the main obstacles is that reinforcement learning agents lack a fundamental understanding of the world and must therefore learn from scratch through numerous trial-and-error interactions. They may also face challenges in providing explanations for their decisions and generalizing the acquired knowledge. Causality, however, offers a notable advantage as it can formalize knowledge in a systematic manner and leverage invariance for effective knowledge transfer. This has led to the emergence of causal reinforcement learning, a subfield of reinforcement learning that seeks to enhance existing algorithms by incorporating causal relationships into the learning process. In this survey, we comprehensively review the literature on causal reinforcement learning. We first introduce the basic concepts of causality and reinforcement learning, and then explain how causality can address core challenges in non-causal reinforcement learning. We categorize and systematically review existing causal reinforcement learning approaches based on their target problems and methodologies. Finally, we outline open issues and future directions in this emerging field.
    摘要 <>将文本翻译成简化字版本。<>根据序列决策问题下的不确定性,力量学习是一种重要的思想模式。虽然过去几十年内有很多出色的成就,但是在实际应用中仍然存在很多挑战。其中一个主要的障碍是,力量学习代理人缺乏世界的基本理解,因此必须通过多次尝试和错误互动来学习。它们还可能面临困难提供决策的解释和泛化获得的知识。然而, causality 却提供了一个明显的优势,即可以系统地形式化知识,并利用不变性来实现有效的知识传递。这导致了 causal reinforcement learning 的出现,这是一种增强现有算法的方法,通过包含 causal 关系来增强学习过程。在这篇评论中,我们系统地介绍了 causal reinforcement learning 的文献。我们首先介绍了 causality 和 reinforcement learning 的基本概念,然后解释了如何通过 causality 解决非 causal reinforcement learning 中的核心挑战。然后,我们按照目标问题和方法分类地系统地审查了现有的 causal reinforcement learning 方法。最后,我们列出了未解决的问题和未来方向。

A Double Machine Learning Approach to Combining Experimental and Observational Data

  • paper_url: http://arxiv.org/abs/2307.01449
  • repo_url: None
  • paper_authors: Marco Morucci, Vittorio Orlandi, Harsh Parikh, Sudeepa Roy, Cynthia Rudin, Alexander Volfovsky
  • for: combines experimental and observational studies to test for assumption violations and estimate treatment effects consistently
  • methods: double machine learning approach, tests for violations of external validity and ignorability under milder assumptions, semi-parametrically efficient treatment effect estimators
  • results: demonstrated in three real-world case studies, relevant for practical settings
    Abstract Experimental and observational studies often lack validity due to untestable assumptions. We propose a double machine learning approach to combine experimental and observational studies, allowing practitioners to test for assumption violations and estimate treatment effects consistently. Our framework tests for violations of external validity and ignorability under milder assumptions. When only one assumption is violated, we provide semi-parametrically efficient treatment effect estimators. However, our no-free-lunch theorem highlights the necessity of accurately identifying the violated assumption for consistent treatment effect estimation. We demonstrate the applicability of our approach in three real-world case studies, highlighting its relevance for practical settings.
    摘要 实验和观察研究经常受到有效性问题,因为假设未经测试。我们提出了一种双机器学习方法,将实验和观察研究结合起来,让实践者可以一直测试假设违反和处理效果的一致性。我们的框架测试了外在效应和忽略性的违反,假设更加轻松。只有一个假设违反时,我们提供了半 Parametric 有效的处理效果估计器。但我们的食物免费 theorem 表明,要准确地识别违反的假设,以确保可靠地估计处理效果。我们在三个实际案例中示例出了我们的方法的实用性。

TablEye: Seeing small Tables through the Lens of Images

  • paper_url: http://arxiv.org/abs/2307.02491
  • repo_url: None
  • paper_authors: Seung-eon Lee, Sang-Chul Lee
  • for: 本研究旨在探讨几shot表格学习的可能性,尤其是在表格数据中缺乏 Labeling 的情况下。
  • methods: 本研究提出了一种名为 TablEye 的框架,通过域转换来缓解表格数据中的限制,并采用了一些已经证明有效的几shot学习算法和嵌入函数。
  • results: TablEye 在 4-shot 任务中的最高 AUC 为 0.11,在 1-shot 设置中的平均提前率为 3.17%。这表明 TablEye 在几shot表格学习中表现出色。
    Abstract The exploration of few-shot tabular learning becomes imperative. Tabular data is a versatile representation that captures diverse information, yet it is not exempt from limitations, property of data and model size. Labeling extensive tabular data can be challenging, and it may not be feasible to capture every important feature. Few-shot tabular learning, however, remains relatively unexplored, primarily due to scarcity of shared information among independent datasets and the inherent ambiguity in defining boundaries within tabular data. To the best of our knowledge, no meaningful and unrestricted few-shot tabular learning techniques have been developed without imposing constraints on the dataset. In this paper, we propose an innovative framework called TablEye, which aims to overcome the limit of forming prior knowledge for tabular data by adopting domain transformation. It facilitates domain transformation by generating tabular images, which effectively conserve the intrinsic semantics of the original tabular data. This approach harnesses rigorously tested few-shot learning algorithms and embedding functions to acquire and apply prior knowledge. Leveraging shared data domains allows us to utilize this prior knowledge, originally learned from the image domain. Specifically, TablEye demonstrated a superior performance by outstripping the TabLLM in a 4-shot task with a maximum 0.11 AUC and a STUNT in a 1- shot setting, where it led on average by 3.17% accuracy.
    摘要 《几招学习 tabular 数据的探索成为必要。表格数据是一种多元的表示方式,涵盖了多种信息,但它并不免受数据和模型大小的限制。对于广泛的表格数据进行标签可能是困难的,而且可能无法捕捉到所有重要的特征。然而,几招 tabular 学习仍然 relativity 未explored,主要是因为独立的数据集之间的共享信息缺乏,以及表格数据中的内在含糊性。根据我们所知,没有有意义且不受限制的几招 tabular 学习技术有 been developed,不需要对数据集做任何限制。在这篇论文中,我们提出了一个创新的框架,即 TablEye,以解决表格数据的限制。TablEye 采用域转换来强制保留表格数据的内在 semantics,并采用了已经测试过的几招学习算法和嵌入函数来获取和应用优先知识。通过共享数据域,我们可以利用这些优先知识,原本从图像领域学习而来。 TablEye 在 4 折事件中的最大 AUC 为 0.11,在 1 折事件中的 STUNT 中平均高于 TabLLM 3.17% 的精度。》

Garbage in, garbage out: Zero-shot detection of crime using Large Language Models

  • paper_url: http://arxiv.org/abs/2307.06844
  • repo_url: https://github.com/anjsimmo/zero-shot-crime-detection
  • paper_authors: Anj Simmons, Rajesh Vasa
  • for: 这个论文旨在利用大语言模型学习的通用常识来进行零极端理解犯罪,给出文本描述的视频surveillance。
  • methods: 该论文使用大语言模型进行零极端理解犯罪,但是需要手动将视频转换成高质量的文本描述。
  • results: 研究发现,当视频被手动转换成高质量的文本描述时,大语言模型可以通过零极端理解犯罪,并且性能与现有的状态时技术相当。但是,现有的自动视频到文本转换方法无法生成高质量的视频描述,导致大语言模型输出垃圾结果。
    Abstract This paper proposes exploiting the common sense knowledge learned by large language models to perform zero-shot reasoning about crimes given textual descriptions of surveillance videos. We show that when video is (manually) converted to high quality textual descriptions, large language models are capable of detecting and classifying crimes with state-of-the-art performance using only zero-shot reasoning. However, existing automated video-to-text approaches are unable to generate video descriptions of sufficient quality to support reasoning (garbage video descriptions into the large language model, garbage out).
    摘要

Unsupervised Feature Learning with Emergent Data-Driven Prototypicality

  • paper_url: http://arxiv.org/abs/2307.01421
  • repo_url: None
  • paper_authors: Yunhui Guo, Youren Zhang, Yubei Chen, Stella X. Yu
  • for: 在一个没有标签的图像集中,我们的目标是训练一个模型,使其对每个图像映射到一个特征空间中,以便不仅 proximity 表示视觉相似性,而且图像的位置直接表示数据集中的趋势。
  • methods: 我们的关键发现是在径比空间中进行无监督特征学习,而不是在欧几里得空间中。在这个空间中,点之间的距离仍然表示图像相似性,而且我们获得了更多的容量来表示数据集中的趋势。
  • results: 我们提出了一种无监督特征学习算法,使用径比空间中的圆束排序(HACK)。HACK 首先生成了径比空间中的均匀填充的粒子,然后将每个图像分配给每个粒子。在凝固后,图像会更加典型地表示数据集。我们的特征映射器只需要在径比空间中扩散训练实例,我们发现图像会靠近起始点,确认我们的想法:无监督的趋势发现。我们展示了我们的数据驱动的趋势可以轻松地实现无监督实例选择,提高模型的普适性和对于非典型实例的鲁棒性。
    Abstract Given an image set without any labels, our goal is to train a model that maps each image to a point in a feature space such that, not only proximity indicates visual similarity, but where it is located directly encodes how prototypical the image is according to the dataset. Our key insight is to perform unsupervised feature learning in hyperbolic instead of Euclidean space, where the distance between points still reflect image similarity, and yet we gain additional capacity for representing prototypicality with the location of the point: The closer it is to the origin, the more prototypical it is. The latter property is simply emergent from optimizing the usual metric learning objective: The image similar to many training instances is best placed at the center of corresponding points in Euclidean space, but closer to the origin in hyperbolic space. We propose an unsupervised feature learning algorithm in Hyperbolic space with sphere pACKing. HACK first generates uniformly packed particles in the Poincar\'e ball of hyperbolic space and then assigns each image uniquely to each particle. Images after congealing are regarded more typical of the dataset it belongs to. With our feature mapper simply trained to spread out training instances in hyperbolic space, we observe that images move closer to the origin with congealing, validating our idea of unsupervised prototypicality discovery. We demonstrate that our data-driven prototypicality provides an easy and superior unsupervised instance selection to reduce sample complexity, increase model generalization with atypical instances and robustness with typical ones.
    摘要 We propose an unsupervised feature learning algorithm in hyperbolic space with sphere pACKing. HACK first generates uniformly packed particles in the Poincaré ball of hyperbolic space and then assigns each image uniquely to each particle. Images after congealing are regarded as more typical of the dataset they belong to. With our feature mapper simply trained to spread out training instances in hyperbolic space, we observe that images move closer to the origin with congealing, validating our idea of unsupervised prototypicality discovery.We demonstrate that our data-driven prototypicality provides an easy and superior unsupervised instance selection to reduce sample complexity, increase model generalization with atypical instances, and robustness with typical ones.

Analyzing the vulnerabilities in SplitFed Learning: Assessing the robustness against Data Poisoning Attacks

  • paper_url: http://arxiv.org/abs/2307.03197
  • repo_url: None
  • paper_authors: Aysha Thahsin Zahir Ismail, Raj Mani Shukla
  • for: 这篇论文旨在研究分布式协同机器学习(DCML)中数据欺诈攻击的影响。特别是在DCML中 Split learning(SL)和 Federated Learning(FL)的混合方法(SplitFed Learning,SFL)中。
  • methods: 本研究提出了三种新型攻击策略,包括无目标攻击、Targeted攻击和距离基于攻击。这些攻击策略的目标是降低基于DCML的分类器性能。
  • results: 经过对三种攻击策略的测试和分析,结果显示无目标和距离基于攻击对SFL中的分类器性能有更大的影响,而Targeted攻击的影响相对较小。这些结果也在两个不同的案例研究中进行了验证。
    Abstract Distributed Collaborative Machine Learning (DCML) is a potential alternative to address the privacy concerns associated with centralized machine learning. The Split learning (SL) and Federated Learning (FL) are the two effective learning approaches in DCML. Recently there have been an increased interest on the hybrid of FL and SL known as the SplitFed Learning (SFL). This research is the earliest attempt to study, analyze and present the impact of data poisoning attacks in SFL. We propose three kinds of novel attack strategies namely untargeted, targeted and distance-based attacks for SFL. All the attacks strategies aim to degrade the performance of the DCML-based classifier. We test the proposed attack strategies for two different case studies on Electrocardiogram signal classification and automatic handwritten digit recognition. A series of attack experiments were conducted by varying the percentage of malicious clients and the choice of the model split layer between the clients and the server. The results after the comprehensive analysis of attack strategies clearly convey that untargeted and distance-based poisoning attacks have greater impacts in evading the classifier outcomes compared to targeted attacks in SFL
    摘要 分布式合作机器学习(DCML)是一种可能的中央机器学习隐私问题的解决方案。Split learning(SL)和联邦学习(FL)是DCML中两种有效的学习方法。在最近,关于FL和SL的混合,即SplitFed Learning(SFL)的兴趣增加。这项研究是DCML基于类别器性能下数据毒素攻击的第一个研究。我们提出了三种新的攻击策略,包括无目标、targeted和距离基于攻击,这些攻击策略的目的都是降低DCML基于类别器的性能。我们对两个不同的案例进行了电势心跳信号分类和自动手写数字识别的测试,并通过变化客户端中的Percentage of malicious clients和模型Split层来进行了一系列攻击实验。结果表明,无目标和距离基于攻击更有可能影响DCML基于类别器的性能,compared to targeted attacks。

Learning to Communicate using Contrastive Learning

  • paper_url: http://arxiv.org/abs/2307.01403
  • repo_url: https://github.com/SonamSangpoLama/Music-Genre-Classification
  • paper_authors: Yat Long Lo, Biswa Sengupta, Jakob Foerster, Michael Noukhovitch
  • for: 这篇论文的目的是提出一种基于对比学习的通信方法,以便在多智能RL中实现有效的协调。
  • methods: 该方法利用在智能代理之间交换的通信信息,并通过分析这些信息的关系来学习有效的通信方式。
  • results: 该方法在通信 essencial 环境下表现出色,比前一些工作更高效和更快速地学习。此外,该方法还能够使通信更加 симметричной,并capture环境中的全局状态信息。
    Abstract Communication is a powerful tool for coordination in multi-agent RL. But inducing an effective, common language is a difficult challenge, particularly in the decentralized setting. In this work, we introduce an alternative perspective where communicative messages sent between agents are considered as different incomplete views of the environment state. By examining the relationship between messages sent and received, we propose to learn to communicate using contrastive learning to maximize the mutual information between messages of a given trajectory. In communication-essential environments, our method outperforms previous work in both performance and learning speed. Using qualitative metrics and representation probing, we show that our method induces more symmetric communication and captures global state information from the environment. Overall, we show the power of contrastive learning and the importance of leveraging messages as encodings for effective communication.
    摘要 通信是多智能体RL中协调的有力工具。但是引入有效、通用语言是一项困难挑战,特别是在分布式设定下。在这项工作中,我们提出一种不同的视角,即在智能体之间交换的通信消息被视为环境状态的不同不完整的视图。通过对交换的消息之间关系的检查,我们提议使用对比学习来最大化交换消息序列中的相互信息。在通信必需的环境下,我们的方法在性能和学习速度两个方面都超过了前一个工作。使用质量度量和表示探测,我们表明我们的方法导致更Symmetric的通信和捕捉环境中的全局状态信息。总之,我们展示了对比学习的力量和通信消息作为编码的利用对RL中的协调有益。

In-depth Analysis On Parallel Processing Patterns for High-Performance Dataframes

  • paper_url: http://arxiv.org/abs/2307.01394
  • repo_url: None
  • paper_authors: Niranda Perera, Arup Kumar Sarker, Mills Staylor, Gregor von Laszewski, Kaiying Shan, Supun Kamburugamuve, Chathura Widanage, Vibhatha Abeykoon, Thejaka Amila Kanewela, Geoffrey Fox
  • for: The paper aims to improve the performance of data processing pipelines by optimizing the use of Dataframes, which are widely used in data engineering applications.
  • methods: The authors propose a cost model for evaluating parallel processing patterns for distributed Dataframe operators and evaluate the performance of their reference runtime implementation, Cylon, on the ORNL Summit supercomputer.
  • results: The authors evaluate the performance of Cylon on the ORNL Summit supercomputer and present the results, which demonstrate the potential for improving the performance of data processing pipelines using their proposed approach.Here is the same information in Simplified Chinese text:
  • for: 这篇论文目标是提高数据处理管道的性能,通过优化广泛使用的数据框架(DataFrame)。
  • methods: 作者提出了一个评估分布式数据框架操作的成本模型,并评估了他们的参考运行时实现(Cylon)在ORNL Summit超级计算机上的性能。
  • results: 作者在ORNL Summit超级计算机上评估了Cylon的性能,并显示了他们的提案可以提高数据处理管道的性能。
    Abstract The Data Science domain has expanded monumentally in both research and industry communities during the past decade, predominantly owing to the Big Data revolution. Artificial Intelligence (AI) and Machine Learning (ML) are bringing more complexities to data engineering applications, which are now integrated into data processing pipelines to process terabytes of data. Typically, a significant amount of time is spent on data preprocessing in these pipelines, and hence improving its e fficiency directly impacts the overall pipeline performance. The community has recently embraced the concept of Dataframes as the de-facto data structure for data representation and manipulation. However, the most widely used serial Dataframes today (R, pandas) experience performance limitations while working on even moderately large data sets. We believe that there is plenty of room for improvement by taking a look at this problem from a high-performance computing point of view. In a prior publication, we presented a set of parallel processing patterns for distributed dataframe operators and the reference runtime implementation, Cylon [1]. In this paper, we are expanding on the initial concept by introducing a cost model for evaluating the said patterns. Furthermore, we evaluate the performance of Cylon on the ORNL Summit supercomputer.
    摘要 “数据科学领域在过去十年内发展壮大,主要归功于大数据革命。人工智能(AI)和机器学习(ML)的应用在数据工程领域中增加了复杂性,现在已经成为数据处理管道的一部分,处理 terrabytes 的数据。通常,数据预处理过程中需要投入大量时间,因此提高预处理效率直接影响整个管道性能。社区最近广泛接受了数据帧作为数据表示和修改的标准数据结构。然而,当前最广泛使用的串行数据帧(R、pandas)在处理中规模较大的数据集时会出现性能限制。我们认为,从高性能计算的角度来看这个问题,还有很多空间提高。在先前的发表文章中,我们提出了分布式数据帧操作的并行处理模式和参考实现 Cylon 。在这篇论文中,我们将对此概念进行扩展,并提出一种成本模型来评估所提出的Pattern。此外,我们还在 ORNL Summit 超级计算机上评估了 Cylon 的性能。”

Depth video data-enabled predictions of longitudinal dairy cow body weight using thresholding and Mask R-CNN algorithms

  • paper_url: http://arxiv.org/abs/2307.01383
  • repo_url: https://github.com/yebigithub/BW_dairy
  • paper_authors: Ye Bi, Leticia M. Campos, Jin Wang, Haipeng Yu, Mark D. Hanigan, Gota Morota
  • for: 这个研究的目的是预测牛体重,并使用视频数据来预测。
  • methods: 研究使用了深度学习segmentation方法,包括单resholding、自适应阈值和Mask R-CNN等三种方法来 segment牛体从背景。
  • results: 研究发现,使用Mask R-CNN方法和线性混合模型可以获得最佳预测系数和平均绝对误差,即0.98和2.03%。此外,这种方法也在留三头牛掉cross-validation中表现最佳。
    Abstract Monitoring cow body weight is crucial to support farm management decisions due to its direct relationship with the growth, nutritional status, and health of dairy cows. Cow body weight is a repeated trait, however, the majority of previous body weight prediction research only used data collected at a single point in time. Furthermore, the utility of deep learning-based segmentation for body weight prediction using videos remains unanswered. Therefore, the objectives of this study were to predict cow body weight from repeatedly measured video data, to compare the performance of the thresholding and Mask R-CNN deep learning approaches, to evaluate the predictive ability of body weight regression models, and to promote open science in the animal science community by releasing the source code for video-based body weight prediction. A total of 40,405 depth images and depth map files were obtained from 10 lactating Holstein cows and 2 non-lactating Jersey cows. Three approaches were investigated to segment the cow's body from the background, including single thresholding, adaptive thresholding, and Mask R-CNN. Four image-derived biometric features, such as dorsal length, abdominal width, height, and volume, were estimated from the segmented images. On average, the Mask-RCNN approach combined with a linear mixed model resulted in the best prediction coefficient of determination and mean absolute percentage error of 0.98 and 2.03%, respectively, in the forecasting cross-validation. The Mask-RCNN approach was also the best in the leave-three-cows-out cross-validation. The prediction coefficients of determination and mean absolute percentage error of the Mask-RCNN coupled with the linear mixed model were 0.90 and 4.70%, respectively. Our results suggest that deep learning-based segmentation improves the prediction performance of cow body weight from longitudinal depth video data.
    摘要 监测牛体重是重要的 farm 管理决策的支持因素,因为它直接关系着牛的生长、营养状况和健康。牛体重是一个 repeating 特征,但大多数之前的体重预测研究只使用了单点时间收集的数据。此外,使用视频深度学习 segmentation 对体重预测还没有得到充分的答案。因此,本研究的目标是预测牛体重从重复测量的视频数据,比较深度学习 segmentation 和阈值分割的性能,评估体重预测模型的预测能力,并促进动物科学社区开放科学的发展。总的来说,我们获得了 10 头排 milk Holstein 牛和 2 头不排 milk Jersey 牛的 40,405 个深度图像和深度图像文件。我们 investigate 三种方法来 segment 牛的身体和背景,包括单个阈值、自适应阈值和Mask R-CNN。从 segmented 图像中,我们估算了牛身体的四个图像特征,包括背部长度、腹部宽度、高度和体积。在预测cross-validation中,Mask R-CNN 方法与直线混合模型结合得到了最好的预测 coefficient of determination 和 mean absolute percentage error,具体值分别为 0.98 和 2.03%。在离别三只牛 cross-validation 中,Mask R-CNN 方法也是最好的。预测 coefficient of determination 和 mean absolute percentage error 的值分别为 0.90 和 4.70%。我们的结果表明,使用深度学习 segmentation 可以提高牛体重预测的精度,从长itudinal depth video 数据中预测牛体重。

Shifting Attention to Relevance: Towards the Uncertainty Estimation of Large Language Models

  • paper_url: http://arxiv.org/abs/2307.01379
  • repo_url: https://github.com/jinhaoduan/shifting-attention-to-relevance
  • paper_authors: Jinhao Duan, Hao Cheng, Shiqi Wang, Chenan Wang, Alex Zavalny, Renjing Xu, Bhavya Kailkhura, Kaidi Xu
  • for: 该研究旨在解决自然语言生成模型(LLM)中的不确定性评估问题,即用户可以信任模型输出的问题。
  • methods: 该研究使用了自动进行反推的LLM,并研究了生成不均衡(generative inequalities)如何影响不确定性评估。
  • results: 实验结果表明,可以通过对更重要的(relevant)组件进行共同偏移注意力来解决生成不均衡导致的偏袋。这种方法被称为共同偏移注意力到更重要的组件(SAR)。
    Abstract Although Large Language Models (LLMs) have shown great potential in Natural Language Generation, it is still challenging to characterize the uncertainty of model generations, i.e., when users could trust model outputs. Our research is derived from the heuristic facts that tokens are created unequally in reflecting the meaning of generations by auto-regressive LLMs, i.e., some tokens are more relevant (or representative) than others, yet all the tokens are equally valued when estimating uncertainty. It is because of the linguistic redundancy where mostly a few keywords are sufficient to convey the meaning of a long sentence. We name these inequalities as generative inequalities and investigate how they affect uncertainty estimation. Our results reveal that considerable tokens and sentences containing limited semantics are weighted equally or even heavily when estimating uncertainty. To tackle these biases posed by generative inequalities, we propose to jointly Shifting Attention to more Relevant (SAR) components from both the token level and the sentence level while estimating uncertainty. We conduct experiments over popular "off-the-shelf" LLMs (e.g., OPT, LLaMA) with model sizes up to 30B and powerful commercial LLMs (e.g., Davinci from OpenAI), across various free-form question-answering tasks. Experimental results and detailed demographic analysis indicate the superior performance of SAR. Code is available at https://github.com/jinhaoduan/shifting-attention-to-relevance.
    摘要 although Large Language Models (LLMs) have shown great potential in Natural Language Generation, it is still challenging to characterize the uncertainty of model generations, i.e., when users could trust model outputs. Our research is derived from the heuristic facts that tokens are created unequally in reflecting the meaning of generations by auto-regressive LLMs, i.e., some tokens are more relevant (or representative) than others, yet all the tokens are equally valued when estimating uncertainty. It is because of the linguistic redundancy where mostly a few keywords are sufficient to convey the meaning of a long sentence. We name these inequalities as generative inequalities and investigate how they affect uncertainty estimation. Our results reveal that considerable tokens and sentences containing limited semantics are weighted equally or even heavily when estimating uncertainty. To tackle these biases posed by generative inequalities, we propose to jointly Shifting Attention to more Relevant (SAR) components from both the token level and the sentence level while estimating uncertainty. We conduct experiments over popular "off-the-shelf" LLMs (e.g., OPT, LLaMA) with model sizes up to 30B and powerful commercial LLMs (e.g., Davinci from OpenAI), across various free-form question-answering tasks. Experimental results and detailed demographic analysis indicate the superior performance of SAR. Code is available at https://github.com/jinhaoduan/shifting-attention-to-relevance.Here's the translation in Traditional Chinese:although Large Language Models (LLMs) have shown great potential in Natural Language Generation, it is still challenging to characterize the uncertainty of model generations, i.e., when users could trust model outputs. Our research is derived from the heuristic facts that tokens are created unequally in reflecting the meaning of generations by auto-regressive LLMs, i.e., some tokens are more relevant (or representative) than others, yet all the tokens are equally valued when estimating uncertainty. It is because of the linguistic redundancy where mostly a few keywords are sufficient to convey the meaning of a long sentence. We name these inequalities as generative inequalities and investigate how they affect uncertainty estimation. Our results reveal that considerable tokens and sentences containing limited semantics are weighted equally or even heavily when estimating uncertainty. To tackle these biases posed by generative inequalities, we propose to jointly Shifting Attention to more Relevant (SAR) components from both the token level and the sentence level while estimating uncertainty. We conduct experiments over popular "off-the-shelf" LLMs (e.g., OPT, LLaMA) with model sizes up to 30B and powerful commercial LLMs (e.g., Davinci from OpenAI), across various free-form question-answering tasks. Experimental results and detailed demographic analysis indicate the superior performance of SAR. Code is available at https://github.com/jinhaoduan/shifting-attention-to-relevance.

A CNN regression model to estimate buildings height maps using Sentinel-1 SAR and Sentinel-2 MSI time series

  • paper_url: http://arxiv.org/abs/2307.01378
  • repo_url: None
  • paper_authors: Ritu Yadav, Andrea Nascetti, Yifang Ban
  • For: The paper is written for urban planning, infrastructure management, and environmental analysis, with a focus on accurately estimating building heights using satellite time series data.* Methods: The paper proposes a supervised Multimodal Building Height Regression Network (MBHR-Net) that uses Sentinel-1 (S1) and Sentinel-2 (S2) satellite time series data to estimate building heights at 10m spatial resolution. The model extracts meaningful features from both S1 and S2 images to learn complex spatio-temporal relationships between image patterns and building heights.* Results: The preliminary results demonstrate the effectiveness of the MBHR-Net model in accurately estimating building heights, with a Root Mean Squared Error (RMSE) of 3.73m, an Intersection over Union (IOU) of 0.95, and an R-squared (R2) score of 0.61. These results show the potential of the model for urban planning, environmental impact analysis, and other related applications.Here is the simplified Chinese text for the three key information points:* 为:这篇论文是为城市规划、基础设施管理和环境分析而写的,旨在准确估计建筑高度使用卫星时序数据。* 方法:该论文提出一种监督式多Modal Building Height Regression Network(MBHR-Net),使用卫星时序数据Sentinel-1(S1)和Sentinel-2(S2)来估计建筑高度的10米空间分辨率。MBHR-Net利用S1和S2图像中的有价值特征来学习建筑高度与图像模式之间的复杂空间时间关系。* 结果:初步结果表明MBHR-Net模型可以准确估计建筑高度,RMSE为3.73米,IOU为0.95,R2为0.61。这些结果表明MBHR-Net模型在城市规划、环境影响分析等领域有广泛的应用前景。
    Abstract Accurate estimation of building heights is essential for urban planning, infrastructure management, and environmental analysis. In this study, we propose a supervised Multimodal Building Height Regression Network (MBHR-Net) for estimating building heights at 10m spatial resolution using Sentinel-1 (S1) and Sentinel-2 (S2) satellite time series. S1 provides Synthetic Aperture Radar (SAR) data that offers valuable information on building structures, while S2 provides multispectral data that is sensitive to different land cover types, vegetation phenology, and building shadows. Our MBHR-Net aims to extract meaningful features from the S1 and S2 images to learn complex spatio-temporal relationships between image patterns and building heights. The model is trained and tested in 10 cities in the Netherlands. Root Mean Squared Error (RMSE), Intersection over Union (IOU), and R-squared (R2) score metrics are used to evaluate the performance of the model. The preliminary results (3.73m RMSE, 0.95 IoU, 0.61 R2) demonstrate the effectiveness of our deep learning model in accurately estimating building heights, showcasing its potential for urban planning, environmental impact analysis, and other related applications.
    摘要 准确估算建筑高度是城市规划、基础设施管理和环境分析中非常重要的。在这项研究中,我们提出了一种监督式多Modal Building Height Regression Network(MBHR-Net),用于使用Sentinel-1(S1)和Sentinel-2(S2)卫星时序序数据来估算建筑高度的10米分辨率。S1提供Synthetic Aperture Radar(SAR)数据,可以提供建筑结构信息,而S2提供多spectral数据,敏感于不同的地面覆盖类型、植被生长阶段和建筑阴影。我们的MBHR-Net通过提取S1和S2图像中有用的特征来学习图像模式和建筑高度之间的复杂空间时间关系。模型在荷兰10座城市进行训练和测试。使用Root Mean Squared Error(RMSE)、Intersection over Union(IOU)和R-squared(R2) metric来评估模型的性能。初步结果(3.73米RMSE、0.95 IoU、0.61 R2)表明我们的深度学习模型在准确地估算建筑高度方面表现出色,这有助于城市规划、环境影响分析等相关应用。

Efficient Determination of Safety Requirements for Perception Systems

  • paper_url: http://arxiv.org/abs/2307.01371
  • repo_url: None
  • paper_authors: Sydney M. Katz, Anthony L. Corso, Esen Yel, Mykel J. Kochenderfer
  • for: 本研究旨在提高安全性的感知系统设计,通过将高级安全要求转化为组件级别的要求。
  • methods: 本研究使用了 Gaussian processes 和 threshold bandits 等常见黑盒估计技术,并将其结合起来开发了一种新的估计方法,称为 smoothing bandits。
  • results: 在基于视觉的飞机冲突避免问题上进行了实验,并显示了 compared to Gaussian process 和 threshold bandit 基elines,smoothing bandits 方法可以提高准确性和效率。
    Abstract Perception systems operate as a subcomponent of the general autonomy stack, and perception system designers often need to optimize performance characteristics while maintaining safety with respect to the overall closed-loop system. For this reason, it is useful to distill high-level safety requirements into component-level requirements on the perception system. In this work, we focus on efficiently determining sets of safe perception system performance characteristics given a black-box simulator of the fully-integrated, closed-loop system. We combine the advantages of common black-box estimation techniques such as Gaussian processes and threshold bandits to develop a new estimation method, which we call smoothing bandits. We demonstrate our method on a vision-based aircraft collision avoidance problem and show improvements in terms of both accuracy and efficiency over the Gaussian process and threshold bandit baselines.
    摘要 Here's the text in Simplified Chinese:感知系统作为整体自主架构的一部分,感知系统设计师需要优化性能特性,同时保持关于全关环境的安全性。为此,可以将高层级的安全需求转换为 ком成器-level的需求。在这个工作中,我们专注于使用黑盒模拟器来划出安全的感知系统性能特性集。我们结合了常见的黑盒估计技术,如 Gaussian processes 和阈值拍赌,开发了一种新的估计方法,我们称之为“缓和拍赌”。我们在基于 computer vision 的飞机回避撞击问题上进行了实验,并证明了我们的方法在精度和效率上具有改进。

Minimizing Age of Information for Mobile Edge Computing Systems: A Nested Index Approach

  • paper_url: http://arxiv.org/abs/2307.01366
  • repo_url: None
  • paper_authors: Shuo Chen, Ning Yang, Meng Zhang, Jun Wang
  • for: 实现实时应用需求,减少信息新鲜度指标 Age-of-Information (AoI) 的延迟。
  • methods: 利用移动 Edge 计算 (MEC) 技术,将任务从移动设备卸载到 Edge 节点进行计算,以提高计算效率。
  • results: 提出一种基于 Restless Multi-Arm-Bandit (RMAB) 问题的嵌入式指标框架,并设计一种嵌入式指标策略,可以提供可靠性和准确性的平衡。该策略可以减少优化率差距达40%,并且随系统缩放因子增大,可以达到下界的静态优化。
    Abstract Exploiting the computational heterogeneity of mobile devices and edge nodes, mobile edge computation (MEC) provides an efficient approach to achieving real-time applications that are sensitive to information freshness, by offloading tasks from mobile devices to edge nodes. We use the metric Age-of-Information (AoI) to evaluate information freshness. An efficient solution to minimize the AoI for the MEC system with multiple users is non-trivial to obtain due to the random computing time. In this paper, we consider multiple users offloading tasks to heterogeneous edge servers in a MEC system. We first reformulate the problem as a Restless Multi-Arm-Bandit (RMAB) problem and establish a hierarchical Markov Decision Process (MDP) to characterize the updating of AoI for the MEC system. Based on the hierarchical MDP, we propose a nested index framework and design a nested index policy with provably asymptotic optimality. Finally, the closed form of the nested index is obtained, which enables the performance tradeoffs between computation complexity and accuracy. Our algorithm leads to an optimality gap reduction of up to 40%, compared to benchmarks. Our algorithm asymptotically approximates the lower bound as the system scalar gets large enough.
    摘要 使用移动设备和边缘节点的计算多样性,移动边计算(MEC)提供了一种高效的方法来实现快速应用程序,这些应用程序需要快速获取新信息。我们使用年龄信息新鲜度(AoI)度量来评估信息新鲜度。由于移动设备到边缘节点的任务下载是随机的,因此设计一个高效的解决方案来减少AoI的差值是非常困难的。在这篇论文中,我们考虑了多个用户将任务下载到不同类型的边缘服务器。我们首先将问题转化为一个闹鼓多臂投资(RMAB)问题,然后建立了一个层次的Markov决策过程(MDP)来描述MEC系统中AoI的更新。基于层次MDP,我们提出了一个嵌入式索引框架,并设计了一个嵌入式索引策略,其可以证明是 asymptotic 优化的。最后,我们计算出嵌入式索引的闭合形式,这使得我们可以实现计算复杂度和准确性之间的性能质量负荷。我们的算法可以减少优化缺陷至多40%,相比 benchmark。我们的算法可以在系统缩放比例很大时,近似于下界。

Towards Safe Autonomous Driving Policies using a Neuro-Symbolic Deep Reinforcement Learning Approach

  • paper_url: http://arxiv.org/abs/2307.01316
  • repo_url: https://github.com/cav-research-lab/safe-reinforcement-learning-using-symbolic-logical-programming-for-autonomous-highway-driving
  • paper_authors: Iman Sharifi, Mustafa Yildirim, Saber Fallah
  • for: 本研究旨在开发一种能够在真实环境中学习自动驾驶策略,并确保安全性的神经 символи学学习方法(DRLSL)。
  • methods: 本研究使用了神经网络和 симвоlic first-order 逻辑组合,以学习自动驾驶策略。
  • results: 实验结果表明,DRLSL 方法可以在真实环境中学习自动驾驶策略,并且在训练和测试阶段都可以避免不安全的行为。此外,DRLSL 方法在新的驾驶场景中表现更好,比传统 DRL 方法更快速地训练并且更好地泛化。
    Abstract The dynamic nature of driving environments and the presence of diverse road users pose significant challenges for decision-making in autonomous driving. Deep reinforcement learning (DRL) has emerged as a popular approach to tackle this problem. However, the application of existing DRL solutions is mainly confined to simulated environments due to safety concerns, impeding their deployment in real-world. To overcome this limitation, this paper introduces a novel neuro-symbolic model-free DRL approach, called DRL with Symbolic Logics (DRLSL) that combines the strengths of DRL (learning from experience) and symbolic first-order logics (knowledge-driven reasoning) to enable safe learning in real-time interactions of autonomous driving within real environments. This innovative approach provides a means to learn autonomous driving policies by actively engaging with the physical environment while ensuring safety. We have implemented the DRLSL framework in autonomous driving using the highD dataset and demonstrated that our method successfully avoids unsafe actions during both the training and testing phases. Furthermore, our results indicate that DRLSL achieves faster convergence during training and exhibits better generalizability to new driving scenarios compared to traditional DRL methods.
    摘要 Autonomous driving 环境的动态性和多样化的道路用户带来了决策的挑战。深度让资料学习(DRL)已经成为解决这个问题的流行方法。然而,现有的 DRL 解决方案主要在虚拟环境中应用,这限制了它们在实际环境中的部署。为了解决这个问题,这篇文章介绍了一种新的神经符号学习(DRLSL)方法,它结合了 DRL 的学习从经验和符号逻辑的知识驱动理解来实现在实时交互中的自动驾驶策略学习。这种创新的方法可以在实际环境中学习自动驾驶策略,同时保证安全。我们在高D数据集上实现了 DRLSL 框架,并在训练和测试阶段都避免了不安全的行为。此外,我们的结果表明,DRLSL 在训练阶段更快地 converges 和在新驾驶场景中更好地泛化。

Self-Tuning PID Control via a Hybrid Actor-Critic-Based Neural Structure for Quadcopter Control

  • paper_url: http://arxiv.org/abs/2307.01312
  • repo_url: None
  • paper_authors: Iman Sharifi, Aria Alasty
    for: 这个研究旨在实时自适应PID控制器的设计和训练,以提高Quadrotor的态度和高度控制稳定性。methods: 这个研究使用了一个基于强迫学习的神经网络,用于自适应PID控制器的训练和决策。具体来说,这个神经网络包含了两个隐藏层和sigmoid滤波函数,并使用了Adaptive Momentum(ADAM)优化器和Back-Propagation(BP)算法进行学习。results: 研究结果显示,提案的方法在面对不确定模型参数和外部干扰时能够更加稳定和有效,并且在训练和决策过程中能够快速地适应环境变化。此外,这个方法也比常规PID控制器 WITH CONSTANT GAINS更好地表现,具体来说,它能够更好地应对Quadrotor的态度和高度控制需求。
    Abstract Proportional-Integrator-Derivative (PID) controller is used in a wide range of industrial and experimental processes. There are a couple of offline methods for tuning PID gains. However, due to the uncertainty of model parameters and external disturbances, real systems such as Quadrotors need more robust and reliable PID controllers. In this research, a self-tuning PID controller using a Reinforcement-Learning-based Neural Network for attitude and altitude control of a Quadrotor has been investigated. An Incremental PID, which contains static and dynamic gains, has been considered and only the variable gains have been tuned. To tune dynamic gains, a model-free actor-critic-based hybrid neural structure was used that was able to properly tune PID gains, and also has done the best as an identifier. In both tunning and identification tasks, a Neural Network with two hidden layers and sigmoid activation functions has been learned using Adaptive Momentum (ADAM) optimizer and Back-Propagation (BP) algorithm. This method is online, able to tackle disturbance, and fast in training. In addition to robustness to mass uncertainty and wind gust disturbance, results showed that the proposed method had a better performance when compared to a PID controller with constant gains.
    摘要 提出了一种基于循环优化器的自适应PID控制器,用于飞行器的拜投和高度控制。在实验中,通过使用一个基于循环优化器的神经网络来自适应PID控制器的变量参数,并且通过使用一个模型自由的actor-critic型神经网络来调整动态参数。在训练过程中,使用了适应动量优化器和反射传播算法来学习神经网络。这种方法在线、能够抗扰动、快速训练,并且在飞行器的拜投和高度控制中表现更好。此外,这种方法还能够抗质量不确定和风 Gust扰动的影响。Here is the word-for-word translation of the given text into Simplified Chinese:提出了一种基于循环优化器的自适应PID控制器,用于飞行器的拜投和高度控制。在实验中,通过使用一个基于循环优化器的神经网络来自适应PID控制器的变量参数,并且通过使用一个模型自由的actor-critic型神经网络来调整动态参数。在训练过程中,使用了适应动量优化器和反射传播算法来学习神经网络。这种方法在线、能够抗扰动、快速训练,并且在飞行器的拜投和高度控制中表现更好。

Reliable AI: Does the Next Generation Require Quantum Computing?

  • paper_url: http://arxiv.org/abs/2307.01301
  • repo_url: None
  • paper_authors: Aras Bacho, Holger Boche, Gitta Kutyniok
  • for: The paper explores the question of whether quantum computing is necessary for the next generation of artificial intelligence.
  • methods: The paper uses various computational models, including digital and analog computing models, to evaluate the limitations of current artificial intelligence systems and the potential benefits of quantum computing.
  • results: The paper finds that current digital computing models are limited in their ability to solve certain problems, such as optimization and deep learning, and that analog computing models may offer a way to overcome these limitations. However, even when using quantum computing models, some limitations persist.
    Abstract In this survey, we aim to explore the fundamental question of whether the next generation of artificial intelligence requires quantum computing. Artificial intelligence is increasingly playing a crucial role in many aspects of our daily lives and is central to the fourth industrial revolution. It is therefore imperative that artificial intelligence is reliable and trustworthy. However, there are still many issues with reliability of artificial intelligence, such as privacy, responsibility, safety, and security, in areas such as autonomous driving, healthcare, robotics, and others. These problems can have various causes, including insufficient data, biases, and robustness problems, as well as fundamental issues such as computability problems on digital hardware. The cause of these computability problems is rooted in the fact that digital hardware is based on the computing model of the Turing machine, which is inherently discrete. Notably, our findings demonstrate that digital hardware is inherently constrained in solving problems about optimization, deep learning, or differential equations. Therefore, these limitations carry substantial implications for the field of artificial intelligence, in particular for machine learning. Furthermore, although it is well known that the quantum computer shows a quantum advantage for certain classes of problems, our findings establish that some of these limitations persist when employing quantum computing models based on the quantum circuit or the quantum Turing machine paradigm. In contrast, analog computing models, such as the Blum-Shub-Smale machine, exhibit the potential to surmount these limitations.
    摘要 在这份调查中,我们想要探讨人工智能下一代是否需要量子计算。人工智能在我们日常生活中越来越重要,是四个工业革命中的核心。因此,人工智能的可靠性和信worthiness是非常重要。然而,人工智能的可靠性还存在许多问题,如隐私、责任、安全和安全性,在自动驾驶、医疗、机器人等领域。这些问题的原因可能包括不充分的数据、偏见和稳定性问题,以及基本问题如计算机问题。这些计算机问题的根源在于数字硬件基于图灵机制,这是不可避免的离散的。注意,我们的发现表明,数字硬件在解决优化、深度学习和梯度方程问题方面存在极大的限制。因此,这些限制对人工智能领域,特别是机器学习领域产生了深刻的影响。此外,虽然广泛认为量子计算在某些问题上具有优势,但我们的发现表明,使用量子计算模型 based on quantum circuit或量子图灵机制时,这些限制仍然存在。相比之下,分析计算模型,如Blum-Shub-Smale机器,表现出可以突破这些限制的潜力。

Pareto-Secure Machine Learning (PSML): Fingerprinting and Securing Inference Serving Systems

  • paper_url: http://arxiv.org/abs/2307.01292
  • repo_url: None
  • paper_authors: Debopam Sanyal, Jui-Tse Hung, Manav Agrawal, Prahlad Jasti, Shahab Nikkhoo, Somesh Jha, Tianhao Wang, Sibin Mohan, Alexey Tumanov
    for: 这种论文主要针对于Model-serving系统的安全性问题,具体来说是针对于模型EXTRACTION攻击的Robustness问题。methods: 该论文提出了一种新的查询减少算法,以及一种基于噪声的防御机制来对抗模型EXTRACTION攻击。results: 该论文表明,使用提出的查询减少算法和噪声防御机制可以减少模型EXTRACTION攻击的精度和准确率,同时可以保持系统的性能(goodput)在接受ABLE范围内。
    Abstract Model-serving systems have become increasingly popular, especially in real-time web applications. In such systems, users send queries to the server and specify the desired performance metrics (e.g., desired accuracy, latency). The server maintains a set of models (model zoo) in the back-end and serves the queries based on the specified metrics. This paper examines the security, specifically robustness against model extraction attacks, of such systems. Existing black-box attacks assume a single model can be repeatedly selected for serving inference requests. Modern inference serving systems break this assumption. Thus, they cannot be directly applied to extract a victim model, as models are hidden behind a layer of abstraction exposed by the serving system. An attacker can no longer identify which model she is interacting with. To this end, we first propose a query-efficient fingerprinting algorithm to enable the attacker to trigger any desired model consistently. We show that by using our fingerprinting algorithm, model extraction can have fidelity and accuracy scores within $1\%$ of the scores obtained when attacking a single, explicitly specified model, as well as up to $14.6\%$ gain in accuracy and up to $7.7\%$ gain in fidelity compared to the naive attack. Second, we counter the proposed attack with a noise-based defense mechanism that thwarts fingerprinting by adding noise to the specified performance metrics. The proposed defense strategy reduces the attack's accuracy and fidelity by up to $9.8\%$ and $4.8\%$, respectively (on medium-sized model extraction). Third, we show that the proposed defense induces a fundamental trade-off between the level of protection and system goodput, achieving configurable and significant victim model extraction protection while maintaining acceptable goodput ($>80\%$). We implement the proposed defense in a real system with plans to open source.
    摘要 现代服务系统中的模型服务系统在实时网络应用中变得越来越流行,用户可以向服务器发送查询,并指定所需的性能指标(例如精度和响应时间)。服务器将维护一个模型集(模型zoo),并根据用户的查询来提供服务。这篇论文检查这些系统的安全性,尤其是对于模型提取攻击的Robustness。现有的黑盒攻击假设可以重复选择服务器提供的模型来进行推理请求。然而,现代推理服务系统破坏了这一假设,因此无法直接应用于提取受害者模型。攻击者无法确定她正在互动的模型。为此,我们首先提出了一种高效的询问 fingerprinting 算法,允许攻击者随时触发所需的模型。我们显示,使用我们的 fingerprinting 算法可以在 $1\%$ 的精度和准确性上达到单个、显式指定的模型攻击的精度和准确性水平,并且可以在 $14.6\%$ 的精度和 $7.7\%$ 的准确性上获得更高的提取精度和准确性。其次,我们采用噪音基的防御机制来抵御 fingerprinting,通过添加噪音到指定的性能指标来推迟攻击。我们的防御策略可以在中等模型提取 task 下 reducethe attack's accuracy和fidelity by up to $9.8\%$和$4.8\%$,respectively。最后,我们表明了我们的防御策略存在一定的质量负担和系统好put的负担,可以实现可配置的和显著的受害者模型提取保护,同时维护 Acceptable 的好put ($>80\%$).我们已经实现了我们的防御策略,计划将其开源。

Fighting the disagreement in Explainable Machine Learning with consensus

  • paper_url: http://arxiv.org/abs/2307.01288
  • repo_url: None
  • paper_authors: Antonio Jesus Banegas-Luna, Carlos Martınez-Cortes, Horacio Perez-Sanchez
  • for: 了解机器学习模型内部工作方式
  • methods: 使用解释算法进行模型解释
  • results: 研究发现提出的函数比其他函数更公正、更一致、更准确地解释了五种机器学习模型。
    Abstract Machine learning (ML) models are often valued by the accuracy of their predictions. However, in some areas of science, the inner workings of models are as relevant as their accuracy. To understand how ML models work internally, the use of interpretability algorithms is the preferred option. Unfortunately, despite the diversity of algorithms available, they often disagree in explaining a model, leading to contradictory explanations. To cope with this issue, consensus functions can be applied once the models have been explained. Nevertheless, the problem is not completely solved because the final result will depend on the selected consensus function and other factors. In this paper, six consensus functions have been evaluated for the explanation of five ML models. The models were previously trained on four synthetic datasets whose internal rules were known in advance. The models were then explained with model-agnostic local and global interpretability algorithms. Finally, consensus was calculated with six different functions, including one developed by the authors. The results demonstrated that the proposed function is fairer than the others and provides more consistent and accurate explanations.
    摘要

Using BOLD-fMRI to Compute the Respiration Volume per Time (RTV) and Respiration Variation (RV) with Convolutional Neural Networks (CNN) in the Human Connectome Development Cohort

  • paper_url: http://arxiv.org/abs/2307.05426
  • repo_url: None
  • paper_authors: Abdoljalil Addeh, Fernando Vega, Rebecca J Williams, Ali Golestani, G. Bruce Pike, M. Ethan MacDonald
  • for: 降低fMRI研究成本、简化实验设备、减轻参与者负担。
  • methods: 使用一维度卷积神经网络模型,从休息BOLD信号中捕捉有用的特征,重建实际的呼吸周期和呼吸变化时间序列。
  • results: CNN模型能够从休息BOLD信号中捕捉有用的特征,重建实际的呼吸周期和呼吸变化时间序列。
    Abstract In many fMRI studies, respiratory signals are unavailable or do not have acceptable quality. Consequently, the direct removal of low-frequency respiratory variations from BOLD signals is not possible. This study proposes a one-dimensional CNN model for reconstruction of two respiratory measures, RV and RVT. Results show that a CNN can capture informative features from resting BOLD signals and reconstruct realistic RV and RVT timeseries. It is expected that application of the proposed method will lower the cost of fMRI studies, reduce complexity, and decrease the burden on participants as they will not be required to wear a respiratory bellows.
    摘要 很多fMRI研究中,呼吸信号不可用或质量不符合要求。因此,直接从BOLD信号中除掉低频呼吸变化不能进行。这项研究提议一种一维Convolutional Neural Network(CNN)模型,用于重建两个呼吸指标:RV和RVT。结果表明,CNN可以从休息BOLD信号中捕捉有用的特征,重建实际的RV和RVT时序。预计通过该方法应用,将降低fMRI研究的成本,降低复杂性,并减轻参与者的负担,他们不需要穿着呼吸膜。

NeuBTF: Neural fields for BTF encoding and transfer

  • paper_url: http://arxiv.org/abs/2307.01199
  • repo_url: None
  • paper_authors: Carlos Rodriguez-Pardo, Konstantinos Kazatzis, Jorge Lopez-Moreno, Elena Garces
  • for: 本研究提出了一种新的神经材料表示法,用于解决神经材料的固定性问题,以提高渲染效果。
  • methods: 该方法使用导航图像作为输入,用于condition神经BTF的结构特征。然后,神经BTF可以用UVs、摄像机和光照 вектор进行查询。
  • results: 该方法在多种 sintetic和捕捉的材料上实现了竞争性的压缩率,并且能够学习表示多种光学性质。
    Abstract Neural material representations are becoming a popular way to represent materials for rendering. They are more expressive than analytic models and occupy less memory than tabulated BTFs. However, existing neural materials are immutable, meaning that their output for a certain query of UVs, camera, and light vector is fixed once they are trained. While this is practical when there is no need to edit the material, it can become very limiting when the fragment of the material used for training is too small or not tileable, which frequently happens when the material has been captured with a gonioreflectometer. In this paper, we propose a novel neural material representation which jointly tackles the problems of BTF compression, tiling, and extrapolation. At test time, our method uses a guidance image as input to condition the neural BTF to the structural features of this input image. Then, the neural BTF can be queried as a regular BTF using UVs, camera, and light vectors. Every component in our framework is purposefully designed to maximize BTF encoding quality at minimal parameter count and computational complexity, achieving competitive compression rates compared with previous work. We demonstrate the results of our method on a variety of synthetic and captured materials, showing its generality and capacity to learn to represent many optical properties.
    摘要 神经材料表示法正在成为渲染中的受欢迎方法。它们比分析模型更加表达力,占用内存更少,但现有的神经材料都是不可变的,意味着它们在训练后的输出将 forever fixed。这在没有需要修改材料时是有用的,但在材料的 Fragment 太小或不可平铺时会变得非常局限。在这篇论文中,我们提出了一种新的神经材料表示法,该法同时解决了 BTF 压缩、瓦片和推理问题。在测试时,我们使用引导图像作为输入,通过神经 BTF 根据 UV、摄像头和光照向量进行访问。每个组件在我们的框架中都是为 maximize BTF 编码质量而设计,同时保持最低的参数计数和计算复杂度,与前一个工作相比,我们的压缩率具有竞争力。我们在多种 sintetic 和 captured 材料上展示了我们的方法的通用性和能力学习表示多种光学性质。

Squeezing Large-Scale Diffusion Models for Mobile

  • paper_url: http://arxiv.org/abs/2307.01193
  • repo_url: None
  • paper_authors: Jiwoong Choi, Minkyu Kim, Daehyun Ahn, Taesu Kim, Yulhwa Kim, Dongwon Jo, Hyesung Jeon, Jae-Joon Kim, Hyungjun Kim
  • for: 本文旨在探讨如何将稳定扩散模型(Stable Diffusion)部署到移动设备上,以便实现高精度图像生成。
  • methods: 本文使用了TensorFlow Lite框架来部署稳定扩散模型到移动设备上,并解决了由限制计算资源和存储空间所带来的问题。
  • results: 研究人员通过实现Mobile Stable Diffusion来降低了512x512像素图像生成的推理时间至少于7秒,在Android设备上使用移动GPU进行推理。
    Abstract The emergence of diffusion models has greatly broadened the scope of high-fidelity image synthesis, resulting in notable advancements in both practical implementation and academic research. With the active adoption of the model in various real-world applications, the need for on-device deployment has grown considerably. However, deploying large diffusion models such as Stable Diffusion with more than one billion parameters to mobile devices poses distinctive challenges due to the limited computational and memory resources, which may vary according to the device. In this paper, we present the challenges and solutions for deploying Stable Diffusion on mobile devices with TensorFlow Lite framework, which supports both iOS and Android devices. The resulting Mobile Stable Diffusion achieves the inference latency of smaller than 7 seconds for a 512x512 image generation on Android devices with mobile GPUs.
    摘要 diffusion 模型的出现对高精度图像生成带来了广泛的应用场景,从实际应用到学术研究都有了很大的进步。随着模型在各种实际应用中的推广,需要将模型部署到移动设备上的需求增长了。然而,将大 diffusion 模型(如稳定扩散),它有超过一十亿参数,部署到移动设备上具有限制的计算资源和内存资源的问题。在这篇文章中,我们介绍了将 Stable Diffusion 部署到移动设备上的挑战和解决方案,使用 TensorFlow Lite 框架支持 iOS 和 Android 设备。Mobile Stable Diffusion 实现了对 Android 设备上的移动 GPU 进行512x512像素图像生成的评估时间小于 7 秒。

SAMAug: Point Prompt Augmentation for Segment Anything Model

  • paper_url: http://arxiv.org/abs/2307.01187
  • repo_url: None
  • paper_authors: Haixing Dai, Chong Ma, Zhengliang Liu, Yiwei Li, Peng Shu, Xiaozheng Wei, Lin Zhao, Zihao Wu, Dajiang Zhu, Wei Liu, Quanzheng Li, Tianming Liu, Xiang Li
  • for: 这篇论文是为了提高交互式图像分割模型(SAM)的性能而写的。
  • methods: 这篇论文提出了一种新的视觉点增强方法,称为SAMAug,用于增强SAM的分割性能。SAMAug生成了增强点提示,以提供更多的信息给SAM。
  • results: 经过测试COCO、Fundus和Chest X-ray等数据集,研究发现SAMAug可以提高SAM的分割性能,尤其是使用最大差Entropy和Saliency模型方法时。这种方法表明了视觉提示工程可以推动交互计算机视觉模型的进步。
    Abstract This paper introduces SAMAug, a novel visual point augmentation method for the Segment Anything Model (SAM) that enhances interactive image segmentation performance. SAMAug generates augmented point prompts to provide more information to SAM. From the initial point prompt, SAM produces the initial mask, which is then fed into our proposed SAMAug to generate augmented point prompts. By incorporating these extra points, SAM can generate augmented segmentation masks based on the augmented point prompts and the initial prompt, resulting in improved segmentation performance. We evaluate four point augmentation techniques: random selection, maximum difference entropy, maximum distance, and a saliency model. Experiments on the COCO, Fundus, and Chest X-ray datasets demonstrate that SAMAug can boost SAM's segmentation results, especially using the maximum distance and saliency model methods. SAMAug underscores the potential of visual prompt engineering to advance interactive computer vision models.
    摘要 这篇论文介绍了SAMAug,一种新的视觉点增强方法,用于提高Segment Anything Model(SAM)的交互图像分割性能。SAMAug生成了增强后的点提示,以提供更多信息给SAM。从初始点提示开始,SAM生成了初始面积,然后我们提posed SAMAug将这些增强后的点提示与初始提示相结合,以生成增强后的分割面积。通过这种方式,SAM可以基于增强后的点提示和初始提示来生成更好的分割结果。我们评估了四种点增强技术:随机选择、最大差异熵、最大距离和聚光模型。在COCO、Fundus和Chest X-ray数据集上进行了实验,结果表明,使用最大距离和聚光模型方法时,SAMAug可以提高SAM的分割结果,特别是在远程图像分割任务上。SAMAug还证明了视觉提示工程的潜力,可以推动交互计算机视觉模型的进步。

PlanE: Representation Learning over Planar Graphs

  • paper_url: http://arxiv.org/abs/2307.01180
  • repo_url: https://github.com/zzysonny/plane
  • paper_authors: Radoslav Dimitrov, Zeyang Zhao, Ralph Abboud, İsmail İlkan Ceylan
  • for: 本研究的目的是设计一个可以实现完整Graph Isomorphism的planar graphs Representation Learning架构。
  • methods: 本研究使用了PlanE框架,其中包括一些可以学习完整Graph Isomorphism的planar graphs Representation Learning架构。
  • results: 实验结果表明,PlanE框架可以实现高效地learning complete invariants over planar graphs,并在well-known planar graph benchmarks上achieve multiple state-of-the-art results。
    Abstract Graph neural networks are prominent models for representation learning over graphs, where the idea is to iteratively compute representations of nodes of an input graph through a series of transformations in such a way that the learned graph function is isomorphism invariant on graphs, which makes the learned representations graph invariants. On the other hand, it is well-known that graph invariants learned by these class of models are incomplete: there are pairs of non-isomorphic graphs which cannot be distinguished by standard graph neural networks. This is unsurprising given the computational difficulty of graph isomorphism testing on general graphs, but the situation begs to differ for special graph classes, for which efficient graph isomorphism testing algorithms are known, such as planar graphs. The goal of this work is to design architectures for efficiently learning complete invariants of planar graphs. Inspired by the classical planar graph isomorphism algorithm of Hopcroft and Tarjan, we propose PlanE as a framework for planar representation learning. PlanE includes architectures which can learn complete invariants over planar graphs while remaining practically scalable. We empirically validate the strong performance of the resulting model architectures on well-known planar graph benchmarks, achieving multiple state-of-the-art results.
    摘要 граф neural networks是输入图形的表示学习模型的主要选择,其中的思想是通过一系列变换来计算输入图形中节点的表示,以确保学习的图函数是isoomorfism不变的,这使得学习的表示是图 invariants。然而,这类模型学习的图 invariants是不完整的:存在非isoomorfism的图对的标准图 neural networks无法分辨。这并不奇怪,因为普通图 isomorphism testing 是NP完备问题,但是在特殊的图类中,有高效的图 isomorphism testing 算法,如平面图。本文的目标是设计能够有效地学习平面图的完整 invariants 的架构。我们提出 PlanE 框架,它包括可以学习平面图中的完整 invariants 的建筑。我们验证了 PlanE 的实验性成果,在一些知名的平面图 bencmarks 上达到了多个 state-of-the-art 结果。

Don’t freeze: Finetune encoders for better Self-Supervised HAR

  • paper_url: http://arxiv.org/abs/2307.01168
  • repo_url: None
  • paper_authors: Vitor Fortes Rey, Dominique Nshimyimana, Paul Lukowicz
  • for: solves the labelled data availability problem in human activity recognition
  • methods: uses pretext tasks such as reconstruction or contrastive predictive coding to learn useful representations
  • results: substantial performance gains across pretext tasks, with the improvement inversely proportional to the amount of labelled data.
    Abstract Recently self-supervised learning has been proposed in the field of human activity recognition as a solution to the labelled data availability problem. The idea being that by using pretext tasks such as reconstruction or contrastive predictive coding, useful representations can be learned that then can be used for classification. Those approaches follow the pretrain, freeze and fine-tune procedure. In this paper we will show how a simple change - not freezing the representation - leads to substantial performance gains across pretext tasks. The improvement was found in all four investigated datasets and across all four pretext tasks and is inversely proportional to amount of labelled data. Moreover the effect is present whether the pretext task is carried on the Capture24 dataset or directly in unlabelled data of the target dataset.
    摘要 近些时候,自我指导学习在人体活动识别领域被提出,作为标签数据可用性问题的解决方案。这种方法通过使用预测任务,如重建或对比预测编码,来学习有用的表示。这些方法遵循“预训练、冻结并微调”的过程。在这篇论文中,我们将展示一种简单的变化,即不冻结表示,导致了重大的性能提升,并在所有四个调查dataset和所有四个预测任务中都有效。此外,这种效果是无论预测任务是在Capture24 dataset上进行还是直接在无标签数据中进行的。

Human in the AI loop via xAI and Active Learning for Visual Inspection

  • paper_url: http://arxiv.org/abs/2307.05508
  • repo_url: None
  • paper_authors: Jože M. Rožanec, Elias Montini, Vincenzo Cutrona, Dimitrios Papamartzivanos, Timotej Klemenčič, Blaž Fortuna, Dunja Mladenić, Entso Veliou, Thanassis Giannetsos, Christos Emmanouilidis
  • for: 这篇论文主要是关于工业革命的影响和人机合作的研究。
  • methods: 论文使用了活动学习和可解释人工智能等两个人工智能子领域,以实现人机合作。
  • results: 论文提出了人机合作在视觉检测方面的可能性,并在欧盟H2020星计划中获得了一些关于视觉检测的结果,包括人工智能、人数字双生和网络安全等方面的研究。
    Abstract Industrial revolutions have historically disrupted manufacturing by introducing automation into production. Increasing automation reshapes the role of the human worker. Advances in robotics and artificial intelligence open new frontiers of human-machine collaboration. Such collaboration can be realized considering two sub-fields of artificial intelligence: active learning and explainable artificial intelligence. Active learning aims to devise strategies that help obtain data that allows machine learning algorithms to learn better. On the other hand, explainable artificial intelligence aims to make the machine learning models intelligible to the human person. The present work first describes Industry 5.0, human-machine collaboration, and state-of-the-art regarding quality inspection, emphasizing visual inspection. Then it outlines how human-machine collaboration could be realized and enhanced in visual inspection. Finally, some of the results obtained in the EU H2020 STAR project regarding visual inspection are shared, considering artificial intelligence, human digital twins, and cybersecurity.
    摘要 工业革命历史上都会对制造进行重大的变革,通过引入自动化技术来提高生产效率。随着机器人和人工智能的发展,人工与机器之间的合作被打开了新的前ier。这种合作可以通过两个人工智能的子领域来实现:活动学习和可 explainable artificial intelligence。活动学习的目标是开发出用于帮助机器学习算法学习的策略,而可 explainable artificial intelligence的目标是使机器学习模型对人类更加明了。本文首先描述了第五代工业革命(Industry 5.0)、人机合作和当前在质量检查方面的状况,特别是视觉检查。然后,它详细介绍了如何通过人机合作来增强视觉检查。最后,本文分享了在欧盟H2020 STAR项目中关于视觉检查的一些结果,包括人工智能、人数字双和安全性。

Soft Gripping: Specifying for Trustworthiness

  • paper_url: http://arxiv.org/abs/2307.01159
  • repo_url: None
  • paper_authors: Dhaminda B. Abeywickrama, Nguyen Hao Le, Greg Chance, Peter D. Winter, Arianna Manzini, Alix J. Partridge, Jonathan Ives, John Downer, Graham Deacon, Jonathan Rossiter, Kerstin Eder, Shane Windsor
  • for: 这篇论文主要用于推动软体机器人技术的广泛应用,提高软体机器人的可靠性和信任性。
  • methods: 该论文提出了对软体机器人系统的规范化需求,包括功能性和非功能性需求,如可靠性、安全性、适应性、预测性、伦理和法规要求。
  • results: 该论文提出了一个广泛的软体机器人抓取器的规范,用于快递卸载各种商品。该规范覆盖了软体机器人抓取器的功能和非功能需求,以提高软体机器人的可靠性和信任性。
    Abstract Soft robotics is an emerging technology in which engineers create flexible devices for use in a variety of applications. In order to advance the wide adoption of soft robots, ensuring their trustworthiness is essential; if soft robots are not trusted, they will not be used to their full potential. In order to demonstrate trustworthiness, a specification needs to be formulated to define what is trustworthy. However, even for soft robotic grippers, which is one of the most mature areas in soft robotics, the soft robotics community has so far given very little attention to formulating specifications. In this work, we discuss the importance of developing specifications during development of soft robotic systems, and present an extensive example specification for a soft gripper for pick-and-place tasks for grocery items. The proposed specification covers both functional and non-functional requirements, such as reliability, safety, adaptability, predictability, ethics, and regulations. We also highlight the need to promote verifiability as a first-class objective in the design of a soft gripper.
    摘要

Theory of Mind as Intrinsic Motivation for Multi-Agent Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2307.01158
  • repo_url: None
  • paper_authors: Ini Oguntola, Joseph Campbell, Simon Stepputtis, Katia Sycara
  • for: 提高人工智能在多代理器中的社会智能能力,使其能够模拟人类的心理状态。
  • methods: 使用深度网络模型精准地表示政策,并将含义深刻的信仰ground在政策中。
  • results: 在混合合作竞争环境中实现了初步的实验成果。
    Abstract The ability to model the mental states of others is crucial to human social intelligence, and can offer similar benefits to artificial agents with respect to the social dynamics induced in multi-agent settings. We present a method of grounding semantically meaningful, human-interpretable beliefs within policies modeled by deep networks. We then consider the task of 2nd-order belief prediction. We propose that ability of each agent to predict the beliefs of the other agents can be used as an intrinsic reward signal for multi-agent reinforcement learning. Finally, we present preliminary empirical results in a mixed cooperative-competitive environment.
    摘要 人类社交智能中能模拟他人的心理状态是关键,可以为人工智能 Agent 在多个 Agent 设置中带来类似的 beneficial 效果。我们提出了将 semantically meaningful 和 human-interpretable 的 belief 嵌入 deep network 中的策略中。然后,我们考虑了第二阶段的 belief 预测任务。我们认为每个 Agent 可以使用它他人的 belief 预测作为多 Agent 学习 reinforcement 中的潜在奖励信号。最后,我们提供了一些初步的实验结果在混合合作-竞争环境中。Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you prefer Traditional Chinese, I can provide that as well.

SCITUNE: Aligning Large Language Models with Scientific Multimodal Instructions

  • paper_url: http://arxiv.org/abs/2307.01139
  • repo_url: https://github.com/lupantech/ScienceQA
  • paper_authors: Sameera Horawalavithana, Sai Munikoti, Ian Stewart, Henry Kvinge
  • for: 这个论文主要是为了提高大型语言模型(LLM)与科学领域的整合。
  • methods: 这篇论文使用了一种名为SciTune的调整框架,以提高LLM的科学多Modal指令遵循能力。
  • results: 与机器生成数据进行finetuning的模型相比,LLaMA-SciTune在科学问答 benchMark中的表现平均高于人类表现,并在多个子类中也达到了人类水平。
    Abstract Instruction finetuning is a popular paradigm to align large language models (LLM) with human intent. Despite its popularity, this idea is less explored in improving the LLMs to align existing foundation models with scientific disciplines, concepts and goals. In this work, we present SciTune as a tuning framework to improve the ability of LLMs to follow scientific multimodal instructions. To test our methodology, we use a human-generated scientific instruction tuning dataset and train a large multimodal model LLaMA-SciTune that connects a vision encoder and LLM for science-focused visual and language understanding. In comparison to the models that are finetuned with machine generated data only, LLaMA-SciTune surpasses human performance on average and in many sub-categories on the ScienceQA benchmark.
    摘要 科学定uning是一种流行的思想,用于将大型语言模型(LLM)与人类意图相对应。尽管这个想法在改进现有基础模型方面得到了更多的关注,但是它在科学领域中的应用还很少。在这项工作中,我们提出了SciTune作为一种调整框架,用于改进LLM的遵循科学多模式指令的能力。为测试我们的方法ологи,我们使用了人类生成的科学指令调整数据集,并训练了一个大型多modal模型LLaMA-SciTune,该模型连接了视觉编码器和LLM,用于科学领域中的视觉和语言理解。与由机器生成数据进行Finetuning的模型相比,LLaMA-SciTune在科学问答标准 bencmark上平均和许多子类型方面超越了人类表现。

Exploring the In-context Learning Ability of Large Language Model for Biomedical Concept Linking

  • paper_url: http://arxiv.org/abs/2307.01137
  • repo_url: None
  • paper_authors: Qinyong Wang, Zhenxiang Gao, Rong Xu
  • for: This research aims to explore the effectiveness of large language models (LLMs) in biomedical concept mapping, specifically in the task of biomedical concept linking.
  • methods: The proposed approach uses a two-stage retrieve-and-rank framework that leverages in-context learning (ICL) capabilities of LLMs. The approach first embeds biomedical concepts using language models, and then uses embedding similarity to retrieve the top candidates. The contextual information of these candidates is incorporated into the prompt and processed by a large language model to re-rank the concepts.
  • results: The approach achieved an accuracy of 90.% in BC5CDR disease entity normalization and 94.7% in chemical entity normalization, demonstrating competitive performance relative to supervised learning methods. Additionally, it showed a significant improvement (over 20-point absolute increase in F1 score) on an oncology matching dataset.
    Abstract The biomedical field relies heavily on concept linking in various areas such as literature mining, graph alignment, information retrieval, question-answering, data, and knowledge integration. Although large language models (LLMs) have made significant strides in many natural language processing tasks, their effectiveness in biomedical concept mapping is yet to be fully explored. This research investigates a method that exploits the in-context learning (ICL) capabilities of large models for biomedical concept linking. The proposed approach adopts a two-stage retrieve-and-rank framework. Initially, biomedical concepts are embedded using language models, and then embedding similarity is utilized to retrieve the top candidates. These candidates' contextual information is subsequently incorporated into the prompt and processed by a large language model to re-rank the concepts. This approach achieved an accuracy of 90.% in BC5CDR disease entity normalization and 94.7% in chemical entity normalization, exhibiting a competitive performance relative to supervised learning methods. Further, it showed a significant improvement, with an over 20-point absolute increase in F1 score on an oncology matching dataset. Extensive qualitative assessments were conducted, and the benefits and potential shortcomings of using large language models within the biomedical domain were discussed. were discussed.
    摘要 生物医学领域强调概念链接在文献检索、图像对alignment、信息检索、问答系统、数据和知识 интеграции等方面发挥重要作用。虽然大型自然语言处理模型(LLM)在许多自然语言处理任务中取得了 significiant进步,但它们在生物医学概念映射方面的效果尚未得到完全探索。本研究探讨了一种利用大型模型的在场学习(ICL)能力进行生物医学概念链接的方法。该方法采用了两个阶段的 retrieve-and-rank框架。首先,生物医学概念被使用语言模型进行嵌入,然后使用嵌入相似性来 retrieve top candidates。这些候选的上下文信息然后被 incorporated 到提示中,并被一个大型语言模型处理以重新排名概念。该方法在 BC5CDR 疾病实体Normalization 和化学实体Normalization 中实现了 90% 的准确率和 94.7% 的准确率,与超级vised learning方法相当。此外,它在生物医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学医学�

ChatGPT vs. Google: A Comparative Study of Search Performance and User Experience

  • paper_url: http://arxiv.org/abs/2307.01135
  • repo_url: None
  • paper_authors: Ruiyun Xu, Yue Feng, Hailiang Chen
  • for: investigate the differences in user behavior when employing search engines and chatbot tools for information-seeking tasks
  • methods: randomized online experiment, ChatGPT-like tool and Google Search-like tool
  • results: ChatGPT group consistently spends less time on all tasks, ChatGPT levels user search performance across different education levels, perceived information quality and user experience are better with ChatGPT, but may also lead to overreliance and generate or replicate misinformation.Here is the same information in Simplified Chinese text:
  • for: 研究用户在使用搜索引擎和 чат机器人工具时的行为差异
  • methods: 随机在线实验,使用ChatGPT类工具和Google搜索类工具
  • results: ChatGPT组的时间投入都比较短,ChatGPT在不同教育水平上的搜索性能相似,但可能会导致过依赖和生成或复制错误信息。
    Abstract The advent of ChatGPT, a large language model-powered chatbot, has prompted questions about its potential implications for traditional search engines. In this study, we investigate the differences in user behavior when employing search engines and chatbot tools for information-seeking tasks. We carry out a randomized online experiment, dividing participants into two groups: one using a ChatGPT-like tool and the other using a Google Search-like tool. Our findings reveal that the ChatGPT group consistently spends less time on all tasks, with no significant difference in overall task performance between the groups. Notably, ChatGPT levels user search performance across different education levels and excels in answering straightforward questions and providing general solutions but falls short in fact-checking tasks. Users perceive ChatGPT's responses as having higher information quality compared to Google Search, despite displaying a similar level of trust in both tools. Furthermore, participants using ChatGPT report significantly better user experiences in terms of usefulness, enjoyment, and satisfaction, while perceived ease of use remains comparable between the two tools. However, ChatGPT may also lead to overreliance and generate or replicate misinformation, yielding inconsistent results. Our study offers valuable insights for search engine management and highlights opportunities for integrating chatbot technologies into search engine designs.
    摘要 《ChatGPT的出现:一项研究 traditional search engines的影响》Introduction:随着ChatGPT的出现,它使得用户对传统搜索引擎的使用方式和功能表现出了不同的需求和预期。本研究旨在探讨用户在使用搜索引擎和ChatGPT工具时的行为差异。我们采用了随机在线实验,将参与者分为两组:一组使用ChatGPT类工具,另一组使用Google搜索类工具。我们的发现表明,ChatGPT组的用户在所有任务上的时间投入相对较少,没有显著差异在总任务表现水平之间。另外,ChatGPT在简单问题和通用解决方案方面表现出色,但在事实核查任务方面表现不佳。用户对ChatGPT的回答评价为高信息质量,尽管显示类似的信任度。此外,使用ChatGPT的参与者对工具的用户体验比使用Google搜索更高,包括有用性、愉悦度和满意度,但易用性认知不异。然而,ChatGPT也可能导致过度依赖和生成或复制错误信息,导致不一致的结果。本研究对搜索引擎管理提供了有价值的发现,同时也探讨了将chatbot技术integrated into search engine designs的可能性。Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. Traditional Chinese is used in Hong Kong, Macau, and Taiwan.

Iterative Zero-Shot LLM Prompting for Knowledge Graph Construction

  • paper_url: http://arxiv.org/abs/2307.01128
  • repo_url: None
  • paper_authors: Salvatore Carta, Alessandro Giuliani, Leonardo Piano, Alessandro Sebastian Podda, Livio Pompianu, Sandro Gabriele Tiddia
  • for: 本研究旨在提出一种可扩展和灵活的知识图生成方法,以解决现有知识图生成技术的瓶颈和局限性。
  • methods: 该方法基于最新的生成大语言模型GPT-3.5,包括迭代提示策略和外部知识无关策略,以解决知识图生成过程中的主要挑战。
  • results: 实验结果表明,该方法可以有效地生成高质量的知识图,并且可以应对不同的应用场景。
    Abstract In the current digitalization era, capturing and effectively representing knowledge is crucial in most real-world scenarios. In this context, knowledge graphs represent a potent tool for retrieving and organizing a vast amount of information in a properly interconnected and interpretable structure. However, their generation is still challenging and often requires considerable human effort and domain expertise, hampering the scalability and flexibility across different application fields. This paper proposes an innovative knowledge graph generation approach that leverages the potential of the latest generative large language models, such as GPT-3.5, that can address all the main critical issues in knowledge graph building. The approach is conveyed in a pipeline that comprises novel iterative zero-shot and external knowledge-agnostic strategies in the main stages of the generation process. Our unique manifold approach may encompass significant benefits to the scientific community. In particular, the main contribution can be summarized by: (i) an innovative strategy for iteratively prompting large language models to extract relevant components of the final graph; (ii) a zero-shot strategy for each prompt, meaning that there is no need for providing examples for "guiding" the prompt result; (iii) a scalable solution, as the adoption of LLMs avoids the need for any external resources or human expertise. To assess the effectiveness of our proposed model, we performed experiments on a dataset that covered a specific domain. We claim that our proposal is a suitable solution for scalable and versatile knowledge graph construction and may be applied to different and novel contexts.
    摘要 在当今数字化时代,捕捉并有效地表达知识是许多实际场景中的关键。在这个 контексте,知识图表示一种可观之的工具,可以快速地检索和组织大量信息,并将其拼接成可读可写的结构。然而,知识图的生成仍然是一个挑战,经常需要大量的人工劳动和领域专业知识,从而限制了扩展性和灵活性在不同应用领域。这篇论文提出了一种创新的知识图生成方法,利用最新的生成大语言模型,如GPT-3.5,解决了知识图生成中的主要挑战。该方法通过一个包含新的迭代零shot和外部知识无关策略的管道来实现。我们的独特 manifoldapproach可能带来了科学社区的重要收益。具体来说,主要贡献可以概括为:(i)一种创新的迭代Prompt大语言模型中提取 relevante组件的策略;(ii)每个Prompt不需要提供示例,即零shot策略;(iii)可扩展的解决方案,因为采用LLMs可以避免任何外部资源或人类专业知识的需求。为评估我们提出的模型效果,我们在特定领域中进行了实验。我们宣称,我们的提案是一种可扩展和多样化的知识图生成方法,可以应用于不同的和新的上下文。