cs.AI - 2023-08-03

The Capability of Large Language Models to Measure Psychiatric Functioning

  • paper_url: http://arxiv.org/abs/2308.01834
  • repo_url: None
  • paper_authors: Isaac R. Galatzer-Levy, Daniel McDuff, Vivek Natarajan, Alan Karthikesalingam, Matteo Malgaroli
  • for: This paper aims to investigate the ability of Large language models (LLMs) to predict psychiatric functioning from patient interviews and clinical descriptions without explicit training.
  • methods: The study uses Med-PaLM 2, a large language model explicitly trained on a large corpus of medical knowledge, to predict psychiatric functioning based on patient interviews and clinical descriptions.
  • results: The study finds that Med-PaLM 2 is capable of assessing psychiatric functioning across a range of psychiatric conditions, with the strongest performance in predicting depression scores based on standardized assessments. The results show the potential for general clinical language models to flexibly predict psychiatric risk based on free descriptions of functioning from both patients and clinicians.Here is the simplified Chinese version of the three key points:
  • for: 这篇论文旨在研究 Large language models (LLMs) 是否可以通过 patient 访问和临床描述来预测 психи治疗功能。
  • methods: 该研究使用 Med-PaLM 2,一个大型语言模型,通过 patient 访问和临床描述来预测 psycho 功能。
  • results: 研究发现 Med-PaLM 2 可以在各种 psycho 疾病中评估 psycho 功能,最强的表现在标准化评估中预测抑郁 scores,其准确率在 0.80 - 0.84 之间,与人类临床评估人员的准确率无 statistically distinguishable difference(t(1,144) = 1.20; p = 0.23),表明大型临床语言模型可以通过自由描述来预测 psycho 风险。
    Abstract The current work investigates the capability of Large language models (LLMs) that are explicitly trained on large corpuses of medical knowledge (Med-PaLM 2) to predict psychiatric functioning from patient interviews and clinical descriptions without being trained to do so. To assess this, n = 145 depression and n =115 PTSD assessments and n = 46 clinical case studies across high prevalence/high comorbidity disorders (Depressive, Anxiety, Psychotic, trauma and stress, Addictive disorders) were analyzed using prompts to extract estimated clinical scores and diagnoses. Results demonstrate that Med-PaLM 2 is capable of assessing psychiatric functioning across a range of psychiatric conditions with the strongest performance being the prediction of depression scores based on standardized assessments (Accuracy range= 0.80 - 0.84) which were statistically indistinguishable from human clinical raters t(1,144) = 1.20; p = 0.23. Results show the potential for general clinical language models to flexibly predict psychiatric risk based on free descriptions of functioning from both patients and clinicians.
    摘要

Learning beyond sensations: how dreams organize neuronal representations

  • paper_url: http://arxiv.org/abs/2308.01830
  • repo_url: None
  • paper_authors: Nicolas Deperrois, Mihai A. Petrovici, Walter Senn, Jakob Jordan
  • for: 这 paper 探讨了大脑中高级感觉 cortices 中 semantic 表示的形成和维护机制,以及这些表示如何影响行为。
  • methods: 这 paper 使用了 predictive learning 理论和虚拟经验来解释 cortical 表示的形成和维护。
  • results: 这 paper 提出了 two complementary learning principles,即 “adversarial dreaming” 和 “contrastive dreaming”,这些原理可以解释 cortical 学习 beyond classical predictive learning paradigm.
    Abstract Semantic representations in higher sensory cortices form the basis for robust, yet flexible behavior. These representations are acquired over the course of development in an unsupervised fashion and continuously maintained over an organism's lifespan. Predictive learning theories propose that these representations emerge from predicting or reconstructing sensory inputs. However, brains are known to generate virtual experiences, such as during imagination and dreaming, that go beyond previously experienced inputs. Here, we suggest that virtual experiences may be just as relevant as actual sensory inputs in shaping cortical representations. In particular, we discuss two complementary learning principles that organize representations through the generation of virtual experiences. First, "adversarial dreaming" proposes that creative dreams support a cortical implementation of adversarial learning in which feedback and feedforward pathways engage in a productive game of trying to fool each other. Second, "contrastive dreaming" proposes that the invariance of neuronal representations to irrelevant factors of variation is acquired by trying to map similar virtual experiences together via a contrastive learning process. These principles are compatible with known cortical structure and dynamics and the phenomenology of sleep thus providing promising directions to explain cortical learning beyond the classical predictive learning paradigm.
    摘要

Hard Adversarial Example Mining for Improving Robust Fairness

  • paper_url: http://arxiv.org/abs/2308.01823
  • repo_url: None
  • paper_authors: Chenhao Lin, Xiang Ji, Yulong Yang, Qian Li, Chao Shen, Run Wang, Liming Fang
  • for: 本研究旨在提高深度神经网络(DNN)对假数据点(Adversarial Example,AE)的Robustness,同时解决隐性假数据点的不公平问题。
  • methods: 本研究提出了一种简单 yet effective的框架,即适应性 Hard Adversarial example Mining(HAM),通过适应性地挖掘硬AE来提高AT的效果。
  • results: 实验结果表明,HAM在CIFAR-10、SVHN和Imagenette等三个 benchmark上都达到了显著的改善robust fairness的效果,同时降低了计算成本。
    Abstract Adversarial training (AT) is widely considered the state-of-the-art technique for improving the robustness of deep neural networks (DNNs) against adversarial examples (AE). Nevertheless, recent studies have revealed that adversarially trained models are prone to unfairness problems, restricting their applicability. In this paper, we empirically observe that this limitation may be attributed to serious adversarial confidence overfitting, i.e., certain adversarial examples with overconfidence. To alleviate this problem, we propose HAM, a straightforward yet effective framework via adaptive Hard Adversarial example Mining.HAM concentrates on mining hard adversarial examples while discarding the easy ones in an adaptive fashion. Specifically, HAM identifies hard AEs in terms of their step sizes needed to cross the decision boundary when calculating loss value. Besides, an early-dropping mechanism is incorporated to discard the easy examples at the initial stages of AE generation, resulting in efficient AT. Extensive experimental results on CIFAR-10, SVHN, and Imagenette demonstrate that HAM achieves significant improvement in robust fairness while reducing computational cost compared to several state-of-the-art adversarial training methods. The code will be made publicly available.
    摘要 adversarial 训练(AT)是深度神经网络(DNN)的鲁棒性提升技术,却在实际应用中存在不公平问题。 recent studies have shown that adversarially trained models are prone to unfairness problems, limiting their applicability. In this paper, we empirically observe that this limitation may be attributed to serious adversarial confidence overfitting, i.e., certain adversarial examples with overconfidence. To alleviate this problem, we propose HAM, a straightforward yet effective framework via adaptive Hard Adversarial example Mining.HAM concentrates on mining hard adversarial examples while discarding the easy ones in an adaptive fashion. Specifically, HAM identifies hard AEs in terms of their step sizes needed to cross the decision boundary when calculating loss value. Besides, an early-dropping mechanism is incorporated to discard the easy examples at the initial stages of AE generation, resulting in efficient AT. extensive experimental results on CIFAR-10, SVHN, and Imagenette demonstrate that HAM achieves significant improvement in robust fairness while reducing computational cost compared to several state-of-the-art adversarial training methods. The code will be made publicly available.

Deep Neural Networks Fused with Textures for Image Classification

  • paper_url: http://arxiv.org/abs/2308.01813
  • repo_url: None
  • paper_authors: Asish Bera, Debotosh Bhattacharjee, Mita Nasipuri
  • for: solves fine-grained image classification (FGIC) challenges by combining global texture with local patch-based information.
  • methods: 使用了批量模型(LSTM)和本地binary pattern(LBP)计算图像级别的文字特征,并将两条流水线结合为一个高效的特征向量。
  • results: 在 eight datasets 上(人脸、皮肤病变、食物、海洋生物等)使用四种标准底层CNN模型,实现了与现有方法相比的更高的分类精度。
    Abstract Fine-grained image classification (FGIC) is a challenging task in computer vision for due to small visual differences among inter-subcategories, but, large intra-class variations. Deep learning methods have achieved remarkable success in solving FGIC. In this paper, we propose a fusion approach to address FGIC by combining global texture with local patch-based information. The first pipeline extracts deep features from various fixed-size non-overlapping patches and encodes features by sequential modelling using the long short-term memory (LSTM). Another path computes image-level textures at multiple scales using the local binary patterns (LBP). The advantages of both streams are integrated to represent an efficient feature vector for image classification. The method is tested on eight datasets representing the human faces, skin lesions, food dishes, marine lives, etc. using four standard backbone CNNs. Our method has attained better classification accuracy over existing methods with notable margins.
    摘要 《细腔化图像分类(FGIC)是计算机视觉中的一项挑战,因为小视觉差异在间类之间很大,但是内类变化很大。深度学习方法在解决FGIC中得到了非常成功。在这篇论文中,我们提出了一种混合方法来解决FGIC,将全像Texture与本地小块信息混合。首条管道从不同大小的非重叠区域提取深度特征,然后使用长期短时间记忆(LSTM)编码特征。另一条管道在多尺度使用本地二进制模式(LBP)计算图像级别的Texture。两条管道的优点被集成,形成高效的特征向量,用于图像分类。我们在八个数据集上进行了测试,包括人脸、皮肤病变、食物碟、海洋生物等,使用四种标准背部CNN。我们的方法在现有方法中达到了更好的分类精度,差异较大。》Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.

Job Shop Scheduling via Deep Reinforcement Learning: a Sequence to Sequence approach

  • paper_url: http://arxiv.org/abs/2308.01797
  • repo_url: https://github.com/dawoz/JSP-DeepRL-Seq2Seq
  • paper_authors: Giovanni Bonetta, Davide Zago, Rossella Cancelliere, Andrea Grosso
  • for: 该论文旨在提出一种基于深度学习的Job调度算法,以自动学习调度规则。
  • methods: 该方法基于自然语言编码器-解码器模型,并未在任何其他任务中使用。
  • results: 实验结果表明,该方法可以超越多种传统优先级调度规则,并与当前最佳深度学习方法相比 display competitive 的结果。
    Abstract Job scheduling is a well-known Combinatorial Optimization problem with endless applications. Well planned schedules bring many benefits in the context of automated systems: among others, they limit production costs and waste. Nevertheless, the NP-hardness of this problem makes it essential to use heuristics whose design is difficult, requires specialized knowledge and often produces methods tailored to the specific task. This paper presents an original end-to-end Deep Reinforcement Learning approach to scheduling that automatically learns dispatching rules. Our technique is inspired by natural language encoder-decoder models for sequence processing and has never been used, to the best of our knowledge, for scheduling purposes. We applied and tested our method in particular to some benchmark instances of Job Shop Problem, but this technique is general enough to be potentially used to tackle other different optimal job scheduling tasks with minimal intervention. Results demonstrate that we outperform many classical approaches exploiting priority dispatching rules and show competitive results on state-of-the-art Deep Reinforcement Learning ones.
    摘要 This paper presents a novel end-to-end deep reinforcement learning approach to job scheduling that automatically learns dispatching rules. Our technique is inspired by natural language encoder-decoder models for sequence processing and has never been used for scheduling purposes, to the best of our knowledge. We applied and tested our method on some benchmark instances of the job shop problem, but it is general enough to be potentially used to tackle other optimal job scheduling tasks with minimal intervention.Our results demonstrate that we outperform many classical approaches that use priority dispatching rules and show competitive results with state-of-the-art deep reinforcement learning methods.

Guided Distillation for Semi-Supervised Instance Segmentation

  • paper_url: http://arxiv.org/abs/2308.02668
  • repo_url: None
  • paper_authors: Tariq Berrada, Camille Couprie, Karteek Alahari, Jakob Verbeek
  • for: 提高实体 segmentation 模型的性能,减少依赖于完全标注的训练图像。
  • methods: 使用 semi-supervised 方法,利用无标注数据作为训练信号,限制模型过拟合标注样本。
  • results: 提高 teacher-student 填充模型的性能,在 Cityscapes 数据集上提高 mask-AP 从 23.7 到 33.9,在 COCO 数据集上提高 mask-AP 从 18.3 到 34.1,对比前一个状态的艺术。
    Abstract Although instance segmentation methods have improved considerably, the dominant paradigm is to rely on fully-annotated training images, which are tedious to obtain. To alleviate this reliance, and boost results, semi-supervised approaches leverage unlabeled data as an additional training signal that limits overfitting to the labeled samples. In this context, we present novel design choices to significantly improve teacher-student distillation models. In particular, we (i) improve the distillation approach by introducing a novel "guided burn-in" stage, and (ii) evaluate different instance segmentation architectures, as well as backbone networks and pre-training strategies. Contrary to previous work which uses only supervised data for the burn-in period of the student model, we also use guidance of the teacher model to exploit unlabeled data in the burn-in period. Our improved distillation approach leads to substantial improvements over previous state-of-the-art results. For example, on the Cityscapes dataset we improve mask-AP from 23.7 to 33.9 when using labels for 10\% of images, and on the COCO dataset we improve mask-AP from 18.3 to 34.1 when using labels for only 1\% of the training data.
    摘要 尽管实例分割方法已经有了很大的进步,但主流的方法仍然是通过完全标注的图像进行训练,这是费时的。为了减轻这种依赖,并提高结果,半超vised方法利用无标注数据作为训练信号,限制学习到标注样本上的过拟合。在这种情况下,我们提出了新的设计选择,以提高教师学生液态模型。具体来说,我们(i)改进了液态模型的适应方法,引入了一种新的“导向燃烧”阶段,以及(ii)评估不同的实例分割架构、背部网络和预训练策略。与前一些工作不同,我们在学生模型的烧入期间也使用导师模型的指导,以便利用无标注数据。我们改进的液态模型方法导致了对前一个状态的重大提高。例如,在Cityscapes dataset上,我们从23.7提高到33.9的mask-AP,并在COCO dataset上从18.3提高到34.1的mask-AP,只使用1%的训练数据上的标签。

MAP: A Model-agnostic Pretraining Framework for Click-through Rate Prediction

  • paper_url: http://arxiv.org/abs/2308.01737
  • repo_url: https://github.com/chiangel/map-code
  • paper_authors: Jianghao Lin, Yanru Qu, Wei Guo, Xinyi Dai, Ruiming Tang, Yong Yu, Weinan Zhang
  • for: 这 paper 是为了解决个性化在线服务的点击率预测问题,因为现有的 neural 模型 无法充分利用大量的用户点击记录数据。
  • methods: 这 paper 使用了自适应学习 paradigm,并提出了两种实用算法:偏挥特征预测 (MFP) 和替换特征检测 (RFD),以利用大量的用户点击记录数据来提高点击率预测性能。
  • results: 实验结果表明,使用 MFP 和 RFD 可以在两个实际大规模数据集 (i.e., Avazu, Criteo) 上 achieve 新的州际纪录性能,并在多个强大的 backbone 上达到新的最佳性能。
    Abstract With the widespread application of personalized online services, click-through rate (CTR) prediction has received more and more attention and research. The most prominent features of CTR prediction are its multi-field categorical data format, and vast and daily-growing data volume. The large capacity of neural models helps digest such massive amounts of data under the supervised learning paradigm, yet they fail to utilize the substantial data to its full potential, since the 1-bit click signal is not sufficient to guide the model to learn capable representations of features and instances. The self-supervised learning paradigm provides a more promising pretrain-finetune solution to better exploit the large amount of user click logs, and learn more generalized and effective representations. However, self-supervised learning for CTR prediction is still an open question, since current works on this line are only preliminary and rudimentary. To this end, we propose a Model-agnostic pretraining (MAP) framework that applies feature corruption and recovery on multi-field categorical data, and more specifically, we derive two practical algorithms: masked feature prediction (MFP) and replaced feature detection (RFD). MFP digs into feature interactions within each instance through masking and predicting a small portion of input features, and introduces noise contrastive estimation (NCE) to handle large feature spaces. RFD further turns MFP into a binary classification mode through replacing and detecting changes in input features, making it even simpler and more effective for CTR pretraining. Our extensive experiments on two real-world large-scale datasets (i.e., Avazu, Criteo) demonstrate the advantages of these two methods on several strong backbones (e.g., DCNv2, DeepFM), and achieve new state-of-the-art performance in terms of both effectiveness and efficiency for CTR prediction.
    摘要 To address this issue, we propose a model-agnostic pretraining (MAP) framework that applies feature corruption and recovery on multi-field categorical data. Specifically, we derive two practical algorithms: masked feature prediction (MFP) and replaced feature detection (RFD). MFP digs into feature interactions within each instance through masking and predicting a small portion of input features, and introduces noise contrastive estimation (NCE) to handle large feature spaces. RFD further turns MFP into a binary classification mode through replacing and detecting changes in input features, making it simpler and more effective for CTR pretraining.Our extensive experiments on two real-world large-scale datasets (i.e., Avazu, Criteo) demonstrate the advantages of these two methods on several strong backbones (e.g., DCNv2, DeepFM), and achieve new state-of-the-art performance in terms of both effectiveness and efficiency for CTR prediction.

Towards Self-organizing Personal Knowledge Assistants in Evolving Corporate Memories

  • paper_url: http://arxiv.org/abs/2308.01732
  • repo_url: None
  • paper_authors: Christian Jilek, Markus Schröder, Heiko Maus, Sven Schwarz, Andreas Dengel
  • for: 本研究旨在概述过去十年内我们部门对个人知识助手的自组织化研究,以及在不断发展的企业记忆中进行的应用。
  • methods: 本研究通常受到实际问题的启发,并在研究与业界合作伙伴的协作下进行。研究包括了不同的知识图构建方法在企业和个人设置下,以及Managed Forgetting和(自组织)Context Spaces作为一种新的个人信息管理(PIM)和知识工作支持的方法。
  • results: 过去实验和结果包括了许多不同的主题,如知识图构建、Managed Forgetting和Context Spaces等。此外,我们还提供了相关工作的概述和一些最新的发现,这些发现尚未发表。最后,我们给出了一个关于CoMem的详细描述,这是基于我们所提出的研究已经在生产中使用的一个企业记忆系统,以及这个系统在进一步研究中的挑战。
    Abstract This paper presents a retrospective overview of a decade of research in our department towards self-organizing personal knowledge assistants in evolving corporate memories. Our research is typically inspired by real-world problems and often conducted in interdisciplinary collaborations with research and industry partners. We summarize past experiments and results comprising topics like various ways of knowledge graph construction in corporate and personal settings, Managed Forgetting and (Self-organizing) Context Spaces as a novel approach to Personal Information Management (PIM) and knowledge work support. Past results are complemented by an overview of related work and some of our latest findings not published so far. Last, we give an overview of our related industry use cases including a detailed look into CoMem, a Corporate Memory based on our presented research already in productive use and providing challenges for further research. Many contributions are only first steps in new directions with still a lot of untapped potential, especially with regard to further increasing the automation in PIM and knowledge work support.
    摘要 中文翻译:本文提供了我们部门过去十年的研究回顾,探讨了自我组织人工智能在演化企业记忆中的个人知识助手。我们的研究通常受到实际问题的启发,并在研究和行业合作伙伴的协作下进行。我们summarize过去的实验和结果,包括企业和个人设置中知识图构建的多种方法,以及自动化Context Spaces和Managed Forgetting作为个人信息管理(PIM)和知识工作支持的新方法。过去的结果也包括相关工作的概述和一些没有发表过的最新发现。最后,我们给出了相关的行业应用场景,包括一个详细的CoMem企业记忆的概述,该记忆基于我们所提出的研究,已经在生产中使用,并提供了进一步研究的挑战。许多贡献都只是新的方向的第一步,特别是在进一步增加PIM和知识工作支持中的自动化。

Is GPT-4 a reliable rater? Evaluating Consistency in GPT-4 Text Ratings

  • paper_url: http://arxiv.org/abs/2308.02575
  • repo_url: None
  • paper_authors: Veronika Hackl, Alexandra Elena Müller, Michael Granitzer, Maximilian Sailer
  • for: 这个研究探究了OpenAI的GPT-4模型在多个迭代、时间尺度和语言风格变化下的反馈评分的一致性。
  • methods: 该研究使用GPT-4模型对高等教育(HE)领域的macroeconomics任务中的回答进行了内容和风格两个方面的评分。统计分析用于了解评分之间的一致性、迭代中评分的一致性以及内容和风格评分之间的相关性。
  • results: 结果显示GPT-4模型在不同的时间尺度下 exhibit 高的 между评分者一致性(ICC分数在0.94-0.99之间),表明该模型在重复提示下能够生成一致的评分。内容和风格评分之间存在0.87的高相关性。当应用不适用的风格时,内容评分保持相对定的,而风格评分下降,这表明LLM有效地在评价过程中分化内容和风格两个方面。
    Abstract This study investigates the consistency of feedback ratings generated by OpenAI's GPT-4, a state-of-the-art artificial intelligence language model, across multiple iterations, time spans and stylistic variations. The model rated responses to tasks within the Higher Education (HE) subject domain of macroeconomics in terms of their content and style. Statistical analysis was conducted in order to learn more about the interrater reliability, consistency of the ratings across iterations and the correlation between ratings in terms of content and style. The results revealed a high interrater reliability with ICC scores ranging between 0.94 and 0.99 for different timespans, suggesting that GPT-4 is capable of generating consistent ratings across repetitions with a clear prompt. Style and content ratings show a high correlation of 0.87. When applying a non-adequate style the average content ratings remained constant, while style ratings decreased, which indicates that the large language model (LLM) effectively distinguishes between these two criteria during evaluation. The prompt used in this study is furthermore presented and explained. Further research is necessary to assess the robustness and reliability of AI models in various use cases.
    摘要

Local Large Language Models for Complex Structured Medical Tasks

  • paper_url: http://arxiv.org/abs/2308.01727
  • repo_url: https://github.com/innovationcore/LocalLLMStructured
  • paper_authors: V. K. Cody Bumgardner, Aaron Mullen, Sam Armstrong, Caylin Hickey, Jeff Talbert
  • for: This paper aims to tackle complex, domain-specific tasks by combining the language reasoning capabilities of large language models (LLMs) with the benefits of local training.
  • methods: The proposed approach utilizes local LLMs, which can be fine-tuned to respond to specific generative instructions and provide structured outputs. The authors used a dataset of over 150k uncurated surgical pathology reports to train and evaluate different model architectures, including LLaMA, BERT, and LongFormer.
  • results: The results show that the LLaMA-based models significantly outperform BERT-style models across all evaluated metrics, especially with large datasets. The LLaMA models demonstrated their ability to handle complex, multi-label tasks, making them a promising approach for utilizing LLMs to perform domain-specific tasks using accessible hardware.
    Abstract This paper introduces an approach that combines the language reasoning capabilities of large language models (LLMs) with the benefits of local training to tackle complex, domain-specific tasks. Specifically, the authors demonstrate their approach by extracting structured condition codes from pathology reports. The proposed approach utilizes local LLMs, which can be fine-tuned to respond to specific generative instructions and provide structured outputs. The authors collected a dataset of over 150k uncurated surgical pathology reports, containing gross descriptions, final diagnoses, and condition codes. They trained different model architectures, including LLaMA, BERT and LongFormer and evaluated their performance. The results show that the LLaMA-based models significantly outperform BERT-style models across all evaluated metrics, even with extremely reduced precision. The LLaMA models performed especially well with large datasets, demonstrating their ability to handle complex, multi-label tasks. Overall, this work presents an effective approach for utilizing LLMs to perform domain-specific tasks using accessible hardware, with potential applications in the medical domain, where complex data extraction and classification are required.
    摘要

Bees Local Phase Quantization Feature Selection for RGB-D Facial Expressions Recognition

  • paper_url: http://arxiv.org/abs/2308.01700
  • repo_url: None
  • paper_authors: Seyed Muhammad Hossein Mousavi, Atiye Ilanloo
  • for: 本研究旨在提出一种基于灵感自然选择器的特征选择方法,并应用于人脸表情识别任务。
  • methods: 本研究使用了蜂群算法(BA)和本地相位量化(LPQ)来实现特征选择。LPQ是一种在频域中表现出色的特征,可以帮助提高人脸表情识别的准确率。
  • results: 研究结果显示,提案的蜂群LPQ方法在人脸表情识别任务中达到了99%的准确率,与其他方法相比,表现出了极好的性能。
    Abstract Feature selection could be defined as an optimization problem and solved by bio-inspired algorithms. Bees Algorithm (BA) shows decent performance in feature selection optimization tasks. On the other hand, Local Phase Quantization (LPQ) is a frequency domain feature which has excellent performance on Depth images. Here, after extracting LPQ features out of RGB (colour) and Depth images from the Iranian Kinect Face Database (IKFDB), the Bees feature selection algorithm applies to select the desired number of features for final classification tasks. IKFDB is recorded with Kinect sensor V.2 and contains colour and depth images for facial and facial micro-expressions recognition purposes. Here five facial expressions of Anger, Joy, Surprise, Disgust and Fear are used for final validation. The proposed Bees LPQ method is compared with Particle Swarm Optimization (PSO) LPQ, PCA LPQ, Lasso LPQ, and just LPQ features for classification tasks with Support Vector Machines (SVM), K-Nearest Neighbourhood (KNN), Shallow Neural Network and Ensemble Subspace KNN. Returned results, show a decent performance of the proposed algorithm (99 % accuracy) in comparison with others.
    摘要 feature选择可以定义为优化问题,并可以使用生物灵感算法解决。蜂群算法(BA)在特征选择优化任务中表现不错。另一方面,本地相对频率量化(LPQ)是一个频域特征,在深度影像上表现出色。在这里,我们将LPQ特征提取自RGB(色)和深度影像,然后使用蜂群特征选择算法选择最多的特征。IKFDB(伊朗掌握脸部数据库)是录制了Kinect感知器V.2的颜色和深度影像,并且用于脸部和脸部微表情识别。在这里,我们使用五种表情:愤怒、喜悦、惊讶、厌恶和恐惧进行最终验证。提案的蜂群LPQ方法与束缚点群集优化(PSO)LPQ、对角量变换(PCA)LPQ、lasso LPQ和单独LPQ特征进行比较,并使用支持向量机(SVM)、K-最近邻居(KNN)、浅层神经网和混合空间KNN进行类别任务。结果显示,提案的算法(99%准确)在比较之下表现不错。

LiDAR-Camera Panoptic Segmentation via Geometry-Consistent and Semantic-Aware Alignment

  • paper_url: http://arxiv.org/abs/2308.01686
  • repo_url: https://github.com/zhangzw12319/lcps
  • paper_authors: Zhiwei Zhang, Zhizhong Zhang, Qian Yu, Ran Yi, Yuan Xie, Lizhuang Ma
  • for: This paper is written for the task of 3D panoptic segmentation, which aims to simultaneously perform semantic segmentation and instance segmentation in a scene using both LiDAR and camera data.
  • methods: The proposed method, called LCPS, uses a three-stage fusion approach that includes an asynchronous compensation pixel alignment module, a semantic-aware region alignment module, and a point-to-voxel feature propagation module to fuse LiDAR and camera data.
  • results: The proposed method achieves an improvement of about 6.9% in PQ performance over the LiDAR-only baseline on the NuScenes dataset, demonstrating the effectiveness of the proposed fusion strategy.Here are the three points in Simplified Chinese text:
  • for: 这篇论文是为了解决3D照片检测问题,这个问题需要同时进行 semantic segmentation 和 instance segmentation,使用 LiDAR 和摄像头数据。
  • methods: 提议的方法是 LCPS,它使用三个阶段的融合方法,包括异步补做像素对齐模块、semantic-aware region alignment模块和点到 voxel 特征传播模块来融合 LiDAR 和摄像头数据。
  • results: 提议的方法在 NuScenes 数据集上实现了约6.9%的 PQ 性能提升,证明了提议的融合策略的有效性。
    Abstract 3D panoptic segmentation is a challenging perception task that requires both semantic segmentation and instance segmentation. In this task, we notice that images could provide rich texture, color, and discriminative information, which can complement LiDAR data for evident performance improvement, but their fusion remains a challenging problem. To this end, we propose LCPS, the first LiDAR-Camera Panoptic Segmentation network. In our approach, we conduct LiDAR-Camera fusion in three stages: 1) an Asynchronous Compensation Pixel Alignment (ACPA) module that calibrates the coordinate misalignment caused by asynchronous problems between sensors; 2) a Semantic-Aware Region Alignment (SARA) module that extends the one-to-one point-pixel mapping to one-to-many semantic relations; 3) a Point-to-Voxel feature Propagation (PVP) module that integrates both geometric and semantic fusion information for the entire point cloud. Our fusion strategy improves about 6.9% PQ performance over the LiDAR-only baseline on NuScenes dataset. Extensive quantitative and qualitative experiments further demonstrate the effectiveness of our novel framework. The code will be released at https://github.com/zhangzw12319/lcps.git.
    摘要 “3D短文本分割是一项具有挑战性的识别任务,它需要同时完成semantic segmentation和instance segmentation。在这种任务中,我们发现图像可以提供丰富的文本、颜色和特征信息,这些信息可以补做LiDAR数据,从而提高识别性能,但是这些信息的融合仍然是一个挑战。为此,我们提出了LCPS,首个LiDAR-Camera短文本分割网络。我们的方法包括三个阶段:1)异步补做像素均衡(ACPA)模块,用于解决摄像头和LiDAR仪器之间的坐标偏差问题; 2) semantic-aware区域匹配(SARA)模块,将一对一点像素映射扩展到一对多semantic关系; 3)点云特征传播(PVP)模块,将光栅和semantic融合信息传播到整个点云中。我们的融合策略提高了NuScenes数据集上LiDAR-only基准点Cloud的PQ性能表现约6.9%。广泛的量化和质量实验进一步证明了我们的新框架的有效性。代码将在https://github.com/zhangzw12319/lcps.git中发布。”

  • paper_url: http://arxiv.org/abs/2308.01682
  • repo_url: https://github.com/cborile/eval_lp_xai
  • paper_authors: Claudio Borile, Alan Perotti, André Panisson
  • for: 这篇论文主要用于提供链接预测模型的解释评价指标,以便促进链接预测模型的采用。
  • methods: 这篇论文使用了现状前景的解释方法,并对这些方法进行评价。
  • results: 研究发现,不同的距离选择方式可能会影响链接预测解释的质量。I hope that helps! Let me know if you have any other questions.
    Abstract Graph Machine Learning (GML) has numerous applications, such as node/graph classification and link prediction, in real-world domains. Providing human-understandable explanations for GML models is a challenging yet fundamental task to foster their adoption, but validating explanations for link prediction models has received little attention. In this paper, we provide quantitative metrics to assess the quality of link prediction explanations, with or without ground-truth. State-of-the-art explainability methods for Graph Neural Networks are evaluated using these metrics. We discuss how underlying assumptions and technical details specific to the link prediction task, such as the choice of distance between node embeddings, can influence the quality of the explanations.
    摘要 graph机器学习(GML)在实际领域有很多应用,如节点/图分类和链接预测。提供可理解的GML模型解释是推广其使用的挑战,但链接预测模型的解释 validation Received little attention。在这篇论文中,我们提供了量化的评价指标,以及或无ground-truth。现有的explainability方法 для图神经网络被使用这些指标进行评估。我们讨论了链接预测任务的下面假设和技术细节,如节点嵌入距离的选择,对解释质量产生的影响。

NBIAS: A Natural Language Processing Framework for Bias Identification in Text

  • paper_url: http://arxiv.org/abs/2308.01681
  • repo_url: None
  • paper_authors: Shaina Raza, Muskan Garg, Deepak John Reji, Syed Raza Bashir, Chen Ding
  • for: 这篇论文的目的是为了探讨文本数据中的偏见,并开发一个可以检测和消除这些偏见的框架。
  • methods: 这篇论文使用了一个基于Transformer的字词分类模型,并通过独特的名称实体来识别偏见字词/短语。
  • results: 这篇论文的结果显示,使用了提案的方法可以实现偏见检测和消除的目的,并且比基于基线的方法提高了1%至8%的精度。
    Abstract Bias in textual data can lead to skewed interpretations and outcomes when the data is used. These biases could perpetuate stereotypes, discrimination, or other forms of unfair treatment. An algorithm trained on biased data ends up making decisions that disproportionately impact a certain group of people. Therefore, it is crucial to detect and remove these biases to ensure the fair and ethical use of data. To this end, we develop a comprehensive and robust framework \textsc{Nbias} that consists of a data layer, corpus contruction, model development layer and an evaluation layer. The dataset is constructed by collecting diverse data from various fields, including social media, healthcare, and job hiring portals. As such, we applied a transformer-based token classification model that is able to identify bias words/ phrases through a unique named entity. In the assessment procedure, we incorporate a blend of quantitative and qualitative evaluations to gauge the effectiveness of our models. We achieve accuracy improvements ranging from 1% to 8% compared to baselines. We are also able to generate a robust understanding of the model functioning, capturing not only numerical data but also the quality and intricacies of its performance. The proposed approach is applicable to a variety of biases and contributes to the fair and ethical use of textual data.
    摘要 文本数据中的偏见可能导致解释和结果偏离均衡,这可能导致推荐或报告不公平或不公正。一个基于偏见数据的算法会导致对某些人群的决策偏离,从而产生不公平的结果。因此,检测和消除偏见是必要的,以确保数据的公平和道德使用。为此,我们开发了一个全面和可靠的框架\textsc{Nbias},它包括数据层、文本建构层、模型开发层和评估层。我们收集了来自不同领域的多样化数据,包括社交媒体、医疗和招聘门户,并应用了基于转换器的单词分类模型,以识别偏见词语/短语。在评估过程中,我们 combinated量化和质量评估来评估模型的效果。我们实现了基eline比较高的准确率改进,从1%到8%不等,并能够生成模型的精准和复杂的性能理解。这种方法可以应用于多种偏见,并为公平和道德的文本数据使用做出贡献。

Learning Implicit Entity-object Relations by Bidirectional Generative Alignment for Multimodal NER

  • paper_url: http://arxiv.org/abs/2308.02570
  • repo_url: None
  • paper_authors: Feng Chen, Jiajia Liu, Kaixiang Ji, Wang Ren, Jian Wang, Jingdong Wang
  • for: 提高多modalNamed Entity recognition(MNER)的性能,尤其是 bridging the semantic gap between text and image,以及匹配实体与其相关对象的匹配。
  • methods: 提出了一种bidirectional generative alignment方法(BGA-MNER),包括图像到文本和文本到图像的生成,以及对实体敏感内容的融合。该方法通过对双向重建目标进行 JOINT 优化,使实体-对象关系得到了正确的匹配。
  • results: 通过对两个 benchmark进行了广泛的实验,表明了我们的方法可以在无图像输入的情况下达到最佳性能。
    Abstract The challenge posed by multimodal named entity recognition (MNER) is mainly two-fold: (1) bridging the semantic gap between text and image and (2) matching the entity with its associated object in image. Existing methods fail to capture the implicit entity-object relations, due to the lack of corresponding annotation. In this paper, we propose a bidirectional generative alignment method named BGA-MNER to tackle these issues. Our BGA-MNER consists of \texttt{image2text} and \texttt{text2image} generation with respect to entity-salient content in two modalities. It jointly optimizes the bidirectional reconstruction objectives, leading to aligning the implicit entity-object relations under such direct and powerful constraints. Furthermore, image-text pairs usually contain unmatched components which are noisy for generation. A stage-refined context sampler is proposed to extract the matched cross-modal content for generation. Extensive experiments on two benchmarks demonstrate that our method achieves state-of-the-art performance without image input during inference.
    摘要

MARLIM: Multi-Agent Reinforcement Learning for Inventory Management

  • paper_url: http://arxiv.org/abs/2308.01649
  • repo_url: None
  • paper_authors: Rémi Leluc, Elie Kadoche, Antoine Bertoncello, Sébastien Gourvénec
  • for: 决策供应链中维持产品供应平衡,提高供应链效率和可靠性。
  • methods: 使用人工智能技术,开发了一种基于强化学习的供应链管理框架,named MARLIM,可以处理单一仓库多种产品的存储管理问题,采用单或多代理人协作方式。
  • results: 通过实验表明,基于强化学习方法的供应链管理方法比传统基eline方法更有利,可以更好地决策供应链中的产品供应问题。
    Abstract Maintaining a balance between the supply and demand of products by optimizing replenishment decisions is one of the most important challenges in the supply chain industry. This paper presents a novel reinforcement learning framework called MARLIM, to address the inventory management problem for a single-echelon multi-products supply chain with stochastic demands and lead-times. Within this context, controllers are developed through single or multiple agents in a cooperative setting. Numerical experiments on real data demonstrate the benefits of reinforcement learning methods over traditional baselines.
    摘要 维护产品供应和需求的平衡是供应链产业中最重要的挑战。本文提出了一个新的强化学习框架,名为MARLIM,以解决单一库存维护问题。在这个设定下,控制器通过单一或多个代理人在合作环境中发展。数据示出强化学习方法比传统基准更有价。

Improving Wind Resistance Performance of Cascaded PID Controlled Quadcopters using Residual Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2308.01648
  • repo_url: None
  • paper_authors: Yu Ishihara, Yuichi Hazama, Kousuke Suzuki, Jerry Jun Yokono, Kohtaro Sabe, Kenta Kawamoto
  • for: 控制四旋翼机器人在风干扰下维持位置的稳定性。
  • methods: 使用剩余强化学习方法建立四旋翼机器人风 resistance控制器,只学习干扰的差异,可以继续使用传统的缓冲PID控制器作为基础控制器,提高风干扰下的性能。
  • results: 通过多种实验,包括在风速大于13m/s的户外场景中,我们的控制器可以减少四旋翼机器人的位置偏差约50%,并且培养的控制器具有强健性,可以在四旋翼机器人的质量和推进器的扬力系数发生变化时保持性能。
    Abstract Wind resistance control is an essential feature for quadcopters to maintain their position to avoid deviation from target position and prevent collisions with obstacles. Conventionally, cascaded PID controller is used for the control of quadcopters for its simplicity and ease of tuning its parameters. However, it is weak against wind disturbances and the quadcopter can easily deviate from target position. In this work, we propose a residual reinforcement learning based approach to build a wind resistance controller of a quadcopter. By learning only the residual that compensates the disturbance, we can continue using the cascaded PID controller as the base controller of the quadcopter but improve its performance against wind disturbances. To avoid unexpected crashes and destructions of quadcopters, our method does not require real hardware for data collection and training. The controller is trained only on a simulator and directly applied to the target hardware without extra finetuning process. We demonstrate the effectiveness of our approach through various experiments including an experiment in an outdoor scene with wind speed greater than 13 m/s. Despite its simplicity, our controller reduces the position deviation by approximately 50% compared to the quadcopter controlled with the conventional cascaded PID controller. Furthermore, trained controller is robust and preserves its performance even though the quadcopter's mass and propeller's lift coefficient is changed between 50% to 150% from original training time.
    摘要 风 resistance 控制是 quadcopter 维持目标位置的重要特性,以避免偏离目标位置和避免遭遇障碍物的Collision。 Conventionally, cascaded PID controller 是用于 quadcopter 控制的最常用方法,因为它的简单性和参数调整的容易性。然而,它在风干扰下表现弱,quadcopter 容易偏离目标位置。在这项工作中,我们提出了基于 residual reinforcement learning 的风 resistance 控制方法。通过学习剩下的偏差,我们可以继续使用 cascaded PID controller 作为 quadcopter 的基础控制器,但是提高其对风干扰的性能。为了避免意外坠机和 quadcopter 的破坏,我们的方法不需要实际硬件数据采集和训练。控制器通过在 simulator 上学习,直接应用于目标硬件上,不需要额外的调整过程。我们通过多种实验,包括在风速大于 13 m/s 的户外场景中,证明了我们的方法的有效性。即使简单,我们的控制器可以降低 quadcopter 的位置偏离约 50%,相比于使用 conventinal cascaded PID controller。另外,训练后的控制器具有坚定性,其性能保持不变,even though quadcopter 的质量和推进器的扬力系数被变化了,从原始训练时间的50%到150%。

Interleaving GANs with knowledge graphs to support design creativity for book covers

  • paper_url: http://arxiv.org/abs/2308.01626
  • repo_url: https://github.com/alexmotogna/generatorapi
  • paper_authors: Alexandru Motogna, Adrian Groza
  • for: 本研究应用生成对抗网络(GANs)到书籍封面领域,以不同的训练方法来提取更好的生成图像。
  • methods: 本研究使用GANs与知识图进行混合训练,通过对标题进行修改来生成多个可能的选择,然后使用识别器进行选择最佳的生成图像。
  • results: 本研究比前一些尝试更好地生成书籍封面,而知识图产生更多的选择,为书籍作者或编辑提供了更好的选择。
    Abstract An attractive book cover is important for the success of a book. In this paper, we apply Generative Adversarial Networks (GANs) to the book covers domain, using different methods for training in order to obtain better generated images. We interleave GANs with knowledge graphs to alter the input title to obtain multiple possible options for any given title, which are then used as an augmented input to the generator. Finally, we use the discriminator obtained during the training phase to select the best images generated with new titles. Our method performed better at generating book covers than previous attempts, and the knowledge graph gives better options to the book author or editor compared to using GANs alone.
    摘要 一本漂亮的书封面对书的成功非常重要。在这篇论文中,我们使用生成对抗网络(GANs)来适应书封面领域,使用不同的训练方法来获得更好的生成图像。我们将GANs与知识图加以混合,将输入标题进行修改,以获得多个可能的选择,然后将这些选择作为增强输入给生成器。最后,我们使用训练阶段获得的推定器来选择最佳的生成图像。我们的方法比之前的尝试更好地生成书封面,而知识图可以为书作者或编辑提供更多的选择,比如使用GANsalone。

Multimodal Indoor Localisation in Parkinson’s Disease for Detecting Medication Use: Observational Pilot Study in a Free-Living Setting

  • paper_url: http://arxiv.org/abs/2308.02419
  • repo_url: https://github.com/ferdianjovan/Multihead-Dual-Convolutional-Self-Attention
  • paper_authors: Ferdian Jovan, Catherine Morgan, Ryan McConville, Emma L. Tonkin, Ian Craddock, Alan Whone
  • for: 这个研究的目的是提高当前indoor localization方法的效果,并使用双感知模式( Received Signal Strength Indicator 和加速器数据)来评估PD患者的 дви作障碍。
  • methods: 这个研究使用了一种基于转换器的方法,利用RSSI和加速器数据来提供 complementary views of movement。
  • results: 研究表明,该方法可以高效地进行indoor localization,并且可以准确地捕捉PD患者的 дви作障碍。具体来说,研究发现,精准的房间级别的地理位置预测,可以准确地预测PD患者是否正在服用levodopa药物。
    Abstract Parkinson's disease (PD) is a slowly progressive, debilitating neurodegenerative disease which causes motor symptoms including gait dysfunction. Motor fluctuations are alterations between periods with a positive response to levodopa therapy ("on") and periods marked by re-emergency of PD symptoms ("off") as the response to medication wears off. These fluctuations often affect gait speed and they increase in their disabling impact as PD progresses. To improve the effectiveness of current indoor localisation methods, a transformer-based approach utilising dual modalities which provide complementary views of movement, Received Signal Strength Indicator (RSSI) and accelerometer data from wearable devices, is proposed. A sub-objective aims to evaluate whether indoor localisation, including its in-home gait speed features (i.e. the time taken to walk between rooms), could be used to evaluate motor fluctuations by detecting whether the person with PD is taking levodopa medications or withholding them. To properly evaluate our proposed method, we use a free-living dataset where the movements and mobility are greatly varied and unstructured as expected in real-world conditions. 24 participants lived in pairs (consisting of one person with PD, one control) for five days in a smart home with various sensors. Our evaluation on the resulting dataset demonstrates that our proposed network outperforms other methods for indoor localisation. The sub-objective evaluation shows that precise room-level localisation predictions, transformed into in-home gait speed features, produce accurate predictions on whether the PD participant is taking or withholding their medications.
    摘要 帕金森病 (PD) 是一种慢慢恶化、疲劳性神经退化病种,引起 дви作Symptoms 包括行走功能障碍。 дви作周期性变化是指由levodopa治疗 ("on") 和PD症状重新出现 ("off") 的交替,这些变化通常对行走速度产生影响,随着PD的进行而加剧。为了提高现有indoor localization方法的有效性,一种基于 transformer 的方法,利用 dual Modalities 提供 complementary 视图的运动数据,包括 Received Signal Strength Indicator (RSSI) 和加速器数据,是提出的。一个 sub-objective 是评估 whether indoor localization, 包括室内步速特征 (i.e. 行走 между房间所需时间), 可以用来评估PD参与者是否服用 levodopa 药物。为了正确评估我们的提议方法,我们使用了一个免费生活数据集,其中参与者的运动和 mobilty 具有很大的变化和不结构化,如预计的实际条件中。24名参与者(一名PD参与者和一名控制人)在五天内生活在一个智能家庭中,并装备了多种感知器。我们对 resulting dataset 进行评估,并显示我们的提议网络在indoor localization方面的表现比其他方法更好。 sub-objective 评估表明,精准的室内地理位置预测,经 transformed 为室内步速特征,可以准确地预测PD参与者是否服用 levodopa 药物。

ReIDTrack: Multi-Object Track and Segmentation Without Motion

  • paper_url: http://arxiv.org/abs/2308.01622
  • repo_url: None
  • paper_authors: Kaer Huang, Bingchuan Sun, Feng Chen, Tao Zhang, Jun Xie, Jian Li, Christopher Walter Twombly, Zhepeng Wang
  • For: The paper focuses on exploring the direction of achieving state-of-the-art (SOTA) performance in multi-object tracking and segmentation (MOTS) using only high-performance detection and appearance models, without relying on motion information and IoU mapping during association.* Methods: The proposed method uses CBNetV2 as a detection model and MoCo-v2 as a self-supervised appearance model, and removes motion information and IoU mapping during the association process.* Results: The method achieved 1st place on the MOTS track and 2nd place on the MOT track in the CVPR2023 WAD workshop, demonstrating its effectiveness and simplicity.Here are the three points in Simplified Chinese text:* For: 本研究主要目标是通过高性能的检测和外观模型来实现多bject tracking和分割(MOTS)中的新状态之冠(SOTA)性能,不需要在协调过程中使用运动信息和IOU映射。* Methods: 提posed方法使用 CBNetV2 作为检测模型,MoCo-v2 作为自我指导的外观模型,并在协调过程中除掉运动信息和IOU映射。* Results: 方法在 CVPR2023 WAD 工作坊中获得了 MOTS track 第一名和 MOT track 第二名,证明了其简洁效果。
    Abstract In recent years, dominant Multi-object tracking (MOT) and segmentation (MOTS) methods mainly follow the tracking-by-detection paradigm. Transformer-based end-to-end (E2E) solutions bring some ideas to MOT and MOTS, but they cannot achieve a new state-of-the-art (SOTA) performance in major MOT and MOTS benchmarks. Detection and association are two main modules of the tracking-by-detection paradigm. Association techniques mainly depend on the combination of motion and appearance information. As deep learning has been recently developed, the performance of the detection and appearance model is rapidly improved. These trends made us consider whether we can achieve SOTA based on only high-performance detection and appearance model. Our paper mainly focuses on exploring this direction based on CBNetV2 with Swin-B as a detection model and MoCo-v2 as a self-supervised appearance model. Motion information and IoU mapping were removed during the association. Our method wins 1st place on the MOTS track and wins 2nd on the MOT track in the CVPR2023 WAD workshop. We hope our simple and effective method can give some insights to the MOT and MOTS research community. Source code will be released under this git repository
    摘要 近年来,主流多目标跟踪(MOT)和多目标分割(MOTS)方法主要采用跟踪检测 paradigm。基于 transformer 的端到端(E2E)解决方案在 MOT 和 MOTS 中带来了一些想法,但它们无法达到主要 MOT 和 MOTS benchmark 中的新状态前景(SOTA)性能。检测和归一化是跟踪检测 paradigm 的两个主要模块。归一化技术主要基于运动和外观信息的组合。随着深度学习的发展,检测和外观模型的性能得到了迅速提高。这些趋势使我们考虑了是否可以基于高性能的检测和外观模型达到 SOTA。我们的论文主要关注 exploring 这一方向,使用 CBNetV2 作为检测模型和 MoCo-v2 作为自主监督的外观模型。在归一化过程中,我们移除了运动信息和 IoU 映射。我们的方法在 CVPR2023 WAD 工作坊的 MOTS 轨道上获得了第一名,在 MOT 轨道上获得了第二名。我们希望我们的简单而有效的方法可以给 MOT 和 MOTS 研究社区提供一些想法。源代码将在这个 Git 仓库中发布。

Assessing Systematic Weaknesses of DNNs using Counterfactuals

  • paper_url: http://arxiv.org/abs/2308.01614
  • repo_url: None
  • paper_authors: Sujan Sai Gannamaneni, Michael Mock, Maram Akila
  • for: 这篇论文主要是为了探讨深度神经网络(DNN)在安全敏感应用中的测试方法,以及寻找和识别这些模型在特定输入空间中的系统性弱点。
  • methods: 这篇论文提出了一种基于对应推理的算法,用于验证已知subset的内在属性对模型性能的影响。这种方法通过模拟出不同属性之间的互动,以确定内在属性是否对模型性能的恶化做出了贡献。
  • results: 根据这篇论文的结果,在自动驾驶领域中的一个semantic segmentation模型中,发现存在不同的人工资产之间的性能差异,但是只有在某些情况下,资产类型本身是对模型性能的恶化原因。
    Abstract With the advancement of DNNs into safety-critical applications, testing approaches for such models have gained more attention. A current direction is the search for and identification of systematic weaknesses that put safety assumptions based on average performance values at risk. Such weaknesses can take on the form of (semantically coherent) subsets or areas in the input space where a DNN performs systematically worse than its expected average. However, it is non-trivial to attribute the reason for such observed low performances to the specific semantic features that describe the subset. For instance, inhomogeneities within the data w.r.t. other (non-considered) attributes might distort results. However, taking into account all (available) attributes and their interaction is often computationally highly expensive. Inspired by counterfactual explanations, we propose an effective and computationally cheap algorithm to validate the semantic attribution of existing subsets, i.e., to check whether the identified attribute is likely to have caused the degraded performance. We demonstrate this approach on an example from the autonomous driving domain using highly annotated simulated data, where we show for a semantic segmentation model that (i) performance differences among the different pedestrian assets exist, but (ii) only in some cases is the asset type itself the reason for this reduction in the performance.
    摘要 Inspired by counterfactual explanations, we propose an efficient and computationally inexpensive algorithm to validate the semantic attribution of existing subsets. We demonstrate this approach on an example from the autonomous driving domain using highly annotated simulated data. Our results show that while performance differences exist among different pedestrian assets, the asset type itself is not always the reason for the reduction in performance.

Discriminative Graph-level Anomaly Detection via Dual-students-teacher Model

  • paper_url: http://arxiv.org/abs/2308.01947
  • repo_url: https://github.com/whb605/gladst
  • paper_authors: Fu Lin, Xuexiong Luo, Jia Wu, Jian Yang, Shan Xue, Zitong Wang, Haonan Gong
  • for: 本文目的是为了检测图像中的图形异常,而不是 traditional node-level anomaly detection 任务中的节点异常检测。
  • methods: 本文提出了一种新的图形异常检测方法,包括定义图像中异常信息、采用节点级和图像级信息差异来识别异常图像,以及使用两个学生模型和一个导师模型来分别学习正常和异常图像的表示。
  • results: 经过广泛的实验分析,本文的方法在实际世界图像 dataset 上显示了效果,能够准确检测图像中的异常图形。
    Abstract Different from the current node-level anomaly detection task, the goal of graph-level anomaly detection is to find abnormal graphs that significantly differ from others in a graph set. Due to the scarcity of research on the work of graph-level anomaly detection, the detailed description of graph-level anomaly is insufficient. Furthermore, existing works focus on capturing anomalous graph information to learn better graph representations, but they ignore the importance of an effective anomaly score function for evaluating abnormal graphs. Thus, in this work, we first define anomalous graph information including node and graph property anomalies in a graph set and adopt node-level and graph-level information differences to identify them, respectively. Then, we introduce a discriminative graph-level anomaly detection framework with dual-students-teacher model, where the teacher model with a heuristic loss are trained to make graph representations more divergent. Then, two competing student models trained by normal and abnormal graphs respectively fit graph representations of the teacher model in terms of node-level and graph-level representation perspectives. Finally, we combine representation errors between two student models to discriminatively distinguish anomalous graphs. Extensive experiment analysis demonstrates that our method is effective for the graph-level anomaly detection task on graph datasets in the real world.
    摘要 不同于当前节点级异常检测任务,graph-level异常检测的目标是找到图集中异常的图,并且与其他图集中的图进行比较。由于关于graph-level异常检测的研究缺乏,graph-level异常的详细描述不够。此外,现有的工作都是捕捉异常图信息来学习更好的图表示,但它们忽略了效果的异常分数函数的重要性。因此,在这个工作中,我们首先定义图集中的异常图信息,包括节点和图属性异常,并采用节点级和图级信息差来识别它们。然后,我们提出了一种有效的图级异常检测框架,基于双学生-教师模型,其中教师模型通过规则损失来训练图表示更加分散。然后,两个竞争的学生模型,分别使用正常和异常图来训练教师模型的图表示,并在节点级和图级表示视角下进行匹配。最后,我们将两个学生模型的表示错误相加,以分别地区分异常图。我们的方法在实际世界上的图据集上进行了广泛的实验分析,并证明其效果。

DOLCE: A Descriptive Ontology for Linguistic and Cognitive Engineering

  • paper_url: http://arxiv.org/abs/2308.01597
  • repo_url: None
  • paper_authors: Stefano Borgo, Roberta Ferrario, Aldo Gangemi, Nicola Guarino, Claudio Masolo, Daniele Porello, Emilio M. Sanfilippo, Laure Vieu
  • for: dolce是一个基础 Ontology,用于提供一致的世界观,并且可以将各个领域知识融合到一起。
  • methods: DOLCE是基于 cognitive和语言考虑的,并且遵循了一些哲学原则和已Established ontological方法,例如OntoClean。
  • results: DOLCE在过去二十年中保持稳定,并且启发了大多数现有的顶层 ontologies,并且用于发展或改善标准和公共项目资源(例如CIDOC CRM、DBpedia和WordNet)。
    Abstract DOLCE, the first top-level (foundational) ontology to be axiomatized, has remained stable for twenty years and today is broadly used in a variety of domains. DOLCE is inspired by cognitive and linguistic considerations and aims to model a commonsense view of reality, like the one human beings exploit in everyday life in areas as diverse as socio-technical systems, manufacturing, financial transactions and cultural heritage. DOLCE clearly lists the ontological choices it is based upon, relies on philosophical principles, is richly formalized, and is built according to well-established ontological methodologies, e.g. OntoClean. Because of these features, it has inspired most of the existing top-level ontologies and has been used to develop or improve standards and public domain resources (e.g. CIDOC CRM, DBpedia and WordNet). Being a foundational ontology, DOLCE is not directly concerned with domain knowledge. Its purpose is to provide the general categories and relations needed to give a coherent view of reality, to integrate domain knowledge, and to mediate across domains. In these 20 years DOLCE has shown that applied ontologies can be stable and that interoperability across reference and domain ontologies is a reality. This paper briefly introduces the ontology and shows how to use it on a few modeling cases.
    摘要 DOLCE,第一个基础 Ontology(基础 ontology),已经稳定了二十年了,今天在多个领域都广泛使用。DOLCE 受到了认知和语言考虑的影响,旨在模型人类日常生活中的常识视角,如社技系统、制造、财务交易和文化遗产等领域。DOLCE 明确列出了其 ontological 选择基础,依靠哲学原则,富有形式化,按照良好的 ontological 方法ologies(如 OntoClean)建立。由于这些特点,它已经影响了大多数现有的基础 ontologies,并用于开发或改进标准和公共领域资源(如 CIDOC CRM、DBpedia 和 WordNet)。作为基础 ontology,DOLCE 不直接关心域知识。其目的是提供一个一致的视角,整合域知识,并在域之间媒介。在过去二十年中,DOLCE 表明了应用 ontologies 可以稳定,并且域 Ontology 之间的可操作性是现实。这篇文章简要介绍了 ontology,并示例了一些模型案例。

Holy Grail 2.0: From Natural Language to Constraint Models

  • paper_url: http://arxiv.org/abs/2308.01589
  • repo_url: None
  • paper_authors: Dimos Tsouros, Hélène Verhaeghe, Serdar Kadıoğlu, Tias Guns
  • for: 本研究旨在探讨使用预训练的大语言模型提取模型从文本问题描述中的可能性,以便提高Constraint Programming(CP)的普及率。
  • methods: 本研究使用了一种基于分解的提示方法,通过与GPT模型进行交互来提取模型。
  • results: 初步结果表明,这种方法可以帮助提取出高质量的模型,并且可以减少CP用户需要具备的专业知识和技能。
    Abstract Twenty-seven years ago, E. Freuder highlighted that "Constraint programming represents one of the closest approaches computer science has yet made to the Holy Grail of programming: the user states the problem, the computer solves it". Nowadays, CP users have great modeling tools available (like Minizinc and CPMpy), allowing them to formulate the problem and then let a solver do the rest of the job, getting closer to the stated goal. However, this still requires the CP user to know the formalism and respect it. Another significant challenge lies in the expertise required to effectively model combinatorial problems. All this limits the wider adoption of CP. In this position paper, we investigate a possible approach to leverage pre-trained Large Language Models to extract models from textual problem descriptions. More specifically, we take inspiration from the Natural Language Processing for Optimization (NL4OPT) challenge and present early results with a decomposition-based prompting approach to GPT Models.
    摘要 27年前,E. Freuder指出了“约束编程代表计算机科学最接近圣杯编程的尝试:用户定义问题,计算机解决”。目前,CP用户有了优秀的模型化工具(如Minizinc和CPMpy),可以将问题形式化并让解决器完成其余的任务,从而更接近目标。然而,这还需要CP用户了解正式语法,并且对 combinatorial 问题的模型化技巧具有专业知识。这些限制了CP更广泛的应用。在这篇Position paper中,我们investigate一种可能的方法,使用预训练的大型自然语言模型提取问题描述中的模型。更 Specifically,我们从Natural Language Processing for Optimization(NL4OPT)挑战中得到灵感,并使用分解基于Prompting Approach来应用GPT模型。

SoK: Assessing the State of Applied Federated Machine Learning

  • paper_url: http://arxiv.org/abs/2308.02454
  • repo_url: None
  • paper_authors: Tobias Müller, Maximilian Stäbler, Hugo Gascón, Frank Köster, Florian Matthes
  • for: This paper aims to explore the current state of applied Federated Machine Learning (FedML) and identify the challenges hindering its practical adoption.
  • methods: The paper uses a comprehensive systematic literature review to assess 74 relevant papers and analyze the real-world applicability of FedML, including its characteristics and emerging trends, motivational drivers, and application domains.
  • results: The paper identifies the challenges encountered in integrating FedML into real-life settings, providing insights that contribute to the further development and implementation of FedML in privacy-critical scenarios.
    Abstract Machine Learning (ML) has shown significant potential in various applications; however, its adoption in privacy-critical domains has been limited due to concerns about data privacy. A promising solution to this issue is Federated Machine Learning (FedML), a model-to-data approach that prioritizes data privacy. By enabling ML algorithms to be applied directly to distributed data sources without sharing raw data, FedML offers enhanced privacy protections, making it suitable for privacy-critical environments. Despite its theoretical benefits, FedML has not seen widespread practical implementation. This study aims to explore the current state of applied FedML and identify the challenges hindering its practical adoption. Through a comprehensive systematic literature review, we assess 74 relevant papers to analyze the real-world applicability of FedML. Our analysis focuses on the characteristics and emerging trends of FedML implementations, as well as the motivational drivers and application domains. We also discuss the encountered challenges in integrating FedML into real-life settings. By shedding light on the existing landscape and potential obstacles, this research contributes to the further development and implementation of FedML in privacy-critical scenarios.
    摘要

Unsupervised Representation Learning for Time Series: A Review

  • paper_url: http://arxiv.org/abs/2308.01578
  • repo_url: https://github.com/mqwfrog/ults
  • paper_authors: Qianwen Meng, Hangwei Qian, Yong Liu, Yonghui Xu, Zhiqi Shen, Lizhen Cui
  • for: 本研究旨在系统地分析无监督表示学习方法,尤其是在时序数据上。
  • methods: 本文使用了多种无监督表示学习方法,包括对比学习、自适应 represencing 等。
  • results: 经验证明,对比学习方法在9个真实世界数据集上的表现很出色,而且可以快速实现和统一评估不同模型。
    Abstract Unsupervised representation learning approaches aim to learn discriminative feature representations from unlabeled data, without the requirement of annotating every sample. Enabling unsupervised representation learning is extremely crucial for time series data, due to its unique annotation bottleneck caused by its complex characteristics and lack of visual cues compared with other data modalities. In recent years, unsupervised representation learning techniques have advanced rapidly in various domains. However, there is a lack of systematic analysis of unsupervised representation learning approaches for time series. To fill the gap, we conduct a comprehensive literature review of existing rapidly evolving unsupervised representation learning approaches for time series. Moreover, we also develop a unified and standardized library, named ULTS (i.e., Unsupervised Learning for Time Series), to facilitate fast implementations and unified evaluations on various models. With ULTS, we empirically evaluate state-of-the-art approaches, especially the rapidly evolving contrastive learning methods, on 9 diverse real-world datasets. We further discuss practical considerations as well as open research challenges on unsupervised representation learning for time series to facilitate future research in this field.
    摘要 <> translate into Simplified Chinese无监督表示学习方法target=blank>target=blank aim to learn discriminative feature representations from unlabeled data, without the requirement of annotating every sample. enable unsupervised representation learning is extremely crucial for time series data, due to its unique annotation bottleneck caused by its complex characteristics and lack of visual cues compared with other data modalities. In recent years, unsupervised representation learning techniques have advanced rapidly in various domains. However, there is a lack of systematic analysis of unsupervised representation learning approaches for time series. To fill the gap, we conduct a comprehensive literature review of existing rapidly evolving unsupervised representation learning approaches for time series. Moreover, we also develop a unified and standardized library, named ULTS (i.e., Unsupervised Learning for Time Series), to facilitate fast implementations and unified evaluations on various models. With ULTS, we empirically evaluate state-of-the-art approaches, especially the rapidly evolving contrastive learning methods, on 9 diverse real-world datasets. We further discuss practical considerations as well as open research challenges on unsupervised representation learning for time series to facilitate future research in this field.Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore.

SimTeG: A Frustratingly Simple Approach Improves Textual Graph Learning

  • paper_url: http://arxiv.org/abs/2308.02565
  • repo_url: https://github.com/vermouthdky/simteg
  • paper_authors: Keyu Duan, Qian Liu, Tat-Seng Chua, Shuicheng Yan, Wei Tsang Ooi, Qizhe Xie, Junxian He
  • for: 这个论文主要是为了提高文本图学习的效果,特别是在文本图 Representation learning 阶段。
  • methods: 这个论文使用了一种简单的方法,即在已经预训练的语言模型(LM)上进行超参数efficient fine-tuning(PEFT),然后使用最后的隐藏状态来生成节点嵌入。
  • results: 这个论文的实验结果表明,使用这种方法可以大幅提高多种图神经网络(GNN)在多个图 benchmark 上的表现。
    Abstract Textual graphs (TGs) are graphs whose nodes correspond to text (sentences or documents), which are widely prevalent. The representation learning of TGs involves two stages: (i) unsupervised feature extraction and (ii) supervised graph representation learning. In recent years, extensive efforts have been devoted to the latter stage, where Graph Neural Networks (GNNs) have dominated. However, the former stage for most existing graph benchmarks still relies on traditional feature engineering techniques. More recently, with the rapid development of language models (LMs), researchers have focused on leveraging LMs to facilitate the learning of TGs, either by jointly training them in a computationally intensive framework (merging the two stages), or designing complex self-supervised training tasks for feature extraction (enhancing the first stage). In this work, we present SimTeG, a frustratingly Simple approach for Textual Graph learning that does not innovate in frameworks, models, and tasks. Instead, we first perform supervised parameter-efficient fine-tuning (PEFT) on a pre-trained LM on the downstream task, such as node classification. We then generate node embeddings using the last hidden states of finetuned LM. These derived features can be further utilized by any GNN for training on the same task. We evaluate our approach on two fundamental graph representation learning tasks: node classification and link prediction. Through extensive experiments, we show that our approach significantly improves the performance of various GNNs on multiple graph benchmarks.
    摘要 文本图(TG)是一种广泛存在的图,其节点对应于文本(句子或文档)。图像学习TG的过程包括两个阶段:(i)不监督的特征提取和(ii)监督图像学习。在过去几年中,大量精力被投入到后一个阶段中,而GNNs(图像神经网络)在这个阶段中占据了主导地位。然而,前一个阶段的大多数现有的图标准 benchmark仍然采用传统的特征工程技术。随着语言模型(LM)的快速发展,研究人员开始利用LM来促进TG的学习,例如通过在计算昂贵的框架中同时训练LM和TG(合并两个阶段),或者设计复杂的自我超视图任务来提高特征提取(增强第一个阶段)。在这种情况下,我们提出了SimTeG,一种简单而惊人的文本图学习方法。我们首先在预训练LM上进行监督参数效率提升(PEFT),然后生成节点嵌入使用预训练LM的最后隐藏状态。这些 derivated 特征可以被任何GNN用于训练同一个任务。我们对两个基本的图表示学习任务进行了广泛的实验:节点类别化和链接预测。通过广泛的实验,我们显示了我们的方法可以在多个图标准 benchmark上提高多种GNN的表现。

Motion Planning Diffusion: Learning and Planning of Robot Motions with Diffusion Models

  • paper_url: http://arxiv.org/abs/2308.01557
  • repo_url: None
  • paper_authors: Joao Carvalho, An T. Le, Mark Baierl, Dorothea Koert, Jan Peters
  • for: 加速机器人动作规划优化的学习先骨
  • methods: 使用扩散模型作为先骨,直接从 posterior 动作分布中采样,并利用扩散模型的逆噪减法来编码动作数据的多模性
  • results: 在 simulate 的平面机器人和 7-DOF 机器人 manipulate 环境中,与基准方法进行比较,表明扩散模型是高维动作分布的强大先骨,能够Encoding 高维动作数据的多模性。
    Abstract Learning priors on trajectory distributions can help accelerate robot motion planning optimization. Given previously successful plans, learning trajectory generative models as priors for a new planning problem is highly desirable. Prior works propose several ways on utilizing this prior to bootstrapping the motion planning problem. Either sampling the prior for initializations or using the prior distribution in a maximum-a-posterior formulation for trajectory optimization. In this work, we propose learning diffusion models as priors. We then can sample directly from the posterior trajectory distribution conditioned on task goals, by leveraging the inverse denoising process of diffusion models. Furthermore, diffusion has been recently shown to effectively encode data multimodality in high-dimensional settings, which is particularly well-suited for large trajectory dataset. To demonstrate our method efficacy, we compare our proposed method - Motion Planning Diffusion - against several baselines in simulated planar robot and 7-dof robot arm manipulator environments. To assess the generalization capabilities of our method, we test it in environments with previously unseen obstacles. Our experiments show that diffusion models are strong priors to encode high-dimensional trajectory distributions of robot motions.
    摘要 学习轨迹分布可以帮助加速机器人运动规划优化。给出过去成功的计划,使用轨迹生成模型作为优化问题的先验分布是非常感兴趣的。先前的工作提出了多种使用这种先验来启动运动规划问题的方法。可以从先验中随机取样,或者使用先验分布来形式化最大 posterior 式对 trajectory 进行优化。在这种工作中,我们提议使用扩散模型来学习先验。我们可以通过扩散模型的逆干扰过程直接从任务目标条件下样本 posterior 轨迹分布。此外,扩散模型最近在高维设定中能够有效地编码数据多模性,这 particulary 适合大轨迹数据集。为了证明我们的方法效果,我们与多个基eline 进行比较,在 simulated 平面机器人和 7-DOF 机器人 manipulator 环境中进行测试。为了评估我们的方法泛化能力,我们在不同的障碍物环境中进行测试。我们的实验显示,扩散模型是高维轨迹分布的强大先验。

A Global Transport Capacity Risk Prediction Method for Rail Transit Based on Gaussian Bayesian Network

  • paper_url: http://arxiv.org/abs/2308.01556
  • repo_url: None
  • paper_authors: Zhang Zhengyang, Dong Wei, Liu jun, Sun Xinya, Ji Yindong
  • for: 预测铁路公共交通网络运输容量风险,即铁路交通网络的负载与乘客流量的差异。
  • methods: 使用线性 Gaussian Bayesian 网络来解释预测模型,并基于铁路交通网络的三层结构(包括铁路交通网络、车流和乘客流)获取预测模型的训练数据。
  • results: 通过 simulate 例子验证了提议的方法的有效性。I hope that helps! Let me know if you have any other questions.
    Abstract Aiming at the prediction problem of transport capacity risk caused by the mismatch between the carrying capacity of rail transit network and passenger flow demand, this paper proposes an explainable prediction method of rail transit network transport capacity risk based on linear Gaussian Bayesian network. This method obtains the training data of the prediction model based on the simulation model of the rail transit system with a three-layer structure including rail transit network, train flow and passenger flow. A Bayesian network structure construction method based on the topology of the rail transit network is proposed, and the MLE (Maximum Likelihood Estimation) method is used to realize the parameter learning of the Bayesian network. Finally, the effectiveness of the proposed method is verified by simulation examples.
    摘要 目标是预测由铁路 транспорт网络载客量和乘客流量差异引起的运输 capacidad 风险的预测方法,本文提出了可解释的预测方法基于线性 Gaussian Bayesian 网络。该方法通过基于铁路 транспорт网络的模拟模型,包括铁路网络、车流和乘客流,来获取预测模型的训练数据。提出了基于铁路网络 topology 的 Bayesian 网络结构建构方法,并使用 MLE (最大likelihood估计) 方法实现参数学习。最后,通过模拟例子验证了该方法的效果。Here's the translation of the text into Traditional Chinese:目的是预测由铁路 транспорт网络载客量和乘客流量差异引起的运输 capacidad 风险的预测方法,本文提出了可解释的预测方法基于线性 Gaussian Bayesian 网络。该方法通过基于铁路 транспорт网络的模拟模型,包括铁路网络、车流和乘客流,来获取预测模型的训练数据。提出了基于铁路网络 topology 的 Bayesian 网络结构建构方法,并使用 MLE (最大likelihood估计) 方法实现参数学习。最后,通过模拟例子验证了该方法的效果。

InterAct: Exploring the Potentials of ChatGPT as a Cooperative Agent

  • paper_url: http://arxiv.org/abs/2308.01552
  • repo_url: None
  • paper_authors: Po-Lin Chen, Cheng-Shang Chang
  • for: 这个论文探讨了将OpenAI的ChatGPT集成到embodied agent系统中,以评估它对交互决策benchmark的影响。
  • methods: 我们引入了InterAct方法,将ChatGPT fed with varied prompts,分配它多个角色,如检查员和排序员,然后与原始语言模型结合。
  • results: 我们的研究表明,在AlfWorld中,ChatGPT的表现率达98%,包括6个不同任务在模拟家庭环境中,这显示了ChatGPT在真实世界设置下处理复杂任务的能力,并为任务规划铺平了道路。
    Abstract This research paper delves into the integration of OpenAI's ChatGPT into embodied agent systems, evaluating its influence on interactive decision-making benchmark. Drawing a parallel to the concept of people assuming roles according to their unique strengths, we introduce InterAct. In this approach, we feed ChatGPT with varied prompts, assigning it a numerous roles like a checker and a sorter, then integrating them with the original language model. Our research shows a remarkable success rate of 98% in AlfWorld, which consists of 6 different tasks in a simulated household environment, emphasizing the significance of proficient prompt engineering. The results highlight ChatGPT's competence in comprehending and performing intricate tasks effectively in real-world settings, thus paving the way for further advancements in task planning.
    摘要 这份研究论文探讨了OpenAI的ChatGPT在具有体系中的整合,评估其对交互决策 bencmark 的影响。我们引入InterAct方法,在不同的任务中 feed ChatGPT 不同的提示,让它扮演多种角色,如检查员和排序员,然后将其与原始语言模型集成。我们的研究发现,在AlfWorld中的6个任务中,ChatGPT 的成功率达98%,这些任务是在模拟的家庭环境中进行的,这 demonstartes ChatGPT 在实际世界中完成复杂任务的能力,从而铺平了进一步发展任务规划的道路。

Avoidance Navigation Based on Offline Pre-Training Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2308.01551
  • repo_url: None
  • paper_authors: Yang Wenkai Ji Ruihang Zhang Yuxiang Lei Hao, Zhao Zijie
  • for: 这个论文旨在适用深度强化学习(DRL)技术,帮助无地图移动机器人进行避免导航。
  • methods: 论文提出了一种在未知环境中使用 raw 感知数据与控制变量的映射,以及一种高效的离线培训策略,以加速扫描阶段的随机探索。 论文还收集了一个通用的数据集,包括专家经验,以便用于其他导航培训工作。
  • results: 论文表明,采用预训练和优先级专家经验可以减少80%的培训时间,并且可以提高DRL奖励的2倍。这些方法还被证明可以在真实环境中实现无碰撞导航。论文还证明了这种DRL模型在不同环境中具有通用的普适性。
    Abstract This paper presents a Pre-Training Deep Reinforcement Learning(DRL) for avoidance navigation without map for mobile robots which map raw sensor data to control variable and navigate in an unknown environment. The efficient offline training strategy is proposed to speed up the inefficient random explorations in early stage and we also collect a universal dataset including expert experience for offline training, which is of some significance for other navigation training work. The pre-training and prioritized expert experience are proposed to reduce 80\% training time and has been verified to improve the 2 times reward of DRL. The advanced simulation gazebo with real physical modelling and dynamic equations reduce the gap between sim-to-real. We train our model a corridor environment, and evaluate the model in different environment getting the same effect. Compared to traditional method navigation, we can confirm the trained model can be directly applied into different scenarios and have the ability to no collision navigate. It was demonstrated that our DRL model have universal general capacity in different environment.
    摘要

MusicLDM: Enhancing Novelty in Text-to-Music Generation Using Beat-Synchronous Mixup Strategies

  • paper_url: http://arxiv.org/abs/2308.01546
  • repo_url: None
  • paper_authors: Ke Chen, Yusong Wu, Haohe Liu, Marianna Nezhurina, Taylor Berg-Kirkpatrick, Shlomo Dubnov
    for:这篇论文的目的是提出一种基于稳定扩散模型的文本到音乐生成模型,以解决音乐生成任务中的数据有限和版权问题。methods:这篇论文使用了稳定扩散模型和音乐LDM架构,并通过重新训练CLAP预训练模型和Hifi-GAN vocoder来适应音乐领域。此外,该论文还提出了两种不同的混合策略,以便在训练数据有限的情况下生成更多样化的音乐。results:论文的实验结果表明,提出的MusicLDM模型和混合策略可以提高生成的音乐质量和创新性,同时仍保持输入文本和生成音乐之间的相似性。此外,该论文还设计了一些新的评价指标,以证明MusicLDM模型和混合策略在生成音乐时的效果。
    Abstract Diffusion models have shown promising results in cross-modal generation tasks, including text-to-image and text-to-audio generation. However, generating music, as a special type of audio, presents unique challenges due to limited availability of music data and sensitive issues related to copyright and plagiarism. In this paper, to tackle these challenges, we first construct a state-of-the-art text-to-music model, MusicLDM, that adapts Stable Diffusion and AudioLDM architectures to the music domain. We achieve this by retraining the contrastive language-audio pretraining model (CLAP) and the Hifi-GAN vocoder, as components of MusicLDM, on a collection of music data samples. Then, to address the limitations of training data and to avoid plagiarism, we leverage a beat tracking model and propose two different mixup strategies for data augmentation: beat-synchronous audio mixup and beat-synchronous latent mixup, which recombine training audio directly or via a latent embeddings space, respectively. Such mixup strategies encourage the model to interpolate between musical training samples and generate new music within the convex hull of the training data, making the generated music more diverse while still staying faithful to the corresponding style. In addition to popular evaluation metrics, we design several new evaluation metrics based on CLAP score to demonstrate that our proposed MusicLDM and beat-synchronous mixup strategies improve both the quality and novelty of generated music, as well as the correspondence between input text and generated music.
    摘要 Diffusion模型在跨Modal生成任务中表现出色,包括文本到图像和文本到音频生成。然而,生成音乐,作为特殊的音频类型,存在独特的挑战,主要是数据的有限性和版权问题。在这篇论文中,我们为解决这些挑战,首先构建了领先的文本到音乐模型MusicLDM。我们在音乐领域使用Stable Diffusion和AudioLDM架构,并通过重新训练CLAP和Hifi-GAN vocoder来适应音乐领域。然后,为了解决训练数据的限制和避免抄袭,我们提出了拍数跟踪模型和两种不同的混合策略:拍数同步音频混合和拍数同步秘密混合,这些策略可以让模型在训练音乐样本之间进行混合,从而生成更多样的音乐,同时仍然保持与输入文本的相关性。此外,我们还设计了一些基于CLAP分数的新评价指标,以证明我们的提议的MusicLDM和拍数同步混合策略可以提高生成的音乐质量和新颖性,同时保持输入文本和生成音乐之间的相关性。

Lode Enhancer: Level Co-creation Through Scaling

  • paper_url: http://arxiv.org/abs/2308.01543
  • repo_url: None
  • paper_authors: Debosmita Bhaumik, Julian Togelius, Georgios N. Yannakakis, Ahmed Khalifa
  • for: 这个论文的目的是用AI技术为创建2D游戏层提供设计协助工具。
  • methods: 论文使用深度神经网络来采用人工压缩的补丁来逐渐增加级别的级别,并将这些训练的神经网络集成到了一个基于网络的编辑器中,以便用户可以在不同的分辨率下创建和编辑层。
  • results: 论文通过训练神经网络来增加级别,并通过提供不同的分辨率来帮助用户快速创建和编辑层。在使用这个工具时,设计师感到了合作的感觉,喜欢这个概念,并提供了进一步改进的反馈。
    Abstract We explore AI-powered upscaling as a design assistance tool in the context of creating 2D game levels. Deep neural networks are used to upscale artificially downscaled patches of levels from the puzzle platformer game Lode Runner. The trained networks are incorporated into a web-based editor, where the user can create and edit levels at three different levels of resolution: 4x4, 8x8, and 16x16. An edit at any resolution instantly transfers to the other resolutions. As upscaling requires inventing features that might not be present at lower resolutions, we train neural networks to reproduce these features. We introduce a neural network architecture that is capable of not only learning upscaling but also giving higher priority to less frequent tiles. To investigate the potential of this tool and guide further development, we conduct a qualitative study with 3 designers to understand how they use it. Designers enjoyed co-designing with the tool, liked its underlying concept, and provided feedback for further improvement.
    摘要 我们探索基于人工智能的水平升级作为游戏等级设计工具。我们使用深度神经网络来升级由游戏平台冒险游戏《劫 runner》中 искусственно压缩的 patches 的等级。我们在基于网络的编辑器中 integrate 了这些训练过的网络,allowing 用户在不同的分辨率(4x4、8x8和16x16)上创建和编辑等级。任何一个分辨率的编辑都会立即传输到其他分辨率上。由于升级需要创造不存在于较低分辨率中的特征,我们训练神经网络来重现这些特征。我们提出了一种神经网络架构,可以不仅学习升级,而且在较少的块出现时给予更高的优先级。为了了解这种工具的潜力和进一步发展的方向,我们进行了3名设计师的质量调研,了解他们如何使用这种工具,并提供了反馈 для进一步改进。

Non-equilibrium physics: from spin glasses to machine and neural learning

  • paper_url: http://arxiv.org/abs/2308.01538
  • repo_url: None
  • paper_authors: Weishun Zhong
  • for: 这个论文的目的是研究杂乱体系中的emergent智能现象,以便更好地理解这些系统的工作机制和应用于人工智能领域。
  • methods: 这篇论文使用统计物理来描述杂乱体系中的emergent智能行为,并分析这些系统的学习机制和动力学特性。
  • results: 这篇论文发现了杂乱体系中学习机制和物理动力学之间的关系,并提出了一种基于统计物理的方法来设计智能系统。这些发现可能扩展我们对智能系统的现代理解,并揭示更多的计算基础结构适用于人工智能应用。
    Abstract Disordered many-body systems exhibit a wide range of emergent phenomena across different scales. These complex behaviors can be utilized for various information processing tasks such as error correction, learning, and optimization. Despite the empirical success of utilizing these systems for intelligent tasks, the underlying principles that govern their emergent intelligent behaviors remain largely unknown. In this thesis, we aim to characterize such emergent intelligence in disordered systems through statistical physics. We chart a roadmap for our efforts in this thesis based on two axes: learning mechanisms (long-term memory vs. working memory) and learning dynamics (artificial vs. natural). Throughout our journey, we uncover relationships between learning mechanisms and physical dynamics that could serve as guiding principles for designing intelligent systems. We hope that our investigation into the emergent intelligence of seemingly disparate learning systems can expand our current understanding of intelligence beyond neural systems and uncover a wider range of computational substrates suitable for AI applications.
    摘要 多体系统的异常行为展示了各种emergent现象,从不同的尺度到不同的级别。这些复杂的行为可以用于各种信息处理任务,如错误检查、学习和优化。虽然使用这些系统进行智能任务的实践成果非常出色,但是这些系统的下面的原理还未得到了充分了解。在这个论文中,我们想通过统计物理来描述这些emergent智能行为。我们根据两个轴来制定我们的路线图:学习机制(长期记忆 vs.工作记忆)和学习动态(人工vs.自然)。在我们的旅程中,我们发现了学习机制和物理动态之间的关系,这些关系可能成为设计智能系统的导向原理。我们希望通过对各种学习系统的emergent智能行为的研究,扩展我们当前对智能的理解,并探索更多的计算substrate适用于AI应用。

Food Classification using Joint Representation of Visual and Textual Data

  • paper_url: http://arxiv.org/abs/2308.02562
  • repo_url: None
  • paper_authors: Prateek Mittal, Puneet Goyal, Joohi Chauhan
  • for: 这个研究旨在提出一个多 modal 分类框架,用于健康领域中的食品分类。
  • methods: 提议的网络使用 modified EfficientNet 和 Mish 活化函数进行图像分类,并使用传统的 BERT 对应网络进行文本分类。
  • results: 实验结果显示,提议的网络在大规模的 open-source 数据集 UPMC Food-101 上的评估中,与其他方法相比,获得了11.57% 和 6.34% 的精度差。
    Abstract Food classification is an important task in health care. In this work, we propose a multimodal classification framework that uses the modified version of EfficientNet with the Mish activation function for image classification, and the traditional BERT transformer-based network is used for text classification. The proposed network and the other state-of-the-art methods are evaluated on a large open-source dataset, UPMC Food-101. The experimental results show that the proposed network outperforms the other methods, a significant difference of 11.57% and 6.34% in accuracy is observed for image and text classification, respectively, when compared with the second-best performing method. We also compared the performance in terms of accuracy, precision, and recall for text classification using both machine learning and deep learning-based models. The comparative analysis from the prediction results of both images and text demonstrated the efficiency and robustness of the proposed approach.
    摘要 食品分类是医疗保健领域中的一项重要任务。在这项工作中,我们提出了一种多Modal分类框架,使用Modified版EfficientNet和Mish活动函数进行图像分类,并使用传统的BERT变换网络进行文本分类。我们的提案网络和其他状态泰施方法在大量的开源数据集UPMC Food-101上进行了评估。实验结果表明,我们的提案网络在图像和文本分类方面的性能都高于其他方法,与第二高性能方法的差异为11.57%和6.34%。我们还对文本分类方面的性能进行了精度、准确率和回归率的比较分析,并通过图像和文本预测结果的对比,证明了我们的方法的有效性和可靠性。

Digital twin brain: a bridge between biological intelligence and artificial intelligence

  • paper_url: http://arxiv.org/abs/2308.01941
  • repo_url: None
  • paper_authors: Hui Xiong, Congying Chu, Lingzhong Fan, Ming Song, Jiaqi Zhang, Yawei Ma, Ruonan Zheng, Junyang Zhang, Zhengyi Yang, Tianzi Jiang
  • for: 本文提出了一种名为“数字双脑”(Digital Twin Brain,DTB)的平台,用于将生物智能和人工智能联系起来,以更好地探索大脑的复杂性。
  • methods: 该平台包括三个核心元素:脑结构、底层模型和广泛的应用领域。此外,脑 Atlases 提供了关键的约束,以保持脑的网络结构在 DTB 中。
  • results: DTB 可以提供前所未有的智能出现和神经疾病研究,以及人工智能的发展和精准心理医疗的潜在应用。
    Abstract In recent years, advances in neuroscience and artificial intelligence have paved the way for unprecedented opportunities for understanding the complexity of the brain and its emulation by computational systems. Cutting-edge advancements in neuroscience research have revealed the intricate relationship between brain structure and function, while the success of artificial neural networks highlights the importance of network architecture. Now is the time to bring them together to better unravel how intelligence emerges from the brain's multiscale repositories. In this review, we propose the Digital Twin Brain (DTB) as a transformative platform that bridges the gap between biological and artificial intelligence. It consists of three core elements: the brain structure that is fundamental to the twinning process, bottom-layer models to generate brain functions, and its wide spectrum of applications. Crucially, brain atlases provide a vital constraint, preserving the brain's network organization within the DTB. Furthermore, we highlight open questions that invite joint efforts from interdisciplinary fields and emphasize the far-reaching implications of the DTB. The DTB can offer unprecedented insights into the emergence of intelligence and neurological disorders, which holds tremendous promise for advancing our understanding of both biological and artificial intelligence, and ultimately propelling the development of artificial general intelligence and facilitating precision mental healthcare.
    摘要 recent years, advances in neuroscience and artificial intelligence have created unprecedented opportunities for understanding the complexity of the brain and its emulation by computational systems. cutting-edge advancements in neuroscience research have revealed the intricate relationship between brain structure and function, while the success of artificial neural networks highlights the importance of network architecture. now is the time to bring them together to better unravel how intelligence emerges from the brain's multiscale repositories. in this review, we propose the digital twin brain (dTB) as a transformative platform that bridges the gap between biological and artificial intelligence. it consists of three core elements: the brain structure that is fundamental to the twinning process, bottom-layer models to generate brain functions, and its wide spectrum of applications. crucially, brain atlases provide a vital constraint, preserving the brain's network organization within the dTB. furthermore, we highlight open questions that invite joint efforts from interdisciplinary fields and emphasize the far-reaching implications of the dTB. the dTB can offer unprecedented insights into the emergence of intelligence and neurological disorders, which holds tremendous promise for advancing our understanding of both biological and artificial intelligence, and ultimately propelling the development of artificial general intelligence and facilitating precision mental healthcare.

Quantum Multi-Agent Reinforcement Learning for Autonomous Mobility Cooperation

  • paper_url: http://arxiv.org/abs/2308.01519
  • repo_url: None
  • paper_authors: Soohyun Park, Jae Pyoung Kim, Chanyoung Park, Soyi Jung, Joongheon Kim
  • for: 本研究旨在提出一种量子多智能学习(QMARL)算法,用于解决多智能系统中的强制参与问题。
  • methods: 本研究使用了actor-critic网络来实现QMARL,并通过提出一种投影值度量(PVM)来提高其可扩展性和快速启发。
  • results: 对于多智能系统,我们的提出的QMARL算法可以更好地处理噪声中间规模量子(NISQ)时代的限制,并且可以更快地到达启发。此外,我们的QMARL算法还可以更有效地使用参数,从而提高效率。
    Abstract For Industry 4.0 Revolution, cooperative autonomous mobility systems are widely used based on multi-agent reinforcement learning (MARL). However, the MARL-based algorithms suffer from huge parameter utilization and convergence difficulties with many agents. To tackle these problems, a quantum MARL (QMARL) algorithm based on the concept of actor-critic network is proposed, which is beneficial in terms of scalability, to deal with the limitations in the noisy intermediate-scale quantum (NISQ) era. Additionally, our QMARL is also beneficial in terms of efficient parameter utilization and fast convergence due to quantum supremacy. Note that the reward in our QMARL is defined as task precision over computation time in multiple agents, thus, multi-agent cooperation can be realized. For further improvement, an additional technique for scalability is proposed, which is called projection value measure (PVM). Based on PVM, our proposed QMARL can achieve the highest reward, by reducing the action dimension into a logarithmic-scale. Finally, we can conclude that our proposed QMARL with PVM outperforms the other algorithms in terms of efficient parameter utilization, fast convergence, and scalability.
    摘要

Large-scale Generative Simulation Artificial Intelligence: the Next Hotspot in Generative AI

  • paper_url: http://arxiv.org/abs/2308.02561
  • repo_url: None
  • paper_authors: Qi Wang, Yanghe Feng, Jincai Huang, Yiqin Lv, Zheng Xie, Xiaoshan Gao
  • for: 这篇研究是为了探讨大规模生成实验人工智能(LS-GenAI)是下一个热点,以应对实际挑战,例如学习资源的仅对等和科学发现的偏好。
  • methods: 本研究使用了大规模生成实验人工智能(LS-GenAI),以探讨实际挑战,例如学习资源的仅对等和科学发现的偏好。
  • results: 本研究获得了重要的发现,包括实际挑战的解决方案和LS-GenAI的应用前景。
    Abstract The concept of GenAI has been developed for decades. Until recently, it has impressed us with substantial breakthroughs in natural language processing and computer vision, actively engaging in industrial scenarios. Noticing the practical challenges, e.g., limited learning resources, and overly dependencies on scientific discovery empiricism, we nominate large-scale generative simulation artificial intelligence (LS-GenAI) as the next hotspot for GenAI to connect.
    摘要 GenAI的概念已经在数十年内发展,直到最近,它在自然语言处理和计算机视觉领域带来了重要的突破,活跃地投入到工业场景中。注意到实际挑战,例如有限的学习资源和科学发现的过度依赖,我们提名大规模生成仿真人工智能(LS-GenAI)为下一个GenAI连接的热点。

Implicit Occupancy Flow Fields for Perception and Prediction in Self-Driving

  • paper_url: http://arxiv.org/abs/2308.01471
  • repo_url: None
  • paper_authors: Ben Agro, Quinlan Sykora, Sergio Casas, Raquel Urtasun
  • for: 本研究旨在提高自动驾驶车辆(SDV)的见觉和未来行为预测能力。
  • methods: 本研究使用了一种统一的见觉和未来预测方法,通过单个神经网络来表示占用和流速grid。该方法可以 direktly被运动规划器查询,避免过度计算和缺失对象检测。此外,该方法还使用了全球注意机制,以增强跨度检测和预测。
  • results: 经过EXTENSIVE EXPERIMENTS在城市和高速公路上,研究发现,该方法可以超越当前状态的艺术。更多信息可以在https://waabi.ai/research/implicito上得到。
    Abstract A self-driving vehicle (SDV) must be able to perceive its surroundings and predict the future behavior of other traffic participants. Existing works either perform object detection followed by trajectory forecasting of the detected objects, or predict dense occupancy and flow grids for the whole scene. The former poses a safety concern as the number of detections needs to be kept low for efficiency reasons, sacrificing object recall. The latter is computationally expensive due to the high-dimensionality of the output grid, and suffers from the limited receptive field inherent to fully convolutional networks. Furthermore, both approaches employ many computational resources predicting areas or objects that might never be queried by the motion planner. This motivates our unified approach to perception and future prediction that implicitly represents occupancy and flow over time with a single neural network. Our method avoids unnecessary computation, as it can be directly queried by the motion planner at continuous spatio-temporal locations. Moreover, we design an architecture that overcomes the limited receptive field of previous explicit occupancy prediction methods by adding an efficient yet effective global attention mechanism. Through extensive experiments in both urban and highway settings, we demonstrate that our implicit model outperforms the current state-of-the-art. For more information, visit the project website: https://waabi.ai/research/implicito.
    摘要 一个自动驾驶车辆(SDV)需要能够感知周围环境并预测其他交通参与者的未来行为。现有的工作都是先检测对象,然后预测检测到的对象的轨迹,或者预测整个场景的厚度占用和流动Grid。前者会导致安全隐患,因为需要保持检测数量低,牺牲对象回忆。后者因为输出格式的高维度而 computationally expensive,而且受到全连接神经网络的局限性困惑。此外,这些方法都需要大量计算资源预测可能不会被访问的区域或对象。这种情况驱动我们开发了一种统一的感知和未来预测方法,该方法可以直接由运动规划器查询,避免不必要的计算。此外,我们还设计了一种高效又有效的全局注意机制,以解决过去的显式占用预测方法的有限范围问题。经过广泛的实验,我们证明了我们的含义模型在城市和高速公路上都能够超越当前状态。更多信息请访问项目网站:https://waabi.ai/research/implicito。

Training Data Protection with Compositional Diffusion Models

  • paper_url: http://arxiv.org/abs/2308.01937
  • repo_url: None
  • paper_authors: Aditya Golatkar, Alessandro Achille, Ashwin Swaminathan, Stefano Soatto
  • for: 这篇论文是用于描述一种名为 compartmentalized diffusion models(CDM)的方法,可以在推理时将不同的扩散模型(或提示)组合起来,以达到与全数据同时训练的性能。
  • methods: 这篇论文使用的方法是在不同的数据源上训练不同的扩散模型,并在推理时将它们组合起来。每个模型只包含它在训练时接触到的数据信息,从而实现数据隐私和权限控制。
  • results: 这篇论文的结果表明,CDM可以实现选择性的忘记和持续学习,以及根据用户的访问权限服务自定义的模型。此外,CDM还可以确定某些数据子集对于生成特定样本的重要性。
    Abstract We introduce Compartmentalized Diffusion Models (CDM), a method to train different diffusion models (or prompts) on distinct data sources and arbitrarily compose them at inference time. The individual models can be trained in isolation, at different times, and on different distributions and domains and can be later composed to achieve performance comparable to a paragon model trained on all data simultaneously. Furthermore, each model only contains information about the subset of the data it was exposed to during training, enabling several forms of training data protection. In particular, CDMs are the first method to enable both selective forgetting and continual learning for large-scale diffusion models, as well as allowing serving customized models based on the user's access rights. CDMs also allow determining the importance of a subset of the data in generating particular samples.
    摘要 我们介绍Compartmentalized Diffusion Models(CDM),一种方法可以在推广过程中训练不同的扩散模型(或提示),并在推广时进行可compose的方式。个别模型可以在不同的时间、不同的分布和领域上进行独立的训练,然后在推广时进行混合。此外,每个模型只包含它在训练过程中接触到的subset of data的信息,因此可以实现多种训练数据保护。特别是,CDMs是首个允许大规模扩散模型进行选择性遗忘和持续学习,以及根据用户的存取权服务自定义模型。此外,CDMs还允许决定特定样本的某些subset of data的重要性。

Dual Governance: The intersection of centralized regulation and crowdsourced safety mechanisms for Generative AI

  • paper_url: http://arxiv.org/abs/2308.04448
  • repo_url: None
  • paper_authors: Avijit Ghosh, Dhanya Lakshmi
    for: 这个论文的目的是提出一种名为“双重管理”的框架,以确保在生成人工智能技术的应用中保障安全和伦理。methods: 该论文使用的方法包括:* 分析现有的中央化法规和社区自身的安全机制,以找到它们之间的互补点。* 提出一种名为“双重管理”的框架,以实现中央化法规和社区自身的共同管理。results: 该论文的结果表明,通过实施“双重管理”框架,可以促进生成人工智能技术的创新和创新,同时确保其安全和伦理的应用。
    Abstract Generative Artificial Intelligence (AI) has seen mainstream adoption lately, especially in the form of consumer-facing, open-ended, text and image generating models. However, the use of such systems raises significant ethical and safety concerns, including privacy violations, misinformation and intellectual property theft. The potential for generative AI to displace human creativity and livelihoods has also been under intense scrutiny. To mitigate these risks, there is an urgent need of policies and regulations responsible and ethical development in the field of generative AI. Existing and proposed centralized regulations by governments to rein in AI face criticisms such as not having sufficient clarity or uniformity, lack of interoperability across lines of jurisdictions, restricting innovation, and hindering free market competition. Decentralized protections via crowdsourced safety tools and mechanisms are a potential alternative. However, they have clear deficiencies in terms of lack of adequacy of oversight and difficulty of enforcement of ethical and safety standards, and are thus not enough by themselves as a regulation mechanism. We propose a marriage of these two strategies via a framework we call Dual Governance. This framework proposes a cooperative synergy between centralized government regulations in a U.S. specific context and safety mechanisms developed by the community to protect stakeholders from the harms of generative AI. By implementing the Dual Governance framework, we posit that innovation and creativity can be promoted while ensuring safe and ethical deployment of generative AI.
    摘要 现代化的人工智能(AI)在最近几年内得到了广泛的普及,尤其是在用户直接参与、开放结束的文本和图像生成模型的形式。然而,使用这些系统会引起重要的伦理和安全问题,包括隐私侵犯、谣言和知识产权盗窃。AI的生成能力可能会取代人类的创造力和生活方式,也在严重的检查下。为了缓解这些风险,需要负责任的政策和法规,以促进开发领域中负责任的AI发展。现有和提议的中央政府的法规,尽管具有一定的优点,但也存在不足的明确性和一致性,不可避免的干扰创新和自由市场竞争。 decentralized的保护机制,例如人类协同发展的安全工具和机制,可以作为一种替代方案。然而,它们缺乏伦理和安全标准的监管和执行能力,因此不具备充分的监管能力。为了解决这些问题,我们提出了一种名为“双重治理”的框架。这种框架建议在美国特有的 контексте下,通过中央政府的法规和社区开发的安全机制,保护利益者从生成AI的危害中。通过实施“双重治理”框架,我们认为可以促进创新和创造力,同时确保安全和负责任的AI发展。

  • paper_url: http://arxiv.org/abs/2308.01469
  • repo_url: None
  • paper_authors: Ruyi Ding, Shijin Duan, Xiaolin Xu, Yunsi Fei
  • for: 本文旨在攻击图Structured Data中的图结构学习模型(Graph Neural Networks,GNNs),以便窃取图连接信息。
  • methods: 本文提出了一种新的图恶意攻击方法,即VertexSerum,可以增强图连接信息泄露。此外,本文还提出了一种注意力机制,可以在链接检测网络中嵌入,以更准确地推测节点相互之间的连接关系。
  • results: 本文的实验结果表明,VertexSerumsignificantly outperforms state-of-the-art(SOTA)链接推测攻击,提高了平均混合精度分数(AUC)的提升率为9.8%。此外,本文的实验还表明,VertexSerum在黑盒和在线学习 Setting下都有出色的应用可能性。
    Abstract Graph neural networks (GNNs) have brought superb performance to various applications utilizing graph structural data, such as social analysis and fraud detection. The graph links, e.g., social relationships and transaction history, are sensitive and valuable information, which raises privacy concerns when using GNNs. To exploit these vulnerabilities, we propose VertexSerum, a novel graph poisoning attack that increases the effectiveness of graph link stealing by amplifying the link connectivity leakage. To infer node adjacency more accurately, we propose an attention mechanism that can be embedded into the link detection network. Our experiments demonstrate that VertexSerum significantly outperforms the SOTA link inference attack, improving the AUC scores by an average of $9.8\%$ across four real-world datasets and three different GNN structures. Furthermore, our experiments reveal the effectiveness of VertexSerum in both black-box and online learning settings, further validating its applicability in real-world scenarios.
    摘要 GRAPH NEURAL NETWORKS (GNNs) 有提供了出色的表现在使用图structured数据的各种应用程序中,如社交分析和诈骗探测。图链,例如社交关系和交易历史记录,是敏感和有价值的信息,使得使用 GNNs 时存在隐私问题。为了利用这些漏洞,我们提议VertexSerum,一种新的图毒注攻击,可以增强图链泄露的连接性。为更准确地推断节点相邻关系,我们提议一种注意力机制可以在链接检测网络中嵌入。我们的实验表明,VertexSerum可以明显超过当前链接推断攻击的最佳实践(SOTA),提高了平均抽样率9.8%。此外,我们的实验还表明VertexSerum在黑盒和在线学习设置下都有出色的应用可行性。

Novel Physics-Based Machine-Learning Models for Indoor Air Quality Approximations

  • paper_url: http://arxiv.org/abs/2308.01438
  • repo_url: None
  • paper_authors: Ahmad Mohammadshirazi, Aida Nadafian, Amin Karimi Monsefi, Mohammad H. Rafiei, Rajiv Ramnath
  • for: 本研究旨在提出六种基于物理知识的机器学习模型,用于精准预测室内污染物质浓度。
  • methods: 本研究使用了状态空间概念、闭包回卷单元和分解技术,搭建了一种轻量级、计算效率高的机器学习模型。
  • results: 实验结果表明,提议的模型比相关的现状艺术模型更加简单、计算效率高,同时能够更好地捕捉室内空气质量数据中的高度非线性特征。
    Abstract Cost-effective sensors are capable of real-time capturing a variety of air quality-related modalities from different pollutant concentrations to indoor/outdoor humidity and temperature. Machine learning (ML) models are capable of performing air-quality "ahead-of-time" approximations. Undoubtedly, accurate indoor air quality approximation significantly helps provide a healthy indoor environment, optimize associated energy consumption, and offer human comfort. However, it is crucial to design an ML architecture to capture the domain knowledge, so-called problem physics. In this study, we propose six novel physics-based ML models for accurate indoor pollutant concentration approximations. The proposed models include an adroit combination of state-space concepts in physics, Gated Recurrent Units, and Decomposition techniques. The proposed models were illustrated using data collected from five offices in a commercial building in California. The proposed models are shown to be less complex, computationally more efficient, and more accurate than similar state-of-the-art transformer-based models. The superiority of the proposed models is due to their relatively light architecture (computational efficiency) and, more importantly, their ability to capture the underlying highly nonlinear patterns embedded in the often contaminated sensor-collected indoor air quality temporal data.
    摘要 Cost-effective sensors can real-time capture various air quality-related modalities, from different pollutant concentrations to indoor/outdoor humidity and temperature. Machine learning (ML) models can perform air-quality "ahead-of-time" approximations. Accurate indoor air quality approximation is crucial for providing a healthy indoor environment, optimizing associated energy consumption, and offering human comfort. However, it is essential to design an ML architecture that captures the domain knowledge, so-called problem physics. In this study, we propose six novel physics-based ML models for accurate indoor pollutant concentration approximations. The proposed models combine state-space concepts in physics, Gated Recurrent Units, and Decomposition techniques. The proposed models were illustrated using data collected from five offices in a commercial building in California. The proposed models are less complex, computationally more efficient, and more accurate than similar state-of-the-art transformer-based models. The superiority of the proposed models is due to their light architecture and their ability to capture the underlying highly nonlinear patterns in the often contaminated sensor-collected indoor air quality temporal data.

Why Do We Need Neuro-symbolic AI to Model Pragmatic Analogies?

  • paper_url: http://arxiv.org/abs/2308.01936
  • repo_url: None
  • paper_authors: Thilini Wijesiriwardene, Amit Sheth, Valerie L. Shalin, Amitava Das
  • for: 这篇论文探讨了大自然语言模型(LLM)在处理文本中逐渐增加复杂度的 analogy 性能。
  • methods: 这篇论文使用了 Neuro-symbolic AI 技术,结合统计学和符号学 AI,以提高文本表示,强调和增强相关内容,并提供抽象和导航。
  • results: 研究发现,随着 analogy 的复杂度增加,需要更多的、多样化的知识,不可能由 lexical co-occurrence statistics 提供。Neuro-symbolic AI 技术可以维持 LLM 的效率,同时保持 analogy 的解释能力,帮助进行教学应用。
    Abstract A hallmark of intelligence is the ability to use a familiar domain to make inferences about a less familiar domain, known as analogical reasoning. In this article, we delve into the performance of Large Language Models (LLMs) in dealing with progressively complex analogies expressed in unstructured text. We discuss analogies at four distinct levels of complexity: lexical analogies, syntactic analogies, semantic analogies, and pragmatic analogies. As the analogies become more complex, they require increasingly extensive, diverse knowledge beyond the textual content, unlikely to be found in the lexical co-occurrence statistics that power LLMs. To address this, we discuss the necessity of employing Neuro-symbolic AI techniques that combine statistical and symbolic AI, informing the representation of unstructured text to highlight and augment relevant content, provide abstraction and guide the mapping process. Our knowledge-informed approach maintains the efficiency of LLMs while preserving the ability to explain analogies for pedagogical applications.
    摘要 一种智能的特征是使用熟悉的领域来对不熟悉的领域进行推理,这称为对比推理。在这篇文章中,我们探讨大语言模型(LLM)在处理不断增长复杂的对比表达的文本中表现。我们讨论了四种不同的复杂性水平的对比:lexical对比、语法对比、semantic对比和 Pragmatic对比。随着对比的复杂程度增加,它们需要更多的、多样化的知识,不可能通过文本内容的lexical co-occurrence statistics来找到。为此,我们讨论了结合统计学和符号学AI技术的必要性,以便高亮和增强文本内容,提供抽象和导向映射过程。我们的知识填充approach保持了LLM的效率,同时保留了对对比的解释,以便在教育应用中使用。

Unlocking the Potential of Similarity Matching: Scalability, Supervision and Pre-training

  • paper_url: http://arxiv.org/abs/2308.02427
  • repo_url: None
  • paper_authors: Yanis Bahroun, Shagesh Sridharan, Atithi Acharya, Dmitri B. Chklovskii, Anirvan M. Sengupta
  • for: 这项研究旨在开发一种基于本地学习规则的可能学习算法,以替代具有限制的反射层梯队列(BP)算法。
  • methods: 研究人员提出了一种基于PyTorch实现的卷积非负相似匹配(SM)算法,以扩展SM到大规模数据集。此外,他们还提出了一种基于SM层的本地监督学习目标,并在PyTorch实现中进行了预训练模型 such as LeNet的比较。
  • results: 研究人员发现,使用PyTorch实现的SM算法可以在计算效率和生物可能性方面与BP算法相比,并且可以在大规模数据集上扩展。此外,他们还发现,将SM层与BP算法拼接在一起可以提高模型的评价性能。
    Abstract While effective, the backpropagation (BP) algorithm exhibits limitations in terms of biological plausibility, computational cost, and suitability for online learning. As a result, there has been a growing interest in developing alternative biologically plausible learning approaches that rely on local learning rules. This study focuses on the primarily unsupervised similarity matching (SM) framework, which aligns with observed mechanisms in biological systems and offers online, localized, and biologically plausible algorithms. i) To scale SM to large datasets, we propose an implementation of Convolutional Nonnegative SM using PyTorch. ii) We introduce a localized supervised SM objective reminiscent of canonical correlation analysis, facilitating stacking SM layers. iii) We leverage the PyTorch implementation for pre-training architectures such as LeNet and compare the evaluation of features against BP-trained models. This work combines biologically plausible algorithms with computational efficiency opening multiple avenues for further explorations.
    摘要 而Effective的Backpropagation(BP)算法具有限制,包括生物学可能性、计算成本和在线学习适用性。因此,有越来越多的关注于开发生物学可能性的学习方法。这项研究强调在大规模数据集上扩大Similarity Matching(SM)框架,与生物系统中观察到的机制相一致,并提供在线、本地和生物学可能性的算法。(i)为了扩大SM到大数据集,我们提议使用PyTorch实现Convolutional Nonnegative SM。(ii)我们引入本地监督SM目标,与栅格分析相似,以推进SM层堆叠。(iii)我们利用PyTorch实现对预训练架构如LeNet进行评估,并与BP训练模型进行比较。这项工作结合了生物学可能性的算法和计算效率,开启了多个探索的可能性。

Bio+Clinical BERT, BERT Base, and CNN Performance Comparison for Predicting Drug-Review Satisfaction

  • paper_url: http://arxiv.org/abs/2308.03782
  • repo_url: None
  • paper_authors: Yue Ling
    for: 这项研究的目的是开发一种可以分析病人药物评价文本,并准确地分类为正面、中性或负面的自然语言处理(NLP)模型。methods: 这项研究采用了多种分类模型,包括BERT基础模型、医学+клиничеBERT模型和简单的CNN模型。results: 研究结果表明,医学+клиничеBERT模型在表现 overall 方面表现出色,特别是在医疗术语方面表现出了11%的macro f1和回归分数提高,如图2所示。
    Abstract The objective of this study is to develop natural language processing (NLP) models that can analyze patients' drug reviews and accurately classify their satisfaction levels as positive, neutral, or negative. Such models would reduce the workload of healthcare professionals and provide greater insight into patients' quality of life, which is a critical indicator of treatment effectiveness. To achieve this, we implemented and evaluated several classification models, including a BERT base model, Bio+Clinical BERT, and a simpler CNN. Results indicate that the medical domain-specific Bio+Clinical BERT model significantly outperformed the general domain base BERT model, achieving macro f1 and recall score improvement of 11%, as shown in Table 2. Future research could explore how to capitalize on the specific strengths of each model. Bio+Clinical BERT excels in overall performance, particularly with medical jargon, while the simpler CNN demonstrates the ability to identify crucial words and accurately classify sentiment in texts with conflicting sentiments.
    摘要 目标是开发一种自然语言处理(NLP)模型,能够分析患者的药物评价,并准确地将满意度分类为正面、中性或负面。这些模型将减轻医疗专业人员的工作负担,并为患者的生活质量提供更多的信息,这是治疗效果的关键指标。为达到这一目标,我们实施和评估了多种分类模型,包括BERT基础模型、医学+临床BERT和简单的CNN。结果表明,具有医学领域特点的Bio+Clinical BERT模型在表格2中显著超过了通用领域基础BERT模型,实现了 macro f1和回忆得分的11%的提升,如图表2所示。未来的研究可以探讨如何利用每个模型的特点。Bio+Clinical BERT在整体性能方面表现出色,特别是对医疗术语的处理能力强,而简单的CNN则能够准确地识别关键词并在文本中 conflicting 的情感下准确地分类 sentiment。

The Paradigm Shifts in Artificial Intelligence

  • paper_url: http://arxiv.org/abs/2308.02558
  • repo_url: https://github.com/its-me-yasho/AI-virtual-mouse-
  • paper_authors: Vasant Dhar
  • for: 本研究旨在探讨人工智能领域过去60年中的 paradigm shift,以及现在的大型预训系统如GPT-3和ChatGPT等 conversational agents 的emergence。
  • methods: 本研究使用库恩的科学进步框架(Kuhn, 1962)来框定每个 paradigm 的发展和衰落,以及现在的AI技术 configurable 应用。
  • results: 研究发现,现在的AI技术已成为一种通用技术,可以 configurable 应用于各种领域。然而,这些技术也存在一些问题和风险,如数据隐私和安全问题。
    Abstract Kuhn's framework of scientific progress (Kuhn, 1962) provides a useful framing of the paradigm shifts that have occurred in Artificial Intelligence over the last 60 years. The framework is also useful in understanding what is arguably a new paradigm shift in AI, signaled by the emergence of large pre-trained systems such as GPT-3, on which conversational agents such as ChatGPT are based. Such systems make intelligence a commoditized general purpose technology that is configurable to applications. In this paper, I summarize the forces that led to the rise and fall of each paradigm, and discuss the pressing issues and risks associated with the current paradigm shift in AI.
    摘要 库恩的科学进步框架(库恩,1962)提供了有用的框架,以呈现过去60年来人工智能领域内的 paradigm shift。这个框架也有助于理解目前AI领域可能出现的新的 paradigm shift,即大型预训系统如GPT-3的出现,以及基于这些系统的对话代理人如ChatGPT。这些系统使智能成为可 configurable 通用技术,可以应用于各种应用程序。在这篇文章中,我会概述过去各个 paradigm 的起源和衰落,并讨论目前AI领域中的紧迫问题和风险。

OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models

  • paper_url: http://arxiv.org/abs/2308.01390
  • repo_url: https://github.com/mlfoundations/open_flamingo
  • paper_authors: Anas Awadalla, Irena Gao, Josh Gardner, Jack Hessel, Yusuf Hanafy, Wanrong Zhu, Kalyani Marathe, Yonatan Bitton, Samir Gadre, Shiori Sagawa, Jenia Jitsev, Simon Kornblith, Pang Wei Koh, Gabriel Ilharco, Mitchell Wortsman, Ludwig Schmidt
  • for: 这个论文是为了提出一种基于自适应语言模型的视觉语言模型家族,包括3B至9B参数的多种模型。
  • methods: 这些模型使用了开源复制深度智能的Flamenco模型,并在七个视觉语言数据集上进行了训练。
  • results: 在七个视觉语言数据集上,OpenFlamingo模型的平均表现为80-89%相对于对应的Flamenco模型表现。I hope that helps! Let me know if you have any other questions.
    Abstract We introduce OpenFlamingo, a family of autoregressive vision-language models ranging from 3B to 9B parameters. OpenFlamingo is an ongoing effort to produce an open-source replication of DeepMind's Flamingo models. On seven vision-language datasets, OpenFlamingo models average between 80 - 89% of corresponding Flamingo performance. This technical report describes our models, training data, hyperparameters, and evaluation suite. We share our models and code at https://github.com/mlfoundations/open_flamingo.
    摘要 我们介绍OpenFlamingo,一个家族型态自动递增视言模型,从3B到9B参数。OpenFlamingo是一个持续进行的开源实现深渊智能的Flamingo模型的努力。在七个视言数据集上,OpenFlamingo模型的平均性能在80-89%之间,与对应的Flamingo模型的性能相似。This technical report describes our models, training data, hyperparameters, and evaluation suite. We share our models and code at .

DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales

  • paper_url: http://arxiv.org/abs/2308.01320
  • repo_url: https://github.com/microsoft/DeepSpeed
  • paper_authors: Zhewei Yao, Reza Yazdani Aminabadi, Olatunji Ruwase, Samyam Rajbhandari, Xiaoxia Wu, Ammar Ahmad Awan, Jeff Rasley, Minjia Zhang, Conglong Li, Connor Holmes, Zhongzhu Zhou, Michael Wyatt, Molly Smith, Lev Kurilenko, Heyang Qin, Masahiro Tanaka, Shuai Che, Shuaiwen Leon Song, Yuxiong He
  • for: 这篇论文旨在提供一个可 accessed、高效、便宜的练习和测试 ChatGPT-like 模型的 Reinforcement Learning with Human Feedback (RLHF) 训练管线,并且可以让训练大小达到百亿个条件的模型在 record 时间内训练,并且在成本上降低训练成本。
  • methods: 这篇论文提出了一个名为 DeepSpeed-Chat 的系统,它提供了三个关键能力:一个易于使用的 ChatGPT-like 模型训练和测试体验、一个 DeepSpeed-RLHF 管线,复制 InstructGPT 的训练管线,以及一个强大的 DeepSpeed-RLHF 系统,结合了多种优化,以提高训练和测试的效率和数据处理能力。
  • results: 这篇论文发现,使用 DeepSpeed-Chat 可以实现在训练大小达到百亿个条件的模型上,在 record 时间内训练,并且在成本上降低训练成本。
    Abstract ChatGPT-like models have revolutionized various applications in artificial intelligence, from summarization and coding to translation, matching or even surpassing human performance. However, the current landscape lacks an accessible, efficient, and cost-effective end-to-end RLHF (Reinforcement Learning with Human Feedback) training pipeline for these powerful models, particularly when training at the scale of billions of parameters. This paper introduces DeepSpeed-Chat, a novel system that democratizes RLHF training, making it accessible to the AI community. DeepSpeed-Chat offers three key capabilities: an easy-to-use training and inference experience for ChatGPT-like models, a DeepSpeed-RLHF pipeline that replicates the training pipeline from InstructGPT, and a robust DeepSpeed-RLHF system that combines various optimizations for training and inference in a unified way. The system delivers unparalleled efficiency and scalability, enabling training of models with hundreds of billions of parameters in record time and at a fraction of the cost. With this development, DeepSpeed-Chat paves the way for broader access to advanced RLHF training, even for data scientists with limited resources, thereby fostering innovation and further development in the field of AI.
    摘要 chatGPT-like模型已经革命化了人工智能多个应用领域,从概要和编程到翻译,与人类表现相当或甚至超越人类表现。然而,目前的景象缺乏一个可 accessible、高效、Cost-effective的RLHF(强化学习with人类反馈)训练管道,特别是在百亿参数训练的场景下。这篇论文介绍了DeepSpeed-Chat,一个新的系统,使得RLHF训练变得更加可 accessible。DeepSpeed-Chat提供了三个关键能力:对ChatGPT-like模型的易于使用训练和推理经验,InstructGPT的DeepSpeed-RLHF管道的复制,以及一个robust的DeepSpeed-RLHF系统,其结合了多种优化,以实现高效和可扩展的训练和推理。该系统可以在纪录时间内训练百亿参数的模型,并且只需一小部分的成本。通过这个发展,DeepSpeed-Chat打开了RLHF训练的大门,使得更多的数据科学家可以访问高级RLHF训练,从而推动人工智能领域的创新和发展。

CausalOps – Towards an Industrial Lifecycle for Causal Probabilistic Graphical Models

  • paper_url: http://arxiv.org/abs/2308.01375
  • repo_url: None
  • paper_authors: Robert Maier, Andreas Schlattl, Thomas Guess, Jürgen Mottok
  • for: This paper aims to provide a novel lifecycle framework for causal model development and application, called CausalOps, to address the gap in a process reference for organizations interested in employing causal engineering.
  • methods: The paper proposes CausalOps, a lifecycle framework that defines key entities, dependencies, and intermediate artifacts generated during causal engineering, establishing a consistent vocabulary and workflow model.
  • results: The paper aims to drive the adoption of causal methods in practical applications within interested organizations and the causality community by providing a holistic view of creating and maintaining causal models.Here is the same information in Simplified Chinese text:
  • for: 本研究目的是提供一个新的 causal 模型开发和应用的生命周期框架,名为 CausalOps,以应对在 causal 工程中缺乏一个适当的程序参考。
  • methods: CausalOps 提出了一个生命周期框架,它定义了 causal 模型开发和应用中的关键实体、依赖关系和中间生成的资料,实现了一致的词汇和工作流程模型。
  • results: 本研究的目的是将 causal 方法实施到实际应用中的有兴趣组织和 causality 社区中,提供一个全面的创建和维护 causal 模型的观点。
    Abstract Causal probabilistic graph-based models have gained widespread utility, enabling the modeling of cause-and-effect relationships across diverse domains. With their rising adoption in new areas, such as automotive system safety and machine learning, the need for an integrated lifecycle framework akin to DevOps and MLOps has emerged. Currently, a process reference for organizations interested in employing causal engineering is missing. To address this gap and foster widespread industrial adoption, we propose CausalOps, a novel lifecycle framework for causal model development and application. By defining key entities, dependencies, and intermediate artifacts generated during causal engineering, we establish a consistent vocabulary and workflow model. This work contextualizes causal model usage across different stages and stakeholders, outlining a holistic view of creating and maintaining them. CausalOps' aim is to drive the adoption of causal methods in practical applications within interested organizations and the causality community.
    摘要 causal probabilistic graph-based models 已经广泛应用于不同领域,以模型 causality 关系。随着这些模型在新领域,如自动驾驶系统安全和机器学习中的应用,需要一个整合的生命周期框架,类似于 DevOps 和 MLOps。目前,有一个关于 causal engineering 的过程参考 absent。为了填补这个空白和推广 causal methods 的实际应用,我们提出了 CausalOps,一种新的生命周期框架 для causal 模型开发和应用。通过定义关键实体、依赖关系和 intermediate artifacts 在 causal engineering 中,我们建立了一种一致的词汇和工作流程模型。这种工作流程可以跨不同阶段和各种参与者,提供一个整体的创建和维护 causal 模型的视图。CausalOps 的目标是推广 causal methods 在有兴趣的组织和 causality 社区中的应用。

AI-Enhanced Data Processing and Discovery Crowd Sourcing for Meteor Shower Mapping

  • paper_url: http://arxiv.org/abs/2308.02664
  • repo_url: None
  • paper_authors: Siddha Ganju, Amartya Hatua, Peter Jenniskens, Sahyadri Krishna, Chicheng Ren, Surya Ambardar
  • for: 这个研究项目的目标是为了映射我们的陨星雨,通过多个位置的低照度视频摄像头进行三角测量陨星轨迹,并在16个国家的北和南 полу球上进行观察和预测陨星雨的返回。
  • methods: 这个研究使用了一个自动化的云端AI智能管道来加速数据处理,并使用可解释的活动学习算法来提高数据可视化,以提高发现率。
  • results: 到目前为止,CAMS已经发现了200多个新的陨星雨,并验证了多个之前报告的陨星雨。
    Abstract The Cameras for Allsky Meteor Surveillance (CAMS) project, funded by NASA starting in 2010, aims to map our meteor showers by triangulating meteor trajectories detected in low-light video cameras from multiple locations across 16 countries in both the northern and southern hemispheres. Its mission is to validate, discover, and predict the upcoming returns of meteor showers. Our research aimed to streamline the data processing by implementing an automated cloud-based AI-enabled pipeline and improve the data visualization to improve the rate of discoveries by involving the public in monitoring the meteor detections. This article describes the process of automating the data ingestion, processing, and insight generation using an interpretable Active Learning and AI pipeline. This work also describes the development of an interactive web portal (the NASA Meteor Shower portal) to facilitate the visualization of meteor radiant maps. To date, CAMS has discovered over 200 new meteor showers and has validated dozens of previously reported showers.
    摘要 美国国家航空航天局(NASA)自2010年起投入的全天 meteor 观测计划(CAMS)计划,目标是通过多个地点、多国的低照度视频相机三角测量 meteor 轨迹,以易地卷积累积掌握 meteor 流星雨。该计划的任务是验证、发现和预测未来的 meteor 流星雨。我们的研究旨在通过实施云端AI智能化管道自动化数据处理,提高数据可视化,以提高发现率,并让公众参与监测流星探测。这篇文章描述了自动化数据入口、处理和探索的 Active Learning AI 管道,以及开发了一个可交互的 NASA 流星雨门户,以便visualize meteor 辐射地图。迄今,CAMS 已经发现了200多个新的 meteor 流星雨,并验证了数十个之前已知的流星雨。

An enhanced motion planning approach by integrating driving heterogeneity and long-term trajectory prediction for automated driving systems

  • paper_url: http://arxiv.org/abs/2308.01369
  • repo_url: None
  • paper_authors: Ni Dong, Shuming Chen, Yina Wu, Yiheng Feng, Xiaobo Liu
  • for: 本研究旨在提高自动驾驶系统在复杂驾驶环境中的导航能力,特别是预测周围的人驾驶汽车(HDV)的驾驶行为。
  • methods: 本研究提出了一种增强的动态规划方法,使用了两个方面的结果:周围 HDV 的驾驶行为和长期轨迹,通过层次模型与自动驾驶系统的动态规划相结合,以提高驾驶安全性。
  • results: 研究发现,使用提出的增强方法可以更好地预测周围 HDV 的驾驶行为,提高自动驾驶系统的导航能力和安全性。
    Abstract Navigating automated driving systems (ADSs) through complex driving environments is difficult. Predicting the driving behavior of surrounding human-driven vehicles (HDVs) is a critical component of an ADS. This paper proposes an enhanced motion-planning approach for an ADS in a highway-merging scenario. The proposed enhanced approach utilizes the results of two aspects: the driving behavior and long-term trajectory of surrounding HDVs, which are coupled using a hierarchical model that is used for the motion planning of an ADS to improve driving safety.
    摘要 自动驾驶系统(ADS)在复杂的驾驶环境中困难 Navigation. 预测周围的人驾驶车辆(HDV)驾驶行为是ADS的一个关键组件。这篇论文提出了一种改进的运动规划方法,用于ADS在高速公路岔道场景中驾驶。该提出的改进方法利用了两个方面的结果:周围HDV的驾驶行为和长期轨迹,通过层次模型与ADS的运动规划相结合,以提高驾驶安全性。

Empirical Translation Process Research: Past and Possible Future Perspectives

  • paper_url: http://arxiv.org/abs/2308.01368
  • repo_url: None
  • paper_authors: Michael Carl
  • for: 本研究旨在开发和评估Empirical Translation Process Research(TPR)模型,并提出了Free Energy Principle(FEP)和Active Inference(AIF)作为深入嵌入翻译过程的模型框架。
  • methods: 本研究使用了CRITT TPR-DB传统,并引入了新的方法来量化基本概念,如重要性理论(relevance)、s-模式和i-模式。这些方法与Monitor Model的关系被确立,并将重要性maximization看作是Free Energy的最小化。
  • results: FEP/AIF提供了一个数学上可靠的基础,允许模型深入的时间建筑,在不同的时间轴上嵌入翻译过程。这个框架开放了未来预测TPR的可能性,可能对人类翻译过程的理解做出重要贡献,并对翻译学和人工智能设计框架产生重要影响。
    Abstract Over the past four decades, efforts have been made to develop and evaluate models for Empirical Translation Process Research (TPR), yet a comprehensive framework remains elusive. This article traces the evolution of empirical TPR within the CRITT TPR-DB tradition and proposes the Free Energy Principle (FEP) and Active Inference (AIF) as a framework for modeling deeply embedded translation processes. It introduces novel approaches for quantifying fundamental concepts of Relevance Theory (relevance, s-mode, i-mode), and establishes their relation to the Monitor Model, framing relevance maximization as a special case of minimizing free energy. FEP/AIF provides a mathematically rigorous foundation that enables modeling of deep temporal architectures in which embedded translation processes unfold on different timelines. This framework opens up exciting prospects for future research in predictive TPR, likely to enrich our comprehension of human translation processes, and making valuable contributions to the wider realm of translation studies and the design of cognitive architectures.
    摘要 Note:* "Empirical Translation Process Research" (TPR) is translated as "观察式翻译过程研究" (含义为"empirical"的"观察"和"过程"两个词).* "CRITT TPR-DB" is translated as "CRITT TPR-DB" (缩写为"CRITT"和"TPR-DB").* "Free Energy Principle" (FEP) is translated as "自由能原理" (含义为"free"和"energy"两个词).* "Active Inference" (AIF) is translated as "活动推测" (含义为"active"和"inference"两个词).* "Relevance Theory" is translated as "相关理论" (含义为"relevance"和"theory"两个词).* "Monitor Model" is translated as "监控模型" (含义为"monitor"和"model"两个词).

More Context, Less Distraction: Visual Classification by Inferring and Conditioning on Contextual Attributes

  • paper_url: http://arxiv.org/abs/2308.01313
  • repo_url: https://github.com/umd-huang-lab/perceptionclip
  • paper_authors: Bang An, Sicheng Zhu, Michael-Andrei Panaitescu-Liess, Chaithanya Kumar Mummadi, Furong Huang
  • for: Zero-shot image classification
  • methods: 使用CLIP的contextual attributes进行图像分类,不需要训练
  • results: better generalization, group robustness, and better interpretability compared to traditional zero-shot classification methods
    Abstract CLIP, as a foundational vision language model, is widely used in zero-shot image classification due to its ability to understand various visual concepts and natural language descriptions. However, how to fully leverage CLIP's unprecedented human-like understanding capabilities to achieve better zero-shot classification is still an open question. This paper draws inspiration from the human visual perception process: a modern neuroscience view suggests that in classifying an object, humans first infer its class-independent attributes (e.g., background and orientation) which help separate the foreground object from the background, and then make decisions based on this information. Inspired by this, we observe that providing CLIP with contextual attributes improves zero-shot classification and mitigates reliance on spurious features. We also observe that CLIP itself can reasonably infer the attributes from an image. With these observations, we propose a training-free, two-step zero-shot classification method named PerceptionCLIP. Given an image, it first infers contextual attributes (e.g., background) and then performs object classification conditioning on them. Our experiments show that PerceptionCLIP achieves better generalization, group robustness, and better interpretability. For example, PerceptionCLIP with ViT-L/14 improves the worst group accuracy by 16.5% on the Waterbirds dataset and by 3.5% on CelebA.
    摘要 CLIP,作为基础视觉语言模型,在零基础图像分类中广泛应用,这是因为它可以理解多种视觉概念和自然语言描述。然而,如何充分利用CLIP的人类化理解能力以实现更好的零基础分类仍然是一个开放的问题。这篇论文启发自人类视觉过程:现代神经科学视野认为,在分类一个物体,人们首先推理出该物体的类型独立特征(例如背景和方向),这些特征会将背景和物体分离开来,然后根据这些信息进行决策。以这种思想为灵感,我们发现,为CLIP提供Contextual attributes可以提高零基础分类和减少基于假特征的依赖。此外,我们发现CLIP本身也可以有效地从图像中推理出这些特征。基于这些观察,我们提出了一种无需训练的、两步零基础分类方法名为PerceptionCLIP。给定一个图像,它首先推理出图像中的Contextual attributes(例如背景),然后根据这些特征进行物体分类。我们的实验表明,PerceptionCLIP在水鸟数据集上提高了最差群组精度 by 16.5%,并在CelebA数据集上提高了3.5%。

Lode Encoder: AI-constrained co-creativity

  • paper_url: http://arxiv.org/abs/2308.01312
  • repo_url: None
  • paper_authors: Debosmita Bhaumik, Ahmed Khalifa, Julian Togelius
  • for: 这篇论文是为了描述一种基于自动编码器的混合性倡议级别创建系统,用于 classic 平台游戏垃圾 runner。
  • methods: 该系统使用多个自动编码器,通过对垃圾 runner 等级进行训练,来生成更符合这些等级的级别设计。用户可以通过 ‘画画’ 方式在系统提供的建议基础上进行级别设计和编辑。
  • results: 文章描述了系统的设计和训练方法,以及用户测试结果。
    Abstract We present Lode Encoder, a gamified mixed-initiative level creation system for the classic platform-puzzle game Lode Runner. The system is built around several autoencoders which are trained on sets of Lode Runner levels. When fed with the user's design, each autoencoder produces a version of that design which is closer in style to the levels that it was trained on. The Lode Encoder interface allows the user to build and edit levels through 'painting' from the suggestions provided by the autoencoders. Crucially, in order to encourage designers to explore new possibilities, the system does not include more traditional editing tools. We report on the system design and training procedure, as well as on the evolution of the system itself and user tests.
    摘要 我们介绍Lode Encoder,一种基于混合主动initiative的平台游戏逻辑编辑系统,专门为经典平台游戏Lode Runner设计。该系统建立在多个自适应器基础之上,这些自适应器在不同的Lode Runner水平上进行训练。当用户输入设计时,每个自适应器都会生成一个更加类似于它所训练的水平的版本。Lode Encoder界面允许用户通过"涂抹"的方式从自适应器提供的建议中建立和编辑水平。关键是,为了鼓励设计师探索新的可能性,该系统不包含传统的编辑工具。我们介绍了系统的设计和训练过程,以及用户测试。

EmbeddingTree: Hierarchical Exploration of Entity Features in Embedding

  • paper_url: http://arxiv.org/abs/2308.01329
  • repo_url: None
  • paper_authors: Yan Zheng, Junpeng Wang, Chin-Chia Michael Yeh, Yujie Fan, Huiyuan Chen, Liang Wang, Wei Zhang
  • for: 本研究旨在提高 embedding learning 算法中feature的解释性,通过一种嵌入树来描述实体特征在嵌入空间中的含义。
  • methods: 本研究使用了一种嵌入树算法,可以帮助用户更好地理解嵌入空间中的feature表示。
  • results: 在实验中,嵌入树算法可以帮助用户发现数据实体中的细节特征,进行嵌入训练中的特征纹理提取和新实体嵌入生成等操作。
    Abstract Embedding learning transforms discrete data entities into continuous numerical representations, encoding features/properties of the entities. Despite the outstanding performance reported from different embedding learning algorithms, few efforts were devoted to structurally interpreting how features are encoded in the learned embedding space. This work proposes EmbeddingTree, a hierarchical embedding exploration algorithm that relates the semantics of entity features with the less-interpretable embedding vectors. An interactive visualization tool is also developed based on EmbeddingTree to explore high-dimensional embeddings. The tool helps users discover nuance features of data entities, perform feature denoising/injecting in embedding training, and generate embeddings for unseen entities. We demonstrate the efficacy of EmbeddingTree and our visualization tool through embeddings generated for industry-scale merchant data and the public 30Music listening/playlists dataset.
    摘要 <>输入文本翻译为简化字符串。<>嵌入学习将粒度数据实体转换为连续数字表示,卷积特征/属性实体。尽管不同嵌入学习算法报道了出色的性能,但是对嵌入学习结果中特征的结构解释得到了少量的努力。这项工作提出了嵌入树,一种嵌入探索算法,该算法将数据实体的 semantics 与嵌入 vectors 之间建立连接。此外,我们还开发了基于嵌入树的互动视觉化工具,帮助用户探索高维嵌入的细节。工具可以帮助用户发现数据实体的细节特征,进行嵌入训练中的特征去噪/注入,以及生成未见实体的嵌入。我们通过使用嵌入树和互动视觉化工具对行业级别的商家数据和公共30Music listening/playlists数据进行了证明。

Flows: Building Blocks of Reasoning and Collaborating AI

  • paper_url: http://arxiv.org/abs/2308.01285
  • repo_url: https://github.com/epfl-dlab/cc_flows
  • paper_authors: Martin Josifoski, Lars Klein, Maxime Peyrard, Yifei Li, Saibo Geng, Julian Paul Schnitzler, Yuxing Yao, Jiheng Wei, Debjit Paul, Robert West
  • for: 这 paper 旨在开发一种原则性的方法,用于设计和研究多个 AI 系统和人类之间的结构化交互。
  • methods: 该 paper 使用 Flows 概念框架,即自 conten 的 computation 块,通过标准化的消息传递接口进行交互。这种模块化设计使得 Flows 可以 recursive 地组合,减少复杂性。
  • results: 该 paper 在竞赛编程任务上实现了结构化思维和合作的进一步改进,使得 AI 只 Flows 增加了 +$21$ 和人类-AI Flows 增加了 +$54$ 绝对点的解决率。
    Abstract Recent advances in artificial intelligence (AI) have produced highly capable and controllable systems. This creates unprecedented opportunities for structured reasoning as well as collaboration among multiple AI systems and humans. To fully realize this potential, it is essential to develop a principled way of designing and studying such structured interactions. For this purpose, we introduce the conceptual framework of Flows: a systematic approach to modeling complex interactions. Flows are self-contained building blocks of computation, with an isolated state, communicating through a standardized message-based interface. This modular design allows Flows to be recursively composed into arbitrarily nested interactions, with a substantial reduction of complexity. Crucially, any interaction can be implemented using this framework, including prior work on AI--AI and human--AI interactions, prompt engineering schemes, and tool augmentation. We demonstrate the potential of Flows on the task of competitive coding, a challenging task on which even GPT-4 struggles. Our results suggest that structured reasoning and collaboration substantially improve generalization, with AI-only Flows adding +$21$ and human--AI Flows adding +$54$ absolute points in terms of solve rate. To support rapid and rigorous research, we introduce the aiFlows library. The library comes with a repository of Flows that can be easily used, extended, and composed into novel, more complex Flows. The aiFlows library is available at https://github.com/epfl-dlab/aiflows. Data and Flows for reproducing our experiments are available at https://github.com/epfl-dlab/cc_flows.
    摘要 最近的人工智能(AI)技术发展已经创造出了高水平的可控制系统。这创造了前所未有的机会,让多个AI系统和人类之间进行结构化的合作和推理。为了实现这些潜力,我们提出了Flows概念框架:一种系统的方法来设计和研究这些结构化交互。Flows是自包含的建筑块,具有隔离的状态和标准化的消息传递接口。这种模块化设计使得Flows可以被 recursively 组合成任意层次的交互,从而减少复杂性。这些交互可以包括先前的AI-AI和人类-AI交互、提前工程方案和工具增强等。我们在竞赛编程任务上示出了Flows的潜力,这是一个AIeven GPT-4 很难完成的任务。我们的结果表明,结构化合作和推理可以提高通用性,AI只Flows adds +$21$ 和人类-AI Flows adds +$54$ 绝对点的解决率。为了支持快速和严格的研究,我们引入了aiFlows库。该库包含了可以轻松使用、扩展和组合成更复杂的Flows的Repository。aiFlows库可以在https://github.com/epfl-dlab/aiflows上获取。实验数据和Flows可以在https://github.com/epfl-dlab/cc_flows上获取。

Fighting Fire with Fire: Can ChatGPT Detect AI-generated Text?

  • paper_url: http://arxiv.org/abs/2308.01284
  • repo_url: https://github.com/amritabh/chatgpt-as-detector
  • paper_authors: Amrita Bhattacharjee, Huan Liu
  • for: 这种研究用于检测人工生成的文本是否真实,以便在自动检测pipeline中使用ChatGPT和类似的大语言模型。
  • methods: 本研究使用ChatGPT作为检测器,通过 Zero-shot learning 方法进行人工生成文本和人类写作文本的检测。
  • results: 研究发现,ChatGPT在人工生成文本和人类写作文本之间的检测性能具有相似性,但是在某些情况下可能会受到干扰。这些结果提供了关于如何使用ChatGPT和类似的大语言模型在自动检测pipeline中的信息。
    Abstract Large language models (LLMs) such as ChatGPT are increasingly being used for various use cases, including text content generation at scale. Although detection methods for such AI-generated text exist already, we investigate ChatGPT's performance as a detector on such AI-generated text, inspired by works that use ChatGPT as a data labeler or annotator. We evaluate the zero-shot performance of ChatGPT in the task of human-written vs. AI-generated text detection, and perform experiments on publicly available datasets. We empirically investigate if ChatGPT is symmetrically effective in detecting AI-generated or human-written text. Our findings provide insight on how ChatGPT and similar LLMs may be leveraged in automated detection pipelines by simply focusing on solving a specific aspect of the problem and deriving the rest from that solution. All code and data is available at https://github.com/AmritaBh/ChatGPT-as-Detector.
    摘要 大型语言模型(LLM)如ChatGPT在不同的应用场景中日益受到应用,包括大规模文本内容生成。虽然现有的AI生成文本检测方法已经存在,但我们在ChatGPT作为数据标注器或标注者的灵感下进行研究,检测AI生成文本的性能。我们对公共可用数据集进行了零shot性能评估,并对人工生成和AI生成文本之间的对比进行了实验。我们发现ChatGPT在检测人工生成和AI生成文本的任务中具有相似的效果。我们的发现可能有助于在自动检测管道中使用ChatGPT和类似的LLM,只需关注解决特定问题的方法,然后从其中 derivation 其他方面的解决方案。所有代码和数据可以在 GitHub 上找到:https://github.com/AmritaBh/ChatGPT-as-Detector。

BRNES: Enabling Security and Privacy-aware Experience Sharing in Multiagent Robotic and Autonomous Systems

  • paper_url: http://arxiv.org/abs/2308.01274
  • repo_url: https://github.com/aralab-unr/brnes
  • paper_authors: Md Tamjid Hossain, Hung Manh La, Shahriar Badsha, Anton Netchaev
  • For: The paper is written to address the issues of adversarial manipulation and inference in multi-agent reinforcement learning (MARL) with experience sharing (ES).* Methods: The proposed framework, called BRNES, uses a dynamic neighbor zone selection and weighted experience aggregation to reduce the impact of Byzantine attacks. It also employs local differential privacy (LDP) to protect the agents’ private information from adversarial inference attacks.* Results: The proposed framework outperforms the state-of-the-art in terms of steps to goal, obtained reward, and time to goal metrics. Specifically, it is 8.32x faster than non-private frameworks and 1.41x faster than private frameworks in an adversarial setting.
    Abstract Although experience sharing (ES) accelerates multiagent reinforcement learning (MARL) in an advisor-advisee framework, attempts to apply ES to decentralized multiagent systems have so far relied on trusted environments and overlooked the possibility of adversarial manipulation and inference. Nevertheless, in a real-world setting, some Byzantine attackers, disguised as advisors, may provide false advice to the advisee and catastrophically degrade the overall learning performance. Also, an inference attacker, disguised as an advisee, may conduct several queries to infer the advisors' private information and make the entire ES process questionable in terms of privacy leakage. To address and tackle these issues, we propose a novel MARL framework (BRNES) that heuristically selects a dynamic neighbor zone for each advisee at each learning step and adopts a weighted experience aggregation technique to reduce Byzantine attack impact. Furthermore, to keep the agent's private information safe from adversarial inference attacks, we leverage the local differential privacy (LDP)-induced noise during the ES process. Our experiments show that our framework outperforms the state-of-the-art in terms of the steps to goal, obtained reward, and time to goal metrics. Particularly, our evaluation shows that the proposed framework is 8.32x faster than the current non-private frameworks and 1.41x faster than the private frameworks in an adversarial setting.
    摘要 虽然经验分享(ES)可以加速多代理激励学习(MARL)在顾问-受顾问框架下,但是在分散式多代理系统中应用ES的尝试都是在可信环境中进行,而忽视了对敌意攻击和推理的可能性。然而,在实际场景中,一些贪婪攻击者,化名为顾问,可能为受顾问提供错误的建议,从而导致总的学习性能受到极大的降低。此外,一个推理攻击者,化名为受顾问,可能通过多个查询来推理出顾问的私人信息,使整个ES过程存在隐私泄露问题。为解决这些问题,我们提议一种基于BRNES的新的MARL框架,强制选择每个受顾问的动态邻区,并采用权重经验聚合技术来减少攻击影响。此外,通过在ES过程中应用本地幂等隐私(LDP)引起的噪声,保护代理的私人信息免遭敌意推理攻击。我们的实验表明,我们的框架比现有的非私钥框架快8.32倍,比私钥框架快1.41倍在敌意 Setting中。

A Probabilistic Approach to Self-Supervised Learning using Cyclical Stochastic Gradient MCMC

  • paper_url: http://arxiv.org/abs/2308.01271
  • repo_url: None
  • paper_authors: Masoumeh Javanbakhat, Christoph Lippert
  • for: 本研究提出了一种实用的 bayesian自适应学习方法,使用循环随机梯度哈密顿-蒙特卡洛(cSGHMC)来 aproximate高维度和多模态的 posterior 分布。
  • methods: 本方法使用 prior 来置信自适应学习模型的参数,并使用 cSGHMC 来 aproximate高维度和多模态的 posterior 分布。
  • results: 通过寻找表示的expressive posterior,悉数自适应学习得到了可读性和多样性的表示。在多种下游分类任务上,取得了显著的性能提升、校准和对于类型检测。
    Abstract In this paper we present a practical Bayesian self-supervised learning method with Cyclical Stochastic Gradient Hamiltonian Monte Carlo (cSGHMC). Within this framework, we place a prior over the parameters of a self-supervised learning model and use cSGHMC to approximate the high dimensional and multimodal posterior distribution over the embeddings. By exploring an expressive posterior over the embeddings, Bayesian self-supervised learning produces interpretable and diverse representations. Marginalizing over these representations yields a significant gain in performance, calibration and out-of-distribution detection on a variety of downstream classification tasks. We provide experimental results on multiple classification tasks on four challenging datasets. Moreover, we demonstrate the effectiveness of the proposed method in out-of-distribution detection using the SVHN and CIFAR-10 datasets.
    摘要 在本文中,我们提出了一种实用的极 bayesian自适应学习方法,即循环随机Gradient Hamiltonian Monte Carlo(cSGHMC)。在这种框架下,我们对自适应学习模型参数进行了先验,并使用cSGHMC来近似高维多模态 posterior distribution over the embeddings。通过探索高维多模态的 posterior over the embeddings,bayesian自适应学习可以生成可读性和多样性的表示。对这些表示进行摘要,可以获得显著的性能、调整和出现在其他分类任务上的表现。我们在多个分类任务上进行了多个数据集的实验,并在SVHN和CIFAR-10数据集上进行了out-of-distribution检测。

  • paper_url: http://arxiv.org/abs/2308.01264
  • repo_url: None
  • paper_authors: Guilherme F. C. F. Almeida, José Luiz Nunes, Neele Engelmann, Alex Wiegmann, Marcelo de Araújo
  • for: 这个论文旨在研究大语言模型GPT-4的道德和法律决策方面的 simulated human reasoning。
  • methods: 作者使用心理学方法 probing GPT-4的道德和法律决策过程。
  • results: 研究发现GPT-4和人类在意图归属、 causality 判断、诱导行为、道德基础、legal luck 的影响以及同意和规则违反判断方面存在高相关性,但也有一些重要的系统性差异。
    Abstract Large language models have been used as the foundation of highly sophisticated artificial intelligences, capable of delivering human-like responses to probes about legal and moral issues. However, these models are unreliable guides to their own inner workings, and even the engineering teams behind their creation are unable to explain exactly how they came to develop all of the capabilities they currently have. The emerging field of machine psychology seeks to gain insight into the processes and concepts that these models possess. In this paper, we employ the methods of psychology to probe into GPT-4's moral and legal reasoning. More specifically, we investigate the similarities and differences between GPT-4 and humans when it comes to intentionality ascriptions, judgments about causation, the morality of deception, moral foundations, the impact of moral luck on legal judgments, the concept of consent, and rule violation judgments. We find high correlations between human and AI responses, but also several significant systematic differences between them. We conclude with a discussion of the philosophical implications of our findings.
    摘要 大型语言模型已被用为高级人工智能的基础,能够提供人类样式的回应于法律和道德问题。然而,这些模型对自己内部的工作方式是不可靠的导航, même 创建团队无法完全解释它们如何获得所有现有的能力。新兴领域的机器心理学欲了解这些模型内部的过程和概念。在这篇论文中,我们使用心理学方法探究GPT-4的道德和法律理解。更 specifically,我们研究人类和AI在意图归属、 causation 判断、误导的道德性、道德基础、legal 判断的道德遗产、同意和规则违反判断方面的相似性和差异。我们发现人类和 AI 回应之间存在高相关性,但也存在一些重要的系统性差异。我们结束于哲学意义的讨论。

XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models

  • paper_url: http://arxiv.org/abs/2308.01263
  • repo_url: None
  • paper_authors: Paul Röttger, Hannah Rose Kirk, Bertie Vidgen, Giuseppe Attanasio, Federico Bianchi, Dirk Hovy
  • For: The paper is written to address the issue of large language models following malicious instructions and generating toxic content, and to propose a new test suite called XSTest to identify eXaggerated Safety behaviors in a structured and systematic way.* Methods: The paper uses a new test suite called XSTest, which comprises 200 safe prompts across ten prompt types, to evaluate the safety behaviors of a recently-released state-of-the-art language model.* Results: The paper highlights systematic failure modes in the language model, demonstrating that it is not well-calibrated and tends to refuse complying with safe prompts that use similar language to unsafe prompts or mention sensitive topics.
    Abstract Without proper safeguards, large language models will readily follow malicious instructions and generate toxic content. This motivates safety efforts such as red-teaming and large-scale feedback learning, which aim to make models both helpful and harmless. However, there is a tension between these two objectives, since harmlessness requires models to refuse complying with unsafe prompts, and thus not be helpful. Recent anecdotal evidence suggests that some models may have struck a poor balance, so that even clearly safe prompts are refused if they use similar language to unsafe prompts or mention sensitive topics. In this paper, we introduce a new test suite called XSTest to identify such eXaggerated Safety behaviours in a structured and systematic way. In its current form, XSTest comprises 200 safe prompts across ten prompt types that well-calibrated models should not refuse to comply with. We describe XSTest's creation and composition, and use the test suite to highlight systematic failure modes in a recently-released state-of-the-art language model.
    摘要 Translated into Simplified Chinese: Without proper safeguards, large language models will readily follow malicious instructions and generate toxic content, which motivates safety efforts such as red-teaming and large-scale feedback learning to make models both helpful and harmless. However, there is a tension between these two objectives, as harmlessness requires models to refuse complying with unsafe prompts, and thus not be helpful. Recent anecdotal evidence suggests that some models may have struck a poor balance, so that even clearly safe prompts are refused if they use similar language to unsafe prompts or mention sensitive topics. In this paper, we introduce a new test suite called XSTest to identify such eXaggerated Safety behaviors in a structured and systematic way. In its current form, XSTest comprises 200 safe prompts across ten prompt types that well-calibrated models should not refuse to comply with. We describe XSTest's creation and composition, and use the test suite to highlight systematic failure modes in a recently-released state-of-the-art language model.Translated into Traditional Chinese: Without proper safeguards, large language models will readily follow malicious instructions and generate toxic content, which motivates safety efforts such as red-teaming and large-scale feedback learning to make models both helpful and harmless. However, there is a tension between these two objectives, as harmlessness requires models to refuse complying with unsafe prompts, and thus not be helpful. Recent anecdotal evidence suggests that some models may have struck a poor balance, so that even clearly safe prompts are refused if they use similar language to unsafe prompts or mention sensitive topics. In this paper, we introduce a new test suite called XSTest to identify such eXaggerated Safety behaviors in a structured and systematic way. In its current form, XSTest comprises 200 safe prompts across ten prompt types that well-calibrated models should not refuse to comply with. We describe XSTest's creation and composition, and use the test suite to highlight systematic failure modes in a recently-released state-of-the-art language model.