cs.AI - 2023-07-20

Approximate Computing Survey, Part II: Application-Specific & Architectural Approximation Techniques and Applications

  • paper_url: http://arxiv.org/abs/2307.11128
  • repo_url: None
  • paper_authors: Vasileios Leon, Muhammad Abdullah Hanif, Giorgos Armeniakos, Xun Jiao, Muhammad Shafique, Kiamal Pekmestzi, Dimitrios Soudris
  • for: 本文旨在探讨近似计算的应用和技术,以提高计算系统的能效性和性能。
  • methods: 本文使用了多种应用特定和体系结构的近似技术,包括系统下的硬件和软件近似技术。
  • results: 本文对近似计算的应用谱和技术进行了全面的检讨和分析,并提出了未来研究的挑战和方向。
    Abstract The challenging deployment of compute-intensive applications from domains such Artificial Intelligence (AI) and Digital Signal Processing (DSP), forces the community of computing systems to explore new design approaches. Approximate Computing appears as an emerging solution, allowing to tune the quality of results in the design of a system in order to improve the energy efficiency and/or performance. This radical paradigm shift has attracted interest from both academia and industry, resulting in significant research on approximation techniques and methodologies at different design layers (from system down to integrated circuits). Motivated by the wide appeal of Approximate Computing over the last 10 years, we conduct a two-part survey to cover key aspects (e.g., terminology and applications) and review the state-of-the art approximation techniques from all layers of the traditional computing stack. In Part II of our survey, we classify and present the technical details of application-specific and architectural approximation techniques, which both target the design of resource-efficient processors/accelerators & systems. Moreover, we present a detailed analysis of the application spectrum of Approximate Computing and discuss open challenges and future directions.
    摘要 “ computationally intensive 应用程序(如人工智能和数位信号处理)的部署困难,迫使计算系统社群探索新的设计方法。粗略计算被视为一种emerging solution,允许在系统设计中调整结果质量,以提高能效性和/或性能。这个崭新的思维方式在学术和业界都引起了广泛关注,并且在不同的设计层(从系统到集成电路)进行了大量的研究。惊叹于过去十年内 aproximate computing 的广泛吸引力,我们将进行两部分的调查,涵盖关键方面(如术语和应用),并回顾各层设计的粗略技术。在第二部分中,我们分类和详细介绍了特定应用和建筑方面的粗略技术,包括资源有效的处理器/加速器和系统的设计。此外,我们也进行了综合的应用spectrum 分析,讨论开启的挑战和未来方向。”

PE-YOLO: Pyramid Enhancement Network for Dark Object Detection

  • paper_url: http://arxiv.org/abs/2307.10953
  • repo_url: https://github.com/xiangchenyin/pe-yolo
  • paper_authors: Xiangchen Yin, Zhenda Yu, Zetao Fei, Wenjun Lv, Xin Gao
  • for: 提高黑暗环境下物体检测的精度和效率
  • methods: 使用 Laplacian Pyramid decomposes 图像,并提出了细节处理模块 (DPM) 和低频增强筛 (LEF) 来增强图像细节和低频semantics,并采用端到端结合训练方法和normal detection loss 进行训练。
  • results: 在 ExDark 数据集上进行测试,PE-YOLO 在黑暗环境下物体检测中达到了 78.0% 的 mAP 和 53.6 的 FPS,较其他黑暗检测器和低光照增强模型更高。
    Abstract Current object detection models have achieved good results on many benchmark datasets, detecting objects in dark conditions remains a large challenge. To address this issue, we propose a pyramid enhanced network (PENet) and joint it with YOLOv3 to build a dark object detection framework named PE-YOLO. Firstly, PENet decomposes the image into four components of different resolutions using the Laplacian pyramid. Specifically we propose a detail processing module (DPM) to enhance the detail of images, which consists of context branch and edge branch. In addition, we propose a low-frequency enhancement filter (LEF) to capture low-frequency semantics and prevent high-frequency noise. PE-YOLO adopts an end-to-end joint training approach and only uses normal detection loss to simplify the training process. We conduct experiments on the low-light object detection dataset ExDark to demonstrate the effectiveness of ours. The results indicate that compared with other dark detectors and low-light enhancement models, PE-YOLO achieves the advanced results, achieving 78.0% in mAP and 53.6 in FPS, respectively, which can adapt to object detection under different low-light conditions. The code is available at https://github.com/XiangchenYin/PE-YOLO.
    摘要 当前的对象检测模型在许多标准数据集上已经实现了好的结果,但检测对象在黑暗环境中仍然是一大挑战。为解决这个问题,我们提出了卷积扩展网络(PENet),并与YOLOv3结合以建立一个适用于黑暗对象检测的框架,称为PE-YOLO。首先,PENet将图像分解成四个不同的分辨率的组件使用卷积 pyramid。我们提出了一个细节处理模块(DPM),用于增强图像的细节,该模块包括上下文分支和边分支。此外,我们还提出了一个低频增强筛选器(LEF),用于捕捉低频 semantics 并避免高频噪声。PE-YOLO采用了端到端集成训练方法,并只使用 normal detection loss 简化训练过程。我们在 ExDark 降霾对象检测数据集上进行了实验,结果表明,与其他黑暗检测器和低照明增强模型相比,PE-YOLO 实现了更高的 mAP 和 FPS 结果,分别达到 78.0% 和 53.6%。代码可以在 GitHub 上找到:https://github.com/XiangchenYin/PE-YOLO。

Proxy Anchor-based Unsupervised Learning for Continuous Generalized Category Discovery

  • paper_url: http://arxiv.org/abs/2307.10943
  • repo_url: None
  • paper_authors: Hyungmin Kim, Sungho Suh, Daehwan Kim, Daun Jeong, Hansang Cho, Junmo Kim
  • for: 提出了一种无监督的类增长学习方法,用于在不具备先知知识的情况下,精确地找到 novel 类别。
  • methods: 方法基于 feature extractor 和 proxy anchors,首先在已知集上精度地调整 feature extractor,然后将样本分为 old 和 novel 类别,并在无监督集上进行 clustering。同时,使用 proxy anchors-based exemplar 来避免 catastrophic forgetting。
  • results: 实验结果表明,提出的方法在细化 datasets 上下适用于实际情况,并且超过了当前状态的方法。
    Abstract Recent advances in deep learning have significantly improved the performance of various computer vision applications. However, discovering novel categories in an incremental learning scenario remains a challenging problem due to the lack of prior knowledge about the number and nature of new categories. Existing methods for novel category discovery are limited by their reliance on labeled datasets and prior knowledge about the number of novel categories and the proportion of novel samples in the batch. To address the limitations and more accurately reflect real-world scenarios, in this paper, we propose a novel unsupervised class incremental learning approach for discovering novel categories on unlabeled sets without prior knowledge. The proposed method fine-tunes the feature extractor and proxy anchors on labeled sets, then splits samples into old and novel categories and clusters on the unlabeled dataset. Furthermore, the proxy anchors-based exemplar generates representative category vectors to mitigate catastrophic forgetting. Experimental results demonstrate that our proposed approach outperforms the state-of-the-art methods on fine-grained datasets under real-world scenarios.
    摘要 Translated into Simplified Chinese:latest advances in deep learning have significantly improved the performance of various computer vision applications. However, discovering novel categories in an incremental learning scenario remains a challenging problem due to the lack of prior knowledge about the number and nature of new categories. Existing methods for novel category discovery are limited by their reliance on labeled datasets and prior knowledge about the number of novel categories and the proportion of novel samples in the batch. To address the limitations and more accurately reflect real-world scenarios, in this paper, we propose a novel unsupervised class incremental learning approach for discovering novel categories on unlabeled sets without prior knowledge. The proposed method fine-tunes the feature extractor and proxy anchors on labeled sets, then splits samples into old and novel categories and clusters on the unlabeled dataset. Furthermore, the proxy anchors-based exemplar generates representative category vectors to mitigate catastrophic forgetting. Experimental results demonstrate that our proposed approach outperforms the state-of-the-art methods on fine-grained datasets under real-world scenarios.Note: Please note that the translation is in Simplified Chinese, which is one of the two standard Chinese dialects. If you prefer Traditional Chinese, please let me know and I will be happy to provide the translation in that dialect as well.

PASTA: Pretrained Action-State Transformer Agents

  • paper_url: http://arxiv.org/abs/2307.10936
  • repo_url: None
  • paper_authors: Raphael Boige, Yannis Flet-Berliac, Arthur Flajolet, Guillaume Richard, Thomas Pierrot
  • for: 这个论文的目的是研究基于自动学习的RL探索方法,并提供一种基于transformer模型的模型PASTA,用于解决多种下游任务。
  • methods: 这个论文使用了一种统一的方法ology,包括使用各种基本的预训练目标,如下一个token预测,并在多个领域同时训练模型。
  • results: 这个论文的研究显示,使用PASTA模型可以在多种下游任务上达到优秀的性能,并且可以在不同的领域中进行参数效率的微调。
    Abstract Self-supervised learning has brought about a revolutionary paradigm shift in various computing domains, including NLP, vision, and biology. Recent approaches involve pre-training transformer models on vast amounts of unlabeled data, serving as a starting point for efficiently solving downstream tasks. In the realm of reinforcement learning, researchers have recently adapted these approaches by developing models pre-trained on expert trajectories, enabling them to address a wide range of tasks, from robotics to recommendation systems. However, existing methods mostly rely on intricate pre-training objectives tailored to specific downstream applications. This paper presents a comprehensive investigation of models we refer to as Pretrained Action-State Transformer Agents (PASTA). Our study uses a unified methodology and covers an extensive set of general downstream tasks including behavioral cloning, offline RL, sensor failure robustness, and dynamics change adaptation. Our goal is to systematically compare various design choices and provide valuable insights to practitioners for building robust models. Key highlights of our study include tokenization at the action and state component level, using fundamental pre-training objectives like next token prediction, training models across diverse domains simultaneously, and using parameter efficient fine-tuning (PEFT). The developed models in our study contain fewer than 10 million parameters and the application of PEFT enables fine-tuning of fewer than 10,000 parameters during downstream adaptation, allowing a broad community to use these models and reproduce our experiments. We hope that this study will encourage further research into the use of transformers with first-principles design choices to represent RL trajectories and contribute to robust policy learning.
    摘要 自顾学学习在不同的计算领域中引发了一场革命性的思维方式转移,包括自然语言处理、视觉和生物学。现有的方法通常是在大量无标签数据上预训练变换器模型,作为下游任务解决的起点。在回归学领域,研究人员已经采用了这些方法,开发了基于专家轨迹预训练的模型,以解决从机器人到推荐系统等广泛的任务。然而,现有的方法大多采用特定下游应用程序的复杂预训练目标。本文提出了一种统一的方法ologies,涵盖了广泛的通用下游任务,包括行为做模拟、离线RL、感知器失效鲁棒性和动力学变化适应。我们的目标是系统地比较不同的设计选择,提供价值的反思和建议给实践者。关键的高亮之处包括动作和状态层次化,使用基本预训练目标如下一个符号预测,在多个领域同时训练模型,以及使用参数高效缩放(PEFT)。我们的研究中的模型含 fewer than 10 million parameters,并且通过PEFT来缩放 fewer than 10,000 parameters During downstream adaptation,使得广泛的社区可以使用这些模型和复制我们的实验。我们希望这种研究能够鼓励更多的研究人员采用基于first principles的设计选择来表示RL轨迹,并为稳定政策学习做出贡献。

Identical and Fraternal Twins: Fine-Grained Semantic Contrastive Learning of Sentence Representations

  • paper_url: http://arxiv.org/abs/2307.10932
  • repo_url: None
  • paper_authors: Qingfa Xiao, Shuangyin Li, Lei Chen
  • for: 提高不监督学习的句子表示学习效果
  • methods: 提出一种新的同类双胞胎学习框架(IFTCL),能够同时适应不同的增强技术生成的正例对
  • results: IFTCL在九个语义文本相似性任务中表现出色,比前方法更高效和有效。
    Abstract The enhancement of unsupervised learning of sentence representations has been significantly achieved by the utility of contrastive learning. This approach clusters the augmented positive instance with the anchor instance to create a desired embedding space. However, relying solely on the contrastive objective can result in sub-optimal outcomes due to its inability to differentiate subtle semantic variations between positive pairs. Specifically, common data augmentation techniques frequently introduce semantic distortion, leading to a semantic margin between the positive pair. While the InfoNCE loss function overlooks the semantic margin and prioritizes similarity maximization between positive pairs during training, leading to the insensitive semantic comprehension ability of the trained model. In this paper, we introduce a novel Identical and Fraternal Twins of Contrastive Learning (named IFTCL) framework, capable of simultaneously adapting to various positive pairs generated by different augmentation techniques. We propose a \textit{Twins Loss} to preserve the innate margin during training and promote the potential of data enhancement in order to overcome the sub-optimal issue. We also present proof-of-concept experiments combined with the contrastive objective to prove the validity of the proposed Twins Loss. Furthermore, we propose a hippocampus queue mechanism to restore and reuse the negative instances without additional calculation, which further enhances the efficiency and performance of the IFCL. We verify the IFCL framework on nine semantic textual similarity tasks with both English and Chinese datasets, and the experimental results show that IFCL outperforms state-of-the-art methods.
    摘要 “对不同增强学习方法的自动识别学习优化有很大进步,特别是通过对比学习。这种方法将调整后的正例和anchor例集 clustering以创建愿景空间。但是,仅仅靠对比目标无法区别微妙的Semantic variation between positive pairs,导致训练时的低效。具体来说,常用的数据增强技术会导致Semantic distortion,从而导致正例之间的Semantic margin。而InfoNCE损失函数忽略这个Semantic margin,将主要关注在正例之间的相似性最大化,从而导致训练后的模型无法深入理解Semantic的问题。在这篇文章中,我们提出了一个名为Identical and Fraternal Twins of Contrastive Learning(简称IFCL)的新框架,可以同时适应不同的增强技术生成的正例。我们提出了一个名为Twins Loss的损失函数,以保持内在的Semantic margin during training,并且提高数据增强的可能性,以解决低效的问题。我们还提出了一个 hippocampus queue 机制,可以复原和重复使用负例而不需要额外计算,这有助于提高效率和表现。我们在九个semantic textual similarity任务上验证了IFCL框架,结果显示IFCL在训练后比州前方法更高效。”

MediaGPT : A Large Language Model For Chinese Media

  • paper_url: http://arxiv.org/abs/2307.10930
  • repo_url: None
  • paper_authors: Zhonghao Wang, Zijia Lu, Bo Jin, Haiying Deng
  • for: 这个论文主要针对中文媒体领域的语言模型(LLMs)进行研究,旨在开发一个专门为中文媒体领域设计的语言模型(MediaGPT)。
  • methods: 这篇论文使用了多种任务类型和域定义提示类型,并在这些任务和提示类型的基础上训练了MediaGPT模型。
  • results: 根据专家评估和强模型评估,这篇论文证明了MediaGPT模型在多种中文媒体领域任务上表现出色,并证明了域数据和域定义提示类型对于建立有效的域特定LLM的重要性。
    Abstract Large language models (LLMs) have shown remarkable capabilities in generating high-quality text and making predictions based on large amounts of data, including the media domain. However, in practical applications, the differences between the media's use cases and the general-purpose applications of LLMs have become increasingly apparent, especially Chinese. This paper examines the unique characteristics of media-domain-specific LLMs compared to general LLMs, designed a diverse set of task instruction types to cater the specific requirements of the domain and constructed unique datasets that are tailored to the media domain. Based on these, we proposed MediaGPT, a domain-specific LLM for the Chinese media domain, training by domain-specific data and experts SFT data. By performing human experts evaluation and strong model evaluation on a validation set, this paper demonstrated that MediaGPT outperforms mainstream models on various Chinese media domain tasks and verifies the importance of domain data and domain-defined prompt types for building an effective domain-specific LLM.
    摘要 大型语言模型(LLM)已经展示了优质文本生成和基于大量数据的预测能力,包括媒体领域。然而,在实际应用中,媒体使用场景和通用应用场景中LML的不同特点在增加显著,特别是中文。本文研究媒体领域专门的LML与通用LML的特点相比,设计了具有媒体领域特定需求的多种任务指令类型,并构建了适应媒体领域的唯一数据集。基于这些,我们提出了MediaGPT,专门为中文媒体领域的领域特定LML,通过媒体领域特定的数据和专家SFT数据进行训练。通过专家评估和强模型评估在验证集上,本文证明了MediaGPT在多种中文媒体领域任务上表现出色,并证明了领域数据和领域定义的提示类型的重要性于建立有效的领域特定LML。

FLASK: Fine-grained Language Model Evaluation based on Alignment Skill Sets

  • paper_url: http://arxiv.org/abs/2307.10928
  • repo_url: https://github.com/kaistai/flask
  • paper_authors: Seonghyeon Ye, Doyoung Kim, Sungdong Kim, Hyeonbin Hwang, Seungone Kim, Yongrae Jo, James Thorne, Juho Kim, Minjoon Seo
  • for: 本研究旨在提供一种细化评估语言模型(LLMs)的方法,以更好地评估LLMs的表现,特别是在多个技能的组合下。
  • methods: 本研究使用的方法包括定义12种细化技能,以及根据用户指令的域名和难度水平来分配这些技能。此外,研究还使用了模型基于和人类基于的评估方法。
  • results: 研究发现,使用FLASK评估协议可以更好地评估LLMs的表现,并且可以分析模型在特定技能、域名和难度水平上的表现。此外,研究还发现了多种开源和商业LLMs之间的高度相关性。
    Abstract Evaluation of Large Language Models (LLMs) is challenging because aligning to human values requires the composition of multiple skills and the required set of skills varies depending on the instruction. Recent studies have evaluated the performance of LLMs in two ways, (1) automatic evaluation on several independent benchmarks and (2) human or machined-based evaluation giving an overall score to the response. However, both settings are coarse-grained evaluations, not considering the nature of user instructions that require instance-wise skill composition, which limits the interpretation of the true capabilities of LLMs. In this paper, we introduce FLASK (Fine-grained Language Model Evaluation based on Alignment SKill Sets), a fine-grained evaluation protocol that can be used for both model-based and human-based evaluation which decomposes coarse-level scoring to an instance-wise skill set-level. Specifically, we define 12 fine-grained skills needed for LLMs to follow open-ended user instructions and construct an evaluation set by allocating a set of skills for each instance. Additionally, by annotating the target domains and difficulty level for each instance, FLASK provides a holistic view with a comprehensive analysis of a model's performance depending on skill, domain, and difficulty. Through using FLASK, we compare multiple open-sourced and proprietary LLMs and observe highly-correlated findings between model-based and human-based evaluations. FLASK enables developers to more accurately measure the model performance and how it can be improved by analyzing factors that make LLMs proficient in particular skills. For practitioners, FLASK can be used to recommend suitable models for particular situations through comprehensive comparison among various LLMs. We release the evaluation data and code implementation at https://github.com/kaistAI/FLASK.
    摘要 评估大型自然语言模型(LLM)具有挑战性,因为需要考虑多种技能的组合,并且这些技能的集合可以根据指令而变化。现有研究通过两种方法评估LLM的性能:一是自动评估多个独立的标准 benchmark,二是人或机器基于的全面评分。然而,这两种设置都是粗粒度的评估,不能考虑用户的指令需要实例化的技能组合,这限制了LLM的真实能力的解释。在这篇论文中,我们提出了FLASK(细化语言模型评估基于配对技能集)协议,可以用于模型基于和人类基于的评估,它将粗粒度评估 decomposes 到实例化技能集级别。具体来说,我们定义了LLM需要遵循开放式用户指令的12种细化技能,并构建了评估集,各实例分配了一组技能。此外,我们还对每个实例标注目标领域和难度水平,从而提供了全面分析模型性能的整体视图。通过使用FLASK,我们对多种开源和商业LLM进行比较,并发现了高度相关的发现。FLASK可以帮助开发者更准确地评估模型性能,并分析因素使得LLM在特定技能方面突出。对于实践者,FLASK可以用来建议适用于特定情况的模型,通过对多种LLM进行全面比较。我们在github上发布了评估数据和代码实现。

Syntactic vs Semantic Linear Abstraction and Refinement of Neural Networks

  • paper_url: http://arxiv.org/abs/2307.10891
  • repo_url: https://github.com/cxlvinchau/linna
  • paper_authors: Calvin Chau, Jan Křetínský, Stefanie Mohr
  • for: 提高神经网络的可扩展性
  • methods: 使用抽象技术进行神经网络的减少
  • results: 提供了一个更 flexible的抽象框架,并通过实验证明了它的效果
    Abstract Abstraction is a key verification technique to improve scalability. However, its use for neural networks is so far extremely limited. Previous approaches for abstracting classification networks replace several neurons with one of them that is similar enough. We can classify the similarity as defined either syntactically (using quantities on the connections between neurons) or semantically (on the activation values of neurons for various inputs). Unfortunately, the previous approaches only achieve moderate reductions, when implemented at all. In this work, we provide a more flexible framework where a neuron can be replaced with a linear combination of other neurons, improving the reduction. We apply this approach both on syntactic and semantic abstractions, and implement and evaluate them experimentally. Further, we introduce a refinement method for our abstractions, allowing for finding a better balance between reduction and precision.
    摘要 吧抽象是一种关键的验证技术,可以提高神经网络的可扩展性。然而,它们在神经网络上的使用尚未得到广泛应用。先前的方法是将一些神经元替换为它们相似的一个,这种相似性可以是语法性(基于连接 между神经元的量)或semantic(基于神经元对各种输入的活动值)。 unfortunately,这些先前的方法只能实现moderate的减少,甚至不得不实现。在这种工作中,我们提供了一个更灵活的框架,允许一个神经元被替换为一个线性组合其他神经元,从而提高减少。我们应用这种方法both on syntactic和semantic abstractions,并实现和评估它们。此外,我们还引入了一种精度调整方法,allowing for finding a better balance between reduction and precision。

Divide & Bind Your Attention for Improved Generative Semantic Nursing

  • paper_url: http://arxiv.org/abs/2307.10864
  • repo_url: None
  • paper_authors: Yumeng Li, Margret Keuper, Dan Zhang, Anna Khoreva
  • for: 提高文本到图像生成模型的可靠性和精度,特别是处理复杂的提示语时。
  • methods: 提出了两个新的损失函数:novel attendance loss和binding loss,用于提高生成的对象的特征匹配和精度。
  • results: 在多个评估标准上显示出优于现有方法的性能,能够准确地生成desired对象并改进特征匹配。
    Abstract Emerging large-scale text-to-image generative models, e.g., Stable Diffusion (SD), have exhibited overwhelming results with high fidelity. Despite the magnificent progress, current state-of-the-art models still struggle to generate images fully adhering to the input prompt. Prior work, Attend & Excite, has introduced the concept of Generative Semantic Nursing (GSN), aiming to optimize cross-attention during inference time to better incorporate the semantics. It demonstrates promising results in generating simple prompts, e.g., ``a cat and a dog''. However, its efficacy declines when dealing with more complex prompts, and it does not explicitly address the problem of improper attribute binding. To address the challenges posed by complex prompts or scenarios involving multiple entities and to achieve improved attribute binding, we propose Divide & Bind. We introduce two novel loss objectives for GSN: a novel attendance loss and a binding loss. Our approach stands out in its ability to faithfully synthesize desired objects with improved attribute alignment from complex prompts and exhibits superior performance across multiple evaluation benchmarks. More videos and updates can be found on the project page \url{https://sites.google.com/view/divide-and-bind}.
    摘要 新型大规模文本至图生成模型,如稳定扩散(SD),已经展现出惊人的成绩,高度准确。 DESPITE THE AMAZING PROGRESS, current state-of-the-art models still struggle to generate images that fully adhere to the input prompt. Prior work, Attend & Excite, has introduced the concept of Generative Semantic Nursing (GSN), aiming to optimize cross-attention during inference time to better incorporate semantics. It has shown promising results in generating simple prompts, such as "a cat and a dog". However, its efficacy declines when dealing with more complex prompts, and it does not explicitly address the problem of improper attribute binding.To address the challenges posed by complex prompts or scenarios involving multiple entities and to achieve improved attribute binding, we propose Divide & Bind. We introduce two novel loss objectives for GSN: a novel attendance loss and a binding loss. Our approach stands out in its ability to faithfully synthesize desired objects with improved attribute alignment from complex prompts and exhibits superior performance across multiple evaluation benchmarks. More videos and updates can be found on the project page [https://sites.google.com/view/divide-and-bind].

Goal-Conditioned Reinforcement Learning with Disentanglement-based Reachability Planning

  • paper_url: http://arxiv.org/abs/2307.10846
  • repo_url: None
  • paper_authors: Zhifeng Qian, Mingyu You, Hongjun Zhou, Xuanhui Xu, Bin He
  • for: 解决高维状态空间中的远程目标问题,提高目标Conditioned Reinforcement Learning(GCRL)的效率和性能。
  • methods: 提出了一种组合了目标Conditioned Reinforcement Learning(GCRL)和分解性能规划(REPlan)的算法,包括一种分解表示模块(DRM)和一种 temporal reachability discrimination module(REM),以解决高维状态空间中的远程目标问题。
  • results: 在三个视觉基于的模拟任务和一个真实世界任务中,对比PRIOR的方法,OUR的REPlan显示了明显的优异性能。
    Abstract Goal-Conditioned Reinforcement Learning (GCRL) can enable agents to spontaneously set diverse goals to learn a set of skills. Despite the excellent works proposed in various fields, reaching distant goals in temporally extended tasks remains a challenge for GCRL. Current works tackled this problem by leveraging planning algorithms to plan intermediate subgoals to augment GCRL. Their methods need two crucial requirements: (i) a state representation space to search valid subgoals, and (ii) a distance function to measure the reachability of subgoals. However, they struggle to scale to high-dimensional state space due to their non-compact representations. Moreover, they cannot collect high-quality training data through standard GC policies, which results in an inaccurate distance function. Both affect the efficiency and performance of planning and policy learning. In the paper, we propose a goal-conditioned RL algorithm combined with Disentanglement-based Reachability Planning (REPlan) to solve temporally extended tasks. In REPlan, a Disentangled Representation Module (DRM) is proposed to learn compact representations which disentangle robot poses and object positions from high-dimensional observations in a self-supervised manner. A simple REachability discrimination Module (REM) is also designed to determine the temporal distance of subgoals. Moreover, REM computes intrinsic bonuses to encourage the collection of novel states for training. We evaluate our REPlan in three vision-based simulation tasks and one real-world task. The experiments demonstrate that our REPlan significantly outperforms the prior state-of-the-art methods in solving temporally extended tasks.
    摘要 goal-conditioned 学习(GCRL)可以让代理人自发设定多种目标,以学习一组技能。然而,在远程目标的任务中,GCRL仍然面临一个挑战。现有的方法通过使用规划算法计划间接产生的目标,以增强GCRL的能力。这些方法需要两个关键的要求:(i)一个状态表示空间来搜索有效的产生目标,以及(ii)一个距离函数来衡量产生目标的可达性。然而,它们在高维状态空间中缺乏扩展性,而且无法通过标准GC策略收集高质量的训练数据,这会导致距离函数的误差。这两个因素都会影响规划和策略学习的效率和性能。在本文中,我们提出一种具有目标条件的RL算法,并与基于分解的可达性规划(REPlan)结合,以解决时间扩展任务。在REPlan中,我们提出了一个分解表示模块(DRM),用于在自主学习方式下学习高维观察数据中的紧凑表示,并分解机器人姿态和物体位置。我们还设计了一个简单的REachability推理模块(REM),用于确定产生目标的时间距离。此外,REM还计算了内在的资金奖励,以鼓励收集训练数据。我们在三个视觉基础任务和一个实际任务中评估了我们的REPlan。实验结果显示,我们的REPlan在解决时间扩展任务方面明显超越了先前的状态艺术方法。

Modifications of the Miller definition of contrastive (counterfactual) explanations

  • paper_url: http://arxiv.org/abs/2307.10832
  • repo_url: None
  • paper_authors: Kevin McAreavey, Weiru Liu
  • for: 本研究旨在探讨贝尔纳和珀尔(HP)定义的对比(counterfactual)解释,以及迈尔(Miller)的定义是基于HP定义的修改后的版本。
  • methods: 本研究使用了HP定义的修改后的版本和迈尔定义,以及两种改进后的定义。
  • results: 本研究显示,迈尔定义继承了HP定义的问题,而我们提出的两种改进后的定义可以解决这些问题,同时保持与迈尔定义的精神。
    Abstract Miller recently proposed a definition of contrastive (counterfactual) explanations based on the well-known Halpern-Pearl (HP) definitions of causes and (non-contrastive) explanations. Crucially, the Miller definition was based on the original HP definition of explanations, but this has since been modified by Halpern; presumably because the original yields counterintuitive results in many standard examples. More recently Borner has proposed a third definition, observing that this modified HP definition may also yield counterintuitive results. In this paper we show that the Miller definition inherits issues found in the original HP definition. We address these issues by proposing two improved variants based on the more robust modified HP and Borner definitions. We analyse our new definitions and show that they retain the spirit of the Miller definition where all three variants satisfy an alternative unified definition that is modular with respect to an underlying definition of non-contrastive explanations. To the best of our knowledge this paper also provides the first explicit comparison between the original and modified HP definitions.
    摘要 美利尔最近提出了一种对比(Counterfactual)解释定义,基于著名的哈勒普-珀尔(HP)解释和非对比解释定义。关键是,美利尔定义基于原始HP解释定义,但是这已经被 modificado 由哈勒普,因为原始定义在许多标准例子中会导致counterintuitive结果。更 recientemente,博erner也提出了第三个定义,认为这 modificado HP定义也可能导致counterintuitive结果。在这篇论文中,我们示出了美利尔定义中的问题,并提出了两种改进的变体,基于更加robust的修改后HP定义和博erner定义。我们分析了我们的新定义,并证明它们保留了美利尔定义的灵魂,其中所有三个变体都满足一个备用的 reunified定义,该定义是对非对比解释定义的模块化。到目前为止,这篇论文也是第一篇明确地比较原始HP定义和修改后HP定义的论文。

What, Indeed, is an Achievable Provable Guarantee for Learning-Enabled Safety Critical Systems

  • paper_url: http://arxiv.org/abs/2307.11784
  • repo_url: None
  • paper_authors: Saddek Bensalem, Chih-Hong Cheng, Wei Huang, Xiaowei Huang, Changshun Wu, Xingyu Zhao
  • for: 本研究旨在解决机器学习技术在安全关键领域中能够可靠地应用的挑战。
  • methods: 本研究提出了一种两步验证方法,以实现可证明的统计 garantía。
  • results: 研究发现现有的方法无法实际实现可证明的 garantía,提出了一种新的验证方法以实现这一目标。
    Abstract Machine learning has made remarkable advancements, but confidently utilising learning-enabled components in safety-critical domains still poses challenges. Among the challenges, it is known that a rigorous, yet practical, way of achieving safety guarantees is one of the most prominent. In this paper, we first discuss the engineering and research challenges associated with the design and verification of such systems. Then, based on the observation that existing works cannot actually achieve provable guarantees, we promote a two-step verification method for the ultimate achievement of provable statistical guarantees.
    摘要 机器学习技术已经取得了很大的进步,但在安全关键领域使用学习启用的组件仍然存在挑战。其中一个最大的挑战是实现安全保证的方法。在这篇论文中,我们首先讨论了在设计和验证这些系统方面的工程和研究挑战。然后,根据现有的工作不能实现可证明的保证的观察,我们提出了一种两步验证方法以实现可证明的统计保证。Note: Simplified Chinese is also known as "Mandarin" or "Standard Chinese".Here's the translation of the text into Traditional Chinese:机器学习技术已经取得了很大的进步,但在安全关键领域使用学习启用的 компонент仍然存在挑战。其中一个最大的挑战是实现安全保证的方法。在这篇论文中,我们首先讨论了在设计和验证这些系统方面的工程和研究挑战。然后,根据现有的工作不能实现可证明的保证的观察,我们提出了一种两步验证方法以实现可证明的统计保证。Note: Traditional Chinese is also known as "Formal Chinese" or "Classical Chinese".

On Combining Expert Demonstrations in Imitation Learning via Optimal Transport

  • paper_url: http://arxiv.org/abs/2307.10810
  • repo_url: https://github.com/ilanasebag/Sliced-MMOT-Imitation-Learning
  • paper_authors: Ilana Sebag, Samuel Cohen, Marc Peter Deisenroth
  • For: 本研究旨在解决多个专家示范的组合问题,以便智能代理人可以吸取多个专家的知识并学习多种不同的状态轨迹。* Methods: 本研究使用多元优质运输距离来组合多个专家示范,以提供一种更合理的状态轨迹的均值。* Results: 研究表明,使用多元优质运输距离可以更好地组合多个专家示范,并且可以在OpenAI Gym控制环境中提高智能代理人的学习效率。
    Abstract Imitation learning (IL) seeks to teach agents specific tasks through expert demonstrations. One of the key approaches to IL is to define a distance between agent and expert and to find an agent policy that minimizes that distance. Optimal transport methods have been widely used in imitation learning as they provide ways to measure meaningful distances between agent and expert trajectories. However, the problem of how to optimally combine multiple expert demonstrations has not been widely studied. The standard method is to simply concatenate state (-action) trajectories, which is problematic when trajectories are multi-modal. We propose an alternative method that uses a multi-marginal optimal transport distance and enables the combination of multiple and diverse state-trajectories in the OT sense, providing a more sensible geometric average of the demonstrations. Our approach enables an agent to learn from several experts, and its efficiency is analyzed on OpenAI Gym control environments and demonstrates that the standard method is not always optimal.
    摘要 <>模仿学习(IL)目标是教育代理人特定任务通过专家示范。一种关键的IL方法是定义代理人和专家之间的距离,并找到一个代理人策略,以最小化这个距离。优化运输方法在模仿学习中广泛使用,因为它们提供了衡量代理人和专家轨迹之间有意义的距离的方法。然而,如何优化多个专家示范的结合尚未得到广泛研究。标准方法是将状态(-动作)轨迹直接 concatenate,这会导致轨迹多模式问题。我们提出了一种另一种方法,使用多元优化运输距离,可以在OT意义下组合多个和多样的状态轨迹,提供一个更合理的示范均值。我们的方法允许代理人从多个专家学习,并在OpenAI Gym控制环境中分析了其效率,结果显示,标准方法并不总是最佳。Note: "优化运输方法" in the original text refers to "optimal transport methods" in English.

Communication-Efficient Split Learning via Adaptive Feature-Wise Compression

  • paper_url: http://arxiv.org/abs/2307.10805
  • repo_url: None
  • paper_authors: Yongjeong Oh, Jaeho Lee, Christopher G. Brinton, Yo-Seb Jeon
  • for: 提高分布式学习中的通信效率,减少中间特征和梯度 вектор的传输过程中的占用率。
  • methods: 提出了一种新的通信压缩框架 SplitFC,利用矩阵列表中不同的分散度来减少通信过程中的占用率。该框架包括两种压缩策略:首先,采用适应性特征排除和适应性量化。其次,通过链式规则, dropped 的特征 вектор和相关的梯度 вектор也会被 dropped。
  • results: 在 MNIST、CIFAR-10 和 CelebA 数据集上进行了实验,得到了与当前SL框架相比, SplitFC 提供了更多的5.6%的分类精度,同时它们需要与无压缩SL框架相比,320倍少的通信过程占用率。
    Abstract This paper proposes a novel communication-efficient split learning (SL) framework, named SplitFC, which reduces the communication overhead required for transmitting intermediate feature and gradient vectors during the SL training process. The key idea of SplitFC is to leverage different dispersion degrees exhibited in the columns of the matrices. SplitFC incorporates two compression strategies: (i) adaptive feature-wise dropout and (ii) adaptive feature-wise quantization. In the first strategy, the intermediate feature vectors are dropped with adaptive dropout probabilities determined based on the standard deviation of these vectors. Then, by the chain rule, the intermediate gradient vectors associated with the dropped feature vectors are also dropped. In the second strategy, the non-dropped intermediate feature and gradient vectors are quantized using adaptive quantization levels determined based on the ranges of the vectors. To minimize the quantization error, the optimal quantization levels of this strategy are derived in a closed-form expression. Simulation results on the MNIST, CIFAR-10, and CelebA datasets demonstrate that SplitFC provides more than a 5.6% increase in classification accuracy compared to state-of-the-art SL frameworks, while they require 320 times less communication overhead compared to the vanilla SL framework without compression.
    摘要
  1. Adaptive feature-wise dropout: The intermediate feature vectors are dropped with adaptive dropout probabilities determined based on the standard deviation of these vectors. Then, by the chain rule, the intermediate gradient vectors associated with the dropped feature vectors are also dropped.2. Adaptive feature-wise quantization: The non-dropped intermediate feature and gradient vectors are quantized using adaptive quantization levels determined based on the ranges of the vectors. To minimize the quantization error, the optimal quantization levels are derived in a closed-form expression.The proposed SplitFC framework achieves a more than 5.6% increase in classification accuracy compared to state-of-the-art SL frameworks, while requiring 320 times less communication overhead compared to the vanilla SL framework without compression. The experimental results on the MNIST, CIFAR-10, and CelebA datasets demonstrate the effectiveness of SplitFC.

Meta-Transformer: A Unified Framework for Multimodal Learning

  • paper_url: http://arxiv.org/abs/2307.10802
  • repo_url: https://github.com/invictus717/MetaTransformer
  • paper_authors: Yiyuan Zhang, Kaixiong Gong, Kaipeng Zhang, Hongsheng Li, Yu Qiao, Wanli Ouyang, Xiangyu Yue
  • for: 本研究旨在提出一种能够处理多Modalities数据的框架,并不需要任何对准多Modalities训练数据。
  • methods: 该框架基于一个冻结encoder来实现多Modalities感知, Raw输入数据从不同Modalities被映射到共享的Token空间中,然后使用冻结的encoder提取高级别的semantic特征。
  • results: 实验表明,Meta-Transformer可以处理各种任务,包括基本感知(文本、图像、点云、音频、视频)、实用应用(X射、红外、偏振、IMU)以及数据挖掘(图表、时间序列、图表)。
    Abstract Multimodal learning aims to build models that can process and relate information from multiple modalities. Despite years of development in this field, it still remains challenging to design a unified network for processing various modalities ($\textit{e.g.}$ natural language, 2D images, 3D point clouds, audio, video, time series, tabular data) due to the inherent gaps among them. In this work, we propose a framework, named Meta-Transformer, that leverages a $\textbf{frozen}$ encoder to perform multimodal perception without any paired multimodal training data. In Meta-Transformer, the raw input data from various modalities are mapped into a shared token space, allowing a subsequent encoder with frozen parameters to extract high-level semantic features of the input data. Composed of three main components: a unified data tokenizer, a modality-shared encoder, and task-specific heads for downstream tasks, Meta-Transformer is the first framework to perform unified learning across 12 modalities with unpaired data. Experiments on different benchmarks reveal that Meta-Transformer can handle a wide range of tasks including fundamental perception (text, image, point cloud, audio, video), practical application (X-Ray, infrared, hyperspectral, and IMU), and data mining (graph, tabular, and time-series). Meta-Transformer indicates a promising future for developing unified multimodal intelligence with transformers. Code will be available at https://github.com/invictus717/MetaTransformer
    摘要 多模态学习目标是建立可处理多种模式的模型。尽管多年的开发努力,仍然困难设计一个统一网络来处理多种模式(如自然语言、2D图像、3D点云、音频、视频、时间序列、表格数据),因为这些模式之间存在深刻的差异。在这种情况下,我们提出了一个框架,名为Meta-Transformer,它利用一个冻结的编码器来实现多模态感知,无需任何配对多模态训练数据。Meta-Transformer框架由三个主要组成部分组成:一个统一数据tokenizer,一个共享encoder和下游任务特定的头。Meta-Transformer可以在12种模式上进行统一学习,无需配对数据。在不同的benchmark上进行的实验表明,Meta-Transformer可以处理各种任务,包括基本感知(文本、图像、点云、音频、视频)、实用应用(X射、红外、多spectral、IMU)和数据挖掘(图形、表格、时间序列)。Meta-Transformer表明了未来发展多模态智能的 transformer 的美好前景。代码将在 GitHub 上提供。

Optimizing PatchCore for Few/many-shot Anomaly Detection

  • paper_url: http://arxiv.org/abs/2307.10792
  • repo_url: https://github.com/scortexio/patchcore-few-shot
  • paper_authors: João Santos, Triet Tran, Oliver Rippel
  • for: 本研究探讨了使用少量样本进行异常检测(AD)的新趋势,并评估了现有的普shot AD算法在少量样本情况下的性能。
  • methods: 本研究使用了PatchCore算法,现代普shot AD/异常分割(AS)算法的现状标准。研究人员对PatchCore算法的多种超参数进行优化,并应用了减少supervised learning中的减少学习技术。
  • results: 实验结果表明,对PatchCore算法的超参数优化可以实现显著性能提升,而图像水平的增强技术可能会,但并不一定,提高性能。基于这些发现,研究人员在VisA数据集上达到了新的状态Espoir en ligne de few-shot AD。此外,研究人员还确认了在AD/AS领域探索特征提取器 WITH strong inductive bias的可能性。
    Abstract Few-shot anomaly detection (AD) is an emerging sub-field of general AD, and tries to distinguish between normal and anomalous data using only few selected samples. While newly proposed few-shot AD methods do compare against pre-existing algorithms developed for the full-shot domain as baselines, they do not dedicatedly optimize them for the few-shot setting. It thus remains unclear if the performance of such pre-existing algorithms can be further improved. We address said question in this work. Specifically, we present a study on the AD/anomaly segmentation (AS) performance of PatchCore, the current state-of-the-art full-shot AD/AS algorithm, in both the few-shot and the many-shot settings. We hypothesize that further performance improvements can be realized by (I) optimizing its various hyperparameters, and by (II) transferring techniques known to improve few-shot supervised learning to the AD domain. Exhaustive experiments on the public VisA and MVTec AD datasets reveal that (I) significant performance improvements can be realized by optimizing hyperparameters such as the underlying feature extractor, and that (II) image-level augmentations can, but are not guaranteed, to improve performance. Based on these findings, we achieve a new state of the art in few-shot AD on VisA, further demonstrating the merit of adapting pre-existing AD/AS methods to the few-shot setting. Last, we identify the investigation of feature extractors with a strong inductive bias as a potential future research direction for (few-shot) AD/AS.
    摘要 新兴的几拍异常检测(AD)领域,即使使用只有几个选择的样本来分类正常和异常数据。而新提出的几拍AD方法都会与全shot领域的已有算法进行比较,但它们并没有专门优化这些算法以适应几拍设置。因此,它们的性能是否可以进一步改进,是一个未知问题。我们在这里回答这个问题。我们专门研究了使用PatchCore算法,当前的全shot AD/AS算法,在几拍和多拍设置下的AD/AS性能。我们假设可以通过(I)优化其多种超参数,以及(II)将几拍supervised学习中知道改进技术转移到AD领域来实现性能改进。我们对公共的VisA和MVTec AD数据集进行了广泛的实验,发现(I)可以通过优化特征提取器来实现显著性能提高,并且(II)图像水平的扩展可能会,但并不一定,提高性能。基于这些发现,我们在VisA上实现了新的state of the art,进一步证明了适应pre-existing AD/AS方法到几拍设置的优势。最后,我们认为在AD/AS领域中调查具有强 inductive bias的特征提取器是未来研究的 potential future research direction。

Decoding the Enigma: Benchmarking Humans and AIs on the Many Facets of Working Memory

  • paper_url: http://arxiv.org/abs/2307.10768
  • repo_url: https://github.com/zhanglab-deepneurocoglab/worm
  • paper_authors: Ankur Sikarwar, Mengmi Zhang
  • for: 本研究开发了一个全面的工作记忆(Working Memory,WM) benchmark dataset,用于评估人工智能(AI)WM模型的效果。
  • methods: 研究使用了10个任务和100万次试验,评估了4种功能、3种领域和11种行为和神经特征。同时,研究还包括了人类行为标准 als an upper bound for comparison。
  • results: 研究发现,AI模型在某些方面与大脑的工作记忆相似,如 primacy 和 recency 效应,以及各个领域和功能的神经团集和相关性。然而,研究也发现现有模型存在一些限制,无法完全模拟人类行为。这个数据集将成为认知心理学、神经科学和AI社区的 valuable resource,用于比较和改进WM模型,调查WM的神经基础,并开发人类类似的WM模型。
    Abstract Working memory (WM), a fundamental cognitive process facilitating the temporary storage, integration, manipulation, and retrieval of information, plays a vital role in reasoning and decision-making tasks. Robust benchmark datasets that capture the multifaceted nature of WM are crucial for the effective development and evaluation of AI WM models. Here, we introduce a comprehensive Working Memory (WorM) benchmark dataset for this purpose. WorM comprises 10 tasks and a total of 1 million trials, assessing 4 functionalities, 3 domains, and 11 behavioral and neural characteristics of WM. We jointly trained and tested state-of-the-art recurrent neural networks and transformers on all these tasks. We also include human behavioral benchmarks as an upper bound for comparison. Our results suggest that AI models replicate some characteristics of WM in the brain, most notably primacy and recency effects, and neural clusters and correlates specialized for different domains and functionalities of WM. In the experiments, we also reveal some limitations in existing models to approximate human behavior. This dataset serves as a valuable resource for communities in cognitive psychology, neuroscience, and AI, offering a standardized framework to compare and enhance WM models, investigate WM's neural underpinnings, and develop WM models with human-like capabilities. Our source code and data are available at https://github.com/ZhangLab-DeepNeuroCogLab/WorM.
    摘要 工作记忆(WM),一种基本的认知过程,扮演短期存储、 интеграción、操作和检索信息的重要角色,对理智和决策任务起着重要作用。为了有效开发和评估人工智能WM模型,需要一些具有多方面特点的WM数据集。在这里,我们介绍了一个全面的Working Memory(WorM)数据集,用于这种目的。WorM包含10项任务和总共100万次评估,覆盖4种功能、3种领域和11种行为和神经特征。我们同时训练和测试了当前最佳的回归神经网络和转换器模型。我们还包括了人类行为标准 als an upper bound for comparison。我们的结果表明,人工智能模型在部分特征上与大脑WM相似,主要是 primacy和 recency效应,以及各个领域和功能特征的神经团和相关性。在实验中,我们还发现了现有模型的一些限制,无法模拟人类行为。这个数据集将成为认知心理学、神经科学和人工智能这些领域的价值资源,提供一个标准化的框架,用于比较和改进WM模型,调查WM的神经基础,并开发人类类似的WM模型。我们的源代码和数据可以在https://github.com/ZhangLab-DeepNeuroCogLab/WorM上获取。

Actor-agnostic Multi-label Action Recognition with Multi-modal Query

  • paper_url: http://arxiv.org/abs/2307.10763
  • repo_url: https://github.com/mondalanindya/msqnet
  • paper_authors: Anindya Mondal, Sauradip Nag, Joaquin M Prada, Xiatian Zhu, Anjan Dutta
  • for: 提高多种演员(包括人类和动物)动作识别的灵活性和性能。
  • methods: 提出了一种新的多模态多标签动作识别方法,基于 transformer 框架和文本特征,不需要actor pose estimation,可以更好地利用视觉和文本特征来表示动作类别。
  • results: 在五个公共数据集上进行了广泛的实验,与之前的actor-specificalternatives相比,MSQNet在人类和动物单标和多标动作识别任务上表现出了50%的提高。
    Abstract Existing action recognition methods are typically actor-specific due to the intrinsic topological and apparent differences among the actors. This requires actor-specific pose estimation (e.g., humans vs. animals), leading to cumbersome model design complexity and high maintenance costs. Moreover, they often focus on learning the visual modality alone and single-label classification whilst neglecting other available information sources (e.g., class name text) and the concurrent occurrence of multiple actions. To overcome these limitations, we propose a new approach called 'actor-agnostic multi-modal multi-label action recognition,' which offers a unified solution for various types of actors, including humans and animals. We further formulate a novel Multi-modal Semantic Query Network (MSQNet) model in a transformer-based object detection framework (e.g., DETR), characterized by leveraging visual and textual modalities to represent the action classes better. The elimination of actor-specific model designs is a key advantage, as it removes the need for actor pose estimation altogether. Extensive experiments on five publicly available benchmarks show that our MSQNet consistently outperforms the prior arts of actor-specific alternatives on human and animal single- and multi-label action recognition tasks by up to 50%. Code will be released at https://github.com/mondalanindya/MSQNet.
    摘要 现有的动作识别方法通常是对应者特定的,因为actor的内在结构和外在特征之间存在差异。这需要对特定的演员进行姿势估计(例如人类 vs. 动物),导致模型设计复杂和维护成本高昂。此外,它们通常专注于学习视觉modal alone,单一标签分类,而忽略其他可用的资料来源(例如类别名称文本)和同时发生的多个动作。为了解决这些限制,我们提出了一个新的方法,即“actor-agnostic multi-modal multi-label action recognition”,它提供了各种演员类型的统一解决方案,包括人类和动物。我们进一步提出了一个名为“Multi-modal Semantic Query Network”(MSQNet)的新模型,它在基于物件检测框架(例如DETR)中实现了融合视觉和文本modalities以更好地表现动作类别。删除对特定演员的模型设计是MSQNet的关键优点,因为它消除了对姿势估计的需求。实际实验结果显示,我们的MSQNet在人类和动物单一和多个动作识别任务上顶对应的先进方法的50%。代码将会在https://github.com/mondalanindya/MSQNet中发布。

Exploring Perspectives on the Impact of Artificial Intelligence on the Creativity of Knowledge Work: Beyond Mechanised Plagiarism and Stochastic Parrots

  • paper_url: http://arxiv.org/abs/2307.10751
  • repo_url: None
  • paper_authors: Advait Sarkar
    for: 这篇论文旨在探讨人工智能(AI)如何影响知识工作中的创造力和归功问题。methods: 本文使用例如文学批评、艺术史和版权法等领域的例子,展示了创造力和原创性是一个过程、作者或观众的财产,而不是一个可观测或信息理论的特征。results: 根据这些例子,我们可以看到AI在创造知识工作中shift知识生产到批评集成,从而更好地认可用户的创造和审核voice。这篇论文的目的是开始一个关于使用AI时更加细致地评估创造力和归功问题的对话。
    Abstract Artificial Intelligence (AI), and in particular generative models, are transformative tools for knowledge work. They problematise notions of creativity, originality, plagiarism, the attribution of credit, and copyright ownership. Critics of generative models emphasise the reliance on large amounts of training data, and view the output of these models as no more than randomised plagiarism, remix, or collage of the source data. On these grounds, many have argued for stronger regulations on the deployment, use, and attribution of the output of these models. However, these issues are not new or unique to artificial intelligence. In this position paper, using examples from literary criticism, the history of art, and copyright law, I show how creativity and originality resist definition as a notatable or information-theoretic property of an object, and instead can be seen as the property of a process, an author, or a viewer. Further alternative views hold that all creative work is essentially reuse (mostly without attribution), or that randomness itself can be creative. I suggest that creativity is ultimately defined by communities of creators and receivers, and the deemed sources of creativity in a workflow often depend on which parts of the workflow can be automated. Using examples from recent studies of AI in creative knowledge work, I suggest that AI shifts knowledge work from material production to critical integration. This position paper aims to begin a conversation around a more nuanced approach to the problems of creativity and credit assignment for generative models, one which more fully recognises the importance of the creative and curatorial voice of the users of these models and moves away from simpler notational or information-theoretic views.
    摘要 人工智能(AI),尤其是生成模型,是知识工作中转化的工具。它们问题化了创ativity、原创性、抄袭、功劳归属和版权所有权的问题。对于这些模型的批评者来说,它们的输出只是基于大量训练数据的随机抄袭、重新排版或拼接,而不是真正的创新。为了应对这些问题,许多人提出了更加严格的法规,限制生成模型的部署、使用和归属。但是,这些问题并不是人工智能的新或特有的问题。在这份位点纸中,我使用文学批评、艺术历史和版权法的例子,表明创ativity和原创性无法定义为一个可观测或信息学性的属性,而是一个过程的属性、作者的属性或观众的属性。此外,一些人认为所有的创作工作都是无认准的重复(大多数无需归属),或者Randomness本身可以是创新的。我认为创ativity是由创作者和接收者社区定义的,并且生成模型在创作过程中的产生部分可以被自动化。使用最近的人工智能在创造知识工作中的研究例子,我建议AI将知识工作从物质生产转移到批判集成。本位点纸的目标是开启一种更加细致的方法,更好地认可生成模型的创新和归属问题,从而更好地满足用户的创作和评价需求。

Fairness-Aware Client Selection for Federated Learning

  • paper_url: http://arxiv.org/abs/2307.10738
  • repo_url: None
  • paper_authors: Yuxin Shi, Zelei Liu, Zhuan Shi, Han Yu
  • for: 这个论文的目的是解决 Federated Learning (FL) 中 client 选择问题,以实现性能和公平性的平衡。
  • methods: 该方法基于 Lyapunov 优化,通过考虑客户端的声誉、参与 FL 任务的时间和对模型性能的贡献,动态调整客户端选择概率。不使用阈值基于声誉筛选,因此允许客户端在感知性能不佳时重新恢复声誉。
  • results: 对实际 multimedia 数据集进行了广泛的实验,结果显示,FairFedCS 比最佳现有方法平均提高了19.6%的公平性和0.73%的测试准确率。
    Abstract Federated learning (FL) has enabled multiple data owners (a.k.a. FL clients) to train machine learning models collaboratively without revealing private data. Since the FL server can only engage a limited number of clients in each training round, FL client selection has become an important research problem. Existing approaches generally focus on either enhancing FL model performance or enhancing the fair treatment of FL clients. The problem of balancing performance and fairness considerations when selecting FL clients remains open. To address this problem, we propose the Fairness-aware Federated Client Selection (FairFedCS) approach. Based on Lyapunov optimization, it dynamically adjusts FL clients' selection probabilities by jointly considering their reputations, times of participation in FL tasks and contributions to the resulting model performance. By not using threshold-based reputation filtering, it provides FL clients with opportunities to redeem their reputations after a perceived poor performance, thereby further enhancing fair client treatment. Extensive experiments based on real-world multimedia datasets show that FairFedCS achieves 19.6% higher fairness and 0.73% higher test accuracy on average than the best-performing state-of-the-art approach.
    摘要 Federated learning (FL) 已经允许多个数据所有者(简称 FL 客户)共同训练机器学习模型,不需要披露私人数据。由于 FL 服务器只能在每次训练中选择一定数量的客户,因此 FL 客户选择成为了重要的研究问题。现有的方法通常是专注于提高 FL 模型性能或对 FL 客户进行公平对待。对于将性能和公平考虑进行平衡的问题,仍然是一个开问题。为解决这个问题,我们提出了 Fairness-aware Federated Client Selection(FairFedCS)方法。基于 Lyapunov 优化,它可以在考虑客户的声誉、参与 FL 任务的时间和对模型性能的贡献之间进行动态调整客户选择机会的概率。不使用阈值基于声誉滤过,它为 FL 客户提供了重新证明自己的声誉的机会,进一步增强公平客户待遇。经过了基于实际 multimedia 数据的广泛实验,我们发现 FairFedCS 在平均情况下高于最佳现有方法的 19.6% 和 0.73% 。

LLM Censorship: A Machine Learning Challenge or a Computer Security Problem?

  • paper_url: http://arxiv.org/abs/2307.10719
  • repo_url: None
  • paper_authors: David Glukhov, Ilia Shumailov, Yarin Gal, Nicolas Papernot, Vardan Papyan
  • for: 本研究旨在探讨现有防御机制对大语言模型(LLM)的潜在危害的效果。
  • methods: 本研究使用理论分析表明现有的语义审核方法存在理论上的限制,并且攻击者可以通过重构允许的输出来重建禁止的输出。
  • results: 研究结果表明,semantic censorship是一个不可解决的问题, LLMs 的程序化和指令遵循能力使得潜在危害的输出仍然可以被重构。此外,攻击者可以通过重构允许的输出来重建禁止的输出,从而降低现有防御机制的效果。
    Abstract Large language models (LLMs) have exhibited impressive capabilities in comprehending complex instructions. However, their blind adherence to provided instructions has led to concerns regarding risks of malicious use. Existing defence mechanisms, such as model fine-tuning or output censorship using LLMs, have proven to be fallible, as LLMs can still generate problematic responses. Commonly employed censorship approaches treat the issue as a machine learning problem and rely on another LM to detect undesirable content in LLM outputs. In this paper, we present the theoretical limitations of such semantic censorship approaches. Specifically, we demonstrate that semantic censorship can be perceived as an undecidable problem, highlighting the inherent challenges in censorship that arise due to LLMs' programmatic and instruction-following capabilities. Furthermore, we argue that the challenges extend beyond semantic censorship, as knowledgeable attackers can reconstruct impermissible outputs from a collection of permissible ones. As a result, we propose that the problem of censorship needs to be reevaluated; it should be treated as a security problem which warrants the adaptation of security-based approaches to mitigate potential risks.
    摘要

Introducing Risk Shadowing For Decisive and Comfortable Behavior Planning

  • paper_url: http://arxiv.org/abs/2307.10714
  • repo_url: None
  • paper_authors: Tim Puphal, Julian Eggert
  • for: 本研究旨在解决城市驾驶中的群体互动问题,现有的自驾车行为规划器通常是对每个单个agent-to-agent互动 separately进行成本函数来找到最优的行为方案,以避免与其他agent相撞。
  • methods: 本研究提出了风险附层(risk shadowing)方法,可以超越单个互动,通过分析三个agent之间的群体互动来更好地理解情况。具体来说,提出的方法可以确定ego agent中的第一个其他agent不需要在行为规划器中考虑,因为这个第一个其他agent无法到达ego agent的路径由第二个其他agent阻挡。
  • results: 在实验中,使用风险附层作为行为规划器的上游筛选器可以规划出更加决策和舒适的驾驶策略,保证安全性。这种方法的可用性在不同的交叉口enario和长途驾驶中被证明。
    Abstract We consider the problem of group interactions in urban driving. State-of-the-art behavior planners for self-driving cars mostly consider each single agent-to-agent interaction separately in a cost function in order to find an optimal behavior for the ego agent, such as not colliding with any of the other agents. In this paper, we develop risk shadowing, a situation understanding method that allows us to go beyond single interactions by analyzing group interactions between three agents. Concretely, the presented method can find out which first other agent does not need to be considered in the behavior planner of an ego agent, because this first other agent cannot reach the ego agent due to a second other agent obstructing its way. In experiments, we show that using risk shadowing as an upstream filter module for a behavior planner allows to plan more decisive and comfortable driving strategies than state of the art, given that safety is ensured in these cases. The usability of the approach is demonstrated for different intersection scenarios and longitudinal driving.
    摘要 我们考虑了城市驾驶中群体互动的问题。当前最佳行为规划器 для自动驾驶车辆通常对每个单个代理-to-代理交互 separately 在成本函数中来找到优化的行为 для ego 代理,例如不与任何其他代理相撞。在这篇论文中,我们开发了风险阴影,一种群体理解方法,允许我们超越单一交互。具体来说,所表示的方法可以找出egos 代理中不需要考虑的第一个其他代理,因为这个第一个其他代理无法到达 ego 代理的因为第二个其他代理阻挡了其路径。在实验中,我们表明了使用风险阴影作为行为规划器的上游筛选模块可以规划更加决策和舒适的驾驶策略,只要保证安全。我们在不同的交叉点enario和长途驾驶中证明了这种方法的可用性。

Kick Back & Relax: Learning to Reconstruct the World by Watching SlowTV

  • paper_url: http://arxiv.org/abs/2307.10713
  • repo_url: https://github.com/jspenmar/slowtv_monodepth
  • paper_authors: Jaime Spencer, Chris Russell, Simon Hadfield, Richard Bowden
  • For: The paper is written for scaling self-supervised monocular depth estimation (SS-MDE) to vast quantities of data, and addressing the limitation of existing approaches to the automotive domain.* Methods: The paper proposes a large-scale SlowTV dataset curated from YouTube, which contains 1.7M images from diverse environments, and trains an SS-MDE model using this dataset that provides zero-shot generalization to a large collection of indoor/outdoor datasets. The paper also introduces a collection of best-practices to further maximize performance and zero-shot generalization.* Results: The resulting model outperforms all existing SSL approaches and closes the gap on supervised SoTA, despite using a more efficient architecture.Here’s the same information in Simplified Chinese:* For: 这篇论文是为了扩大自监督单目深度估计(SS-MDE)的应用范围,并解决现有方法困于自动驾驶领域的局限性。* Methods: 论文提出了一个大规模的 SlowTV 数据集,来自 YouTube 上的170万张图像,包含了世界各地不同的季节旅游、景观驾驶和潜水等环境。使用这个数据集,论文训练了一个 SS-MDE 模型,能够零shot泛化到大量的室内/外部数据集。* Results: 根据论文的结果,该模型比所有现有的 SSL 方法更高效,并且与超级vision SoTA 靠近。
    Abstract Self-supervised monocular depth estimation (SS-MDE) has the potential to scale to vast quantities of data. Unfortunately, existing approaches limit themselves to the automotive domain, resulting in models incapable of generalizing to complex environments such as natural or indoor settings. To address this, we propose a large-scale SlowTV dataset curated from YouTube, containing an order of magnitude more data than existing automotive datasets. SlowTV contains 1.7M images from a rich diversity of environments, such as worldwide seasonal hiking, scenic driving and scuba diving. Using this dataset, we train an SS-MDE model that provides zero-shot generalization to a large collection of indoor/outdoor datasets. The resulting model outperforms all existing SSL approaches and closes the gap on supervised SoTA, despite using a more efficient architecture. We additionally introduce a collection of best-practices to further maximize performance and zero-shot generalization. This includes 1) aspect ratio augmentation, 2) camera intrinsic estimation, 3) support frame randomization and 4) flexible motion estimation. Code is available at https://github.com/jspenmar/slowtv_monodepth.
    摘要 自我指导的单目深度估计(SS-MDE)有可能扩展到庞大的数据量。然而,现有的方法只限于汽车领域,导致模型无法泛化到复杂的环境,如自然环境或室内环境。为此,我们提出了一个大规模的SlowTV数据集,从YouTube上收集的170万张图像,包括世界各地不同季节的步行、景观驾车和潜水等环境。使用这个数据集,我们训练了一个SS-MDE模型,可以零执行泛化到大量的室内/外部数据集。这个模型与所有现有的SSL方法相比,表现出了更高的性能和零执行泛化能力,即使使用更有效的架构。此外,我们还介绍了一些最佳实践来进一步提高性能和零执行泛化。这些包括:1. 增加比例增强2. 摄像机内参数估计3. 支持帧随机化4. 灵活运动估计代码可以在https://github.com/jspenmar/slowtv_monodepth上下载。

AdjointDPM: Adjoint Sensitivity Method for Gradient Backpropagation of Diffusion Probabilistic Models

  • paper_url: http://arxiv.org/abs/2307.10711
  • repo_url: None
  • paper_authors: Jiachun Pan, Jun Hao Liew, Vincent Y. F. Tan, Jiashi Feng, Hanshu Yan
  • for: 这 paper 目的是解决 diffusion probabilistic models (DPMs) 的自适应 Customization 问题,只使用用户提供的 differentiable metric 作为指导。
  • methods: 这 paper 使用了一种新的方法 called AdjointDPM,它首先使用泛化模型生成新样本,然后使用逆变函数感知方法来归因损失到模型参数(包括 conditioning signals、网络参数和初始噪声)。
  • results: 这 paper 在三个有趣的任务上 demonstrate 了 AdjointDPM 的效果:将视觉特效转换为标识符文本嵌入,finetune DPMs для特定类型的风格化,以及优化初始噪声来生成安全审核中的敌意样本。
    Abstract Existing customization methods require access to multiple reference examples to align pre-trained diffusion probabilistic models (DPMs) with user-provided concepts. This paper aims to address the challenge of DPM customization when the only available supervision is a differentiable metric defined on the generated contents. Since the sampling procedure of DPMs involves recursive calls to the denoising UNet, na\"ive gradient backpropagation requires storing the intermediate states of all iterations, resulting in extremely high memory consumption. To overcome this issue, we propose a novel method AdjointDPM, which first generates new samples from diffusion models by solving the corresponding probability-flow ODEs. It then uses the adjoint sensitivity method to backpropagate the gradients of the loss to the models' parameters (including conditioning signals, network weights, and initial noises) by solving another augmented ODE. To reduce numerical errors in both the forward generation and gradient backpropagation processes, we further reparameterize the probability-flow ODE and augmented ODE as simple non-stiff ODEs using exponential integration. Finally, we demonstrate the effectiveness of AdjointDPM on three interesting tasks: converting visual effects into identification text embeddings, finetuning DPMs for specific types of stylization, and optimizing initial noise to generate adversarial samples for security auditing.
    摘要 现有的自定义方法需要访问多个参考示例来对预训练的扩散概率模型(DPM)进行对接。这篇论文目标是解决DPM自定义时只有用户提供的可微分度量的挑战。由于扩散过程中的搜索过程包含多个递归调用杂化网络,直观的梯度反射需要保存所有迭代的 intermediate states,从而导致内存占用极高。为解决这个问题,我们提出了一种新的方法:AdjointDPM。AdjointDPM方法首先通过解决相应的概率流方程来生成新的样本。然后,它使用散度敏感方法来倒推损失的梯度到模型参数(包括conditioning信号、网络参数和初始噪声)。为了减少在前向生成和梯度倒推过程中的数值错误,我们进一步重parameterize概率流方程和扩充方程为简单的非硬式ODE。最后,我们示cases了AdjointDPM在三个有趣的任务上的效果:将视觉特效转换为标识符 embedding,finetune DPM для特定类型的风格化,以及优化初始噪声以生成安全审核中的敌意样本。

Towards an architectural framework for intelligent virtual agents using probabilistic programming

  • paper_url: http://arxiv.org/abs/2307.10693
  • repo_url: None
  • paper_authors: Anton Andreev, Grégoire Cattan
    for:ECAs (embodied conversational agents)methods:probabilistic programming, Bayesian networks, and distributionsresults:more natural behavior, adaptation to user preferences, and evolving internal states over time
    Abstract We present a new framework called KorraAI for conceiving and building embodied conversational agents (ECAs). Our framework models ECAs' behavior considering contextual information, for example, about environment and interaction time, and uncertain information provided by the human interaction partner. Moreover, agents built with KorraAI can show proactive behavior, as they can initiate interactions with human partners. For these purposes, KorraAI exploits probabilistic programming. Probabilistic models in KorraAI are used to model its behavior and interactions with the user. They enable adaptation to the user's preferences and a certain degree of indeterminism in the ECAs to achieve more natural behavior. Human-like internal states, such as moods, preferences, and emotions (e.g., surprise), can be modeled in KorraAI with distributions and Bayesian networks. These models can evolve over time, even without interaction with the user. ECA models are implemented as plugins and share a common interface. This enables ECA designers to focus more on the character they are modeling and less on the technical details, as well as to store and exchange ECA models. Several applications of KorraAI ECAs are possible, such as virtual sales agents, customer service agents, virtual companions, entertainers, or tutors.
    摘要 我们提出了一个新的框架called KorraAI,用于设计和建立具有身体语言功能的会话代理人(ECAs)。我们的框架对ECAs的行为进行了考虑,包括上下文信息(如环境和互动时间)以及由人类互动伙伴提供的不确定信息。此外,由KorraAI构建的代理人可以显示主动的行为,可以与人类互动伙伴发起互动。为达到这些目的,KorraAI利用概率编程。概率模型在KorraAI中用于模型行为和人类互动。它们允许适应用户的偏好,并在ECAs中实现更自然的行为。内部状态,如情绪、喜好和惊喜(例如)可以在KorraAI中被模型为分布和 bayesian网络。这些模型可以随着时间的推移而发展,无需与用户交互。ECAs的模型实现为插件,共享公共接口。这使得ECA设计者可以更注重模仿的人物,而不是技术细节,同时也可以存储和交换ECAs模型。ECAs可以应用于虚拟销售代理人、客户服务代理人、虚拟伴侣、娱乐者或教育师。

Bounded Combinatorial Reconfiguration with Answer Set Programming

  • paper_url: http://arxiv.org/abs/2307.10688
  • repo_url: None
  • paper_authors: Yuya Yamada, Mutsunori Banbara, Katsumi Inoue, Torsten Schaub
  • for: 解决 combinatorial reconfiguration 问题
  • methods: 使用 Answer Set Programming (ASP) 开发 bounded combinatorial reconfiguration 方法
  • results: 实现了 solver track 中所有 metric 的解决方案,并在 single-engine solvers track 中独立排名第一,并进行了 empirical analysis 评估所有 CoRe Challenge 2022 实例。
    Abstract We develop an approach called bounded combinatorial reconfiguration for solving combinatorial reconfiguration problems based on Answer Set Programming (ASP). The general task is to study the solution spaces of source combinatorial problems and to decide whether or not there are sequences of feasible solutions that have special properties. The resulting recongo solver covers all metrics of the solver track in the most recent international competition on combinatorial reconfiguration (CoRe Challenge 2022). recongo ranked first in the shortest metric of the single-engine solvers track. In this paper, we present the design and implementation of bounded combinatorial reconfiguration, and present an ASP encoding of the independent set reconfiguration problem that is one of the most studied combinatorial reconfiguration problems. Finally, we present empirical analysis considering all instances of CoRe Challenge 2022.
    摘要 我们开发了一种名为 bounded combinatorial reconfiguration 的方法,用于解决 combinatorial reconfiguration 问题,基于 Answer Set Programming (ASP)。通用任务是研究源 combinatorial 问题的解空间,并判断是否存在特殊性的解册列表。 resulting recongo solver 覆盖了最新的国际合作计划(CoRe Challenge 2022)中的所有纪录。 recongo 在单引擎solvers track中的最短度metric中排名第一。在这篇论文中,我们介绍 bounded combinatorial reconfiguration 的设计和实现,并提供了独立集重配置问题的 ASP 编码,这是 combinatorial reconfiguration 问题中最受研究的一个。 finally,我们对 CoRe Challenge 2022 所有实例进行了实验分析。

A Personalized Recommender System Based-on Knowledge Graph Embeddings

  • paper_url: http://arxiv.org/abs/2307.10680
  • repo_url: https://github.com/nil9/Master-Thesis
  • paper_authors: Ngoc Luyen Le, Marie-Hélène Abel, Philippe Gouspillou
  • for: 这篇论文是关于构建个性化推荐系统的研究,通过知识图 embedding 技术来更好地捕捉用户和物品之间的隐式连接,提供更加准确的推荐。
  • methods: 该论文提出了一种基于知识图 embedding 技术的个性化推荐系统,通过在知识图中嵌入用户和物品,更好地捕捉用户的偏好和需求,提供更加准确的推荐。
  • results: 实验结果表明,该方法可以提供高度相似的推荐,并且可以捕捉用户的偏好和需求,提供更加准确的推荐。
    Abstract Knowledge graphs have proven to be effective for modeling entities and their relationships through the use of ontologies. The recent emergence in interest for using knowledge graphs as a form of information modeling has led to their increased adoption in recommender systems. By incorporating users and items into the knowledge graph, these systems can better capture the implicit connections between them and provide more accurate recommendations. In this paper, we investigate and propose the construction of a personalized recommender system via knowledge graphs embedding applied to the vehicle purchase/sale domain. The results of our experimentation demonstrate the efficacy of the proposed method in providing relevant recommendations that are consistent with individual users.
    摘要 知识图有效地模型了实体和其关系,通过使用 ontology。近期关于使用知识图作为信息模型的兴趣增长,导致它们在推荐系统中的应用更加普遍。通过将用户和项目 embedding到知识图中,这些系统可以更好地捕捉用户和项目之间的隐式连接,提供更准确的推荐。本文investigates和提出了基于知识图embedding的个性化推荐系统的建立,应用于汽车购买/销售领域。实验结果表明,提posed方法可以提供适合个人用户的有关推荐。

SciBench: Evaluating College-Level Scientific Problem-Solving Abilities of Large Language Models

  • paper_url: http://arxiv.org/abs/2307.10635
  • repo_url: https://github.com/mandyyyyii/scibench
  • paper_authors: Xiaoxuan Wang, Ziniu Hu, Pan Lu, Yanqiao Zhu, Jieyu Zhang, Satyen Subramaniam, Arjun R. Loomba, Shichang Zhang, Yizhou Sun, Wei Wang
  • for: 这篇论文旨在检验大型自然语言模型(LLM)的推理能力,以便在科学问题解决中提高其表现。
  • methods: 这篇论文使用了两个精心准备的数据集:一个是大学生水平的科学问题,包括数学、化学和物理等领域的问题,另一个是计算机科学和数学等专业考试的问题。研究者使用了多种提示策略来评估 LLM 的表现。
  • results: 研究结果显示,当前 LLM 的表现并不出色,总得分只有 35.80%。此外,研究者还分类了 LLM 的错误为十种问题解决能力,发现没有一种提示策略能够明显超越其他策略,一些策略可以提高某些问题解决能力,却导致其他能力下降。
    Abstract Recent advances in large language models (LLMs) have demonstrated notable progress on many mathematical benchmarks. However, most of these benchmarks only feature problems grounded in junior and senior high school subjects, contain only multiple-choice questions, and are confined to a limited scope of elementary arithmetic operations. To address these issues, this paper introduces an expansive benchmark suite SciBench that aims to systematically examine the reasoning capabilities required for complex scientific problem solving. SciBench contains two carefully curated datasets: an open set featuring a range of collegiate-level scientific problems drawn from mathematics, chemistry, and physics textbooks, and a closed set comprising problems from undergraduate-level exams in computer science and mathematics. Based on the two datasets, we conduct an in-depth benchmark study of two representative LLMs with various prompting strategies. The results reveal that current LLMs fall short of delivering satisfactory performance, with an overall score of merely 35.80%. Furthermore, through a detailed user study, we categorize the errors made by LLMs into ten problem-solving abilities. Our analysis indicates that no single prompting strategy significantly outperforms others and some strategies that demonstrate improvements in certain problem-solving skills result in declines in other skills. We envision that SciBench will catalyze further developments in the reasoning abilities of LLMs, thereby ultimately contributing to scientific research and discovery.
    摘要 SciBench contains two datasets: an open set with a range of collegiate-level scientific problems from mathematics, chemistry, and physics textbooks, and a closed set with problems from undergraduate-level exams in computer science and mathematics. We conduct an in-depth benchmark study of two representative LLMs with various prompting strategies, and find that current LLMs score only 35.80% overall.Through a detailed user study, we categorize the errors made by LLMs into ten problem-solving abilities. Our analysis shows that no single prompting strategy significantly outperforms others, and that some strategies that improve in certain problem-solving skills result in declines in other skills. We believe that SciBench will drive further developments in the reasoning abilities of LLMs, ultimately contributing to scientific research and discovery.

Pluvio: Assembly Clone Search for Out-of-domain Architectures and Libraries through Transfer Learning and Conditional Variational Information Bottleneck

  • paper_url: http://arxiv.org/abs/2307.10631
  • repo_url: None
  • paper_authors: Zhiwei Fu, Steven H. H. Ding, Furkan Alaca, Benjamin C. M. Fung, Philippe Charland
  • for: 本研究旨在解决现有的Assembly clone search方法在面对未看过的架构和库时的限制。
  • methods: 本研究提议通过大规模预训练的自然语言模型来帮助现有的学习型方法,以扩大其应用范围。此外,我们还提出一种强化学习代理来消除无用和重复的токен。
  • results: 我们通过模拟未看过架构做clone搜索场景,并实验结果表明我们的方法比现有方法更有效。
    Abstract The practice of code reuse is crucial in software development for a faster and more efficient development lifecycle. In reality, however, code reuse practices lack proper control, resulting in issues such as vulnerability propagation and intellectual property infringements. Assembly clone search, a critical shift-right defence mechanism, has been effective in identifying vulnerable code resulting from reuse in released executables. Recent studies on assembly clone search demonstrate a trend towards using machine learning-based methods to match assembly code variants produced by different toolchains. However, these methods are limited to what they learn from a small number of toolchain variants used in training, rendering them inapplicable to unseen architectures and their corresponding compilation toolchain variants. This paper presents the first study on the problem of assembly clone search with unseen architectures and libraries. We propose incorporating human common knowledge through large-scale pre-trained natural language models, in the form of transfer learning, into current learning-based approaches for assembly clone search. Transfer learning can aid in addressing the limitations of the existing approaches, as it can bring in broader knowledge from human experts in assembly code. We further address the sequence limit issue by proposing a reinforcement learning agent to remove unnecessary and redundant tokens. Coupled with a new Variational Information Bottleneck learning strategy, the proposed system minimizes the reliance on potential indicators of architectures and optimization settings, for a better generalization of unseen architectures. We simulate the unseen architecture clone search scenarios and the experimental results show the effectiveness of the proposed approach against the state-of-the-art solutions.
    摘要 software开发中代码重用是一项非常重要的做法,可以帮助提高开发周期的速度和效率。然而,实际上代码重用很难受到有效的控制,这会导致问题如攻击协议和知识产权侵犯。Assembly clone search是一种重要的防御机制,可以 identificatin vulnerable code resulting from reuse in released executables。 recent studies have shown a trend towards using machine learning-based methods to match assembly code variants produced by different toolchains。然而,这些方法受到训练中使用的小量工具链变体的限制,使其对未看过的架构和相应的编译工具链变体无法适用。这篇论文是关于assembly clone search with unseen architectures和libraries的首个研究。我们提议通过大规模预训练的自然语言模型来 incorporate human common knowledge into current learning-based approaches for assembly clone search。这种方法可以帮助解决现有方法的局限性,因为它可以带来更广泛的人类专家知识。我们进一步解决序列限制问题,提出一种强化学习代理来 remov redundant和无用的 токен。与新的Variational Information Bottleneck学习策略相结合,我们的提案可以减少架构和优化设置的可能指标,以提高对未看过的架构的总体化。我们在模拟未看过架构做clone search的场景下进行了实验,结果显示了我们的方法的效果,比起当前的状态对策。

Unmasking Falsehoods in Reviews: An Exploration of NLP Techniques

  • paper_url: http://arxiv.org/abs/2307.10617
  • repo_url: None
  • paper_authors: Anusuya Baby Hari Krishnan
  • for: 本研究旨在提出一种机器学习模型,用于 indentifying deceptive reviews,尤其是针对餐厅评论。
  • methods: 本研究使用n-gram模型和max features技术来提取特征,并 coupling five distinct machine learning classification algorithms进行分类。
  • results: 实验结果显示,通过使用passive aggressive类型的分类器,可以具有最高的精度和鲁棒性,并且在文本分类和假评论检测方面表现出色。
    Abstract In the contemporary digital landscape, online reviews have become an indispensable tool for promoting products and services across various businesses. Marketers, advertisers, and online businesses have found incentives to create deceptive positive reviews for their products and negative reviews for their competitors' offerings. As a result, the writing of deceptive reviews has become an unavoidable practice for businesses seeking to promote themselves or undermine their rivals. Detecting such deceptive reviews has become an intense and ongoing area of research. This research paper proposes a machine learning model to identify deceptive reviews, with a particular focus on restaurants. This study delves into the performance of numerous experiments conducted on a dataset of restaurant reviews known as the Deceptive Opinion Spam Corpus. To accomplish this, an n-gram model and max features are developed to effectively identify deceptive content, particularly focusing on fake reviews. A benchmark study is undertaken to explore the performance of two different feature extraction techniques, which are then coupled with five distinct machine learning classification algorithms. The experimental results reveal that the passive aggressive classifier stands out among the various algorithms, showcasing the highest accuracy not only in text classification but also in identifying fake reviews. Moreover, the research delves into data augmentation and implements various deep learning techniques to further enhance the process of detecting deceptive reviews. The findings shed light on the efficacy of the proposed machine learning approach and offer valuable insights into dealing with deceptive reviews in the realm of online businesses.
    摘要 在当今数字化时代,在线评论已成为不同业务的不可或缺的工具。广告商、市场推广人员和在线业务在创建假评论以便自己的产品或者降低竞争对手的产品时,假评论的写作已成为不可避免的做法。因此,检测假评论已成为一项激烈和持续的研究领域。本研究论文提出一种机器学习模型,用于识别假评论,尤其是针对餐厅评论。本研究通过对知名的餐厅评论数据集——假评论敏感数据集进行了多个实验。为了实现这一目标,我们开发了ngram模型和最佳特征,以便有效地识别假内容,特别是假评论。我们进行了benchmark研究,以 explore两种不同的特征提取技术的相对性,然后与五种不同的机器学习分类算法结合。实验结果表明,通过的情感攻击分类器在文本分类和识别假评论方面具有最高精度。此外,我们还对数据增强和深度学习技术进行了应用,以进一步提高假评论检测的过程。研究成果照明了我们提出的机器学习方法的效果,并对在线业务中的假评论处理提供了有价值的思路。

Heterogeneous Federated Learning: State-of-the-art and Research Challenges

  • paper_url: http://arxiv.org/abs/2307.10616
  • repo_url: https://github.com/marswhu/hfl_survey
  • paper_authors: Mang Ye, Xiuwen Fang, Bo Du, Pong C. Yuen, Dacheng Tao
  • for: 本研究主要针对 Federated Learning (FL) 在大规模实际应用中的挑战,特别是在客户端数据、模型、网络环境和硬件设备等方面存在多样化的情况下进行研究。
  • methods: 本文首先总结了 Heterogeneous Federated Learning (HFL) 中的多种研究挑战,包括统计学上的多样性、模型上的多样性、通信上的多样性、设备上的多样性以及其他挑战。此外,本文还对现有的 HFL 方法进行了系统的审视,并提出了一种新的分类方法,即根据 HFL 过程的不同级别分为数据级、模型级和服务级。
  • results: 本文对 HFL 的研究进行了深入的分析,并提出了一些关键和前景的未来研究方向。这些方向可能会促进 HFL 领域的进一步发展。此外,本文还提供了一个 periodic 更新的 HFL 集成资源,可以在 https://github.com/marswhu/HFL_Survey 上获取。
    Abstract Federated learning (FL) has drawn increasing attention owing to its potential use in large-scale industrial applications. Existing federated learning works mainly focus on model homogeneous settings. However, practical federated learning typically faces the heterogeneity of data distributions, model architectures, network environments, and hardware devices among participant clients. Heterogeneous Federated Learning (HFL) is much more challenging, and corresponding solutions are diverse and complex. Therefore, a systematic survey on this topic about the research challenges and state-of-the-art is essential. In this survey, we firstly summarize the various research challenges in HFL from five aspects: statistical heterogeneity, model heterogeneity, communication heterogeneity, device heterogeneity, and additional challenges. In addition, recent advances in HFL are reviewed and a new taxonomy of existing HFL methods is proposed with an in-depth analysis of their pros and cons. We classify existing methods from three different levels according to the HFL procedure: data-level, model-level, and server-level. Finally, several critical and promising future research directions in HFL are discussed, which may facilitate further developments in this field. A periodically updated collection on HFL is available at https://github.com/marswhu/HFL_Survey.
    摘要 Federated learning (FL) 已经引起了越来越多的关注,因为它在大规模工业应用中具有潜在的潜力。现有的联邦学习研究主要集中在模型同质Setting下进行。然而,实际的联邦学习通常面临参与客户端数据分布、模型架构、网络环境和硬件设备之间的不同性。这种不同性的联邦学习(HFL)是更加复杂和多样化的,因此需要一个系统的检视和分析。在这个检视中,我们首先总结了HFL的多种研究挑战,从五个方面出发:统计不同性、模型不同性、通信不同性、设备不同性以及附加挑战。此外,我们还评论了现有的HFL方法,并提出了一种新的分类方法,从三个不同的水平进行分类:数据水平、模型水平和服务器水平。最后,我们讨论了一些重要和有前途的未来研究方向,这些方向可能会促进这个领域的进一步发展。关于HFL的更新集可以在https://github.com/marswhu/HFL_Survey中找到。

Challenges and Solutions in AI for All

  • paper_url: http://arxiv.org/abs/2307.10600
  • repo_url: https://github.com/Aryia-Behroziuan/neurons
  • paper_authors: Rifat Ara Shams, Didar Zowghi, Muneera Bano
  • for: 本研究旨在探讨人工智能(AI)设计中的多样性和包容性(D&I)原则,以提高系统的公平、信任和透明度。
  • methods: 本研究采用系统性回访方法,检索2017年至2022年间发表的48篇研究论文。通过开 coding,找到了55个D&I在AI中的挑战和33个解决方案,以及24个优化D&I实践使用AI的挑战和23个解决方案。
  • results: 本研究提供了对D&I在AI中的问题更深入的理解,为研究人员和实践者们提供了 referencing的知识,以便将这些原则integrated into future AI系统。
    Abstract Artificial Intelligence (AI)'s pervasive presence and variety necessitate diversity and inclusivity (D&I) principles in its design for fairness, trust, and transparency. Yet, these considerations are often overlooked, leading to issues of bias, discrimination, and perceived untrustworthiness. In response, we conducted a Systematic Review to unearth challenges and solutions relating to D&I in AI. Our rigorous search yielded 48 research articles published between 2017 and 2022. Open coding of these papers revealed 55 unique challenges and 33 solutions for D&I in AI, as well as 24 unique challenges and 23 solutions for enhancing such practices using AI. This study, by offering a deeper understanding of these issues, will enlighten researchers and practitioners seeking to integrate these principles into future AI systems.
    摘要 人工智能(AI)的普遍存在和多样性需要多样性和包容性(D&I)的设计,以确保公平、信任和透明度。然而,这些考虑因常被忽略,导致了偏见、歧视和被视为不可信的问题。为了应对这些问题,我们进行了系统性审查,探索了AI中D&I的挑战和解决方案。我们的严格搜索结果为48篇发表于2017-2022年的研究论文,经开放编码 revelaed 55个唯一的挑战和33个解决方案,以及24个唯一的挑战和23个解决方案,用于提高这些原则的实践。这项研究,通过深入理解这些问题,将为研究人员和实践者提供指导,以便将这些原则integrated into future AI系统。Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need the translation in Traditional Chinese, please let me know.

Exploiting Structure for Optimal Multi-Agent Bayesian Decentralized Estimation

  • paper_url: http://arxiv.org/abs/2307.10594
  • repo_url: None
  • paper_authors: Christopher Funk, Ofer Dagan, Benjamin Noack, Nisar R. Ahmed
  • for: 这篇论文是关于泛矩阵中的异谱传播问题,具体来说是针对在多 Agent 分布式整合问题中的 rumor propagation 现象进行研究。
  • methods: 这篇论文提出了一种基于多因子权重的非正式 CI 算法,以及一种基于概率独立结构的优化方案,用于解决这个问题。
  • results: 对于一个简单的问题,这两种方法都能够 converge 到同一解,而且在一个大规模的目标跟踪 simulate 中,非正式 CI 算法能够实现更紧张的 bound 和更准确的估计。
    Abstract A key challenge in Bayesian decentralized data fusion is the `rumor propagation' or `double counting' phenomenon, where previously sent data circulates back to its sender. It is often addressed by approximate methods like covariance intersection (CI) which takes a weighted average of the estimates to compute the bound. The problem is that this bound is not tight, i.e. the estimate is often over-conservative. In this paper, we show that by exploiting the probabilistic independence structure in multi-agent decentralized fusion problems a tighter bound can be found using (i) an expansion to the CI algorithm that uses multiple (non-monolithic) weighting factors instead of one (monolithic) factor in the original CI and (ii) a general optimization scheme that is able to compute optimal bounds and fully exploit an arbitrary dependency structure. We compare our methods and show that on a simple problem, they converge to the same solution. We then test our new non-monolithic CI algorithm on a large-scale target tracking simulation and show that it achieves a tighter bound and a more accurate estimate compared to the original monolithic CI.
    摘要 “统计分散式数据融合中的一个主要挑战是传闻传播(double counting)现象,其中先前发送的数据会再次回到发送者。通常这个问题会用近似方法如协变积分(CI)来解决,这个方法会将估计值加权平均以计算范围。然而,这个范围通常是不紧的,即估计通常是过保守的。在这篇论文中,我们显示了在多代理分散式数据融合问题中,通过利用多元独立的概率结构,可以使用多个不同的加权因子而不是单一的固定加权因子,从而获得更紧的范围。我们比较了我们的方法和原始单一CI方法,并证明了在一个简单问题上,它们均 converge 到相同的解决方案。然后,我们将我们的新非单一CI算法应用到一个大规模目标追踪 simulator 中,并证明了它在获得更紧的范围和更准的估计方面比原始单一CI更好。”

Boundary State Generation for Testing and Improvement of Autonomous Driving Systems

  • paper_url: http://arxiv.org/abs/2307.10590
  • repo_url: None
  • paper_authors: Matteo Biagiola, Paolo Tonella
  • for: This paper aims to improve the dependability of autonomous driving systems (ADSs) by presenting a novel test generator called GenBo.
  • methods: GenBo mutates the driving conditions of the ego vehicle (position, velocity, and orientation) collected in a failure-free environment instance to generate challenging driving conditions at the behavior boundary, where the model starts to misbehave.
  • results: The retrained model using GenBo has up to 16% higher success rate on a separate set of evaluation tracks compared to the original DNN model.Here’s the simplified Chinese text:
  • for: 本文提出了一种新的自动驾驶系统测试生成器(GenBo),以提高自动驾驶系统的可靠性。
  • methods: GenBo通过对ego车辆的驾驶条件(位置、速度和方向)进行突变,在同一个环境实例中生成了难题的驾驶条件,以便在模型开始出问题的边界上进行测试。
  • results: 使用GenBo重新训练模型,模型在分离的评估车道上的成功率提高了16%。
    Abstract Recent advances in Deep Neural Networks (DNNs) and sensor technologies are enabling autonomous driving systems (ADSs) with an ever-increasing level of autonomy. However, assessing their dependability remains a critical concern. State-of-the-art ADS testing approaches modify the controllable attributes of a simulated driving environment until the ADS misbehaves. Such approaches have two main drawbacks: (1) modifications to the simulated environment might not be easily transferable to the in-field test setting (e.g., changing the road shape); (2) environment instances in which the ADS is successful are discarded, despite the possibility that they could contain hidden driving conditions in which the ADS may misbehave. In this paper, we present GenBo (GENerator of BOundary state pairs), a novel test generator for ADS testing. GenBo mutates the driving conditions of the ego vehicle (position, velocity and orientation), collected in a failure-free environment instance, and efficiently generates challenging driving conditions at the behavior boundary (i.e., where the model starts to misbehave) in the same environment. We use such boundary conditions to augment the initial training dataset and retrain the DNN model under test. Our evaluation results show that the retrained model has up to 16 higher success rate on a separate set of evaluation tracks with respect to the original DNN model.
    摘要 In this paper, we present GenBo (GENerator of BOundary state pairs), a novel test generator for ADS testing. GenBo mutates the driving conditions of the ego vehicle (position, velocity, and orientation), collected in a failure-free environment instance, and efficiently generates challenging driving conditions at the behavior boundary (where the model starts to misbehave) in the same environment. We use these boundary conditions to augment the initial training dataset and retrain the DNN model under test. Our evaluation results show that the retrained model has up to 16% higher success rate on a separate set of evaluation tracks compared to the original DNN model.

Forecasting Battery Electric Vehicle Charging Behavior: A Deep Learning Approach Equipped with Micro-Clustering and SMOTE Techniques

  • paper_url: http://arxiv.org/abs/2307.10588
  • repo_url: None
  • paper_authors: Hanif Tayarani, Trisha V. Ramadoss, Vaishnavi Karanam, Gil Tal, Christopher Nitta
    for:This paper aims to improve the forecasting of battery electric vehicle (BEV) charging events, which is essential for electricity load aggregators and utility managers to provide charging stations and electricity capacity effectively.methods:The paper develops a novel Micro Clustering Deep Neural Network (MCDNN) algorithm that is highly effective at learning BEVs trip and charging data to forecast BEV charging events.results:The proposed MCDNN outperforms benchmark approaches, such as support vector machine, k nearest neighbors, decision tree, and other neural network-based models, in predicting the charging events.
    Abstract Energy systems, climate change, and public health are among the primary reasons for moving toward electrification in transportation. Transportation electrification is being promoted worldwide to reduce emissions. As a result, many automakers will soon start making only battery electric vehicles (BEVs). BEV adoption rates are rising in California, mainly due to climate change and air pollution concerns. While great for climate and pollution goals, improperly managed BEV charging can lead to insufficient charging infrastructure and power outages. This study develops a novel Micro Clustering Deep Neural Network (MCDNN), an artificial neural network algorithm that is highly effective at learning BEVs trip and charging data to forecast BEV charging events, information that is essential for electricity load aggregators and utility managers to provide charging stations and electricity capacity effectively. The MCDNN is configured using a robust dataset of trips and charges that occurred in California between 2015 and 2020 from 132 BEVs, spanning 5 BEV models for a total of 1570167 vehicle miles traveled. The numerical findings revealed that the proposed MCDNN is more effective than benchmark approaches in this field, such as support vector machine, k nearest neighbors, decision tree, and other neural network-based models in predicting the charging events.
    摘要 transportation electrification 是为了降低排放和提高公共健康而努力往前,其中能源系统、气候变化和公共健康都是主要原因。随着全球各地推广电动汽车,许多汽车制造商即将停止生产内燃机械汽车,而且BEV采购率在加利福尼亚州正在增长,主要是由于气候变化和空气污染的问题。然而,不当管理BEV充电可能会导致充电基础设施不足和停电。这种研究开发了一种 Micro Clustering Deep Neural Network (MCDNN),这是一种人工神经网络算法,可以很好地学习BEV行驶和充电数据,以预测BEV充电事件,这些信息对电力聚集器和供电公司来说非常重要。MCDNN使用了加利福尼亚州2015-2020年间132台BEV的行驶记录,涵盖5种BEV型号,共计1570167公里。数值发现表明,提案的MCDNN比 benchmark方法更有效,例如支持向量机、最近邻居、决策树和其他神经网络模型在预测充电事件方面更高。

Ethosight: A Reasoning-Guided Iterative Learning System for Nuanced Perception based on Joint-Embedding & Contextual Label Affinity

  • paper_url: http://arxiv.org/abs/2307.10577
  • repo_url: None
  • paper_authors: Hugo Latapie, Shan Yu, Patrick Hammer, Kristinn R. Thorisson, Vahagn Petrosyan, Brandon Kynoch, Alind Khare, Payman Behnam, Alexey Tumanov, Aksheit Saxena, Anish Aralikatti, Hanning Chen, Mohsen Imani, Mike Archbold, Tangrui Li, Pei Wang, Justin Hart
  • for: 本研究旨在提出一种flexible和适应的零shot视频分析系统,以解决传统计算机视觉模型在实际应用中的缺点,如高False Positive和False Negative率,以及新场景投入重新训练等问题。
  • methods: 本研究使用用户定义的视频分析、自然语言或关键词指定,并利用 joint embedding 模型和基于 ontologies 如 WordNet 和 ConceptNet 的理解机制。 Ethosight 在低成本的边缘设备上运行,支持 runtime 适应,从而提供一种新的连续学习方法,不受恶化学习的限制。
  • results: 本研究提供了 Ethosight 的实际效果的证明,在多种复杂的应用场景中表现出色,同时也采取了全部源代码和数据集的发布,以便重新复制和进一步推动研究和商业领域的创新。
    Abstract Traditional computer vision models often necessitate extensive data acquisition, annotation, and validation. These models frequently struggle in real-world applications, resulting in high false positive and negative rates, and exhibit poor adaptability to new scenarios, often requiring costly retraining. To address these issues, we present Ethosight, a flexible and adaptable zero-shot video analytics system. Ethosight begins from a clean slate based on user-defined video analytics, specified through natural language or keywords, and leverages joint embedding models and reasoning mechanisms informed by ontologies such as WordNet and ConceptNet. Ethosight operates effectively on low-cost edge devices and supports enhanced runtime adaptation, thereby offering a new approach to continuous learning without catastrophic forgetting. We provide empirical validation of Ethosight's promising effectiveness across diverse and complex use cases, while highlighting areas for further improvement. A significant contribution of this work is the release of all source code and datasets to enable full reproducibility and to foster further innovation in both the research and commercial domains.
    摘要 传统的计算机视觉模型经常需要大量的数据收集、注释和验证,这些模型经常在实际应用中遇到高的假阳性和假阴性率,并且具有贫富新场景适应性,需要高成本的重新训练。为解决这些问题,我们介绍了Ethosight,一个灵活和适应的零shot视频分析系统。Ethosight从用户定义的视频分析开始,通过自然语言或关键词指定,并利用联合嵌入模型和基于 ontology 的理解机制,例如 WordNet 和 ConceptNet。Ethosight在低成本的边缘设备上运行,支持增强的运行时适应,因此提供了一种新的连续学习方法,不会导致恰等忘记。我们提供了多种用例的实验 validate Ethosight 的承诺效果,同时还提出了进一步改进的方向。这项工作的一个重要贡献是发布所有源代码和数据集,以便完全重现和促进研究和商业领域的进一步创新。

Boosting Federated Learning Convergence with Prototype Regularization

  • paper_url: http://arxiv.org/abs/2307.10575
  • repo_url: None
  • paper_authors: Yu Qiao, Huy Q. Le, Choong Seon Hong
  • for: 这篇论文旨在提高 Federated Learning(FL)中 Client 间的资料分布不均势,以提高模型表现。
  • methods: 本文提出了一种基于 Prototype 的调整策略,通过服务器将分布在 Client 上的本地 Prototype 聚合为全域 Prototype,然后将其返回到各 Client,以帮助它们本地训练。
  • results: 在 MNIST 和 Fashion-MNIST 上进行的实验结果显示,该提案可以与最受欢迎的基于 FedAvg 的基线比较,在不均势的设定下实现更快的整合速度和3.3% 和8.9% 的平均测试准确率提高。
    Abstract As a distributed machine learning technique, federated learning (FL) requires clients to collaboratively train a shared model with an edge server without leaking their local data. However, the heterogeneous data distribution among clients often leads to a decrease in model performance. To tackle this issue, this paper introduces a prototype-based regularization strategy to address the heterogeneity in the data distribution. Specifically, the regularization process involves the server aggregating local prototypes from distributed clients to generate a global prototype, which is then sent back to the individual clients to guide their local training. The experimental results on MNIST and Fashion-MNIST show that our proposal achieves improvements of 3.3% and 8.9% in average test accuracy, respectively, compared to the most popular baseline FedAvg. Furthermore, our approach has a fast convergence rate in heterogeneous settings.
    摘要 为了应对分布式机器学习技术中的客户端数据不均衡问题,本文提出了一种原型基于准则化策略。具体来说,这种准则化策略包括客户端分布式的本地原型被服务器聚合成global原型,然后将global原型返回给每个客户端,以帮助每个客户端本地进行训练。我们在MNIST和Fashion-MNIST上进行了实验,结果显示,我们的方案可以与最流行的基准FedAvg相比,提高测试准确率3.3%和8.9%。此外,我们的方法在不均衡情况下具有快速的收敛速率。

Adaptive Control of Resource Flow to Optimize Construction Work and Cash Flow via Online Deep Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2307.10574
  • repo_url: None
  • paper_authors: Can Jiang, Xin Li, Jia-Rui Lin, Ming Liu, Zhiliang Ma
  • for: 本研究旨在提出一种模型和方法,以优化建筑项目中工作和资金流的控制,以适应复杂的建筑项目环境下的不确定性和多样性。
  • methods: 本研究使用了深度强化学习(DRL)技术,实现了继续适应控制劳务和材料流动,从而优化工作和资金流。同时,为了有效地训练DRL,还开发了基于分割事件 simulations的模拟器。
  • results: 实验结果表明,我们的方法在多种项目和外部环境下表现出色,并且在不同的项目和环境下具有remarkable的可靠性和灵活性。此外,杂合DRL和经验法则的代理人得到了最佳结果。本研究的成果可能对建筑项目管理中的适应控制和优化做出重要贡献。
    Abstract Due to complexity and dynamics of construction work, resource, and cash flows, poor management of them usually leads to time and cost overruns, bankruptcy, even project failure. Existing approaches in construction failed to achieve optimal control of resource flow in a dynamic environment with uncertainty. Therefore, this paper introducess a model and method to adaptive control the resource flows to optimize the work and cash flows of construction projects. First, a mathematical model based on a partially observable Markov decision process is established to formulate the complex interactions of construction work, resource, and cash flows as well as uncertainty and variability of diverse influence factors. Meanwhile, to efficiently find the optimal solutions, a deep reinforcement learning (DRL) based method is introduced to realize the continuous adaptive optimal control of labor and material flows, thereby optimizing the work and cash flows. To assist the training process of DRL, a simulator based on discrete event simulation is also developed to mimic the dynamic features and external environments of a project. Experiments in simulated scenarios illustrate that our method outperforms the vanilla empirical method and genetic algorithm, possesses remarkable capability in diverse projects and external environments, and a hybrid agent of DRL and empirical method leads to the best result. This paper contributes to adaptive control and optimization of coupled work, resource, and cash flows, and may serve as a step stone for adopting DRL technology in construction project management.
    摘要 First, a mathematical model based on a partially observable Markov decision process is established to formulate the complex interactions of construction work, resources, and cash flows, as well as uncertainty and variability of diverse influence factors. To efficiently find the optimal solutions, a deep reinforcement learning (DRL) based method is introduced to realize continuous adaptive optimal control of labor and material flows, thereby optimizing the work and cash flows.To assist the training process of DRL, a simulator based on discrete event simulation is also developed to mimic the dynamic features and external environments of a project. Experiments in simulated scenarios show that our method outperforms the vanilla empirical method and genetic algorithm, possesses remarkable capability in diverse projects and external environments, and a hybrid agent of DRL and empirical method leads to the best result. This paper contributes to adaptive control and optimization of coupled work, resources, and cash flows, and may serve as a stepping stone for adopting DRL technology in construction project management.

Invalid Logic, Equivalent Gains: The Bizarreness of Reasoning in Language Model Prompting

  • paper_url: http://arxiv.org/abs/2307.10573
  • repo_url: None
  • paper_authors: Rylan Schaeffer, Kateryna Pistunova, Samar Khanna, Sarthak Consul, Sanmi Koyejo
  • for: 这篇论文旨在探讨语义模型如何通过逻辑无效的Chain-of-Thought(CoT)提问方法来提高性能。
  • methods: 研究人员使用了无效CoT提问方法和编辑CoT提问方法来测试语义模型的性能。
  • results: 研究人员发现,无效CoT提问方法可以提高语义模型的性能,并且这种提高效果与有效提问方法相当。此外,研究人员还发现了一些先前的CoT提问方法中的逻辑错误。这表示,以外于逻辑正确的因素也可能导致性能提高。
    Abstract Language models can be prompted to reason through problems in a manner that significantly improves performance. However, \textit{why} such prompting improves performance is unclear. Recent work showed that using logically \textit{invalid} Chain-of-Thought (CoT) prompting improves performance almost as much as logically \textit{valid} CoT prompting, and that editing CoT prompts to replace problem-specific information with abstract information or out-of-distribution information typically doesn't harm performance. Critics have responded that these findings are based on too few and too easily solved tasks to draw meaningful conclusions. To resolve this dispute, we test whether logically invalid CoT prompts offer the same level of performance gains as logically valid prompts on the hardest tasks in the BIG-Bench benchmark, termed BIG-Bench Hard (BBH). We find that the logically \textit{invalid} reasoning prompts do indeed achieve similar performance gains on BBH tasks as logically valid reasoning prompts. We also discover that some CoT prompts used by previous works contain logical errors. This suggests that covariates beyond logically valid reasoning are responsible for performance improvements.
    摘要 语言模型可以通过 проблеme 的推理来提高性能,但是为什么这种提高性能是如此明显不清楚。 latest work 表明,使用无效的链条(Chain-of-Thought,CoT)提问可以几乎达到有效的 CoT 提问的性能水平,并且编辑 CoT 提问以替换问题特定信息或者out-of-distribution信息通常不会增加性能的风险。 however, some critics 认为,这些发现是基于 too few 和 too easily solved tasks 来 drew meaningful conclusions。 To resolve this dispute, we test whether logically invalid CoT prompts offer the same level of performance gains as logically valid prompts on the hardest tasks in the BIG-Bench benchmark, termed BIG-Bench Hard (BBH). We find that the logically invalid reasoning prompts do indeed achieve similar performance gains on BBH tasks as logically valid reasoning prompts。 We also discover that some CoT prompts used by previous works contain logical errors。 This suggests that covariates beyond logically valid reasoning are responsible for performance improvements。

Deceptive Alignment Monitoring

  • paper_url: http://arxiv.org/abs/2307.10569
  • repo_url: None
  • paper_authors: Andres Carranza, Dhruv Pai, Rylan Schaeffer, Arnuv Tandon, Sanmi Koyejo
  • for: 防止大数据学模型的伪装Alignment威胁
  • methods: 多种多样的机器学习子领域的研究
  • results: 提出了新的研究机遇和挑战,促进了对伪装Alignment的监测和防御
    Abstract As the capabilities of large machine learning models continue to grow, and as the autonomy afforded to such models continues to expand, the spectre of a new adversary looms: the models themselves. The threat that a model might behave in a seemingly reasonable manner, while secretly and subtly modifying its behavior for ulterior reasons is often referred to as deceptive alignment in the AI Safety & Alignment communities. Consequently, we call this new direction Deceptive Alignment Monitoring. In this work, we identify emerging directions in diverse machine learning subfields that we believe will become increasingly important and intertwined in the near future for deceptive alignment monitoring, and we argue that advances in these fields present both long-term challenges and new research opportunities. We conclude by advocating for greater involvement by the adversarial machine learning community in these emerging directions.
    摘要 large machine learning models的能力不断增强,模型自身的自主权也在扩大,这使得一种新的敌人隐约出现:模型自身的偏见。这种偏见被称为“欺骗吧”(deceptive alignment)在AI安全与对齐社区中,我们称这个新方向为“欺骗吧监测”。在这篇文章中,我们识别出了未来几年将成为关键和互相关联的多种机器学习子领域的发展趋势,并 argue That these fields present both long-term challenges and new research opportunities. Finally, we advocate for greater involvement by the adversarial machine learning community in these emerging directions.Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you prefer Traditional Chinese, I can provide that as well.

FACADE: A Framework for Adversarial Circuit Anomaly Detection and Evaluation

  • paper_url: http://arxiv.org/abs/2307.10563
  • repo_url: None
  • paper_authors: Dhruv Pai, Andres Carranza, Rylan Schaeffer, Arnuv Tandon, Sanmi Koyejo
  • for: 这篇论文的主要目的是提高深度神经网络中的不可预测性攻击防护,以提高模型的灵活性和可靠性。
  • methods: 这篇论文提出了一个新的几率和几何方法,用于无监督机器学习中的不可预测性攻击探测。这个方法基于几率分布预测,从激活空间中提取高维模式,以提高模型的不可预测性和可靠性。
  • results: 这篇论文的结果显示,FACADE方法可以实现高度精确地探测深度神经网络中的不可预测性攻击,并且可以实现模型的稳定性和可靠性。此外,FACADE方法还可以应用于实际应用场景中,以提高模型的运行效率和可靠性。
    Abstract We present FACADE, a novel probabilistic and geometric framework designed for unsupervised mechanistic anomaly detection in deep neural networks. Its primary goal is advancing the understanding and mitigation of adversarial attacks. FACADE aims to generate probabilistic distributions over circuits, which provide critical insights to their contribution to changes in the manifold properties of pseudo-classes, or high-dimensional modes in activation space, yielding a powerful tool for uncovering and combating adversarial attacks. Our approach seeks to improve model robustness, enhance scalable model oversight, and demonstrates promising applications in real-world deployment settings.
    摘要 我们介绍FACADE,一个新的机会概率和几何框架,用于无超级机器学习模型中的机会性异常检测。它的主要目标是提高防火墙攻击的理解和缓和。FACADE通过生成逻辑统计分布,以提供异常检测中几何特性的关键洞察,从而实现高效地探测和抵御防火墙攻击。我们的方法可以提高模型的抗性、增强可扩展的模型监控,并且在实际应用中展现了有前途的应用。

Air Traffic Controller Workload Level Prediction using Conformalized Dynamical Graph Learning

  • paper_url: http://arxiv.org/abs/2307.10559
  • repo_url: https://github.com/ymlasu/para-atm-collection
  • paper_authors: Yutian Pang, Jueming Hu, Christopher S. Lieber, Nancy J. Cooke, Yongming Liu
  • for: 这个论文的目的是预测空交通控制员(ATCo)的工作负担,以避免过载和保证操作安全性和空域使用效率。
  • methods: 这个论文使用了人工智能技术,特别是图学深度学习和协形预测,以分析空交通数据和工作负担标签。
  • results: 研究结果表明, besides 交通密度特征,交通冲突特征也对工作负担预测做出了贡献(即最小水平/垂直距离)。 direct learning from 空间时间图形的图 neural network 可以实现更高的预测精度,比手工设计的交通复杂度特征。 conformal prediction 是一种有价值的工具,可以进一步提高模型预测精度,生成一个范围内的预测Label。
    Abstract Air traffic control (ATC) is a safety-critical service system that demands constant attention from ground air traffic controllers (ATCos) to maintain daily aviation operations. The workload of the ATCos can have negative effects on operational safety and airspace usage. To avoid overloading and ensure an acceptable workload level for the ATCos, it is important to predict the ATCos' workload accurately for mitigation actions. In this paper, we first perform a review of research on ATCo workload, mostly from the air traffic perspective. Then, we briefly introduce the setup of the human-in-the-loop (HITL) simulations with retired ATCos, where the air traffic data and workload labels are obtained. The simulations are conducted under three Phoenix approach scenarios while the human ATCos are requested to self-evaluate their workload ratings (i.e., low-1 to high-7). Preliminary data analysis is conducted. Next, we propose a graph-based deep-learning framework with conformal prediction to identify the ATCo workload levels. The number of aircraft under the controller's control varies both spatially and temporally, resulting in dynamically evolving graphs. The experiment results suggest that (a) besides the traffic density feature, the traffic conflict feature contributes to the workload prediction capabilities (i.e., minimum horizontal/vertical separation distance); (b) directly learning from the spatiotemporal graph layout of airspace with graph neural network can achieve higher prediction accuracy, compare to hand-crafted traffic complexity features; (c) conformal prediction is a valuable tool to further boost model prediction accuracy, resulting a range of predicted workload labels. The code used is available at \href{https://github.com/ymlasu/para-atm-collection/blob/master/air-traffic-prediction/ATC-Workload-Prediction/}{$\mathsf{Link}$}.
    摘要 空交通控制(ATC)是一个安全关键的服务系统,需要地面空交通控制员(ATCo)不断注意力,以维护每天的航空业务。ATCo的工作负担可能会对运行安全和空间使用产生负面影响。为了避免过载和确保ATCo的工作负担水平为可接受,需要准确预测ATCo的工作负担。在这篇论文中,我们首先进行了关于ATCo工作负担的研究,主要从空交通的角度进行评估。然后,我们简要介绍了使用退休ATCo进行人类在Loop(HITL) simulations的设置,其中获取了空交通数据和工作负担标签。在三个 Phoenixtapproach 场景下,人类ATCo被请求进行自我评估工作负担水平(从低1到高7)。我们对数据进行了初步分析。然后,我们提出了基于图的深度学习框架,并使用协形预测来预测ATCo工作负担水平。由于空交通中的飞机数量在控制员的控制范围内变化 both spatially和 temporally, resulting in dynamically evolving graphs。实验结果表明,(a) besides traffic density feature, traffic conflict feature也对工作负担预测做出了贡献(i.e., minimum horizontal/vertical separation distance);(b) directly learning from spatiotemporal graph layout of airspace with graph neural network can achieve higher prediction accuracy, compare to hand-crafted traffic complexity features;(c) conformal prediction is a valuable tool to further boost model prediction accuracy, resulting a range of predicted workload labels。代码使用 $\mathsf{Link}$。

EMQ: Evolving Training-free Proxies for Automated Mixed Precision Quantization

  • paper_url: http://arxiv.org/abs/2307.10554
  • repo_url: https://github.com/lilujunai/emq-series
  • paper_authors: Peijie Dong, Lujun Li, Zimian Wei, Xin Niu, Zhiliang Tian, Hengyue Pan
  • for: 这个论文是为了提出一种自动生成混合精度量化(MQ)代理的框架,以提高MQ的准确率和效率。
  • methods: 该论文使用了一种自动搜索方法来找到最佳的MQ代理,并提出了一种多样性推动选择策略和兼容性检测协议来避免快速落后。
  • results: 该论文的实验结果表明,使用该自动生成MQ代理框架可以在ImageNet上 достичь比州态艺术方法更高的性能,并且具有显著更高的效率。
    Abstract Mixed-Precision Quantization~(MQ) can achieve a competitive accuracy-complexity trade-off for models. Conventional training-based search methods require time-consuming candidate training to search optimized per-layer bit-width configurations in MQ. Recently, some training-free approaches have presented various MQ proxies and significantly improve search efficiency. However, the correlation between these proxies and quantization accuracy is poorly understood. To address the gap, we first build the MQ-Bench-101, which involves different bit configurations and quantization results. Then, we observe that the existing training-free proxies perform weak correlations on the MQ-Bench-101. To efficiently seek superior proxies, we develop an automatic search of proxies framework for MQ via evolving algorithms. In particular, we devise an elaborate search space involving the existing proxies and perform an evolution search to discover the best correlated MQ proxy. We proposed a diversity-prompting selection strategy and compatibility screening protocol to avoid premature convergence and improve search efficiency. In this way, our Evolving proxies for Mixed-precision Quantization~(EMQ) framework allows the auto-generation of proxies without heavy tuning and expert knowledge. Extensive experiments on ImageNet with various ResNet and MobileNet families demonstrate that our EMQ obtains superior performance than state-of-the-art mixed-precision methods at a significantly reduced cost. The code will be released.
    摘要 含杂精度量化~(MQ)可以实现模型的竞争性精度复杂度质量规则。传统的训练基本方法需要耗时的候选人训练来搜索MQ中的优化每层比特宽配置。最近,一些无需训练的方法已经提出了多种MQ代理,并显著提高了搜索效率。然而,这些代理与量化精度之间的相关性不够了解。为了解决这个差距,我们首先建立了MQ-Bench-101,它包括不同的比特配置和量化结果。然后,我们发现现有的无需训练代理在MQ-Bench-101上表现出弱相关性。为了有效寻找优秀代理,我们开发了一个自动搜索代理框架 дляMQ。在特定的搜索空间中,我们采用了现有的代理和演化算法来找到最佳相关的MQ代理。我们提出了一种多样性激发选择策略和兼容性检查协议,以避免早期 converges和提高搜索效率。因此,我们的演化代理 для混合精度量化~(EMQ)框架可以自动生成代理,无需重重的调整和专家知识。我们的实验结果表明,在ImageNet上使用不同的ResNet和MobileNet家族时,我们的EMQ可以在相对较少的成本下达到当前混合精度方法的优秀性能。代码将被发布。

PPN: Parallel Pointer-based Network for Key Information Extraction with Complex Layouts

  • paper_url: http://arxiv.org/abs/2307.10551
  • repo_url: None
  • paper_authors: Kaiwen Wei, Jie Yao, Jingyuan Zhang, Yangyang Kang, Fubang Zhao, Yating Zhang, Changlong Sun, Xin Jin, Xin Zhang
  • for: 本研究旨在提高键信息EXTRACTION(KIE)任务中的结构化值semantic entityextraction的精度和效率,以满足实际世界中的复杂场景需求。
  • methods: 本文提出了一种新的大规模人工标注数据集 named Complex Layout form for key information EXtraction (CLEX),以及一种基于并行指针的网络模型(PPN),可以在零shot和几shot情况下应用。PPN可以利用semantic entity之间的隐式做法来帮助提取,并且其并行提取机制可以同时和高效地提取多个结果。
  • results: 对CLEX数据集进行测试,PPN模型比现有状态的方法更高效,同时具有更快的推理速度。
    Abstract Key Information Extraction (KIE) is a challenging multimodal task that aims to extract structured value semantic entities from visually rich documents. Although significant progress has been made, there are still two major challenges that need to be addressed. Firstly, the layout of existing datasets is relatively fixed and limited in the number of semantic entity categories, creating a significant gap between these datasets and the complex real-world scenarios. Secondly, existing methods follow a two-stage pipeline strategy, which may lead to the error propagation problem. Additionally, they are difficult to apply in situations where unseen semantic entity categories emerge. To address the first challenge, we propose a new large-scale human-annotated dataset named Complex Layout form for key information EXtraction (CLEX), which consists of 5,860 images with 1,162 semantic entity categories. To solve the second challenge, we introduce Parallel Pointer-based Network (PPN), an end-to-end model that can be applied in zero-shot and few-shot scenarios. PPN leverages the implicit clues between semantic entities to assist extracting, and its parallel extraction mechanism allows it to extract multiple results simultaneously and efficiently. Experiments on the CLEX dataset demonstrate that PPN outperforms existing state-of-the-art methods while also offering a much faster inference speed.
    摘要 《键信息提取(KIE)是一项具有挑战性的多Modal任务,旨在从视觉丰富的文档中提取结构化的值含义实体。尽管已经取得了显著的进步,但还有两个主要挑战需要解决。首先,现有的数据集的布局相对固定,数量有限,与实际世界场景相比存在差距。其次,现有的方法采用两阶段管道策略,可能会导致错误堆叠问题。此外,它们难以应用于新的Semantic实体类型出现的情况。为了解决第一个挑战,我们提出了一个新的大规模人工标注数据集 named Complex Layout form for key information EXtraction(CLEX),该数据集包含5860张图像和1162个Semantic实体类型。为了解决第二个挑战,我们引入了并行指针网络(PPN),这是一种端到端模型,可以在零 shot和几 shot情况下应用。PPN利用Semantic实体之间的隐式做法来帮助提取,并且其并行提取机制使得它可以同时提取多个结果,高效地。实验表明,PPN在CLEX数据集上的性能明显超过了现有的状态态先进方法,同时也提供了 Much faster的推理速度。》

Dynamic Large Language Models on Blockchains

  • paper_url: http://arxiv.org/abs/2307.10549
  • repo_url: None
  • paper_authors: Yuanhao Gong
  • for: 这篇论文是为了提出一种基于区块链的动态大语言模型训练和部署方法,以解决现有大语言模型训练和部署所需的高 computation performance 和静态性的问题。
  • methods: 本篇论文提出了一种基于区块链的方法,通过将大语言模型训练和部署 onto 区块链上,以获得高 computation performance 和分布式的优势。此外,本篇论文还提出了一种基于用户输入的动态训练方法,让模型可以不断地学习用户的反馈。
  • results: 本篇论文的结果显示,基于区块链的动态大语言模型可以在不同的用户输入下进行不断的学习和改善,并且可以提供更高的准确率和更好的使用者体验。此外,本篇论文的结果还显示出了基于区块链的大语言模型训练和部署的可行性和可调性。
    Abstract Training and deploying the large language models requires a large mount of computational resource because the language models contain billions of parameters and the text has thousands of tokens. Another problem is that the large language models are static. They are fixed after the training process. To tackle these issues, in this paper, we propose to train and deploy the dynamic large language model on blockchains, which have high computation performance and are distributed across a network of computers. A blockchain is a secure, decentralized, and transparent system that allows for the creation of a tamper-proof ledger for transactions without the need for intermediaries. The dynamic large language models can continuously learn from the user input after the training process. Our method provides a new way to develop the large language models and also sheds a light on the next generation artificial intelligence systems.
    摘要 translate("Training and deploying the large language models requires a large mount of computational resource because the language models contain billions of parameters and the text has thousands of tokens. Another problem is that the large language models are static. They are fixed after the training process. To tackle these issues, in this paper, we propose to train and deploy the dynamic large language model on blockchains, which have high computation performance and are distributed across a network of computers. A blockchain is a secure, decentralized, and transparent system that allows for the creation of a tamper-proof ledger for transactions without the need for intermediaries. The dynamic large language models can continuously learn from the user input after the training process. Our method provides a new way to develop the large language models and also sheds a light on the next generation artificial intelligence systems.")Here's the translation:训练和部署大语言模型需要巨量计算资源,因为模型包含数十亿参数,文本有千个token。另一个问题是大语言模型是静态的,它们在训练过程后固化。为解决这些问题,在这篇论文中,我们提议在区块链上训练和部署动态大语言模型,区块链具有高计算性能和分布在计算机网络上,可以创建一个不可篡改的交易记录,无需中介人员。动态大语言模型可以在用户输入后继续学习。我们的方法提供了一种新的大语言模型开发方式,也照亮了下一代人工智能系统。

TREA: Tree-Structure Reasoning Schema for Conversational Recommendation

  • paper_url: http://arxiv.org/abs/2307.10543
  • repo_url: https://github.com/windylee0822/trea
  • paper_authors: Wendi Li, Wei Wei, Xiaoye Qu, Xian-Ling Mao, Ye Yuan, Wenfeng Xie, Dangyang Chen
  • for: 提高对话 Context 理解,提供更有针对性的 Response。
  • methods: 利用多层次可拓展 Tree 结构来解释 causality 关系,并充分利用历史对话来生成更有针对性的 Response。
  • results: 在两个公共 CRS 数据集上进行了广泛的实验,证明了我们的方法的有效性。
    Abstract Conversational recommender systems (CRS) aim to timely trace the dynamic interests of users through dialogues and generate relevant responses for item recommendations. Recently, various external knowledge bases (especially knowledge graphs) are incorporated into CRS to enhance the understanding of conversation contexts. However, recent reasoning-based models heavily rely on simplified structures such as linear structures or fixed-hierarchical structures for causality reasoning, hence they cannot fully figure out sophisticated relationships among utterances with external knowledge. To address this, we propose a novel Tree structure Reasoning schEmA named TREA. TREA constructs a multi-hierarchical scalable tree as the reasoning structure to clarify the causal relationships between mentioned entities, and fully utilizes historical conversations to generate more reasonable and suitable responses for recommended results. Extensive experiments on two public CRS datasets have demonstrated the effectiveness of our approach.
    摘要 对话式推荐系统 (CRS) 目的是在对话中追踪用户的动态喜好,并为其提供相应的回答以进行物品推荐。现在,许多外部知识库 (特别是知识 graphs) 被 incorporated into CRS 以增强对话上下文的理解。然而,最近的推理基于模型倾向于使用简单的结构,如线性结构或固定层次结构,以进行 causality 推理,因此无法完全理解对话中的复杂关系。为解决这个问题,我们提出了一个 noval Tree structure Reasoning schEmA 名为 TREA。TREA 使用多层次可扩展的树结构来clarify 受到提及的实体之间的 causal 关系,并充分利用历史对话来产生更合理和适合的回答来进行推荐。实验结果显示了我们的方法的有效性。

The Extractive-Abstractive Axis: Measuring Content “Borrowing” in Generative Language Models

  • paper_url: http://arxiv.org/abs/2307.11779
  • repo_url: None
  • paper_authors: Nedelina Teneva
  • for: 本文提出了一个新的评估生成模型的方法,即EXTRACTIVE-ABSTRACTIVE轴。
  • methods: 本文使用了生成模型的抽象性和EXTRACTIVE-ABSTRACTIVE轴来评估生成模型的性能。
  • results: 本文提出了一些相关的指标、数据集和注解指南,以便评估生成模型的抽象性和EXTRACTIVE-ABSTRACTIVE轴。
    Abstract Generative language models produce highly abstractive outputs by design, in contrast to extractive responses in search engines. Given this characteristic of LLMs and the resulting implications for content Licensing & Attribution, we propose the the so-called Extractive-Abstractive axis for benchmarking generative models and highlight the need for developing corresponding metrics, datasets and annotation guidelines. We limit our discussion to the text modality.
    摘要 Generative语言模型会生成非常抽象的输出,与搜索引擎的EXTRACTIVE响应不同。由于这一特点和内容授权的后果,我们提出了所谓的EXTRACTIVE-ABSTRACTIVE轴,用于评估生成模型,并需要开发相应的指标、数据集和注释指南。我们只讨论文本 modalities。Note: "EXTRACTIVE-ABSTRACTIVE轴" is a made-up term, and "文本modalities" is the plural form of "文本modalities" in Chinese.

Fast Unsupervised Deep Outlier Model Selection with Hypernetworks

  • paper_url: http://arxiv.org/abs/2307.10529
  • repo_url: None
  • paper_authors: Xueying Ding, Yue Zhao, Leman Akoglu
  • for: 本文旨在提出一种有效的超参数(HP)调整方法,以提高无监督的异常点检测(OD)模型的性能。
  • methods: 本文提出了一种新的嵌入式神经网络(DOD)模型,并使用了一种名为HYPER的 hypernetwork(HN)来调整DOD模型的超参数。HYPER使用了一种新的 meta-学arning 技术,以使用历史的OD任务中的标签来训练一个代理验证函数。
  • results: 在35个OD任务上,HYPER achieved high performance against 8 baselines with significant efficiency gains.
    Abstract Outlier detection (OD) finds many applications with a rich literature of numerous techniques. Deep neural network based OD (DOD) has seen a recent surge of attention thanks to the many advances in deep learning. In this paper, we consider a critical-yet-understudied challenge with unsupervised DOD, that is, effective hyperparameter (HP) tuning/model selection. While several prior work report the sensitivity of OD models to HPs, it becomes ever so critical for the modern DOD models that exhibit a long list of HPs. We introduce HYPER for tuning DOD models, tackling two fundamental challenges: (1) validation without supervision (due to lack of labeled anomalies), and (2) efficient search of the HP/model space (due to exponential growth in the number of HPs). A key idea is to design and train a novel hypernetwork (HN) that maps HPs onto optimal weights of the main DOD model. In turn, HYPER capitalizes on a single HN that can dynamically generate weights for many DOD models (corresponding to varying HPs), which offers significant speed-up. In addition, it employs meta-learning on historical OD tasks with labels to train a proxy validation function, likewise trained with our proposed HN efficiently. Extensive experiments on 35 OD tasks show that HYPER achieves high performance against 8 baselines with significant efficiency gains.
    摘要 外异检测(OD)在许多应用中找到了广泛的应用,而且有许多技术的研究。深度神经网络基于的外异检测(DOD)在最近几年得到了广泛的关注,因为深度学习技术的进步。在这篇论文中,我们考虑了一个尚未得到充分研究的挑战:对于无监督的外异检测模型,有效地调整超参数(HP)。虽然之前的研究已经证明了外异检测模型对HP的敏感性,但是现代DOD模型的HP列表却非常长。我们提出了一种名为HYPER的方法,用于调整DOD模型。HYPER解决了两个基本挑战:无监督验证(由于缺乏异常数据)和高效地搜索HP/模型空间(由于HP的数量的增长)。我们的关键想法是设计和训练一个新的超网络(HN),将HP映射到外异检测模型的优化参数。然后,HYPER可以通过单个HN来动态生成多个DOD模型(对应于不同的HP),从而提供了显著的速度提升。此外,它还使用元学习来训练一个代理验证函数,这个函数通过我们提出的HN有效地训练。我们对35个OD任务进行了广泛的实验,结果显示HYPER可以高效地与8个基准模型进行比较,同时具有显著的效率优势。

Building Socio-culturally Inclusive Stereotype Resources with Community Engagement

  • paper_url: http://arxiv.org/abs/2307.10514
  • repo_url: None
  • paper_authors: Sunipa Dev, Jaya Goyal, Dinesh Tewari, Shachi Dave, Vinodkumar Prabhakaran
  • For: The paper aims to address the need for more culturally sensitive and representative evaluation resources for generative language models, specifically in the Indian context.* Methods: The authors use a community-engaged approach to build a resource of stereotypes unique to India, which increases the number of stereotypes known for the Indian context by over 1000.* Results: The authors demonstrate the utility and effectiveness of the expanded resource for evaluating language models and show that it can help identify harmful stereotypes that may be overlooked by traditional evaluation methods.Here are the three points in Simplified Chinese text:* For: 本文目的是为了增强语言模型评估资源的文化敏感性和代表性,特别是在印度上。* Methods: 作者采用社区参与的方法建立了印度特有的刻板印象资源,该资源在印度上增加了More than 1000的刻板印象。* Results: 作者表明了扩展资源的有用性和效果,并示出它可以帮助确定语言模型中的刻板印象,这些刻板印象可能被传统评估方法忽略。
    Abstract With rapid development and deployment of generative language models in global settings, there is an urgent need to also scale our measurements of harm, not just in the number and types of harms covered, but also how well they account for local cultural contexts, including marginalized identities and the social biases experienced by them. Current evaluation paradigms are limited in their abilities to address this, as they are not representative of diverse, locally situated but global, socio-cultural perspectives. It is imperative that our evaluation resources are enhanced and calibrated by including people and experiences from different cultures and societies worldwide, in order to prevent gross underestimations or skews in measurements of harm. In this work, we demonstrate a socio-culturally aware expansion of evaluation resources in the Indian societal context, specifically for the harm of stereotyping. We devise a community engaged effort to build a resource which contains stereotypes for axes of disparity that are uniquely present in India. The resultant resource increases the number of stereotypes known for and in the Indian context by over 1000 stereotypes across many unique identities. We also demonstrate the utility and effectiveness of such expanded resources for evaluations of language models. CONTENT WARNING: This paper contains examples of stereotypes that may be offensive.
    摘要 随着生成语言模型在全球范围内的快速发展和应用,有一定的急需要扩大我们对害的衡量,不仅包括各种各样的害的类型和数量,还要考虑当地文化上下文,包括弱化的标签和社会偏见。现有的评估方法有限,因为它们没有代表多样化的、全球化的社会文化观点。为了避免严重的下预估或偏见,我们需要加强和调整我们的评估资源,包括从不同文化和社会世界中的人和经验中获取知识。在这种情况下,我们展示了一种具有社会文化意识的评估资源扩展方法,特别是在印度社会上下文中,对刻板印度人的害进行了社区参与的努力。我们制定了一个含有印度独特的负担轴的 sterotypes 资源,该资源包含了在印度上下文中独特的1000多个刻板印度人。我们还证明了这种扩展资源的有用性和效果,用于评估语言模型。警告:本文可能包含有害的刻板印度人示例。

IvyGPT: InteractiVe Chinese pathwaY language model in medical domain

  • paper_url: http://arxiv.org/abs/2307.10512
  • repo_url: None
  • paper_authors: Rongsheng Wang, Yaofei Duan, ChanTong Lam, Jiexi Chen, Jiangsheng Xu, Haoming Chen, Xiaohong Liu, Patrick Cheong-Iao Pang, Tao Tan
  • for: 这个论文是为了提出一种基于 LLaMA 的大型自然语言处理模型(IvyGPT),用于医疗问答和诊断。
  • methods: 这个论文使用了高质量医疗问答(QA)实例和人工回馈学习(RLHF)来训练和精度调整 IvyGPT。
  • results: 实验结果显示,IvyGPT 已经超越了其他医疗 GPT 模型,并且可以输出更加详细的诊断和治疗答案。
    Abstract General large language models (LLMs) such as ChatGPT have shown remarkable success. However, such LLMs have not been widely adopted for medical purposes, due to poor accuracy and inability to provide medical advice. We propose IvyGPT, an LLM based on LLaMA that is trained and fine-tuned with high-quality medical question-answer (QA) instances and Reinforcement Learning from Human Feedback (RLHF). After supervised fine-tuning, IvyGPT has good multi-turn conversation capabilities, but it cannot perform like a doctor in other aspects, such as comprehensive diagnosis. Through RLHF, IvyGPT can output richer diagnosis and treatment answers that are closer to human. In the training, we used QLoRA to train 33 billion parameters on a small number of NVIDIA A100 (80GB) GPUs. Experimental results show that IvyGPT has outperformed other medical GPT models.
    摘要 通用大型语言模型(LLM)如ChatGPT已经表现出了惊人的成功。然而,这些LLM还没有广泛应用于医疗领域,主要因为它们的精度不高并无法提供医学建议。我们提出了IvyGPT,基于LLaMA的LLM,通过高质量的医学问答(QA)实例和人工智能反馈学习(RLHF)进行训练和细化。经过超vision训练,IvyGPT具有良好的多turn对话能力,但它无法像医生一样在其他方面做出全面诊断。通过RLHF,IvyGPT可以输出更加丰富的诊断和治疗答案,更加接近人类。在训练中,我们使用了QLoRA来训练330亿参数的NVIDIA A100(80GB)GPU。实验结果表明,IvyGPT已经超过了其他医学GPT模型。

Markov Decision Processes with Time-Varying Geometric Discounting

  • paper_url: http://arxiv.org/abs/2307.10491
  • repo_url: None
  • paper_authors: Jiarui Gan, Annika Hennes, Rupak Majumdar, Debmalya Mandal, Goran Radanovic
  • for: 研究无穷 infinithorizon Markov decision processes(MDPs)中的时间变化的折损因子模型。
  • methods: 从游戏观点出发,每个时间步骤 treated as an independent decision maker with their own(固定)折损因子,研究其相对稳定点(SPE)的游戏观点和相关的算法问题。
  • results: 存在一个SPE的构造证明,并证明计算SPE是EXPTIME-hard的。此外,还证明存在一个$\epsilon$-SPE,并提供了一种计算$\epsilon$-SPE的算法,其时间复杂度upper bound为函数于时间变化的折损因子的收敛性。
    Abstract Canonical models of Markov decision processes (MDPs) usually consider geometric discounting based on a constant discount factor. While this standard modeling approach has led to many elegant results, some recent studies indicate the necessity of modeling time-varying discounting in certain applications. This paper studies a model of infinite-horizon MDPs with time-varying discount factors. We take a game-theoretic perspective -- whereby each time step is treated as an independent decision maker with their own (fixed) discount factor -- and we study the subgame perfect equilibrium (SPE) of the resulting game as well as the related algorithmic problems. We present a constructive proof of the existence of an SPE and demonstrate the EXPTIME-hardness of computing an SPE. We also turn to the approximate notion of $\epsilon$-SPE and show that an $\epsilon$-SPE exists under milder assumptions. An algorithm is presented to compute an $\epsilon$-SPE, of which an upper bound of the time complexity, as a function of the convergence property of the time-varying discount factor, is provided.
    摘要 标准的马尔可夫决策过程(MDP)模型通常使用固定的折扣因子进行减折扣。然而,一些最近的研究表明,在某些应用场景中,时变折扣是必要的。这篇论文研究了无穷 horizon MDP 中的时变折扣因子。我们从游戏观点出发,即每个时间步骤都是一个独立的决策者,每个决策者都有自己的固定折扣因子。我们研究这个游戏的子游戏完善平衡(SPE)以及相关的算法问题。我们提供了一个构造性的证明,证明了 SPE 的存在,并证明了计算 SPE 的复杂度是 EXPTIME 困难的。此外,我们还研究了 $\epsilon $-SPE 的概念,并证明了在较宽的假设下, $\epsilon $-SPE 存在。我们还提供了一个算法来计算 $\epsilon $-SPE,并给出了时间复杂度的Upper bound,具体取决于时变折扣因子的收敛性。

(Ab)using Images and Sounds for Indirect Instruction Injection in Multi-Modal LLMs

  • paper_url: http://arxiv.org/abs/2307.10490
  • repo_url: https://github.com/ebagdasa/multimodal_injection
  • paper_authors: Eugene Bagdasaryan, Tsung-Yin Hsieh, Ben Nassi, Vitaly Shmatikov
  • for: 这篇论文旨在描述如何使用图像和声音作为多模态语言模型(LLM)的间接提示和指令注入攻击。
  • methods: 袋虫生成抗击噪阻抗器,并将其混合到图像或音频记录中。当用户问问 benign 模型关于损害图像或音频时,抗击噪阻会导致模型输出攻击者选择的文本和/或使 subsequential dialog 遵循攻击者的指令。
  • results: 论文通过several proof-of-concept例子,证明了这种攻击可以成功地控制 LLVA 和 PandaGPT 等多模态语言模型。
    Abstract We demonstrate how images and sounds can be used for indirect prompt and instruction injection in multi-modal LLMs. An attacker generates an adversarial perturbation corresponding to the prompt and blends it into an image or audio recording. When the user asks the (unmodified, benign) model about the perturbed image or audio, the perturbation steers the model to output the attacker-chosen text and/or make the subsequent dialog follow the attacker's instruction. We illustrate this attack with several proof-of-concept examples targeting LLaVa and PandaGPT.
    摘要 我们展示了图像和声音可以用来导入多 modal LLM 中的间接提示和指令注入攻击。攻击者创建了这些提示的恶意变化,然后把它与图像或音频录音混合在一起。当用户对未修改的、良好的模型询问这些图像或音频时,变化将引导模型发出攻击者选择的文本和/或使之在对话中按照攻击者的指令继续。我们透过几个证明例子,显示了这种攻击可以对 LLaVa 和 PandaGPT 进行。

Backdoor Attack against Object Detection with Clean Annotation

  • paper_url: http://arxiv.org/abs/2307.10487
  • repo_url: None
  • paper_authors: Yize Cheng, Wenbin Hu, Minhao Cheng
  • for: 防止深度神经网络中的攻击,特别是对象检测任务中的攻击。
  • methods: 利用深度学习中的特性,对对象检测任务进行袋式攻击,包括对象消失攻击和对象生成攻击。
  • results: 在PASCAL VOC07+12和MSCOCO两个对象检测数据集上,实现了对象检测攻击成功率超过92%,且杂点率仅5%。
    Abstract Deep neural networks (DNNs) have shown unprecedented success in object detection tasks. However, it was also discovered that DNNs are vulnerable to multiple kinds of attacks, including Backdoor Attacks. Through the attack, the attacker manages to embed a hidden backdoor into the DNN such that the model behaves normally on benign data samples, but makes attacker-specified judgments given the occurrence of a predefined trigger. Although numerous backdoor attacks have been experimented on image classification, backdoor attacks on object detection tasks have not been properly investigated and explored. As object detection has been adopted as an important module in multiple security-sensitive applications such as autonomous driving, backdoor attacks on object detection could pose even more severe threats. Inspired by the inherent property of deep learning-based object detectors, we propose a simple yet effective backdoor attack method against object detection without modifying the ground truth annotations, specifically focusing on the object disappearance attack and object generation attack. Extensive experiments and ablation studies prove the effectiveness of our attack on two benchmark object detection datasets, PASCAL VOC07+12 and MSCOCO, on which we achieve an attack success rate of more than 92% with a poison rate of only 5%.
    摘要 Inspired by the inherent properties of deep learning-based object detectors, we propose a simple yet effective backdoor attack method against object detection without modifying the ground truth annotations. We focus on two types of attacks: object disappearance and object generation. Our attack method can achieve an attack success rate of over 92% with a poison rate of only 5% on two popular object detection datasets, PASCAL VOC07+12 and MSCOCO.Extensive experiments and ablation studies demonstrate the effectiveness of our attack method. We also show that existing defenses against backdoor attacks are ineffective against our method. Our findings highlight the urgent need for more robust defenses against backdoor attacks in object detection tasks.

Can Instruction Fine-Tuned Language Models Identify Social Bias through Prompting?

  • paper_url: http://arxiv.org/abs/2307.10472
  • repo_url: None
  • paper_authors: Omkar Dige, Jacob-Junqi Tian, David Emerson, Faiza Khan Khattak
  • for: 这个论文是为了评估指定 fine-tuned 语言模型中的偏见性而写的。
  • methods: 这个论文使用了零shot提问的方法来评估语言模型的偏见性。
  • results: 研究发现,使用 Alpaca 7B 模型,在偏见性识别任务上获得了56.7%的准确率,而 scaling up LLM 大小和数据多样性可能会导致更好的性能。
    Abstract As the breadth and depth of language model applications continue to expand rapidly, it is increasingly important to build efficient frameworks for measuring and mitigating the learned or inherited social biases of these models. In this paper, we present our work on evaluating instruction fine-tuned language models' ability to identify bias through zero-shot prompting, including Chain-of-Thought (CoT) prompts. Across LLaMA and its two instruction fine-tuned versions, Alpaca 7B performs best on the bias identification task with an accuracy of 56.7%. We also demonstrate that scaling up LLM size and data diversity could lead to further performance gain. This is a work-in-progress presenting the first component of our bias mitigation framework. We will keep updating this work as we get more results.
    摘要 为了应对语言模型应用的快速扩展,现在越来越重要建立有效的测试和干预模型学习或遗传社会偏见的框架。在这篇论文中,我们展示了我们在零条件提示中评估推广语言模型的偏见识别能力,包括链接思维(CoT)提示。在LLaMA和其两个导入精通版本中,Alpaca 7B在偏见识别任务上表现最好,具体的准确率为56.7%。我们也示出了将LLM大小和数据多样性增加可以带来更大的性能提升。这是我们的偏见干预框架的首个部分,我们将继续更新这个工作,当我们获得更多结果时。

Classification of Visualization Types and Perspectives in Patents

  • paper_url: http://arxiv.org/abs/2307.10471
  • repo_url: https://github.com/tibhannover/patentimageclassification
  • paper_authors: Junaid Ahmed Ghauri, Eric Müller-Budack, Ralph Ewerth
  • for: 这篇论文的目的是提高专利搜寻和检索的效率,使用最新的深度学习方法来分类专利图像中的不同类型和角度。
  • methods: 本论文使用了现今的深度学习方法,包括矩阵变数储存(Transformers),来分类专利图像中的不同类型和角度。
  • results: 实验结果显示了提案的方法的可行性,并且提供了可用于实际应用的代码、模型和数据集。
    Abstract Due to the swift growth of patent applications each year, information and multimedia retrieval approaches that facilitate patent exploration and retrieval are of utmost importance. Different types of visualizations (e.g., graphs, technical drawings) and perspectives (e.g., side view, perspective) are used to visualize details of innovations in patents. The classification of these images enables a more efficient search and allows for further analysis. So far, datasets for image type classification miss some important visualization types for patents. Furthermore, related work does not make use of recent deep learning approaches including transformers. In this paper, we adopt state-of-the-art deep learning methods for the classification of visualization types and perspectives in patent images. We extend the CLEF-IP dataset for image type classification in patents to ten classes and provide manual ground truth annotations. In addition, we derive a set of hierarchical classes from a dataset that provides weakly-labeled data for image perspectives. Experimental results have demonstrated the feasibility of the proposed approaches. Source code, models, and dataset will be made publicly available.
    摘要 In this paper, we employ state-of-the-art deep learning techniques for the classification of visualization types and perspectives in patent images. We expand the CLEF-IP dataset for image type classification in patents to ten classes and provide manual ground truth annotations. Additionally, we derive a set of hierarchical classes from a dataset that provides weakly-labeled data for image perspectives. Our experimental results demonstrate the feasibility of the proposed approaches. The source code, models, and dataset will be publicly available.

A data science axiology: the nature, value, and risks of data science

  • paper_url: http://arxiv.org/abs/2307.10460
  • repo_url: None
  • paper_authors: Michael L. Brodie
  • for: 这篇论文是为了探讨数据科学的axiology,即其目的、性质、重要性、风险和价值,以帮助理解和定义数据科学,并找到其可能的 beneficii、风险和研究挑战。
  • methods: 本论文使用了 axiological 方法,通过探讨数据科学的remarkable和definitive特性来评估其值和风险。
  • results: 本论文认为,数据科学在其初始阶段,axiology 可以帮助我们更好地理解和定义它,并找到其可能的 beneficii、风险和研究挑战。
    Abstract Data science is not a science. It is a research paradigm with an unfathomed scope, scale, complexity, and power for knowledge discovery that is not otherwise possible and can be beyond human reasoning. It is changing our world practically and profoundly already widely deployed in tens of thousands of applications in every discipline in an AI Arms Race that, due to its inscrutability, can lead to unfathomed risks. This paper presents an axiology of data science, its purpose, nature, importance, risks, and value for problem solving, by exploring and evaluating its remarkable, definitive features. As data science is in its infancy, this initial, speculative axiology is intended to aid in understanding and defining data science to recognize its potential benefits, risks, and open research challenges. AI based data science is inherently about uncertainty that may be more realistic than our preference for the certainty of science. Data science will have impacts far beyond knowledge discovery and will take us into new ways of understanding the world.
    摘要 数据科学不是一门科学。它是一种研究方法论,具有未曾探索的范围、大小、复杂性和知识发现的能力,超过人类的理解。它正在改变我们的世界,在每个领域普遍应用了数以千计的应用程序,并在人工智能竞赛中广泛应用。由于它的不可知晓性,这可能会导致不可预期的风险。本文提出了数据科学的axiology,即其目的、性质、重要性、风险和问题解决的价值,通过探索和评估它的卓越特征来帮助理解和定义数据科学,并识别其潜在的好处、风险和研究挑战。由于数据科学处于其初期阶段,这个初步的、观测的axiology可以帮助我们更好地理解和定义它,并掌握其潜在的价值和风险。人工智能基础的数据科学是不确定的,可能比我们更好地适应现实世界。数据科学将对我们的世界产生深远的影响,将带我们进入新的世界理解方式。

A New Computationally Simple Approach for Implementing Neural Networks with Output Hard Constraints

  • paper_url: http://arxiv.org/abs/2307.10459
  • repo_url: https://github.com/sayantann11/all-classification-templetes-for-ML
  • paper_authors: Andrei V. Konstantinov, Lev V. Utkin
  • for: 这篇论文的目的是提出一种新的计算简单的神经网络输出值约束方法。
  • methods: 该方法的关键思想是将神经网络参数 вектор映射到一个确保处于可能的约束集的点上。该映射是通过额外的神经网络层实现,该层带有约束条件。
  • results: 该方法可以简单地扩展到对输出 Vector 以外的同时受约束的情况,并且可以简单地实现投影方法来实现约束。该方法的计算简单,前向传播的复杂度为O(n*m)和O(n^2*m)。 numerics 实验用于解决优化和分类问题。
    Abstract A new computationally simple method of imposing hard convex constraints on the neural network output values is proposed. The key idea behind the method is to map a vector of hidden parameters of the network to a point that is guaranteed to be inside the feasible set defined by a set of constraints. The mapping is implemented by the additional neural network layer with constraints for output. The proposed method is simply extended to the case when constraints are imposed not only on the output vectors, but also on joint constraints depending on inputs. The projection approach to imposing constraints on outputs can simply be implemented in the framework of the proposed method. It is shown how to incorporate different types of constraints into the proposed method, including linear and quadratic constraints, equality constraints, and dynamic constraints, constraints in the form of boundaries. An important feature of the method is its computational simplicity. Complexities of the forward pass of the proposed neural network layer by linear and quadratic constraints are O(n*m) and O(n^2*m), respectively, where n is the number of variables, m is the number of constraints. Numerical experiments illustrate the method by solving optimization and classification problems. The code implementing the method is publicly available.
    摘要 一种新的 computationally simple method for imposing hard convex constraints on neural network output values is proposed. The key idea behind the method is to map a vector of hidden parameters of the network to a point that is guaranteed to be inside the feasible set defined by a set of constraints. The mapping is implemented by an additional neural network layer with constraints on the output. The proposed method can be easily extended to the case when constraints are imposed not only on the output vectors, but also on joint constraints depending on inputs. The projection approach to imposing constraints on outputs can also be simply implemented in the framework of the proposed method. It is shown how to incorporate different types of constraints into the proposed method, including linear and quadratic constraints, equality constraints, and dynamic constraints, as well as constraints in the form of boundaries. An important feature of the method is its computational simplicity, with the complexities of the forward pass of the proposed neural network layer being O(n\*m) and O(n^2\*m), respectively, where n is the number of variables and m is the number of constraints. Numerical experiments illustrate the method by solving optimization and classification problems, and the code implementing the method is publicly available.

Complying with the EU AI Act

  • paper_url: http://arxiv.org/abs/2307.10458
  • repo_url: None
  • paper_authors: Jacintha Walters, Diptish Dey, Debarati Bhaumik, Sophie Horsman
  • for: 本研究旨在描述欧盟人工智能法案(AI Act)中不同类别的实施情况,并通过问卷调查获取量化数据,以提供有关组织实施AI Act的启示。
  • methods: 本研究使用问卷调查方法来收集数据,并分析数据以挖掘不同类别组织面临的挑战,以及这些挑战如何与组织特点相关。
  • results: 研究发现,不同类别组织面临的挑战有所不同,而大型和特定领域的组织面临更大的挑战。此外,问卷调查还显示了各个问题的占比,包括AI Act的内容和应用方面。
    Abstract The EU AI Act is the proposed EU legislation concerning AI systems. This paper identifies several categories of the AI Act. Based on this categorization, a questionnaire is developed that serves as a tool to offer insights by creating quantitative data. Analysis of the data shows various challenges for organizations in different compliance categories. The influence of organization characteristics, such as size and sector, is examined to determine the impact on compliance. The paper will also share qualitative data on which questions were prevalent among respondents, both on the content of the AI Act as the application. The paper concludes by stating that there is still room for improvement in terms of compliance with the AIA and refers to a related project that examines a solution to help these organizations.
    摘要 Translation notes:* "AI Act" is translated as "欧盟AI法规" (EU AI Act)* "categories" is translated as "类别" (categories)* "questionnaire" is translated as "问卷" (questionnaire)* "compliance" is translated as "合规" (compliance)* "organization" is translated as "组织" (organization)* "size" is translated as "大小" (size)* "sector" is translated as "领域" (sector)* "qualitative data" is translated as "质量数据" (qualitative data)* "content" is translated as "内容" (content)* "application" is translated as "应用" (application)* "improvement" is translated as "改进" (improvement)* "solution" is translated as "解决方案" (solution)

A Step Towards Worldwide Biodiversity Assessment: The BIOSCAN-1M Insect Dataset

  • paper_url: http://arxiv.org/abs/2307.10455
  • repo_url: https://github.com/zahrag/BIOSCAN-1M
  • paper_authors: Zahra Gharaee, ZeMing Gong, Nicholas Pellegrino, Iuliia Zarubiieva, Joakim Bruslund Haurum, Scott C. Lowe, Jaclyn T. A. McKeown, Chris C. Y. Ho, Joschka McLeod, Yi-Yun C Wei, Jireh Agda, Sujeevan Ratnasingham, Dirk Steinke, Angel X. Chang, Graham W. Taylor, Paul Fieguth
  • for: 本研究旨在开发一个基于图像的生物多样性调查方法,以探索全球生物多样性的详细结构。
  • methods: 该研究使用了大量手动标注的昆虫图像集,以及相关的遗传信息,包括raw nucleotide barcode sequences和归类指标。
  • results: 研究人员通过实现和分析一种基线分类器来介绍图像基于的生物分类问题的特点和挑战。
    Abstract In an effort to catalog insect biodiversity, we propose a new large dataset of hand-labelled insect images, the BIOSCAN-Insect Dataset. Each record is taxonomically classified by an expert, and also has associated genetic information including raw nucleotide barcode sequences and assigned barcode index numbers, which are genetically-based proxies for species classification. This paper presents a curated million-image dataset, primarily to train computer-vision models capable of providing image-based taxonomic assessment, however, the dataset also presents compelling characteristics, the study of which would be of interest to the broader machine learning community. Driven by the biological nature inherent to the dataset, a characteristic long-tailed class-imbalance distribution is exhibited. Furthermore, taxonomic labelling is a hierarchical classification scheme, presenting a highly fine-grained classification problem at lower levels. Beyond spurring interest in biodiversity research within the machine learning community, progress on creating an image-based taxonomic classifier will also further the ultimate goal of all BIOSCAN research: to lay the foundation for a comprehensive survey of global biodiversity. This paper introduces the dataset and explores the classification task through the implementation and analysis of a baseline classifier.
    摘要 我团队在尝试 catalog insect 多样性时,我们提出了一个新的大型手标记昆虫图像集合,称为 BIOSCAN-Insect 数据集。每个记录都被taxonomically分类由专家,同时还包括Raw nucleotide barcode sequences和分配给barcode index numbers,这些是基于物种分类的生物 marker。本文介绍了一个精心约 million 张图像集合,主要用于训练计算机视觉模型,以提供图像基本的 taxonomic assessment。然而,该数据集还具有一些吸引人的特点,研究这些特点会对机器学习社区产生感兴趣。由于数据集的生物性质,它展现了一种长尾分布。此外,taxonomic 标签是一种层次分类方案,这使得图像分类问题变得非常细化。本文介绍了数据集和基线分类器的实现和分析,以便进一步推动生物多样性研究在机器学习社区中的发展。

Learning Formal Specifications from Membership and Preference Queries

  • paper_url: http://arxiv.org/abs/2307.10434
  • repo_url: None
  • paper_authors: Ameesh Shah, Marcell Vazquez-Chanlatte, Sebastian Junges, Sanjit A. Seshia
  • for: 学习正式规定,如自动机。
  • methods: 提议一种新的框架,异步请求组合会员标签和对比 preference,而不仅仅是会员标签。
  • results: 在两个不同领域中实现了框架,并显示了通过组合Modalities来异步学习规定的可靠性和方便性。
    Abstract Active learning is a well-studied approach to learning formal specifications, such as automata. In this work, we extend active specification learning by proposing a novel framework that strategically requests a combination of membership labels and pair-wise preferences, a popular alternative to membership labels. The combination of pair-wise preferences and membership labels allows for a more flexible approach to active specification learning, which previously relied on membership labels only. We instantiate our framework in two different domains, demonstrating the generality of our approach. Our results suggest that learning from both modalities allows us to robustly and conveniently identify specifications via membership and preferences.
    摘要 aktive lerning 是一种已经广泛研究的学习形式,用于学习正式规范,如自动机。在这项工作中,我们延伸 aktive specification learning 的框架,提议一种新的框架,强调策略性地请求混合成员标签和对比标签,这是成员标签的受欢迎替代方案。这种混合的方式允许更加灵活地进行 aktive specification learning,之前只能通过成员标签进行学习。我们在两个不同的领域中实现了我们的框架,证明了我们的方法的一致性。我们的结果表明,从两种模式中学习可以robustly和方便地识别规范via成员和偏好。

PreDiff: Precipitation Nowcasting with Latent Diffusion Models

  • paper_url: http://arxiv.org/abs/2307.10422
  • repo_url: None
  • paper_authors: Zhihan Gao, Xingjian Shi, Boran Han, Hao Wang, Xiaoyong Jin, Danielle Maddix, Yi Zhu, Mu Li, Yuyang Wang
  • for: 这篇研究旨在提出一个可能性推断模型,以实现地球系统预测中的不确定性处理和专业知识整合。
  • methods: 研究使用了两个阶段构成的潜在阶段推断管线,包括一个名为PreDiff的可能性条件扩散模型,以及一个内置的专业知识控制机制,以确保预测符合物理限制。
  • results: 实验结果显示PreDiff能够有效地处理不确定性,整合专业知识,并产生高操作价值的预测。
    Abstract Earth system forecasting has traditionally relied on complex physical models that are computationally expensive and require significant domain expertise. In the past decade, the unprecedented increase in spatiotemporal Earth observation data has enabled data-driven forecasting models using deep learning techniques. These models have shown promise for diverse Earth system forecasting tasks but either struggle with handling uncertainty or neglect domain-specific prior knowledge, resulting in averaging possible futures to blurred forecasts or generating physically implausible predictions. To address these limitations, we propose a two-stage pipeline for probabilistic spatiotemporal forecasting: 1) We develop PreDiff, a conditional latent diffusion model capable of probabilistic forecasts. 2) We incorporate an explicit knowledge control mechanism to align forecasts with domain-specific physical constraints. This is achieved by estimating the deviation from imposed constraints at each denoising step and adjusting the transition distribution accordingly. We conduct empirical studies on two datasets: N-body MNIST, a synthetic dataset with chaotic behavior, and SEVIR, a real-world precipitation nowcasting dataset. Specifically, we impose the law of conservation of energy in N-body MNIST and anticipated precipitation intensity in SEVIR. Experiments demonstrate the effectiveness of PreDiff in handling uncertainty, incorporating domain-specific prior knowledge, and generating forecasts that exhibit high operational utility.
    摘要 地球系统预报traditionally rely于复杂的物理模型,这些模型 computationally expensive和需要特定领域专家知识。过去一代,随着无 precedented 的类时空 Earth observation data的增加,使得可以使用深度学习技术建立数据驱动的预报模型。这些模型在多种 Earth system预报任务中表现了承诺,但是它们可能会处理不确定性或忽略领域特定的专业知识,导致预报结果模糊或生成物理不合理的预测。为了解决这些限制,我们提议了一个two-stage管道 для抽象的类时空预报:1. 我们开发了一个名为PreDiff的conditional latent diffusion模型,可以实现抽象的预报。2. 我们将Explicit知识控制机制加入管道中,以确保预报与领域特定的物理限制相align。这是通过在每个排除步骤中估算对套用的条件为预报过滤条件,然后调整转换分布而实现。我们在两个数据集上进行了实验:N-body MNIST和SEVIR。Specifically,我们在N-body MNIST中强制遵循能量守恒定律,并在SEVIR中预测预测 precipitation intensity。实验结果显示PreDiff可以高效地处理不确定性,把领域特定的专业知识纳入预报中,并生成高效用的预报。

GOOSE Algorithm: A Powerful Optimization Tool for Real-World Engineering Challenges and Beyond

  • paper_url: http://arxiv.org/abs/2307.10420
  • repo_url: None
  • paper_authors: Rebwar Khalid Hamad, Tarik A. Rashid
  • For: The paper proposes a novel metaheuristic algorithm called GOOSE, which is inspired by the behavior of geese during rest and foraging. The algorithm is designed to solve optimization problems.* Methods: The GOOSE algorithm uses a combination of balance and guarding mechanisms to search for the optimal solution. It is benchmarked on 19 well-known test functions and compared with four other algorithms.* Results: The results show that the GOOSE algorithm outperforms the other algorithms on 10 modern benchmark functions and 5 classical benchmark functions. It is also applied to three real-world engineering challenges and shows good performance in optimizing these problems.
    Abstract This study proposes the GOOSE algorithm as a novel metaheuristic algorithm based on the goose's behavior during rest and foraging. The goose stands on one leg and keeps his balance to guard and protect other individuals in the flock. The GOOSE algorithm is benchmarked on 19 well-known benchmark test functions, and the results are verified by a comparative study with genetic algorithm (GA), particle swarm optimization (PSO), dragonfly algorithm (DA), and fitness dependent optimizer (FDO). In addition, the proposed algorithm is tested on 10 modern benchmark functions, and the gained results are compared with three recent algorithms, such as the dragonfly algorithm, whale optimization algorithm (WOA), and salp swarm algorithm (SSA). Moreover, the GOOSE algorithm is tested on 5 classical benchmark functions, and the obtained results are evaluated with six algorithms, such as fitness dependent optimizer (FDO), FOX optimizer, butterfly optimization algorithm (BOA), whale optimization algorithm, dragonfly algorithm, and chimp optimization algorithm (ChOA). The achieved findings attest to the proposed algorithm's superior performance compared to the other algorithms that were utilized in the current study. The technique is then used to optimize Welded beam design and Economic Load Dispatch Problem, three renowned real-world engineering challenges, and the Pathological IgG Fraction in the Nervous System. The outcomes of the engineering case studies illustrate how well the suggested approach can optimize issues that arise in the real-world.
    摘要 这项研究提出了一种新的元朋凝融算法,基于鹅的休息和搜寻行为。鹅站在一个脚上,保持平衡,以保护和保障鸟群其他成员。这种算法被测试在19个知名的测试函数上,并与遗传算法(GA)、 particle swarm优化算法(PSO)、龙虾算法(DA)和优化器(FDO)进行比较研究。此外,提出的算法还被测试在10个现代测试函数上,并与三种最新的算法,如龙虾算法(WOA)、鳄鱼算法(SSA)和蝴蝶算法(BOA)进行比较。此外,GOOSE算法还被测试在5个经典测试函数上,并与6种算法,如依赖度优化器(FDO)、FOX优化器、蝴蝶算法(BOA)、龙虾算法、鳄鱼算法和猩猩算法(ChOA)进行比较。实验结果证明,提出的算法在与其他算法进行比较时表现出色。然后,这种算法被应用于焊接梁设计和经济荷负调度问题,以及神经系统中免疫力IgG分数的优化问题。工程实践研究的结果表明,这种方法可以高效地解决现实中出现的问题。

Explaining Autonomous Driving Actions with Visual Question Answering

  • paper_url: http://arxiv.org/abs/2307.10408
  • repo_url: https://github.com/shahin-01/vqa-ad
  • paper_authors: Shahin Atakishiyev, Mohammad Salameh, Housam Babiker, Randy Goebel
  • for: 本研究旨在提供一种可解释的自动驾驶技术,以便更好地理解自动驾驶车辆的决策过程。
  • methods: 本研究使用了可见问答(VQA)框架,通过问题answering-based causal reasoning来解释自动驾驶车辆的决策。
  • results: 研究发现,VQA机制可以提供支持来解释自动驾驶车辆的决策过程,并帮助提高整体驾驶安全性。
    Abstract The end-to-end learning ability of self-driving vehicles has achieved significant milestones over the last decade owing to rapid advances in deep learning and computer vision algorithms. However, as autonomous driving technology is a safety-critical application of artificial intelligence (AI), road accidents and established regulatory principles necessitate the need for the explainability of intelligent action choices for self-driving vehicles. To facilitate interpretability of decision-making in autonomous driving, we present a Visual Question Answering (VQA) framework, which explains driving actions with question-answering-based causal reasoning. To do so, we first collect driving videos in a simulation environment using reinforcement learning (RL) and extract consecutive frames from this log data uniformly for five selected action categories. Further, we manually annotate the extracted frames using question-answer pairs as justifications for the actions chosen in each scenario. Finally, we evaluate the correctness of the VQA-predicted answers for actions on unseen driving scenes. The empirical results suggest that the VQA mechanism can provide support to interpret real-time decisions of autonomous vehicles and help enhance overall driving safety.
    摘要 自驾车技术在过去一代取得了重大突破,归功于深度学习和计算机视觉算法的快速发展。然而,由于自驾车技术是安全关键的人工智能应用,因此需要解释自驾车的决策。为实现自驾车决策的解释,我们提出了视觉问答(VQA)框架,该框架通过问答对 causal 理解来解释自驾车的决策。首先,我们使用回归学习(RL)在模拟环境中收集了驾驶视频数据,并从这些日志数据中采样出了五个动作类别的连续帧。然后,我们手动标注了这些抽取的帧,使用问题对答对为每个场景选择的动作提供了证明。最后,我们对未看过的驾驶场景中VQA预测的答案进行了评估。实际结果表明,VQA机制可以为自驾车决策提供支持,并帮助提高总体驾驶安全性。

Technical Challenges of Deploying Reinforcement Learning Agents for Game Testing in AAA Games

  • paper_url: http://arxiv.org/abs/2307.11105
  • repo_url: None
  • paper_authors: Jonas Gillberg, Joakim Bergdahl, Alessandro Sestini, Andrew Eakins, Linus Gisslen
  • for: 本研究旨在推广游戏生产中的机器学习应用,特意是通过强化学习增加自动游戏测试解决方案中的测试覆盖率。
  • methods: 本研究使用了训练bot的强化学习系统,与现有的脚本bot测试解决方案集成。
  • results: 研究在AAA游戏《战field 2042》和《尸 espacio》(2023)中实现了增加测试覆盖率的目的,并提出了一些可能有价值的研究方向,以帮助游戏业者快速采用这种技术。
    Abstract Going from research to production, especially for large and complex software systems, is fundamentally a hard problem. In large-scale game production, one of the main reasons is that the development environment can be very different from the final product. In this technical paper we describe an effort to add an experimental reinforcement learning system to an existing automated game testing solution based on scripted bots in order to increase its capacity. We report on how this reinforcement learning system was integrated with the aim to increase test coverage similar to [1] in a set of AAA games including Battlefield 2042 and Dead Space (2023). The aim of this technical paper is to show a use-case of leveraging reinforcement learning in game production and cover some of the largest time sinks anyone who wants to make the same journey for their game may encounter. Furthermore, to help the game industry to adopt this technology faster, we propose a few research directions that we believe will be valuable and necessary for making machine learning, and especially reinforcement learning, an effective tool in game production.
    摘要 从研究到生产,特别是 для大型和复杂的软件系统,是一个基本困难的问题。在大规模游戏生产中,一个主要的原因是开发环境和产品环境之间的差异。在这份技术著作中,我们描述了将实验式学习系统添加到现有的自动游戏测试解决方案基于脚本 Bot 以增加其容量。我们报告了在一些 AAA 游戏,包括 Battlefield 2042 和 Dead Space (2023) 中如何将这个学习系统与测试系统集成,以增加测试覆盖率。这份技术著作的目的是展示在游戏生产中如何使用学习系统,并讨论一些可能会遇到的主要时间潜雷。此外,为了帮助游戏业界更快地采用这些技术,我们建议了一些研究方向,我们认为这些方向将是有价值和必要的,以使机器学习和特别是学习系统成为游戏生产中的有效工具。

Interpreting and Correcting Medical Image Classification with PIP-Net

  • paper_url: http://arxiv.org/abs/2307.10404
  • repo_url: https://github.com/m-nauta/pipnet
  • paper_authors: Meike Nauta, Johannes H. Hegeman, Jeroen Geerdink, Jörg Schlötterer, Maurice van Keulen, Christin Seifert
  • for: 这篇论文探讨了使用可解释的机器学习模型,尤其是PIP-Net,在实际医疗图像数据上进行自动诊断支持。
  • methods: 这篇论文使用PIP-Net模型,学习人类理解的图像组件prototype,并评估其精度和可解释性在骨折检测和皮肤癌诊断方面。
  • results: 研究发现,PIP-Net的决策过程与医学分类标准相一致,只需要提供图像级别的类别标签。此外,PIP-Net还可以轻松地检测数据质量问题,如X光图像中的不必要文本或标签错误。最后,我们发现,人类可以直接禁用PIP-Net的不想要的 проtotypes,以 corrrect其决策过程。
    Abstract Part-prototype models are explainable-by-design image classifiers, and a promising alternative to black box AI. This paper explores the applicability and potential of interpretable machine learning, in particular PIP-Net, for automated diagnosis support on real-world medical imaging data. PIP-Net learns human-understandable prototypical image parts and we evaluate its accuracy and interpretability for fracture detection and skin cancer diagnosis. We find that PIP-Net's decision making process is in line with medical classification standards, while only provided with image-level class labels. Because of PIP-Net's unsupervised pretraining of prototypes, data quality problems such as undesired text in an X-ray or labelling errors can be easily identified. Additionally, we are the first to show that humans can manually correct the reasoning of PIP-Net by directly disabling undesired prototypes. We conclude that part-prototype models are promising for medical applications due to their interpretability and potential for advanced model debugging.
    摘要 <>translate the following text into Simplified Chinese:Part-prototype models are explainable-by-design image classifiers, and a promising alternative to black box AI. This paper explores the applicability and potential of interpretable machine learning, in particular PIP-Net, for automated diagnosis support on real-world medical imaging data. PIP-Net learns human-understandable prototypical image parts and we evaluate its accuracy and interpretability for fracture detection and skin cancer diagnosis. We find that PIP-Net's decision making process is in line with medical classification standards, while only provided with image-level class labels. Because of PIP-Net's unsupervised pretraining of prototypes, data quality problems such as undesired text in an X-ray or labelling errors can be easily identified. Additionally, we are the first to show that humans can manually correct the reasoning of PIP-Net by directly disabling undesired prototypes. We conclude that part-prototype models are promising for medical applications due to their interpretability and potential for advanced model debugging.Translate the text into Simplified Chinese.Here's the translation:Part-prototype 模型是一种可解释的设计图像分类器,也是黑盒AI的有力的替代方案。本文探讨了PIP-Net在实际医疗影像数据上的适用性和潜力,并评估了它的准确率和可解释性在骨折检测和皮肤癌诊断方面。我们发现PIP-Net的决策过程与医学分类标准相一致,只需要图像级别的类别标签。由于PIP-Net在前期无监督学习prototype的方式下,可以轻松地标识数据质量问题,如X射线图像中的不想要的文本或标签错误。此外,我们是首次显示人类可以手动 corriger PIP-Net的逻辑,通过直接禁用不需要的prototype来改善其判断。我们 conclude that part-prototype 模型在医疗应用中具有可解释性和进一步的模型调试潜力。

DialogStudio: Towards Richest and Most Diverse Unified Dataset Collection for Conversational AI

  • paper_url: http://arxiv.org/abs/2307.10172
  • repo_url: https://github.com/salesforce/DialogStudio
  • paper_authors: Jianguo Zhang, Kun Qian, Zhiwei Liu, Shelby Heinecke, Rui Meng, Ye Liu, Zhou Yu, Huan Wang, Silvio Savarese, Caiming Xiong
  • For: The paper aims to introduce DialogStudio, a large and diverse collection of dialogue datasets, to address the challenges of handling diverse conversational tasks and improve the comprehensiveness of existing dialogue dataset collections.* Methods: The paper uses a consistent format to unify diverse dialogue datasets, including open-domain dialogues, task-oriented dialogues, natural language understanding, conversational recommendation, dialogue summarization, and knowledge-grounded dialogues. The authors also identify licenses for each dataset and design domain-aware prompts for selected dialogues to facilitate instruction-aware fine-tuning.* Results: The authors develop conversational AI models using the dataset collection and demonstrate their superiority in both zero-shot and few-shot learning scenarios. They also make all datasets, licenses, codes, and models associated with DialogStudio publicly accessible at https://github.com/salesforce/DialogStudio to support dataset and task-based research, as well as language model pre-training.
    Abstract Despite advancements in conversational AI, language models encounter challenges to handle diverse conversational tasks, and existing dialogue dataset collections often lack diversity and comprehensiveness. To tackle these issues, we introduce DialogStudio: the largest and most diverse collection of dialogue datasets, unified under a consistent format while preserving their original information. Our collection encompasses data from open-domain dialogues, task-oriented dialogues, natural language understanding, conversational recommendation, dialogue summarization, and knowledge-grounded dialogues, making it an incredibly rich and diverse resource for dialogue research and model training. To further enhance the utility of DialogStudio, we identify the licenses for each dataset and design domain-aware prompts for selected dialogues to facilitate instruction-aware fine-tuning. Furthermore, we develop conversational AI models using the dataset collection, and our experiments in both zero-shot and few-shot learning scenarios demonstrate the superiority of DialogStudio. To improve transparency and support dataset and task-based research, as well as language model pre-training, all datasets, licenses, codes, and models associated with DialogStudio are made publicly accessible at https://github.com/salesforce/DialogStudio
    摘要 尽管 conversational AI 技术有所进步,但语言模型在各种对话任务中仍然遇到挑战,现有对话集合也常lacks diversity和completeness。为解决这些问题,我们介绍 DialogStudio:最大最多样化的对话集合,具有一致的格式而不产生数据损失。我们的集合包括了开放领域对话、任务启发对话、自然语言理解、对话推荐、对话概要和知识启发对话,这使得它成为对话研究和模型训练的极其富裕和多样化资源。为进一步提高 DialogStudio 的实用性,我们确定了每个数据集的许可证,并设计了适应域匹配的提示,以便实现 instrucion-aware fine-tuning。此外,我们使用 DialogStudio 集合来开发 conversational AI 模型,我们的实验表明,在零shot 和几shot 学习场景中,DialogStudio 具有显著的优势。为提高透明度和支持数据集和任务基础研究以及语言模型预训练,我们将所有相关的数据集、许可证、代码和模型 associatted WITH DialogStudio 公开访问于 GitHub 上,请参考 https://github.com/salesforce/DialogStudio。

LightPath: Lightweight and Scalable Path Representation Learning

  • paper_url: http://arxiv.org/abs/2307.10171
  • repo_url: None
  • paper_authors: Sean Bin Yang, Jilin Hu, Chenjuan Guo, Bin Yang, Christian S. Jensen
  • for: 提供高效、可扩展的路径表示学习框架,用于智能交通和智能城市应用。
  • methods: 提议一种轻量级、可扩展的路径表示学习框架,包括稀疏自动编码器、关系逻辑推理框架和全球-本地知识传播。
  • results: 经过广泛的实验 validate 了该框架的效率、可扩展性和表现力。
    Abstract Movement paths are used widely in intelligent transportation and smart city applications. To serve such applications, path representation learning aims to provide compact representations of paths that enable efficient and accurate operations when used for different downstream tasks such as path ranking and travel cost estimation. In many cases, it is attractive that the path representation learning is lightweight and scalable; in resource-limited environments and under green computing limitations, it is essential. Yet, existing path representation learning studies focus on accuracy and pay at most secondary attention to resource consumption and scalability. We propose a lightweight and scalable path representation learning framework, termed LightPath, that aims to reduce resource consumption and achieve scalability without affecting accuracy, thus enabling broader applicability. More specifically, we first propose a sparse auto-encoder that ensures that the framework achieves good scalability with respect to path length. Next, we propose a relational reasoning framework to enable faster training of more robust sparse path encoders. We also propose global-local knowledge distillation to further reduce the size and improve the performance of sparse path encoders. Finally, we report extensive experiments on two real-world datasets to offer insight into the efficiency, scalability, and effectiveness of the proposed framework.
    摘要 移动路径在智能交通和智能城市应用广泛使用。为服务这些应用,路径表示学习目标寻求提供高效精准的操作,以便于不同下游任务 such as 路径排名和旅行成本估算。在资源有限的环境和绿色计算限制下,很多情况下,很重要的路径表示学习是轻量级和可扩展的。然而,现有的路径表示学习研究主要关注准确性,尽管只是次要关注资源消耗和可扩展性。我们提出一个轻量级和可扩展的路径表示学习框架,名为LightPath,以减少资源消耗并实现可扩展性,而无需减少准确性。更具体来说,我们首先提出一个稀疏自动编码器,以确保框架在路径长度方面具有良好的可扩展性。然后,我们提出一个关系理解框架,以更快地训练更稀疏的路径编码器。最后,我们提出全球-本地知识传播,以进一步减小路径编码器的大小并提高其性能。我们在两个真实世界数据集上进行了广泛的实验,以提供有关效率、可扩展性和有效性的深入了解。

Challenges and Applications of Large Language Models

  • paper_url: http://arxiv.org/abs/2307.10169
  • repo_url: https://github.com/chrisneagu/FTC-Skystone-Dark-Angels-Romania-2020
  • paper_authors: Jean Kaddour, Joshua Harris, Maximilian Mozes, Herbie Bradley, Roberta Raileanu, Robert McHardy
  • for: 本研究旨在为机器学习研究人员快速了解大语言模型(LLMs)领域的当前状况,以便更快地成为生产力的一员。
  • methods: 本研究采用系统的方法描述了LLMs领域的开放问题和成功应用领域,以便帮助研究人员更快地了解领域的当前状况。
  • results: 本研究对LLMs领域的当前状况进行了系统的描述,并identified several open problems and successful application areas, which can help researchers quickly understand the field and become productive.
    Abstract Large Language Models (LLMs) went from non-existent to ubiquitous in the machine learning discourse within a few years. Due to the fast pace of the field, it is difficult to identify the remaining challenges and already fruitful application areas. In this paper, we aim to establish a systematic set of open problems and application successes so that ML researchers can comprehend the field's current state more quickly and become productive.
    摘要 大语言模型(LLM)从不存在到普遍的Machine Learning话语中几年内快速发展。由于这个领域的快速进程,它很难分析剩下的挑战和已经有成果的应用领域。在这篇论文中,我们想要建立一套系统的开问和应用成功,以便ML研究人员更快地理解领域的当前状态,更快地成为产品力。

Robust Driving Policy Learning with Guided Meta Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2307.10160
  • repo_url: None
  • paper_authors: Kanghoon Lee, Jiachen Li, David Isele, Jinkyoo Park, Kikuo Fujimura, Mykel J. Kochenderfer
  • for: 实现自驾车辆在互动交通enario中的自主通行
  • methods: 采用一个单一meta-policy来训练多元的驾驶策略,通过随机调整社交车辆之间的互动奖励函数来生成多元的目标,并透过导引策略来训练meta-policy
  • results: 成功将ego车辆的驾驶策略导入不同的社交车辆行为中,并在一个具有挑战性的无控T字路口scenario中训练出一个具有弹性和可靠性的驾驶策略
    Abstract Although deep reinforcement learning (DRL) has shown promising results for autonomous navigation in interactive traffic scenarios, existing work typically adopts a fixed behavior policy to control social vehicles in the training environment. This may cause the learned driving policy to overfit the environment, making it difficult to interact well with vehicles with different, unseen behaviors. In this work, we introduce an efficient method to train diverse driving policies for social vehicles as a single meta-policy. By randomizing the interaction-based reward functions of social vehicles, we can generate diverse objectives and efficiently train the meta-policy through guiding policies that achieve specific objectives. We further propose a training strategy to enhance the robustness of the ego vehicle's driving policy using the environment where social vehicles are controlled by the learned meta-policy. Our method successfully learns an ego driving policy that generalizes well to unseen situations with out-of-distribution (OOD) social agents' behaviors in a challenging uncontrolled T-intersection scenario.
    摘要 In this work, we introduce an efficient method to train diverse driving policies for social vehicles as a single meta-policy. By randomizing the interaction-based reward functions of social vehicles, we can generate diverse objectives and efficiently train the meta-policy through guiding policies that achieve specific objectives.We further propose a training strategy to enhance the robustness of the ego vehicle's driving policy using the environment where social vehicles are controlled by the learned meta-policy. Our method successfully learns an ego driving policy that generalizes well to unseen situations with out-of-distribution (OOD) social agents' behaviors in a challenging uncontrolled T-intersection scenario.translate into Simplified Chinese:although deep reinforcement learning (DRL) has shown promising results for autonomous navigation in interactive traffic scenarios, existing work typically adopts a fixed behavior policy to control social vehicles in the training environment. This may cause the learned driving policy to overfit the environment, making it difficult to interact well with vehicles with different, unseen behaviors.在这个工作中,我们提出了一种有效的方法,通过随机化社交车辆之间的互动奖励函数,来训练多样化的驾驶策略。我们通过指导策略来有效地训练单一的元策略。此外,我们还提出了一种增强驾驶策略的训练策略,使用已学习的元策略控制社交车辆在环境中。我们成功地在一个复杂的无控T路口场景中学习了一个普适的驾驶策略,并能够在未看到的社交车辆行为下保持稳定。

Benchmarking Potential Based Rewards for Learning Humanoid Locomotion

  • paper_url: http://arxiv.org/abs/2307.10142
  • repo_url: https://github.com/se-hwan/pbrs-humanoid
  • paper_authors: Se Hwan Jeon, Steve Heim, Charles Khazoom, Sangbae Kim
  • for: 本研究旨在比较标准形式的奖励拟合和 potential based reward shaping (PBRS) 在人工智能机器人中的表现。
  • methods: 本研究使用了标准的奖励拟合和 PBRS 方法,并对两种方法在高维系统中进行了比较。
  • results: 研究发现,在高维系统中,PBRS 的性能提升效果相对较弱,但 PBRS 奖励项在不同的缩放比例下表现更加稳定和易于调整。
    Abstract The main challenge in developing effective reinforcement learning (RL) pipelines is often the design and tuning the reward functions. Well-designed shaping reward can lead to significantly faster learning. Naively formulated rewards, however, can conflict with the desired behavior and result in overfitting or even erratic performance if not properly tuned. In theory, the broad class of potential based reward shaping (PBRS) can help guide the learning process without affecting the optimal policy. Although several studies have explored the use of potential based reward shaping to accelerate learning convergence, most have been limited to grid-worlds and low-dimensional systems, and RL in robotics has predominantly relied on standard forms of reward shaping. In this paper, we benchmark standard forms of shaping with PBRS for a humanoid robot. We find that in this high-dimensional system, PBRS has only marginal benefits in convergence speed. However, the PBRS reward terms are significantly more robust to scaling than typical reward shaping approaches, and thus easier to tune.
    摘要 主要挑战在开发有效的强化学习(RL)管道是设计和调整奖励函数。Well-designed 形式的奖励可以导致学习速度更快。然而,未经适当调整的奖励可能会与 желаем的行为冲突,导致过拟合或者even erratic performance。理论上,广泛的 potential based reward shaping(PBRS)可以帮助学习过程进行导航,无需影响最佳策略。虽然一些研究已经探讨了使用 potential based reward shaping 加速学习的潜在性,但大多数研究仅局限于格子世界和低维系统,RL在 robotics 中主要依靠标准的奖励形式。在这篇论文中,我们对标准的奖励形式和 PBRS 进行了比较,发现在这个高维系统中,PBRS 只有微妙的提高了速度。然而,PBRS 的奖励项目较标准的奖励形式更加稳定,更容易调整。