cs.AI - 2023-07-08

PCG-based Static Underground Garage Scenario Generation

  • paper_url: http://arxiv.org/abs/2307.03988
  • repo_url: None
  • paper_authors: Wenjin Li, Kai Li
  • for: 本研究旨在用Sarsa算法解决地下停车场 static scenario simulation 的PCG问题。
  • methods: 本paper使用Sarsa算法进行PCG,以生成具有充足细节的地下停车场场景。
  • results: 本研究实现了基于Sarsa算法的PCG方法,可以生成高质量的地下停车场场景,为自动驾驶技术的训练提供了更多的数据支持。
    Abstract Autonomous driving technology has five levels, from L0 to L5. Currently, only the L2 level (partial automation) can be achieved, and there is a long way to go before reaching the final level of L5 (full automation). The key to crossing these levels lies in training the autonomous driving model. However, relying solely on real-world road data to train the model is far from enough and consumes a great deal of resources. Although there are already examples of training autonomous driving models through simulators that simulate real-world scenarios, these scenarios require complete manual construction. Directly converting 3D scenes from road network formats will lack a large amount of detail and cannot be used as training sets. Underground parking garage static scenario simulation is regarded as a procedural content generation (PCG) problem. This paper will use the Sarsa algorithm to solve procedural content generation on underground garage structures.
    摘要 自动驾驶技术有五级,从L0到L5。目前只有L2级(部分自动化)可以实现,剩下的级别还有很长的路要走。模型训练是十分重要的关键。然而,仅仅通过使用现实世界道路数据来训练模型是费尽资源的,而且需要大量的数据。虽然已经有些使用模拟器 simulate real-world scenarios的例子,但这些场景需要完全手动构建。直接将3D场景从道路网络格式转换来用作训练集是缺乏详细信息的,无法用于训练。地下停车场 static scenario simulation被视为一个过程Content generation(PCG)问题。本文使用Sarsa算法解决地下停车场的PCG问题。

Integrating Curricula with Replays: Its Effects on Continual Learning

  • paper_url: http://arxiv.org/abs/2307.05747
  • repo_url: https://github.com/zhanglab-deepneurocoglab/integrating-curricula-with-replays
  • paper_authors: Ren Jie Tee, Mengmi Zhang
  • for: 这种研究旨在探讨在进行连续学习时,通过融合课程和重温方法,以便提高知识保持和学习传递。
  • methods: 研究使用了不同的课程设计,包括交叠频率、顺序和选择策略,以影响重温过程中的连续学习。
  • results: 研究发现,通过融合课程和重温方法,可以有效地避免忘却现象,并提高知识传递。这些结果表明,融合课程可以成为连续学习方法的进一步发展。
    Abstract Humans engage in learning and reviewing processes with curricula when acquiring new skills or knowledge. This human learning behavior has inspired the integration of curricula with replay methods in continual learning agents. The goal is to emulate the human learning process, thereby improving knowledge retention and facilitating learning transfer. Existing replay methods in continual learning agents involve the random selection and ordering of data from previous tasks, which has shown to be effective. However, limited research has explored the integration of different curricula with replay methods to enhance continual learning. Our study takes initial steps in examining the impact of integrating curricula with replay methods on continual learning in three specific aspects: the interleaved frequency of replayed exemplars with training data, the sequence in which exemplars are replayed, and the strategy for selecting exemplars into the replay buffer. These aspects of curricula design align with cognitive psychology principles and leverage the benefits of interleaved practice during replays, easy-to-hard rehearsal, and exemplar selection strategy involving exemplars from a uniform distribution of difficulties. Based on our results, these three curricula effectively mitigated catastrophic forgetting and enhanced positive knowledge transfer, demonstrating the potential of curricula in advancing continual learning methodologies. Our code and data are available: https://github.com/ZhangLab-DeepNeuroCogLab/Integrating-Curricula-with-Replays
    摘要 人类在学习和复习过程中使用课程,当学习新技能或知识时。这种人类学习行为激发了在不断学习代理人中结合课程和复习方法的整合。目标是模拟人类学习过程,从而提高知识保持和学习转移。现有的复习方法在不断学习代理人中Randomly selecting and ordering data from previous tasks has been shown to be effective. However, limited research has explored the integration of different curricula with replay methods to enhance continual learning. Our study takes initial steps in examining the impact of integrating curricula with replay methods on continual learning in three specific aspects: the interleaved frequency of replayed exemplars with training data, the sequence in which exemplars are replayed, and the strategy for selecting exemplars into the replay buffer. These aspects of curricula design align with cognitive psychology principles and leverage the benefits of interleaved practice during replays, easy-to-hard rehearsal, and exemplar selection strategy involving exemplars from a uniform distribution of difficulties. According to our results, these three curricula effectively mitigated catastrophic forgetting and enhanced positive knowledge transfer, demonstrating the potential of curricula in advancing continual learning methodologies.我们的代码和数据可以在 GitHub上获取:https://github.com/ZhangLab-DeepNeuroCogLab/Integrating-Curricula-with-Replays

Autonomy 2.0: The Quest for Economies of Scale

  • paper_url: http://arxiv.org/abs/2307.03973
  • repo_url: None
  • paper_authors: Shuang Wu, Bo Yu, Shaoshan Liu, Yuhao Zhu
  • for: 本文主要适用于 autonomous machines 领域的技术挑战和经济影响。
  • methods: 本文使用技术分析和经济分析方法来探讨 autonomous machines 领域的可描述性和经济可能性。
  • results: 本文 argue that scalability 是 autonomous machines 领域的关键因素,但现有的发展模式(Autonomy 1.0)不能充分利用计算成本和数据资源的经济效益。 在解决关键瓶颈的同时,新的发展模式(Autonomy 2.0)可以大幅提高 autonomous machines 领域的可描述性和经济可能性。
    Abstract With the advancement of robotics and AI technologies in the past decade, we have now entered the age of autonomous machines. In this new age of information technology, autonomous machines, such as service robots, autonomous drones, delivery robots, and autonomous vehicles, rather than humans, will provide services. In this article, through examining the technical challenges and economic impact of the digital economy, we argue that scalability is both highly necessary from a technical perspective and significantly advantageous from an economic perspective, thus is the key for the autonomy industry to achieve its full potential. Nonetheless, the current development paradigm, dubbed Autonomy 1.0, scales with the number of engineers, instead of with the amount of data or compute resources, hence preventing the autonomy industry to fully benefit from the economies of scale, especially the exponentially cheapening compute cost and the explosion of available data. We further analyze the key scalability blockers and explain how a new development paradigm, dubbed Autonomy 2.0, can address these problems to greatly boost the autonomy industry.
    摘要 Translated into Simplified Chinese:随着过去十年的机器人和人工智能技术的发展,我们已经进入了自动化机器的时代。在这个新的信息技术时代,自动化机器,如服务机器人、自动飞行器、配送机器人和自动驾驶车辆,而不是人类,将提供服务。在这篇文章中,我们通过分析技术挑战和数字经济的影响, argue that可扩展性是技术上必需的和经济上有利的,因此是自动化industry的潜在力量。然而,当前的开发模式,称为Autonomy 1.0,与工程师数量成比例增长,而不是与数据量或计算资源成比例增长,因此阻碍了自动化industry从全面获得经济效益,特别是快速减少的计算成本和可用数据的爆发。我们进一步分析阻碍可扩展性的关键问题,并解释如何一种新的开发模式,称为Autonomy 2.0,可以解决这些问题,以大幅提高自动化industry。

Multi-Intent Detection in User Provided Annotations for Programming by Examples Systems

  • paper_url: http://arxiv.org/abs/2307.03966
  • repo_url: None
  • paper_authors: Nischal Ashok Kumar, Nitin Gupta, Shanmukha Guttula, Hima Patel
  • for: 这个论文的目的是解决在集成开发中数据映射的问题,尤其是在应用程序缺乏命名标准和嵌套字段结构的情况下。
  • methods: 这个论文使用了编程例子(PBE)技术来自动生成数据转换程序,从用户提供的输入和输出样本中学习计算机程序的正确意图。
  • results: 该论文提出了一种深度神经网络基于不确定性预测模型,可以分析输入输出字符串并将其映射到不同的属性集,以解决PBE系统中的多意问题。
    Abstract In mapping enterprise applications, data mapping remains a fundamental part of integration development, but its time consuming. An increasing number of applications lack naming standards, and nested field structures further add complexity for the integration developers. Once the mapping is done, data transformation is the next challenge for the users since each application expects data to be in a certain format. Also, while building integration flow, developers need to understand the format of the source and target data field and come up with transformation program that can change data from source to target format. The problem of automatic generation of a transformation program through program synthesis paradigm from some specifications has been studied since the early days of Artificial Intelligence (AI). Programming by Example (PBE) is one such kind of technique that targets automatic inferencing of a computer program to accomplish a format or string conversion task from user-provided input and output samples. To learn the correct intent, a diverse set of samples from the user is required. However, there is a possibility that the user fails to provide a diverse set of samples. This can lead to multiple intents or ambiguity in the input and output samples. Hence, PBE systems can get confused in generating the correct intent program. In this paper, we propose a deep neural network based ambiguity prediction model, which analyzes the input-output strings and maps them to a different set of properties responsible for multiple intent. Users can analyze these properties and accordingly can provide new samples or modify existing samples which can help in building a better PBE system for mapping enterprise applications.
    摘要 Mapping企业应用程序中,数据映射仍然是集成开发的基本部分,但是它占用了很多时间。越来越多的应用程序缺少命名标准,嵌套的字段结构进一步增加了集成开发人员的复杂性。一旦映射完成,则下一个挑战是数据转换,因为每个应用程序都会预期数据在某种格式下来。在建立集成流时,开发人员需要理解源数据和目标数据字段的格式,并编写转换程序以将数据从源格式转换到目标格式。自AI时代以来,人工智能编程方法已经被研究了很长时间。在这个过程中,一种技术是编程示例(PBE),它可以自动生成计算机程序,以实现格式或字符串转换任务。为了学习正确的意图,用户需要提供多样的示例。然而,用户可能无法提供多样的示例,这会导致多个意图或输入/输出样本的歧义。因此,PBE系统可能会在生成正确意图程序时感到困惑。在本文中,我们提出了一种基于深度神经网络的歧义预测模型,该模型可以分析输入/输出字符串,并将其映射到不同的属性集,这些属性集负责多个意图。用户可以分析这些属性,并根据这些属性提供新的示例或修改现有示例,以帮助建立更好的PBE系统。

Right to be Forgotten in the Era of Large Language Models: Implications, Challenges, and Solutions

  • paper_url: http://arxiv.org/abs/2307.03941
  • repo_url: None
  • paper_authors: Dawen Zhang, Pamela Finckenberg-Broman, Thong Hoang, Shidong Pan, Zhenchang Xing, Mark Staples, Xiwei Xu
  • For: This paper explores the challenges of implementing the Right to Be Forgotten (RTBF) in Large Language Models (LLMs) and provides insights on how to implement technical solutions for RTBF.* Methods: The paper discusses the use of machine unlearning, model editing, and prompting engineering as potential solutions for RTBF in LLMs.* Results: The paper provides insights on the challenges of implementing RTBF in LLMs and suggests potential solutions for compliance with the RTBF.Here’s the text in Simplified Chinese:* For: 这篇论文研究了大语言模型(LLM)中实现“忘记权”(RTBF)的挑战,并提供了实现RTBF的技术解决方案。* Methods: 论文讨论了机器“忘记”、模型修改和引导工程等可能的解决方案。* Results: 论文提供了LLM中实现RTBF的挑战和可能的解决方案。
    Abstract The Right to be Forgotten (RTBF) was first established as the result of the ruling of Google Spain SL, Google Inc. v AEPD, Mario Costeja Gonz\'alez, and was later included as the Right to Erasure under the General Data Protection Regulation (GDPR) of European Union to allow individuals the right to request personal data be deleted by organizations. Specifically for search engines, individuals can send requests to organizations to exclude their information from the query results. With the recent development of Large Language Models (LLMs) and their use in chatbots, LLM-enabled software systems have become popular. But they are not excluded from the RTBF. Compared with the indexing approach used by search engines, LLMs store, and process information in a completely different way. This poses new challenges for compliance with the RTBF. In this paper, we explore these challenges and provide our insights on how to implement technical solutions for the RTBF, including the use of machine unlearning, model editing, and prompting engineering.
    摘要 “右投忘”(RTBF)首次得到了Google西班牙SL、Google公司诉AEPD、马里奥·科斯泰加·冈萨雷斯案例的判决, later被包含在欧盟数据保护条例(GDPR)中,以allow个人请求组织删除个人数据。特别是 для搜索引擎,个人可以向组织发送请求,请求排除他们的信息从查询结果中。随着大自然语言模型(LLMs)的发展和它们在 чат机器人中的使用,LLM-enabled software systems 已成为流行的。但它们并不是RTBF的例外。与搜索引擎使用的索引方法不同,LLMs存储和处理信息的方式带来了新的RTBF的挑战。在这篇论文中,我们探讨这些挑战,并提供了实现RTBF的技术解决方案,包括机器学习、模型编辑和提示工程。

Copilot for Xcode: Exploring AI-Assisted Programming by Prompting Cloud-based Large Language Models

  • paper_url: http://arxiv.org/abs/2307.14349
  • repo_url: None
  • paper_authors: Chee Wei Tan, Shangxin Guo, Man Fai Wong, Ching Nam Hang
  • for: 支持人类软件开发者的AI助手工具,帮助开发者更快速、更高效地完成软件开发任务。
  • methods: 利用云端大语言模型(LLM)和Apple的本地开发环境Xcode进行集成,通过高级自然语言处理(NLP)技术来处理代码符号和代码模式,实现代码生成、自动完成、文档生成和错误探测等功能。
  • results: 通过在Xcode中 integrate LLM,可以提高开发效率和释放创造力,并且可以同时进行一些小型决策,通过提示工程来帮助开发者更快速地完成软件开发任务。
    Abstract This paper presents an AI-assisted programming tool called Copilot for Xcode for program composition and design to support human software developers. By seamlessly integrating cloud-based Large Language Models (LLM) with Apple's local development environment, Xcode, this tool enhances productivity and unleashes creativity for software development in Apple software ecosystem (e.g., iOS apps, macOS). Leveraging advanced natural language processing (NLP) techniques, Copilot for Xcode effectively processes source code tokens and patterns within code repositories, enabling features such as code generation, autocompletion, documentation, and error detection. Software developers can also query and make "small" decisions for program composition, some of which can be made simultaneously, and this is facilitated through prompt engineering in a chat interface of Copilot for Xcode. Finally, we present simple case studies as evidence of the effectiveness of utilizing NLP in Xcode to prompt popular LLM services like OpenAI ChatGPT for program composition and design.
    摘要 这篇论文描述了一个基于人工智能的编程工具called Copilot,用于增强 Xcode 中的软件开发产业。该工具通过将云端大语言模型(LLM)与 Apple 的本地开发环境 XCode 集成,以提高开发效率和释放创造力。通过进行源代码Token和模式的高级自然语言处理(NLP)处理,Copilot for Xcode 可以实现代码生成、自动完成、文档生成和错误检测等功能。开发者可以通过提示工程来进行小决策,并通过交互式弹出框架来同时进行多个决策。最后,我们提供了一些简单的案例研究,以证明使用 NLP 在 Xcode 中提示流行的 LLM 服务如 OpenAI ChatGPT 可以增强软件开发和设计。

Inductive Meta-path Learning for Schema-complex Heterogeneous Information Networks

  • paper_url: http://arxiv.org/abs/2307.03937
  • repo_url: None
  • paper_authors: Shixuan Liu, Changjun Fan, Kewei Cheng, Yunfei Wang, Peng Cui, Yizhou Sun, Zhong Liu
  • for: 这篇论文主要是为了解决复杂 schema 上的 Heterogeneous Information Networks (HINs) 中的 meta-path 问题。
  • methods: 该论文提出了一种 inducing meta-path learning 框架,使用 schema-level 表示来支持不同关系的 meta-path 学习,并采用了一种基于奖励学习的路径找索引机制来学习Establishing meta-paths with high coverage and confidence for multiple relations。
  • results: 实验结果表明,该提出的方法可以有效地解决复杂 schema 上的 HINs 中 meta-path 问题,并且可以提高 meta-path 的覆盖率和信任度。
    Abstract Heterogeneous Information Networks (HINs) are information networks with multiple types of nodes and edges. The concept of meta-path, i.e., a sequence of entity types and relation types connecting two entities, is proposed to provide the meta-level explainable semantics for various HIN tasks. Traditionally, meta-paths are primarily used for schema-simple HINs, e.g., bibliographic networks with only a few entity types, where meta-paths are often enumerated with domain knowledge. However, the adoption of meta-paths for schema-complex HINs, such as knowledge bases (KBs) with hundreds of entity and relation types, has been limited due to the computational complexity associated with meta-path enumeration. Additionally, effectively assessing meta-paths requires enumerating relevant path instances, which adds further complexity to the meta-path learning process. To address these challenges, we propose SchemaWalk, an inductive meta-path learning framework for schema-complex HINs. We represent meta-paths with schema-level representations to support the learning of the scores of meta-paths for varying relations, mitigating the need of exhaustive path instance enumeration for each relation. Further, we design a reinforcement-learning based path-finding agent, which directly navigates the network schema (i.e., schema graph) to learn policies for establishing meta-paths with high coverage and confidence for multiple relations. Extensive experiments on real data sets demonstrate the effectiveness of our proposed paradigm.
    摘要 非同质信息网络(HIN)是一种具有多种节点和边的信息网络。meta-path这个概念,即连接两个实体的一系列实体类型和关系类型的序列,是为了提供HIN任务的元素级别解释 semantics。然而,在Schema-Complex HINs中,例如知识库(KB)中的百种实体和关系类型,meta-path的采用受到了计算复杂性的限制。此外,评估meta-path需要列出相关的路径实例,这加重了meta-path学习过程中的复杂性。为解决这些挑战,我们提出了SchemaWalk,一种induktive meta-path学习框架 дляSchema-Complex HINs。我们使用schema层次表示meta-paths,以支持不同关系的score学习,从而消除了每个关系的枚举路径实例的需要。此外,我们设计了一种基于奖励学习的路径找索引器,可以直接在网络schema图(即schema图)上学习Establish meta-paths with high coverage and confidence for multiple relations。实验表明了我们提出的方法的有效性。

Towards Efficient In-memory Computing Hardware for Quantized Neural Networks: State-of-the-art, Open Challenges and Perspectives

  • paper_url: http://arxiv.org/abs/2307.03936
  • repo_url: None
  • paper_authors: Olga Krestinskaya, Li Zhang, Khaled Nabil Salama
  • for: 本研究旨在探讨在 Edge Computing 环境中实现吞吐量压缩神经网络,尤其是针对具有限制的能源和计算资源的边缘设备。
  • methods: 本研究使用 In-memory Computing (IMC) 技术和量化神经网络 (QNN) 来实现Edge Computing中的神经网络处理。
  • results: 本研究提供了一个完整的 QNN 和 IMC 硬件实现的评估,以及开放的挑战、设计要求、建议和前瞻,并提供了一个 IMCC 硬件路线图。
    Abstract The amount of data processed in the cloud, the development of Internet-of-Things (IoT) applications, and growing data privacy concerns force the transition from cloud-based to edge-based processing. Limited energy and computational resources on edge push the transition from traditional von Neumann architectures to In-memory Computing (IMC), especially for machine learning and neural network applications. Network compression techniques are applied to implement a neural network on limited hardware resources. Quantization is one of the most efficient network compression techniques allowing to reduce the memory footprint, latency, and energy consumption. This paper provides a comprehensive review of IMC-based Quantized Neural Networks (QNN) and links software-based quantization approaches to IMC hardware implementation. Moreover, open challenges, QNN design requirements, recommendations, and perspectives along with an IMC-based QNN hardware roadmap are provided.
    摘要 云计算中数据处理量、互联网物联网(IoT)应用的发展以及数据隐私问题的增加,导致从云基础到边缘基础的处理过渡。边缘设备的有限能源和计算资源使得从传统 von Neumann 架构过渡到内存计算(IMC),特别是 для机器学习和神经网络应用。为实现限制性硬件资源上的神经网络实现,网络压缩技术被应用。量化是最高效的网络压缩技术,可以减少内存占用量、延迟和能耗。本文提供了全面的内存计算基于量化神经网络(QNN)审查,并将软件基于量化方法与IMC硬件实现相连。此外,还提供了开放的挑战、QNN设计要求、建议和前瞻,以及IMC基于QNN硬件路线图。

Bounding data reconstruction attacks with the hypothesis testing interpretation of differential privacy

  • paper_url: http://arxiv.org/abs/2307.03928
  • repo_url: None
  • paper_authors: Georgios Kaissis, Jamie Hayes, Alexander Ziller, Daniel Rueckert
  • for: 本研究探讨了数据重建攻击对机器学习模型的成功率上的Upper bound,即数据重建Robustness(ReRo)。
  • methods: 本研究使用了渐近 Monte Carlo 估计来计算 ReRo 的紧致 bound,但这些估计只适用于特定的渐近DP机制。本文则提出了基于假设测试DP和ReRo的连接,并 deriveclosed-form、分析的或数字 ReRo 下限 для别 Laplace 和 Gaussian 机制以及它们的抽样variant。
  • results: 本研究提出了可直接计算的 ReRo 下限 для普通的DP机制,包括Laplace和Gaussian机制以及它们的抽样variant。这些下限可用于评估数据重建攻击的成功率,并帮助选择合适的DP机制。
    Abstract We explore Reconstruction Robustness (ReRo), which was recently proposed as an upper bound on the success of data reconstruction attacks against machine learning models. Previous research has demonstrated that differential privacy (DP) mechanisms also provide ReRo, but so far, only asymptotic Monte Carlo estimates of a tight ReRo bound have been shown. Directly computable ReRo bounds for general DP mechanisms are thus desirable. In this work, we establish a connection between hypothesis testing DP and ReRo and derive closed-form, analytic or numerical ReRo bounds for the Laplace and Gaussian mechanisms and their subsampled variants.
    摘要 我们探索重建鲁棒性(ReRo),最近提出的数据重建攻击隐私模型的成功率上限。先前的研究表明,差分隐私(DP)机制也提供了ReRo,但只有非正式的贝叶斯 Monte Carlo 估计。因此,直接计算可读的 ReRo 下限对普通的 DP 机制是有价值的。在这种工作中,我们将假设测试DP与ReRo之间的连接,并对拉普拉斯和高斯机制及其抽样变体 derivation of closed-form, analytic or numerical ReRo bounds.

Applying human-centered AI in developing effective human-AI teaming: A perspective of human-AI joint cognitive systems

  • paper_url: http://arxiv.org/abs/2307.03913
  • repo_url: None
  • paper_authors: Wei Xu, Zaifeng Gao
    for:* The paper focuses on the concept of human-AI teaming (HAT) as a new paradigm for developing AI systems, and the challenges and limitations of each member in human-AI collaboration.methods:* The paper proposes a conceptual framework of human-AI joint cognitive systems (HAIJCS) to represent and implement HAT for developing effective human-AI teaming.results:* The paper discusses the implications and future work for HAIJCS, and argues that HAIJCS may help adopt HAI while enabling HCAI.
    Abstract Research and application have used human-AI teaming (HAT) as a new paradigm to develop AI systems. HAT recognizes that AI will function as a teammate instead of simply a tool in collaboration with humans. Effective human-AI teams need to be capable of taking advantage of the unique abilities of both humans and AI while overcoming the known challenges and limitations of each member, augmenting human capabilities, and raising joint performance beyond that of either entity. The National AI Research and Strategic Plan 2023 update has recognized that research programs focusing primarily on the independent performance of AI systems generally fail to consider the functionality that AI must provide within the context of dynamic, adaptive, and collaborative teams and calls for further research on human-AI teaming and collaboration. However, there has been debate about whether AI can work as a teammate with humans. The primary concern is that adopting the "teaming" paradigm contradicts the human-centered AI (HCAI) approach, resulting in humans losing control of AI systems. This article further analyzes the HAT paradigm and the debates. Specifically, we elaborate on our proposed conceptual framework of human-AI joint cognitive systems (HAIJCS) and apply it to represent HAT under the HCAI umbrella. We believe that HAIJCS may help adopt HAI while enabling HCAI. The implications and future work for HAIJCS are also discussed. Insights: AI has led to the emergence of a new form of human-machine relationship: human-AI teaming (HAT), a paradigmatic shift in human-AI systems; We must follow a human-centered AI (HCAI) approach when applying HAT as a new design paradigm; We propose a conceptual framework of human-AI joint cognitive systems (HAIJCS) to represent and implement HAT for developing effective human-AI teaming
    摘要 研究和应用已经使用人类-人工智能团队(HAT)作为新的设计模式开发人工智能系统。HAT认可人工智能将作为团队成员而不仅仅是工具和人类合作。有效的人类-人工智能团队需要能够利用人类和人工智能的特殊能力,超越每个成员的知道的限制,增强人类能力,并使团队性能高于任何一个成员。2023年国家人工智能研究和战略计划更新认为,研究专注于独立运行的人工智能系统的Programmes通常不会考虑人工智能在动态、适应和协作团队中的功能,并呼吁进一步研究人类-人工智能团队和合作。然而,有讨论是人工智能能够作为团队成员。主要关注点是采用“团队”模式会让人类失去对人工智能系统的控制。本文进一步分析HAT模式和辩论。特别是,我们详细阐述我们的提出的人类-人工智能共同认知系统(HAIJCS)概念框架,并将其应用于HCAI领域中的HAT。我们认为HAIJCS可以帮助采用HAI,同时保持HCAI。本文还讨论了HAIJCS的意义和未来工作。

ScriptWorld: Text Based Environment For Learning Procedural Knowledge

  • paper_url: http://arxiv.org/abs/2307.03906
  • repo_url: https://github.com/exploration-lab/scriptworld
  • paper_authors: Abhinav Joshi, Areeb Ahmad, Umang Pandey, Ashutosh Modi
  • for: 本研究旨在开发一个基于奖励学习的文本环境,用于帮助代理人学习日常生活中的常识知识和自然语言理解能力。
  • methods: 本研究使用了一个名为ScriptWorld的文本环境,其中包含了10种日常生活中的活动,并对这些活动进行了详细的分析。furthermore, the authors use reinforcement learning-based baseline models/agents to play the games in Scriptworld, and leverage features obtained from pre-trained language models to understand the role of language models in such environments.
  • results: 实验结果表明,由于使用了预训练的语言模型,代理人可以更好地解决日常生活中的文本基于奖励学习环境。
    Abstract Text-based games provide a framework for developing natural language understanding and commonsense knowledge about the world in reinforcement learning based agents. Existing text-based environments often rely on fictional situations and characters to create a gaming framework and are far from real-world scenarios. In this paper, we introduce ScriptWorld: a text-based environment for teaching agents about real-world daily chores and hence imparting commonsense knowledge. To the best of our knowledge, it is the first interactive text-based gaming framework that consists of daily real-world human activities designed using scripts dataset. We provide gaming environments for 10 daily activities and perform a detailed analysis of the proposed environment. We develop RL-based baseline models/agents to play the games in Scriptworld. To understand the role of language models in such environments, we leverage features obtained from pre-trained language models in the RL agents. Our experiments show that prior knowledge obtained from a pre-trained language model helps to solve real-world text-based gaming environments. We release the environment via Github: https://github.com/Exploration-Lab/ScriptWorld
    摘要

Improving Prototypical Part Networks with Reward Reweighing, Reselection, and Retraining

  • paper_url: http://arxiv.org/abs/2307.03887
  • repo_url: None
  • paper_authors: Robin Netzorg, Jiaxun Li, Bin Yu
  • for: 这个论文的目的是提出一种基于深度可读性方法的图像分类方法,以便从图像中提取有意义的特征来进行分类。
  • methods: 这种方法是基于protoypical part network(ProtoPNet),它尝试通过分析图像的各个部分来进行分类。然而,这种方法经常会从图像中学习无用或不一致的部分来进行分类。为了解决这个问题,这个论文引用了人工智能反馈学习(RLHF)的最近发展,以便为ProtoPNet进行微调。通过收集CUB-200-2011 dataset上的人工标注,构建一个奖励模型,以便识别非无用的原型。
  • results: 通过在ProtoPNet训练过程中添加奖励模型、重新选择和重新训练原型,提出了一种名为R3-ProtoPNet的新方法。R3-ProtoPNet可以提高图像分类中的总体一致性和有意义性,但是独立使用R3-ProtoPNet时会下降测试预测精度。然而,将多个R3-ProtoPNet组合成ensemble时,可以提高测试预测性能,同时保持可读性。
    Abstract In recent years, work has gone into developing deep interpretable methods for image classification that clearly attributes a model's output to specific features of the data. One such of these methods is the prototypical part network (ProtoPNet), which attempts to classify images based on meaningful parts of the input. While this method results in interpretable classifications, this method often learns to classify from spurious or inconsistent parts of the image. Hoping to remedy this, we take inspiration from the recent developments in Reinforcement Learning with Human Feedback (RLHF) to fine-tune these prototypes. By collecting human annotations of prototypes quality via a 1-5 scale on the CUB-200-2011 dataset, we construct a reward model that learns to identify non-spurious prototypes. In place of a full RL update, we propose the reweighted, reselected, and retrained prototypical part network (R3-ProtoPNet), which adds an additional three steps to the ProtoPNet training loop. The first two steps are reward-based reweighting and reselection, which align prototypes with human feedback. The final step is retraining to realign the model's features with the updated prototypes. We find that R3-ProtoPNet improves the overall consistency and meaningfulness of the prototypes, but lower the test predictive accuracy when used independently. When multiple R3-ProtoPNets are incorporated into an ensemble, we find an increase in test predictive performance while maintaining interpretability.
    摘要 近年来,有很多工作在发展深入可解释的图像分类方法,以便清晰地归因模型的输出到特定的数据特征。一种这些方法是 прототипиаль部分网络(ProtoPNet),它尝试通过基于图像的意义部分来分类图像。虽然这种方法可以得到可解释的分类结果,但它经常从不安定或不一致的图像部分进行分类。为了纠正这个问题,我们取得了人类对 проtotypes 质量的注释(通过 CUB-200-2011 数据集上的一个5级评分系统),并根据这些注释构建了一个奖励模型,可以识别非安定的 prototypes。而不是整个RL更新,我们提议一种名为 R3-ProtoPNet 的方法,它在 ProtoPNet 训练循环中添加了三个额外步骤。首先,我们使用奖励来重新权重和选择 prototypes,以使其与人类反馈相吻合。然后,我们重新训练模型的特征,以便与更新后的 prototypes 进行对齐。我们发现 R3-ProtoPNet 可以提高总的一致性和意义性,但在独立使用时测试预测精度相对较低。然而,当多个 R3-ProtoPNets 被组合成ensemble时,我们发现测试预测性能得到了提高,同时保持了可解释性。

Designing Mixed-Initiative Video Games

  • paper_url: http://arxiv.org/abs/2307.03877
  • repo_url: None
  • paper_authors: Daijin Yang
    for: This paper aims to explore the use of gamification in mixed-initiative co-creation to make human-AI interactions more accessible and fun.methods: The author prototyped a game called Snake Story, where players can select AI-generated texts to write a story of a snake by playing a “Snake” like game. A controlled experiment was conducted to compare player-AI interactions with and without the game component.results: The study found that players utilized different strategies when playing with the two versions, game mechanics significantly affected the output stories, players’ creative process, and players’ role perceptions. Additionally, players with different backgrounds showed different preferences for the two versions.
    Abstract The development of Artificial Intelligence (AI) enables humans to co-create content with machines. The unexpectedness of AI-generated content can bring inspiration and entertainment to users. However, the co-creation interactions are always designed for content creators and have poor accessibility. To explore gamification of mixed-initiative co-creation and make human-AI interactions accessible and fun for players, I prototyped Snake Story, a mixed-initiative game where players can select AI-generated texts to write a story of a snake by playing a "Snake" like game. A controlled experiment was conducted to investigate the dynamics of player-AI interactions with and without the game component in the designed interface. As a result of a study with 11 players (n=11), I found that players utilized different strategies when playing with the two versions, game mechanics significantly affected the output stories, players' creative process, as well as role perceptions, and players with different backgrounds showed different preferences for the two versions. Based on these results, I further discussed considerations for mixed-initiative game design. This work aims to inspire the design of engaging co-creation experiences.
    摘要 人工智能(AI)的发展使得人们可以与机器共同创作内容。AI生成的内容的不可预测性可以给用户带来创意和娱乐。然而,共同创作交互都是为内容创作者设计的,而且访问性很差。为了探索杂合式共同创作的游戏化和人机交互的可乐性,我设计了蛇故事,一款杂合式游戏,其中玩家可以通过选择AI生成的文本来写一个蛇的故事。我们进行了一项控制实验,并与11名玩家进行了测试(n=11)。结果表明,玩家在两个版本中使用了不同的策略,游戏机制对输出故事、玩家的创作过程以及玩家的角色认知产生了显著影响,而不同背景的玩家也表现出了不同的偏好。这些结果表明,在设计杂合式游戏时需要考虑一些考量。这项工作的目的是鼓励设计有趣的共同创作经验。

Large Language Models for Supply Chain Optimization

  • paper_url: http://arxiv.org/abs/2307.03875
  • repo_url: https://github.com/jettbrains/-L-
  • paper_authors: Beibin Li, Konstantina Mellou, Bo Zhang, Jeevan Pathuri, Ishai Menache
  • for: 该研究旨在使用大语言模型(LLM)提高供应链自动化的可理解度和信任度。
  • methods: 研究提出了一个名为 OptiGuide 的框架,可以接受普通文本查询,并输出供应链优化结果的概念性解释。该框架不会抛弃现有的可combined optimization技术,而是通过解决 what-if 问题(例如,如果使用提供商 B 而不是提供商 A 来满足某个需求,则cost 会如何变化?)来提供可衡量的答案。
  • results: 研究在 Microsoft 云供应链中实现了一个真实的服务器分布式enario,并开发了一个通用的评估标准,可以用于评估其他情况下 LLM 输出的准确性。
    Abstract Supply chain operations traditionally involve a variety of complex decision making problems. Over the last few decades, supply chains greatly benefited from advances in computation, which allowed the transition from manual processing to automation and cost-effective optimization. Nonetheless, business operators still need to spend substantial efforts in explaining and interpreting the optimization outcomes to stakeholders. Motivated by the recent advances in Large Language Models (LLMs), we study how this disruptive technology can help bridge the gap between supply chain automation and human comprehension and trust thereof. We design OptiGuide -- a framework that accepts as input queries in plain text, and outputs insights about the underlying optimization outcomes. Our framework does not forgo the state-of-the-art combinatorial optimization technology, but rather leverages it to quantitatively answer what-if scenarios (e.g., how would the cost change if we used supplier B instead of supplier A for a given demand?). Importantly, our design does not require sending proprietary data over to LLMs, which can be a privacy concern in some circumstances. We demonstrate the effectiveness of our framework on a real server placement scenario within Microsoft's cloud supply chain. Along the way, we develop a general evaluation benchmark, which can be used to evaluate the accuracy of the LLM output in other scenarios.
    摘要 供应链操作涉及到许多复杂的决策问题。过去几十年,供应链受到计算技术的进步,从手动处理转变到自动化和成本效果的优化。然而,商业运营者仍需投入大量的努力来解释和解读优化结果给潜在投资者。鼓励了最近的大语言模型(LLM)的进步,我们研究如何使用这种破坏技术来bridging供应链自动化和人类理解和信任之间的差距。我们设计了OptiGuide框架,它接受输入是简单文本问题,并输出供应链优化结果的凝视。我们的框架不会抛弃现有的组合优化技术,而是利用它来回答具有优化结果的what-ifenario(例如,如果使用supplier B而不是supplier A来满足一个需求, Then what would be the cost?)。重要的是,我们的设计不会将专有数据传输到LLM,这可能会在某些情况下成为隐私问题。我们在Microsoft云供应链中进行了一个真实的服务部署场景,并开发了一个通用的评估标准,可以用于评估LLM输出的准确性在其他场景中。

Domain Adaptation using Silver Standard Labels for Ki-67 Scoring in Digital Pathology: A Step Closer to Widescale Deployment

  • paper_url: http://arxiv.org/abs/2307.03872
  • repo_url: None
  • paper_authors: Amanda Dy, Ngoc-Nhu Jennifer Nguyen, Seyed Hossein Mirjahanmardi, Melanie Dawe, Anthony Fyles, Wei Shi, Fei-Fei Liu, Dimitrios Androutsos, Susan Done, April Khademi
  • for: 提高基因矫正评分的对 объекivity和效率
  • methods: 使用无监督框架生成目标领域的银标签,并将银标签与来源领域的金标签(GS)一起使用进行训练
  • results: 在两种验证的基因矫正架构(UV-Net和piNET)上测试了五种训练方案,其中SS+GS方法在目标数据上显示出最高的PI准确率(95.9%)和更一致的结果,并且分析了t-SNE图表表明SS+GS模型学习的特征更加适合源和目标数据,从而提高了通用性。
    Abstract Deep learning systems have been proposed to improve the objectivity and efficiency of Ki- 67 PI scoring. The challenge is that while very accurate, deep learning techniques suffer from reduced performance when applied to out-of-domain data. This is a critical challenge for clinical translation, as models are typically trained using data available to the vendor, which is not from the target domain. To address this challenge, this study proposes a domain adaptation pipeline that employs an unsupervised framework to generate silver standard (pseudo) labels in the target domain, which is used to augment the gold standard (GS) source domain data. Five training regimes were tested on two validated Ki-67 scoring architectures (UV-Net and piNET), (1) SS Only: trained on target silver standard (SS) labels, (2) GS Only: trained on source GS labels, (3) Mixed: trained on target SS and source GS labels, (4) GS+SS: trained on source GS labels and fine-tuned on target SS labels, and our proposed method (5) SS+GS: trained on source SS labels and fine-tuned on source GS labels. The SS+GS method yielded significantly (p < 0.05) higher PI accuracy (95.9%) and more consistent results compared to the GS Only model on target data. Analysis of t-SNE plots showed features learned by the SS+GS models are more aligned for source and target data, resulting in improved generalization. The proposed pipeline provides an efficient method for learning the target distribution without manual annotations, which are time-consuming and costly to generate for medical images. This framework can be applied to any target site as a per-laboratory calibration method, for widescale deployment.
    摘要 深度学习系统可以提高 Ki-67 PI 分数的 объектив性和效率。然而,深度学习技术在不同领域数据上表现不佳,这是临床翻译中的挑战。为解决这个挑战,本研究提出了一个领域适应管道,使用无监督框架生成目标领域的银标签(pseudo labels),并将其与源领域的黄金标签(GS)一起使用来补充数据。本研究测试了五种训练方案,包括(1)SS Only:在目标领域的银标签(SS)上训练,(2)GS Only:在源领域的黄金标签(GS)上训练,(3)Mixed:在目标领域的银标签和源领域的黄金标签上训练,(4)GS+SS:在源领域的黄金标签上训练,然后在目标领域的银标签上进行细化,以及我们的提议方法(5)SS+GS:在源领域的银标签上训练,然后在源领域的黄金标签上进行细化。SS+GS 方法在目标数据上显示出了明显高于GS Only模型的PI准确率(95.9%)和更一致的结果。分析t-SNE图示了SS+GS模型学习的特征与源和目标数据更加一致,导致了提高了总体化。这种管道可以高效地学习目标分布而无需手动标注,这些标注是医疗图像生成的时间消耗和成本高昂的。这种框架可以在任何目标站点上进行各实验室的调整,为广泛部署准备。

Personalized Resource Allocation in Wireless Networks: An AI-Enabled and Big Data-Driven Multi-Objective Optimization

  • paper_url: http://arxiv.org/abs/2307.03867
  • repo_url: None
  • paper_authors: Rawan Alkurd, Ibrahim Abualhaol, Halim Yanikomeroglu
  • for: 本文主要用于描述如何使用人工智能(AI)技术来优化无线网络设计和优化。
  • methods: 本文使用了AI技术来实现用户水平个性化,并且提出了一种基于大数据的智能层来微管无线网络资源。
  • results: 根据本文描述,使用AI技术可以在无线网络中实现用户水平个性化,并且可以在实时中微调网络资源以提高用户满意度和资源利用率。
    Abstract The design and optimization of wireless networks have mostly been based on strong mathematical and theoretical modeling. Nonetheless, as novel applications emerge in the era of 5G and beyond, unprecedented levels of complexity will be encountered in the design and optimization of the network. As a result, the use of Artificial Intelligence (AI) is envisioned for wireless network design and optimization due to the flexibility and adaptability it offers in solving extremely complex problems in real-time. One of the main future applications of AI is enabling user-level personalization for numerous use cases. AI will revolutionize the way we interact with computers in which computers will be able to sense commands and emotions from humans in a non-intrusive manner, making the entire process transparent to users. By leveraging this capability, and accelerated by the advances in computing technologies, wireless networks can be redesigned to enable the personalization of network services to the user level in real-time. While current wireless networks are being optimized to achieve a predefined set of quality requirements, the personalization technology advocated in this article is supported by an intelligent big data-driven layer designed to micro-manage the scarce network resources. This layer provides the intelligence required to decide the necessary service quality that achieves the target satisfaction level for each user. Due to its dynamic and flexible design, personalized networks are expected to achieve unprecedented improvements in optimizing two contradicting objectives in wireless networks: saving resources and improving user satisfaction levels.
    摘要 wireless 网络的设计和优化曾主要基于强大的数学和理论模型。然而,随着5G和以后的应用出现, wireless 网络的设计和优化将面临无前例的复杂性。因此,人工智能(AI)将在无线网络设计和优化中扮演重要的作用,因为它可以在实时解决极其复杂的问题中提供灵活和适应性。未来,AI 将在无线网络设计和优化中扮演重要的作用,允许用户水平个性化。通过感知人类的命令和情感,计算机将成为不侵入的,使整个过程透明给用户。通过这种能力,并且受计算技术的加速,无线网络可以被重新设计,以实时个性化网络服务,以达到每个用户的目标满意度。当前的无线网络被优化以实现一组预定的质量要求,而人性化技术在本文中提出的协助层,通过大数据驱动的智能层,为缺乏资源的网络资源进行微管理。这层提供了必要的智能,以确定每个用户需要的服务质量,以达到目标满意度水平。由于它的动态和灵活设计,个性化网络预计会实现无前例的改善,在无线网络中改善两个矛盾目标:节约资源和提高用户满意度水平。

Reinforcement and Deep Reinforcement Learning-based Solutions for Machine Maintenance Planning, Scheduling Policies, and Optimization

  • paper_url: http://arxiv.org/abs/2307.03860
  • repo_url: None
  • paper_authors: Oluwaseyi Ogunfowora, Homayoun Najjaran
  • for: 本研究目的是对维护规划和优化问题的应用 Reinforcement Learning 进行文献查询和分析。
  • methods: 本研究使用了 Reinforcement Learning 和深度 Reinforcement Learning 等数据驱动决策算法来开发智能维护计划。
  • results: 本研究通过对文献进行分类和摘要,提出了维护规划和优化问题的共同主题和研究方法,同时还提出了未来研究的方向和关键问题。
    Abstract Systems and machines undergo various failure modes that result in machine health degradation, so maintenance actions are required to restore them back to a state where they can perform their expected functions. Since maintenance tasks are inevitable, maintenance planning is essential to ensure the smooth operations of the production system and other industries at large. Maintenance planning is a decision-making problem that aims at developing optimum maintenance policies and plans that help reduces maintenance costs, extend asset life, maximize their availability, and ultimately ensure workplace safety. Reinforcement learning is a data-driven decision-making algorithm that has been increasingly applied to develop dynamic maintenance plans while leveraging the continuous information from condition monitoring of the system and machine states. By leveraging the condition monitoring data of systems and machines with reinforcement learning, smart maintenance planners can be developed, which is a precursor to achieving a smart factory. This paper presents a literature review on the applications of reinforcement and deep reinforcement learning for maintenance planning and optimization problems. To capture the common ideas without losing touch with the uniqueness of each publication, taxonomies used to categorize the systems were developed, and reviewed publications were highlighted, classified, and summarized based on these taxonomies. Adopted methodologies, findings, and well-defined interpretations of the reviewed studies were summarized in graphical and tabular representations to maximize the utility of the work for both researchers and practitioners. This work also highlights the research gaps, key insights from the literature, and areas for future work.
    摘要 系统和机器会经历多种失效模式,导致机器健康下降,因此维护工作是必要的来恢复它们回到可以进行预期功能的状态。由于维护任务是不可避免的,因此维护观察是必要的来确保生产系统和其他业界的顺畅运行。维护观察是一个决策问题,旨在发展最佳维护政策和计划,帮助降低维护成本,延长资产寿命,最大化资产可用性,并确保工作安全。对于系统和机器的状态监控数据,强化学习是一种资料驱动的决策算法,它在发展动态维护计划方面表现出色。透过对系统和机器的状态监控数据的强化学习,可以发展出智能维护观察器,这是智能工厂的前提。本文将介绍对维护观察和优化问题的应用强化学习和深度强化学习的文献综述。为了捕捉每篇文章的共同主题而不失其各自独特性,文章被分类和摘要,并以文献综述的形式呈现,以便 для研究人员和实践者对其具有最大的实用性。本文还 highlights 维护观察和优化问题的研究漏洞、文献综述中的主要意义和未来工作的方向。

Teach Me How to Learn: A Perspective Review towards User-centered Neuro-symbolic Learning for Robotic Surgical Systems

  • paper_url: http://arxiv.org/abs/2307.03853
  • repo_url: None
  • paper_authors: Amr Gomaa, Bilal Mahdy, Niko Kleer, Michael Feld, Frank Kirchner, Antonio Krüger
    for: 这项研究旨在开发一种人类在Loop学习 paradigma,用于教育机器人在术式和感知两个水平上进行学习,以提高机器人在手术过程中的性能。methods: 本研究使用了混合神经符号学习方法,其中包括人类在Loop学习和自动学习两个方面。results: 研究人员通过对各种手术任务进行评估,发现存在一些挑战,如机器人与外科医生之间的交互和feedback问题。此外,还发现了一些可能的解决方案,如在线学习和专家反馈。
    Abstract Recent advances in machine learning models allowed robots to identify objects on a perceptual nonsymbolic level (e.g., through sensor fusion and natural language understanding). However, these primarily black-box learning models still lack interpretation and transferability and require high data and computational demand. An alternative solution is to teach a robot on both perceptual nonsymbolic and conceptual symbolic levels through hybrid neurosymbolic learning approaches with expert feedback (i.e., human-in-the-loop learning). This work proposes a concept for this user-centered hybrid learning paradigm that focuses on robotic surgical situations. While most recent research focused on hybrid learning for non-robotic and some generic robotic domains, little work focuses on surgical robotics. We survey this related research while focusing on human-in-the-loop surgical robotic systems. This evaluation highlights the most prominent solutions for autonomous surgical robots and the challenges surgeons face when interacting with these systems. Finally, we envision possible ways to address these challenges using online apprenticeship learning based on implicit and explicit feedback from expert surgeons.
    摘要 Translated into Simplified Chinese:近期机器学习模型的进步使得机器人能够通过感知混合和自然语言理解识别物体,但这些主要是黑盒学习模型,仍然缺乏解释性和可 перенос性,并需要高度数据和计算资源。一种 altenative 解决方案是通过混合神经符号学习方法与专家反馈(即人在循环学习)来教育机器人。这个工作提出了一种以用户为中心的混合学习 парадиг,专注于 робо医学应用。虽然最近的研究主要集中在非机器人和一些通用机器人领域的混合学习,但对于手术机器人来说,有很少的研究。我们在这些相关研究中做了评估,主要关注人在循环手术机器系统中与这些系统的互动所遇到的挑战。最后,我们想象了使用在线循环学习,基于专家外科医生的显式和隐式反馈来解决这些挑战。

Optimal Learners for Realizable Regression: PAC Learning and Online Learning

  • paper_url: http://arxiv.org/abs/2307.03848
  • repo_url: None
  • paper_authors: Idan Attias, Steve Hanneke, Alkis Kalavasis, Amin Karbasi, Grigoris Velegkas
  • for: 本文的目的是Characterizing the statistical complexity of realizable regression in both PAC learning setting and online learning setting.
  • methods: 本文使用了一种新的维度来Characterize which classes of real-valued predictors are learnable, 以及一种新的维度来Characterize the minimax instance optimal cumulative loss up to a constant factor.
  • results: 本文提出了一个必要的 condition for learnability based on a combinatorial dimension related to the DS dimension, 并 conjecture that it may also be sufficient in this context. 此外,本文还解决了Daskalakis和Golowich在STOC ‘22中提出的一个开Question.
    Abstract In this work, we aim to characterize the statistical complexity of realizable regression both in the PAC learning setting and the online learning setting. Previous work had established the sufficiency of finiteness of the fat shattering dimension for PAC learnability and the necessity of finiteness of the scaled Natarajan dimension, but little progress had been made towards a more complete characterization since the work of Simon 1997 (SICOMP '97). To this end, we first introduce a minimax instance optimal learner for realizable regression and propose a novel dimension that both qualitatively and quantitatively characterizes which classes of real-valued predictors are learnable. We then identify a combinatorial dimension related to the Graph dimension that characterizes ERM learnability in the realizable setting. Finally, we establish a necessary condition for learnability based on a combinatorial dimension related to the DS dimension, and conjecture that it may also be sufficient in this context. Additionally, in the context of online learning we provide a dimension that characterizes the minimax instance optimal cumulative loss up to a constant factor and design an optimal online learner for realizable regression, thus resolving an open question raised by Daskalakis and Golowich in STOC '22.
    摘要 在这项工作中,我们目标是characterize realizable regression的统计复杂性在PAC学习 Setting和在线学习 Setting中。先前的工作已经证明了实izable regression的可学习性充分 suffices finiteness of the fat shattering dimension,但是没有进行更完整的Characterization。为此,我们首先引入一个最小最优学习器 для realizable regression,并提出一种新的维度来Characterize which classes of real-valued predictors are learnable。然后,我们 indentify a combinatorial dimension related to the Graph dimension that characterizes ERM learnability in the realizable setting。最后,我们确立了学习可能性的必要条件,基于一种 combinatorial dimension related to the DS dimension,并speculate that it may also be sufficient in this context。在在线学习 Setting中,我们提供了一个Characterize the minimax instance optimal cumulative loss up to a constant factor的维度,并设计了一个optimal online learner for realizable regression,这解决了Daskalakis和Golowich在STOC '22中提出的一个开问。

RADAR: Robust AI-Text Detection via Adversarial Learning

  • paper_url: http://arxiv.org/abs/2307.03838
  • repo_url: None
  • paper_authors: Xiaomeng Hu, Pin-Yu Chen, Tsung-Yi Ho
    for: 这篇论文主要目的是提出一个新的框架,即 RADAR,用于检测人工智能文本生成器(LLM)生成的文本是否真实,并且能够对抗LLM生成的各种抽象和重建文本。methods: 这篇论文提出了一个新的框架RADAR,其中包括两个主要部分:一个是一个强大的AI-text检测器,另一个是一个强大的伪文本生成器。这两个部分在训练时会互相影响,以提高AI-text检测器的准确性和适用范围。results: 实验结果显示,RADAR在8种不同的LLM中进行测试时,具有与现有AI-text检测方法相比的优秀性能,特别是在各种抽象和重建文本情况下。此外,RADAR还能够对不同的LLM进行强大的转移学习,并且透过GPT-3.5进行进一步的改进。
    Abstract Recent advances in large language models (LLMs) and the intensifying popularity of ChatGPT-like applications have blurred the boundary of high-quality text generation between humans and machines. However, in addition to the anticipated revolutionary changes to our technology and society, the difficulty of distinguishing LLM-generated texts (AI-text) from human-generated texts poses new challenges of misuse and fairness, such as fake content generation, plagiarism, and false accusation of innocent writers. While existing works show that current AI-text detectors are not robust to LLM-based paraphrasing, this paper aims to bridge this gap by proposing a new framework called RADAR, which jointly trains a Robust AI-text Detector via Adversarial leaRning. RADAR is based on adversarial training of a paraphraser and a detector. The paraphraser's goal is to generate realistic contents to evade AI-text detection. RADAR uses the feedback from the detector to update the paraphraser, and vice versa. Evaluated with 8 different LLMs (Pythia, Dolly 2.0, Palmyra, Camel, GPT-J, Dolly 1.0, LLaMA, and Vicuna) across 4 datasets, experimental results show that RADAR significantly outperforms existing AI-text detection methods, especially when paraphrasing is in place. We also identify the strong transferability of RADAR from instruction-tuned LLMs to other LLMs, and evaluate the improved capability of RADAR via GPT-3.5.
    摘要 近年的大语言模型(LLM)和智能客户端应用的普及,使得人机生成高质量文本的边界变得模糊。然而,这些技术和社会变革的潜在改变,也带来了新的困难,如假内容生成、抄袭和无辜的写作者被诬告。现有研究表明,现有的AI-文本检测器对LLM-基于重新推敲的文本不具有坚固的鲁棒性。这篇论文旨在bridging这个差距,提出了一种新的框架,即RADAR,它通过对抗学习训练一个Robust AI-文本检测器和一个重新推敲器。RADAR基于抗对抗训练,重新推敲器的目标是生成真实的内容,逃脱AI-文本检测器的检测。RADAR通过检测器对重新推敲器的反馈来更新重新推敲器,并 vice versa。在8种不同的LLM(Pythia、Dolly 2.0、Palmyra、Camel、GPT-J、Dolly 1.0、LLaMA和Vicuna)在4个数据集上进行了实验,结果显示,RADAR在AI-文本检测方面表现出色,特别是在重新推敲时。我们还发现了RADAR在受过特定指令的LLM上的强大传输性,并通过GPT-3.5进行评估,发现RADAR的改进能力。

Back to Optimization: Diffusion-based Zero-Shot 3D Human Pose Estimation

  • paper_url: http://arxiv.org/abs/2307.03833
  • repo_url: https://github.com/ipl-uw/ZeDO-Release
  • paper_authors: Zhongyu Jiang, Zhuoran Zhou, Lei Li, Wenhao Chai, Cheng-Yen Yang, Jenq-Neng Hwang
  • for: 3D human pose estimation (HPE) tasks in the wild, where traditional optimization-based methods have limited performance and learning-based methods have difficulty generalizing to new domains and scenarios.
  • methods: Zero-shot Diffusion-based Optimization (ZeDO) pipeline, which combines the advantages of optimization-based and learning-based methods by using a diffusion process to refine the pose estimates and a multi-hypothesis framework to handle cross-domain and in-the-wild variations.
  • results: state-of-the-art (SOTA) performance on Human3.6M and 3DPW datasets, with minMPJPE $51.4$mm and PA-MPJPE $42.6$mm, respectively, without requiring any 2D-3D or image-3D pairs for training.
    Abstract Learning-based methods have dominated the 3D human pose estimation (HPE) tasks with significantly better performance in most benchmarks than traditional optimization-based methods. Nonetheless, 3D HPE in the wild is still the biggest challenge of learning-based models, whether with 2D-3D lifting, image-to-3D, or diffusion-based methods, since the trained networks implicitly learn camera intrinsic parameters and domain-based 3D human pose distributions and estimate poses by statistical average. On the other hand, the optimization-based methods estimate results case-by-case, which can predict more diverse and sophisticated human poses in the wild. By combining the advantages of optimization-based and learning-based methods, we propose the Zero-shot Diffusion-based Optimization (ZeDO) pipeline for 3D HPE to solve the problem of cross-domain and in-the-wild 3D HPE. Our multi-hypothesis ZeDO achieves state-of-the-art (SOTA) performance on Human3.6M as minMPJPE $51.4$mm without training with any 2D-3D or image-3D pairs. Moreover, our single-hypothesis ZeDO achieves SOTA performance on 3DPW dataset with PA-MPJPE $42.6$mm on cross-dataset evaluation, which even outperforms learning-based methods trained on 3DPW.
    摘要 学习基于方法在3D人姿估计任务中占据主导地位,在大多数标准准则上比传统优化基于方法更好表现。然而,3D人姿估计在野外仍然是学习基于方法模型的最大挑战,无论是2D-3D升级、图像-3D或扩散基于方法。这是因为训练的网络隐藏状态参数和领域基于3D人姿分布,并估计姿势的统计平均值。相比之下,优化基于方法方法每个案例都预测结果,可以预测更多和更复杂的人姿。我们提出了基于零shot扩散优化(ZeDO)管道来解决跨领域和野外3D人姿估计问题。我们的多种假设ZeDO在人3.6M数据集上实现了最佳性(SOTA)性能,无需训练2D-3D或图像-3D对。此外,我们的单种假设ZeDO在3DPW数据集上实现了SOTA性能,与学习基于方法方法相比,即使在跨数据集评估中。

Effect of Intensity Standardization on Deep Learning for WML Segmentation in Multi-Centre FLAIR MRI

  • paper_url: http://arxiv.org/abs/2307.03827
  • repo_url: None
  • paper_authors: Abdollah Ghazvanchahi, Pejman Jahbedar Maralani, Alan R. Moody, April Khademi
  • for: 这个论文的目的是对FLAIR MR图像进行白 matter损变(WML)分割,以提高DL方法在不同成像中心的性能。
  • methods: 这个论文使用了多种抑制法,包括IALMLAB和其他流行的 нормализа法,以及一种 skip-connection UNet 模型。
  • results: 结果显示,IALMLAB和 ensemble 方法在各种维度上都有较高的WML分割性能,特别是在不同的成像中心数据上。
    Abstract Deep learning (DL) methods for white matter lesion (WML) segmentation in MRI suffer a reduction in performance when applied on data from a scanner or centre that is out-of-distribution (OOD) from the training data. This is critical for translation and widescale adoption, since current models cannot be readily applied to data from new institutions. In this work, we evaluate several intensity standardization methods for MRI as a preprocessing step for WML segmentation in multi-centre Fluid-Attenuated Inversion Recovery (FLAIR) MRI. We evaluate a method specifically developed for FLAIR MRI called IAMLAB along with other popular normalization techniques such as White-strip, Nyul and Z-score. We proposed an Ensemble model that combines predictions from each of these models. A skip-connection UNet (SC UNet) was trained on the standardized images, as well as the original data and segmentation performance was evaluated over several dimensions. The training (in-distribution) data consists of a single study, of 60 volumes, and the test (OOD) data is 128 unseen volumes from three clinical cohorts. Results show IAMLAB and Ensemble provide higher WML segmentation performance compared to models from original data or other normalization methods. IAMLAB & Ensemble have the highest dice similarity coefficient (DSC) on the in-distribution data (0.78 & 0.80) and on clinical OOD data. DSC was significantly higher for IAMLAB compared to the original data (p<0.05) for all lesion categories (LL>25mL: 0.77 vs. 0.71; 10mL<= LL<25mL: 0.66 vs. 0.61; LL<10mL: 0.53 vs. 0.52). The IAMLAB and Ensemble normalization methods are mitigating MRI domain shift and are optimal for DL-based WML segmentation in unseen FLAIR data.
    摘要 深度学习(DL)方法用于FLAIR MR中的白质损伤(WML)分 segmentation时,当应用于不同的扫描仪或中心数据时,性能会下降。这对于翻译和大规模应用而言是重要的,因为当前的模型无法 direct 应用于新机构的数据。在这种情况下,我们评估了多种INTENSITY STANDARDIZATION METHODS FOR MRI作为FLAIR MR分 segmentation的预处理步骤。我们评估了specifically developed for FLAIR MR的IAMLAB方法,以及其他流行的normalization技术,如White-strip、Nyul和Z-score。我们提出了一种ensemble模型,该模型将每个模型的预测结果进行combine。一个skip-connection UNET(SC UNNet)在标准化图像上进行训练,以及原始数据上。我们对标准化图像和原始数据进行分 segmentation性能进行评估。training(在 Distribution)数据包括60个Volume,测试(OOD)数据包括128个未看过的Volume从三个临床群体。结果显示,IAMLAB和Ensemble方法在WML分 segmentation中提供了更高的性能,相比于原始数据或其他normalization方法。IAMLAB和Ensemble方法在in-distribution数据上的DSC( dice similarity coefficient)为0.78和0.80,并在临床OOD数据上也有最高的DSC。相比原始数据,IAMLAB方法的DSC显著高于原始数据(p<0.05),对所有损伤分类(LL>25mL:0.77 vs. 0.71; 10mL≤ LL<25mL:0.66 vs. 0.61; LL<10mL:0.53 vs. 0.52)。IAMLAB和Ensemble normalization方法可以 Mitigate MRI DOMAIN SHIFT,是适用于DL-based WML分 segmentation的优选方法。

How does AI chat change search behaviors?

  • paper_url: http://arxiv.org/abs/2307.03826
  • repo_url: None
  • paper_authors: Robert Capra, Jaime Arguello
  • for: 这个研究是一个初步的调查,旨在研究用户在搜索过程中如何使用生成AI聊天系统(简称chat),以及将chat系统与现有搜索工具结合使用后,用户的搜索习惯和策略会受到什么影响。
  • methods: 这个研究使用了10名参与者,他们使用了一个组合的Chat+Search系统,该系统使用了OpenAI GPT-3.5 API和Bing Web Search v5 API。参与者完成了三个搜索任务。
  • results: 这个预印稿中报告了用户如何将AI聊天系统纳入搜索过程中,他们对聊天系统的喜好和不喜好,对聊天系统的信任度,以及他们对聊天系统生成回复的心理模型。
    Abstract Generative AI tools such as chatGPT are poised to change the way people engage with online information. Recently, Microsoft announced their "new Bing" search system which incorporates chat and generative AI technology from OpenAI. Google has announced plans to deploy search interfaces that incorporate similar types of technology. These new technologies will transform how people can search for information. The research presented here is an early investigation into how people make use of a generative AI chat system (referred to simply as chat from here on) as part of a search process, and how the incorporation of chat systems with existing search tools may effect users search behaviors and strategies. We report on an exploratory user study with 10 participants who used a combined Chat+Search system that utilized the OpenAI GPT-3.5 API and the Bing Web Search v5 API. Participants completed three search tasks. In this pre-print paper of preliminary results, we report on ways that users integrated AI chat into their search process, things they liked and disliked about the chat system, their trust in the chat responses, and their mental models of how the chat system generated responses.
    摘要 “生成AI工具如 chatGPT 将改变线上信息搜寻方式。微软最近宣布“新的Bing”搜寻系统,融合了OpenAI的 chat和生成AI技术。Google 也宣布将推出相似的技术。这些新技术将改变人们搜寻资讯的方式。本研究是探索用户如何使用生成AI chat系统(以下简称为“chat”)作为搜寻过程的一部分,以及将 chat 系统与现有的搜寻工具结合后对用户搜寻行为和策略的影响。”“我们进行了10名用户的exploratory用户研究,他们使用了 combine Chat+Search 系统,该系统使用 OpenAI GPT-3.5 API 和 Bing Web Search v5 API。用户完成了三个搜寻任务。在这个预印稿中,我们报告了用户如何将 chat 系统组合到搜寻过程中,他们喜欢和不喜欢 chat 系统,他们对 chat 系统的信任度,以及他们如何解释 chat 系统生成的回答。”

Exploring and Characterizing Large Language Models For Embedded System Development and Debugging

  • paper_url: http://arxiv.org/abs/2307.03817
  • repo_url: None
  • paper_authors: Zachary Englhardt, Richard Li, Dilini Nissanka, Zhihan Zhang, Girish Narayanswamy, Joseph Breda, Xin Liu, Shwetak Patel, Vikram Iyer
    for:这种论文旨在评估领先的大语言模型(GPT-3.5、GPT-4、PaLM 2)在嵌入式系统开发中的表现,以及人工程序员与这些工具之间的交互方式。methods:该论文采用端到端硬件在Loop(HIL)评估平台来验证LLM生成的程序,并对N=450个实验进行比较。同时,该论文还开发了一种基于AI的软件工程办事处程序,用于建立嵌入式系统。results:研究发现,GPT-4特别表现出跨领域理解和逻辑能力,在某些情况下可以从单个提示生成完全正确的程序。在N=50次实验中,GPT-4生成的I2C接口66%的时间能够正常工作。此外,GPT-4还生成了特定的储存器驱动程序、LoRa通信程序和Context特定的电源优化程序,使nRF52程序的电流减少至12.2 uA,减少了740倍。
    Abstract Large language models (LLMs) have shown remarkable abilities to generate code, however their ability to develop software for embedded systems, which requires cross-domain knowledge of hardware and software has not been studied. In this paper we systematically evaluate leading LLMs (GPT-3.5, GPT-4, PaLM 2) to assess their performance for embedded system development, study how human programmers interact with these tools, and develop an AI-based software engineering workflow for building embedded systems. We develop an an end-to-end hardware-in-the-loop evaluation platform for verifying LLM generated programs using sensor actuator pairs. We compare all three models with N=450 experiments and find surprisingly that GPT-4 especially shows an exceptional level of cross-domain understanding and reasoning, in some cases generating fully correct programs from a single prompt. In N=50 trials, GPT-4 produces functional I2C interfaces 66% of the time. GPT-4 also produces register-level drivers, code for LoRa communication, and context-specific power optimizations for an nRF52 program resulting in over 740x current reduction to 12.2 uA. We also characterize the models' limitations to develop a generalizable workflow for using LLMs in embedded system development. We evaluate the workflow with 15 users including novice and expert programmers. We find that our workflow improves productivity for all users and increases the success rate for building a LoRa environmental sensor from 25% to 100%, including for users with zero hardware or C/C++ experience.
    摘要 大型语言模型(LLM)已经表现出杰出的代码生成能力,但它们对嵌入式系统开发,需要跨领域知识的开发仍未被研究。在这篇文章中,我们系统评估了主流LLM(GPT-3.5、GPT-4、PaLM 2)的表现,了解人工开发者与这些工具之间的互动,并开发了基于人工智能的软件工程生命周期 workflow,用于建立嵌入式系统。我们开发了一个终端硬件在Loop评估平台,用于验证LLM生成的程式码。我们对N=450实验进行比较,发现GPT-4尤其表现出跨领域理解和推理的特别高水平,有时候从单一提示中生成完全正确的程式码。在N=50试验中,GPT-4产生了66%的正常I2C接口。GPT-4还产生了对应的储存器驱动程式码、LoRa通信程式码和特定应用程序码,导致nRF52程式的电流降低至12.2 uA,实现了740倍的电流增强。我们还评估了模型的限制,以发展一个通用的工作流程,用于在嵌入式系统开发中使用LLM。我们将这个工作流程评估了15名使用者,包括初学者和高级程序员。我们发现,我们的工作流程可以帮助所有使用者提高生产力,并将将LoRa环境感应器的建立率由25%提高至100%,包括零硬件或C/C++经验的使用者。

For Women, Life, Freedom: A Participatory AI-Based Social Web Analysis of a Watershed Moment in Iran’s Gender Struggles

  • paper_url: http://arxiv.org/abs/2307.03764
  • repo_url: None
  • paper_authors: Adel Khorramrouz, Sujan Dutta, Ashiqur R. KhudaBukhsh
  • for: 这篇论文目的是计算推特语言对妇女平等的推动。
  • methods: 这篇论文使用了一个ensemble活动学习管道,以训练一个立场分类器。该管道中,伊朗女性参与了一个活跃的角色,不仅为标注提供标签,还提供了有价值的关键词和更具有意义的文档样本,以便更好地建立AI系统。
  • results: 分析结果显示,马哈赛妮·阿米尼的死亡引发了一些极化的推特语言讨论,增加了对妇女平等的负面和正面推文。正面推文的增加略大于负面推文的增加。此外,对于账户创建时间,与国家对齐的推特账户和oprotest推特账户之间,oprotest账户更像基线波斯语推特活动。
    Abstract In this paper, we present a computational analysis of the Persian language Twitter discourse with the aim to estimate the shift in stance toward gender equality following the death of Mahsa Amini in police custody. We present an ensemble active learning pipeline to train a stance classifier. Our novelty lies in the involvement of Iranian women in an active role as annotators in building this AI system. Our annotators not only provide labels, but they also suggest valuable keywords for more meaningful corpus creation as well as provide short example documents for a guided sampling step. Our analyses indicate that Mahsa Amini's death triggered polarized Persian language discourse where both fractions of negative and positive tweets toward gender equality increased. The increase in positive tweets was slightly greater than the increase in negative tweets. We also observe that with respect to account creation time, between the state-aligned Twitter accounts and pro-protest Twitter accounts, pro-protest accounts are more similar to baseline Persian Twitter activity.
    摘要 在本文中,我们提出了一种计算机分析方法,用于分析Twitter上的波斯语言讨论,以估计在贝娅·艾米尼在警察执法中去世后,对男女平等的态度发生了哪些变化。我们提出了一个ensemble活动学习管道,用于训练一个立场分类器。我们的创新在于,伊朗女性参与了这个人工智能系统的建构过程中,不仅提供标签,还提供了有价值的关键词,以及一些导向采样步骤中的短文案例。我们的分析表明,贝娅·艾米尼的去世导致了波斯语言讨论的各种极化,正面和负面的推文数量均增加,正面推文数量略大于负面推文数量。此外,我们发现,在帐户创建时间方面,抗护护护 Twitter 帐户与支持抗议 Twitter 帐户之间,后者更加类似于基线波斯 Twitter 活动。

URL: A Representation Learning Benchmark for Transferable Uncertainty Estimates

  • paper_url: http://arxiv.org/abs/2307.03810
  • repo_url: https://github.com/mkirchhof/url
  • paper_authors: Michael Kirchhof, Bálint Mucsányi, Seong Joon Oh, Enkelejda Kasneci
  • for: 本研究旨在提出一个名为“Uncertainty-aware Representation Learning”(URL)的测试库,用于评估对于不同类型的资料集进行转换的表达学习模型。
  • methods: 本研究使用了十一种不同的不确定量化方法,从ImageNet上进行预训练,然后转移到八个下游资料集上进行评估。
  • results: 研究结果显示,专注于表达本身的不确定性或直接估计预测风险的方法表现较好,但实现可转移的不确定量化仍然是一个开启的挑战。
    Abstract Representation learning has significantly driven the field to develop pretrained models that can act as a valuable starting point when transferring to new datasets. With the rising demand for reliable machine learning and uncertainty quantification, there is a need for pretrained models that not only provide embeddings but also transferable uncertainty estimates. To guide the development of such models, we propose the Uncertainty-aware Representation Learning (URL) benchmark. Besides the transferability of the representations, it also measures the zero-shot transferability of the uncertainty estimate using a novel metric. We apply URL to evaluate eleven uncertainty quantifiers that are pretrained on ImageNet and transferred to eight downstream datasets. We find that approaches that focus on the uncertainty of the representation itself or estimate the prediction risk directly outperform those that are based on the probabilities of upstream classes. Yet, achieving transferable uncertainty quantification remains an open challenge. Our findings indicate that it is not necessarily in conflict with traditional representation learning goals. Code is provided under https://github.com/mkirchhof/url .
    摘要 <>将文本翻译成简化中文。>现代化学习中,表现学习已经带来了大量的预训练模型,可以作为新数据集的起点进行转移。随着可靠机器学习和不确定量评估的需求增加,需要开发可以提供嵌入和传输不确定度估计的预训练模型。为了导引这些模型的发展,我们提出了不确定性感知学习(URL) benchmark。除了表达的传输性外,它还测量了零批量传输不确定度估计的新度量。我们使用 URL 评估了 eleven 种 ImageNet 预训练的不确定度估计器,并将它们转移到八个下游数据集。我们发现,关注表达本身的不确定度或直接估计预测风险的方法表现较好。然而,实现传输不确定度估计仍然是一个开放的挑战。我们的发现表明,这并不一定与传统表现学习目标在冲突。代码可以在 获取。

CLIPMasterPrints: Fooling Contrastive Language-Image Pre-training Using Latent Variable Evolution

  • paper_url: http://arxiv.org/abs/2307.03798
  • repo_url: https://github.com/matfrei/clipmasterprints
  • paper_authors: Matthias Freiberger, Peter Kun, Anders Sundnes Løvlie, Sebastian Risi
  • for: This paper demonstrates the vulnerability of Contrastive Language-Image Pre-training (CLIP) models to “fooling master images” that can manipulate the model’s confidence score for a wide range of prompts, while being unrecognizable to humans.
  • methods: The authors mine fooling master images by searching the latent space of generative models using evolution strategies or stochastic gradient descent. They investigate the properties of these mined images and find that they can generalize to a large number of semantically related captions.
  • results: The authors evaluate two possible mitigation strategies and find that the vulnerability to fooling master examples is closely related to a modality gap in contrastive pre-trained multi-modal networks. They argue for the mitigation of modality gaps in CLIP and related multi-modal approaches to improve their robustness.Here’s the full summary in Simplified Chinese:
  • for: 这个论文展示了CLIP模型对”骗主图像”的感受性,这些图像可以让CLIP模型对广泛的提示中的大量提示的确idence得分高,而人类则无法识别。
  • methods: 作者通过演化策略或批处 gradient descent 搜索生成模型的latent空间,挖掘出可以让CLIP模型对广泛的提示中的大量提示的确idence得分高的”骗主图像”。他们investigate这些挖掘出来的图像的性质,发现它们可以泛化到大量相关的提示中。
  • results: 作者评估了两种可能的防御策略,发现模态差在相对的多modal网络中对CLIP模型的感受性具有直接关系。他们因此 argues for在CLIP和相关多modal方法中减少模态差以提高其Robustness。
    Abstract Models leveraging both visual and textual data such as Contrastive Language-Image Pre-training (CLIP), are increasingly gaining importance. In this work, we show that despite their versatility, such models are vulnerable to what we refer to as fooling master images. Fooling master images are capable of maximizing the confidence score of a CLIP model for a significant number of widely varying prompts, while being unrecognizable for humans. We demonstrate how fooling master images can be mined by searching the latent space of generative models by means of an evolution strategy or stochastic gradient descent. We investigate the properties of the mined fooling master images, and find that images trained on a small number of image captions potentially generalize to a much larger number of semantically related captions. Further, we evaluate two possible mitigation strategies and find that vulnerability to fooling master examples is closely related to a modality gap in contrastive pre-trained multi-modal networks. From the perspective of vulnerability to off-manifold attacks, we therefore argue for the mitigation of modality gaps in CLIP and related multi-modal approaches. Source code and mined CLIPMasterPrints are available at https://github.com/matfrei/CLIPMasterPrints.
    摘要 模型结合视觉和文本数据,如对照语言图像预训练(CLIP),在当前研究中变得越来越重要。在这项工作中,我们表明,即使这些模型具有多样性,它们却容易受到我们称为“诡异主图”的攻击。诡异主图可以让CLIP模型对许多Semantic的提示具有最高的信任度,而这些图像对人类来说是不可识别的。我们通过搜索生成模型的latent空间使用演化策略或Stochastic gradient descent来挖掘诡异主图。我们研究了挖掘出来的诡异主图的性质,发现图像通过一小 número de图像描述学习可以涵盖一个许多更广泛的Semantic相关的描述。此外,我们评估了两种可能的缓解策略,发现模式差异导致的攻击性质与CLIP和相关多模态网络的易训练性相关。从攻击性角度来看,我们因此主张在CLIP和相关多模态网络中减少模式差异,以避免诡异主图的攻击。代码和挖掘出来的CLIPMasterPrints可以在https://github.com/matfrei/CLIPMasterPrints中找到。

When does the ID algorithm fail?

  • paper_url: http://arxiv.org/abs/2307.03750
  • repo_url: https://github.com/SOYJUN/Implement-ODR-protocol
  • paper_authors: Ilya Shpitser
  • for: 本文研究了一种解决图解 causal 模型中 interventional 分布 p(Y | do(a)) 的问题,即 ID 算法。
  • methods: 本文使用了 ID 算法,并提供了一些其他形式的表述。
  • results: 本文证明了 ID 算法是有声称的(对于输入 graphical causal model 中的 p(Y | do(a)) 进行正确的函数),并且完整(如果输入不能被模型中的 p(Y | do(a)) 进行正确的函数,则会显式地标识为失败)。
    Abstract The ID algorithm solves the problem of identification of interventional distributions of the form p(Y | do(a)) in graphical causal models, and has been formulated in a number of ways [12, 9, 6]. The ID algorithm is sound (outputs the correct functional of the observed data distribution whenever p(Y | do(a)) is identified in the causal model represented by the input graph), and complete (explicitly flags as a failure any input p(Y | do(a)) whenever this distribution is not identified in the causal model represented by the input graph). The reference [9] provides a result, the so called "hedge criterion" (Corollary 3), which aims to give a graphical characterization of situations when the ID algorithm fails to identify its input in terms of a structure in the input graph called the hedge. While the ID algorithm is, indeed, a sound and complete algorithm, and the hedge structure does arise whenever the input distribution is not identified, Corollary 3 presented in [9] is incorrect as stated. In this note, I outline the modern presentation of the ID algorithm, discuss a simple counterexample to Corollary 3, and provide a number of graphical characterizations of the ID algorithm failing to identify its input distribution.
    摘要 《ID算法解决了 causal 模型中 intervenational 分布的形式 p(Y | do(a)) 的问题,并已经有多种表述方式 [12, 9, 6]。ID算法是有效的(对于输入数据分布 p(Y | do(a)) 是 causal 模型中的正确函数),并且是完整的(明确地标识输入数据分布 p(Y | do(a)) 不能在 causal 模型中被识别)。参考 [9] 提供了一个名为 "防茂 критерион"(悬峰3)的结果,它目的是给出一种图解方式,用于描述 ID 算法输入分布不能被识别的情况。然而,ID 算法确实是一个有效和完整的算法,而且防茂结构在输入分布不能被识别时会出现。在本文中,我将详细介绍 ID 算法的现代表述方法,提供一个简单的反例,以及一些图解方式,用于描述 ID 算法输入分布不能被识别的情况。

AI and the EU Digital Markets Act: Addressing the Risks of Bigness in Generative AI

  • paper_url: http://arxiv.org/abs/2308.02033
  • repo_url: None
  • paper_authors: Ayse Gizem Yasar, Andrew Chong, Evan Dong, Thomas Krendl Gilbert, Sarah Hladikova, Roland Maio, Carlos Mougan, Xudong Shen, Shubham Singh, Ana-Andreea Stoica, Savannah Thais, Miri Zilka
  • for: 本研究旨在逐doFiltering the risks of bigness in digital markets, particularly in relation to generative AI systems.
  • methods: 作者提议 integrate certain AI software as core platform services,并 классифици certain developers as gatekeepers under the EU’s Digital Markets Act (DMA).
  • results: 本研究提出了一种评估 gatekeeper obligations的方法,以确保它们覆盖 generative AI services。这些结果可以帮助欧盟在考虑 generative AI 特定规则和可能的 DMA 修订时,更好地保持多样性和开放性在 generative AI 服务中。
    Abstract As AI technology advances rapidly, concerns over the risks of bigness in digital markets are also growing. The EU's Digital Markets Act (DMA) aims to address these risks. Still, the current framework may not adequately cover generative AI systems that could become gateways for AI-based services. This paper argues for integrating certain AI software as core platform services and classifying certain developers as gatekeepers under the DMA. We also propose an assessment of gatekeeper obligations to ensure they cover generative AI services. As the EU considers generative AI-specific rules and possible DMA amendments, this paper provides insights towards diversity and openness in generative AI services.
    摘要 随着人工智能技术的快速发展,大型数字市场的风险问题也在不断增长。欧盟的数字市场法(DMA)想要解决这些问题。然而,现有的框架可能不够覆盖生成AI系统,这些系统可能会成为AI服务的门户。这篇文章提议将某些AI软件作为核心平台服务 интегра进DMA,并将某些开发者 классифици为DMA中的“门槛keeper”。我们还提议对门槛keeper的义务进行评估,以确保它们覆盖生成AI服务。随着欧盟考虑生成AI特有的规则和可能的DMA修改,这篇文章提供了关于多样性和开放性在生成AI服务方面的洞察。

Intelligent Robotic Sonographer: Mutual Information-based Disentangled Reward Learning from Few Demonstrations

  • paper_url: http://arxiv.org/abs/2307.03705
  • repo_url: None
  • paper_authors: Zhongliang Jiang, Yuan Bi, Mingchuan Zhou, Ying Hu, Michael Burke, Nassir Navab
    for:The paper proposes an intelligent robotic sonographer to autonomously explore target anatomies and navigate a US probe to a relevant 2D plane by learning from expert.methods:The proposed approach uses a neural reward function, ranked pairwise image comparisons, and mutual information to learn the “language of sonography” and overcome inter-patient variations. A Gaussian distribution-based filter is also developed to evaluate the quality of the expert’s demonstrations.results:The proposed approach is demonstrated to be effective in different experiments, including representative experiments for the “line” target and “point” target on vascular phantom and two ex-vivo animal organ phantoms, respectively. The results showed that the proposed advanced framework can robustly work on different kinds of known and unseen phantoms.
    Abstract Ultrasound (US) imaging is widely used for biometric measurement and diagnosis of internal organs due to the advantages of being real-time and radiation-free. However, due to high inter-operator variability, resulting images highly depend on operators' experience. In this work, an intelligent robotic sonographer is proposed to autonomously "explore" target anatomies and navigate a US probe to a relevant 2D plane by learning from expert. The underlying high-level physiological knowledge from experts is inferred by a neural reward function, using a ranked pairwise image comparisons approach in a self-supervised fashion. This process can be referred to as understanding the "language of sonography". Considering the generalization capability to overcome inter-patient variations, mutual information is estimated by a network to explicitly extract the task-related and domain features in latent space. Besides, a Gaussian distribution-based filter is developed to automatically evaluate and take the quality of the expert's demonstrations into account. The robotic localization is carried out in coarse-to-fine mode based on the predicted reward associated to B-mode images. To demonstrate the performance of the proposed approach, representative experiments for the "line" target and "point" target are performed on vascular phantom and two ex-vivo animal organ phantoms (chicken heart and lamb kidney), respectively. The results demonstrated that the proposed advanced framework can robustly work on different kinds of known and unseen phantoms.
    摘要 超声成像(US)成为内部器官诊断和生物米特ри评估的广泛应用,主要原因是它具有实时和无核燐灰的优点。然而,由于操作员之间的高度变化,导致图像具有操作员的经验效应。在这项工作中,一个智能机器人超声测试员被提议,以自主地"探索"目标解剖结构,并使用学习从专家得到的高级生理知识来导航US prob。这种过程可以称为"医学超声语言"的理解。另外,为了强化图像的泛化能力,一个基于Gaussian分布的筛选器被开发,以自动评估专家的示范质量。机器人本地化采用了从预测的奖励关系来进行粗略到细粒的模式,以实现对B模式图像的预测。为证明提出的方法的性能,对不同类型的知名和未知荟袋(vascular phantom和两只酪肉动物器官荟袋)进行了代表性的实验。结果表明,提出的高级框架可以在不同类型的荟袋上具有robust性。

Unveiling the Potential of Knowledge-Prompted ChatGPT for Enhancing Drug Trafficking Detection on Social Media

  • paper_url: http://arxiv.org/abs/2307.03699
  • repo_url: None
  • paper_authors: Chuanbo Hu, Bin Liu, Xin Li, Yanfang Ye
    for:The paper aims to detect illicit drug trafficking activities on social media, specifically on platforms like Instagram and Twitter.methods:The authors use large language models (LLMs), such as ChatGPT, to detect drug trafficking activities. They propose an analytical framework that incorporates prior knowledge and scenario-based prompts to improve the accuracy of drug trafficking detection.results:The proposed framework outperforms other baseline language models in terms of drug trafficking detection accuracy, with a remarkable improvement of nearly 12%. The use of prior knowledge and scenario-based prompts helps ChatGPT effectively identify and label drug trafficking activities, even in the presence of deceptive language and euphemisms used by drug dealers to evade detection.
    Abstract Social media platforms such as Instagram and Twitter have emerged as critical channels for drug marketing and illegal sale. Detecting and labeling online illicit drug trafficking activities becomes important in addressing this issue. However, the effectiveness of conventional supervised learning methods in detecting drug trafficking heavily relies on having access to substantial amounts of labeled data, while data annotation is time-consuming and resource-intensive. Furthermore, these models often face challenges in accurately identifying trafficking activities when drug dealers use deceptive language and euphemisms to avoid detection. To overcome this limitation, we conduct the first systematic study on leveraging large language models (LLMs), such as ChatGPT, to detect illicit drug trafficking activities on social media. We propose an analytical framework to compose \emph{knowledge-informed prompts}, which serve as the interface that humans can interact with and use LLMs to perform the detection task. Additionally, we design a Monte Carlo dropout based prompt optimization method to further to improve performance and interpretability. Our experimental findings demonstrate that the proposed framework outperforms other baseline language models in terms of drug trafficking detection accuracy, showing a remarkable improvement of nearly 12\%. By integrating prior knowledge and the proposed prompts, ChatGPT can effectively identify and label drug trafficking activities on social networks, even in the presence of deceptive language and euphemisms used by drug dealers to evade detection. The implications of our research extend to social networks, emphasizing the importance of incorporating prior knowledge and scenario-based prompts into analytical tools to improve online security and public safety.
    摘要 社交媒体平台如Instagram和Twitter已成为药品市场和非法销售的重要渠道。检测和标注在线贩卖药品活动变得非常重要,以解决这个问题。然而,传统的监督学习方法在检测贩卖药品上凭借具有较大量的标注数据进行检测是不可靠的,而且这些模型经常遇到识别贩卖药品活动时,贩卖者使用欺骗语言和推荐词来避免检测的问题。为了解决这些限制,我们进行了第一次系统性的研究,利用大型自然语言模型(LLM),如ChatGPT,检测社交媒体上的贩卖药品活动。我们提出了一个分析框架,将知识告诉作为界面,让人们通过LLM进行检测任务。此外,我们还设计了基于Monte Carlo Dropout的提示优化方法,以提高性能和可解释性。我们的实验结果表明,我们的框架在贩卖药品检测精度方面比基eline语言模型提高了12%以上。通过将知识和我们提出的提示结合使用,ChatGPT可以在社交网络上有效地识别和标注贩卖药品活动,即使贩卖者使用欺骗语言和推荐词来避免检测。我们的研究结果对社交网络产生重要的扩展,强调在线安全和公共安全中包含知识和场景基于的分析工具的重要性。

Scalable Membership Inference Attacks via Quantile Regression

  • paper_url: http://arxiv.org/abs/2307.03694
  • repo_url: None
  • paper_authors: Martin Bertran, Shuai Tang, Michael Kearns, Jamie Morgenstern, Aaron Roth, Zhiwei Steven Wu
  • for: 本研究旨在攻击用黑盒训练的模型,以确定特定示例是否被用于训练。
  • methods: 本研究使用量化回归来攻击模型,并不需要知道模型的结构。
  • results: 实验结果表明,本方法可以与现有最佳攻击方法竞争,而且需要训练只一个模型,相比之前的攻击方法需要训练多个模型。
    Abstract Membership inference attacks are designed to determine, using black box access to trained models, whether a particular example was used in training or not. Membership inference can be formalized as a hypothesis testing problem. The most effective existing attacks estimate the distribution of some test statistic (usually the model's confidence on the true label) on points that were (and were not) used in training by training many \emph{shadow models} -- i.e. models of the same architecture as the model being attacked, trained on a random subsample of data. While effective, these attacks are extremely computationally expensive, especially when the model under attack is large. We introduce a new class of attacks based on performing quantile regression on the distribution of confidence scores induced by the model under attack on points that are not used in training. We show that our method is competitive with state-of-the-art shadow model attacks, while requiring substantially less compute because our attack requires training only a single model. Moreover, unlike shadow model attacks, our proposed attack does not require any knowledge of the architecture of the model under attack and is therefore truly ``black-box". We show the efficacy of this approach in an extensive series of experiments on various datasets and model architectures.
    摘要 成员推测攻击是设计用黑盒访问训练过的模型来确定特定示例是否在训练中使用过。成员推测可以形式化为一个假设测试问题。现有最有效的攻击方法是估计模型在真实标签上的信任度分布中的某些测试统计(通常是模型对真实标签的信任度)。这些攻击通常是非常计算昂贵的,特别是当模型被攻击时非常大。我们介绍了一种新的攻击方法,基于模型下攻击的信任度分布中的confidence分布进行量化回归。我们显示了我们的方法与现有的阴影模型攻击相当竞争力,而需要更少的计算资源,因为我们的攻击只需要训练一个模型。此外,我们的提议的攻击方法不需要知道模型下攻击的结构,因此是真正的“黑盒”攻击。我们在不同的 dataset 和模型结构上进行了广泛的实验,以证明这种方法的可行性。