cs.AI - 2023-09-20

RAI4IoE: Responsible AI for Enabling the Internet of Energy

  • paper_url: http://arxiv.org/abs/2309.11691
  • repo_url: None
  • paper_authors: Minhui Xue, Surya Nepal, Ling Liu, Subbu Sethuvenkatraman, Xingliang Yuan, Carsten Rudolph, Ruoxi Sun, Greg Eisenhauer
  • for: 这项研究旨在开发一个公平且负责任的AI框架,以便在互联网能源(IoE)中实现可靠的能源分布。
  • methods: 该研究使用了先进的5G-6G网络和AI技术,以连接和 инте格力化可再生分布能源资源(DERs),如电动车、存储电池、风力发电和太阳能电池。这使得DER所有者作为生产者和消费者(prosumers)可以参与能源市场,并从中获得经济收益。
  • results: 该研究的目标是确保社区成员的公平参与,并负责使用他们的数据,以便在IoE中提供安全、可靠和可再生的能源服务。
    Abstract This paper plans to develop an Equitable and Responsible AI framework with enabling techniques and algorithms for the Internet of Energy (IoE), in short, RAI4IoE. The energy sector is going through substantial changes fueled by two key drivers: building a zero-carbon energy sector and the digital transformation of the energy infrastructure. We expect to see the convergence of these two drivers resulting in the IoE, where renewable distributed energy resources (DERs), such as electric cars, storage batteries, wind turbines and photovoltaics (PV), can be connected and integrated for reliable energy distribution by leveraging advanced 5G-6G networks and AI technology. This allows DER owners as prosumers to participate in the energy market and derive economic incentives. DERs are inherently asset-driven and face equitable challenges (i.e., fair, diverse and inclusive). Without equitable access, privileged individuals, groups and organizations can participate and benefit at the cost of disadvantaged groups. The real-time management of DER resources not only brings out the equity problem to the IoE, it also collects highly sensitive location, time, activity dependent data, which requires to be handled responsibly (e.g., privacy, security and safety), for AI-enhanced predictions, optimization and prioritization services, and automated management of flexible resources. The vision of our project is to ensure equitable participation of the community members and responsible use of their data in IoE so that it could reap the benefits of advances in AI to provide safe, reliable and sustainable energy services.
    摘要 这份研究报告计划开发一个公平和负责任的人工智能框架(RAI4IoE),用于互联网能源(IoE)领域。能源领域正在经历重大变革,这两个关键驱动因素:建立零碳素能源产业和能源基础设施的数字变革。我们预计这两个驱动因素会相互交集,导致IoE的出现,其中可再生分布式能源资源(DERs),如电动车、存储电池、风力发电和太阳能电池(PV),可以相互连接和集成,以实现可靠的能源分布,通过利用先进的5G-6G网络和人工智能技术。这允许DER所有者作为生产者和消费者(prosumers)参与能源市场,从而获得经济收益。DERs本身具有资产驱动的特点,面临公平挑战(例如,公平、多样化和包容)。如果没有公平访问,特权个人、组织和集团可以参与和获得利益,而受折磨的群体则被排除在外。IoE实时管理DER资源不仅抛出了公平问题,还收集了高度敏感的地点、时间、活动依赖数据,需要负责任地处理(例如,隐私、安全和安全),以便通过人工智能技术提供了预测、优化和优先级服务,自动管理灵活资源。我们的项目视图是确保社区成员公平参与IoE,并负责使用他们的数据,以便IoE可以通过人工智能技术的进步获得安全、可靠和可再生的能源服务。

LLM Guided Inductive Inference for Solving Compositional Problems

  • paper_url: http://arxiv.org/abs/2309.11688
  • repo_url: None
  • paper_authors: Abhigya Sodani, Lauren Moos, Matthew Mirman
  • for: 解决大语言模型(LLM)在问答任务中表现出色,但是它们的表现受限于问题中不包含在模型训练数据中的知识,需要通过直接观察或与实际世界交互来获得。
  • methods: 我们提出了一种方法,即 Recursion based extensible LLM(REBEL),它通过自动理解技术如动态规划和前进链接策略来处理开放世界、深度理解任务。REBEL使用自然语言描述来指定工具,并使用这些工具进行递归问题分解和外部工具使用。
  • results: 我们在一组需要深度嵌套使用外部工具的问题上示出了REBEL的能力,并在一个组合和对话性的 Setting中进行了证明。
    Abstract While large language models (LLMs) have demonstrated impressive performance in question-answering tasks, their performance is limited when the questions require knowledge that is not included in the model's training data and can only be acquired through direct observation or interaction with the real world. Existing methods decompose reasoning tasks through the use of modules invoked sequentially, limiting their ability to answer deep reasoning tasks. We introduce a method, Recursion based extensible LLM (REBEL), which handles open-world, deep reasoning tasks by employing automated reasoning techniques like dynamic planning and forward-chaining strategies. REBEL allows LLMs to reason via recursive problem decomposition and utilization of external tools. The tools that REBEL uses are specified only by natural language description. We further demonstrate REBEL capabilities on a set of problems that require a deeply nested use of external tools in a compositional and conversational setting.
    摘要 大型语言模型(LLM)在问答任务中表现出色,但它们的表现受到训练数据中不包含的知识的限制。现有的方法通过运行模组来 decomposing 推理任务,限制它们 Answer deep reasoning tasks. We propose a method called Recursion based extensible LLM (REBEL), which handles open-world, deep reasoning tasks by using automated reasoning techniques such as dynamic planning and forward-chaining strategies. REBEL allows LLMs to reason through recursive problem decomposition and utilize external tools. The tools that REBEL uses are specified only by natural language description. We further demonstrate REBEL's capabilities on a set of problems that require a deeply nested use of external tools in a compositional and conversational setting.

Dr. FERMI: A Stochastic Distributionally Robust Fair Empirical Risk Minimization Framework

  • paper_url: http://arxiv.org/abs/2309.11682
  • repo_url: None
  • paper_authors: Sina Baharlouei, Meisam Razaviyayn
  • for: This paper aims to address the issue of fair machine learning models behaving unfairly on test data due to distribution shifts.
  • methods: The proposed method is based on distributionally robust optimization under $L_p$ norm uncertainty sets with respect to the Exponential Renyi Mutual Information (ERMI) as the measure of fairness violation. The method does not require knowledge of the causal graph and can be implemented in a stochastic fashion.
  • results: The proposed framework has been evaluated through extensive experiments on real datasets consisting of distribution shifts, and the results show that it performs well in terms of fairness and efficiency.
    Abstract While training fair machine learning models has been studied extensively in recent years, most developed methods rely on the assumption that the training and test data have similar distributions. In the presence of distribution shifts, fair models may behave unfairly on test data. There have been some developments for fair learning robust to distribution shifts to address this shortcoming. However, most proposed solutions are based on the assumption of having access to the causal graph describing the interaction of different features. Moreover, existing algorithms require full access to data and cannot be used when small batches are used (stochastic/batch implementation). This paper proposes the first stochastic distributionally robust fairness framework with convergence guarantees that do not require knowledge of the causal graph. More specifically, we formulate the fair inference in the presence of the distribution shift as a distributionally robust optimization problem under $L_p$ norm uncertainty sets with respect to the Exponential Renyi Mutual Information (ERMI) as the measure of fairness violation. We then discuss how the proposed method can be implemented in a stochastic fashion. We have evaluated the presented framework's performance and efficiency through extensive experiments on real datasets consisting of distribution shifts.
    摘要 traditional machine learning models have been extensively studied in recent years, but most of these methods rely on the assumption that the training and test data have similar distributions. However, in the presence of distribution shifts, fair models may behave unfairly on test data. To address this shortcoming, there have been some developments in fair learning that are robust to distribution shifts, but these methods are based on the assumption of having access to the causal graph describing the interaction of different features. Moreover, existing algorithms require full access to data and cannot be used when small batches are used (stochastic/batch implementation). This paper proposes the first stochastic distributionally robust fairness framework with convergence guarantees that do not require knowledge of the causal graph. Specifically, we formulate the fair inference in the presence of distribution shift as a distributionally robust optimization problem under $L_p$ norm uncertainty sets with respect to the Exponential Renyi Mutual Information (ERMI) as the measure of fairness violation. We then discuss how the proposed method can be implemented in a stochastic fashion. We have evaluated the presented framework's performance and efficiency through extensive experiments on real datasets consisting of distribution shifts.Here's the translation in Traditional Chinese:传统机器学习模型在最近的年份已经得到了广泛的研究,但大多数这些方法假设训练和测试数据的分布相似。然而,在分布shift情况下,公平的模型可能会在测试数据上不公平。为了解决这问题,有些开发了不同的公平学习方法,但这些方法假设有存在 causal graph 描述不同特征之间的互动。此外,现有的算法需要完整的数据存取,并且无法在小批量中使用(stochastic/batch实现)。本文提出了首个可靠的分布robust公平性框架,不需要知道 causal graph。具体来说,我们将 fair inference 在分布shift情况下形式化为分布robust优化问题,并使用 $L_p$ нор uncertainty set 来度量公平违反。我们然后讨论了如何实现这个方法在抽象的方式上。我们通过实际的实验,评估了提出的框架的性能和效率,以及实际应用中的可靠性。

Federated Learning with Neural Graphical Models

  • paper_url: http://arxiv.org/abs/2309.11680
  • repo_url: https://github.com/Aryia-Behroziuan/neurons
  • paper_authors: Urszula Chajewska, Harsh Shrivastava
  • for: 该论文旨在创建基于专有数据的模型,以便多个客户保留专有数据控制权,同时通过共享资源提高模型准确性。
  • methods: 该论文提出了一种基于神经网络的联邦学习框架(FedNGMs),使用 probabilistic graphical models(NGMs)学习复杂的输入特征之间的非线性关系。
  • results: 该论文的 FedNGMs 框架可以避免 neuron matching 框架如 Federated Matched Averaging 的缺点,并且可以适应数据不均衡、多个参与者和limited communication bandwidth 等问题。
    Abstract Federated Learning (FL) addresses the need to create models based on proprietary data in such a way that multiple clients retain exclusive control over their data, while all benefit from improved model accuracy due to pooled resources. Recently proposed Neural Graphical Models (NGMs) are Probabilistic Graphical models that utilize the expressive power of neural networks to learn complex non-linear dependencies between the input features. They learn to capture the underlying data distribution and have efficient algorithms for inference and sampling. We develop a FL framework which maintains a global NGM model that learns the averaged information from the local NGM models while keeping the training data within the client's environment. Our design, FedNGMs, avoids the pitfalls and shortcomings of neuron matching frameworks like Federated Matched Averaging that suffers from model parameter explosion. Our global model size remains constant throughout the process. In the cases where clients have local variables that are not part of the combined global distribution, we propose a `Stitching' algorithm, which personalizes the global NGM models by merging the additional variables using the client's data. FedNGM is robust to data heterogeneity, large number of participants, and limited communication bandwidth.
    摘要 Federated Learning (FL) 解决了基于专有数据的模型创建的需求,以便多个客户端保留专有数据控制权,而同时各自受益于共享资源的提高模型精度。最近提出的神经图模型(NGM)是一种概率图模型,利用神经网络的表达能力来学习输入特征之间的复杂非线性关系。它们学习下面数据分布,并有效的推理和采样算法。我们开发了一个基于FL的框架,称之为FedNGMs,该框架在客户端环境中保持global NGM模型,该模型学习客户端的local NGM模型中的均值信息,而不需要将训练数据传输到客户端。我们的设计避免了神经网络匹配框架如联邦匹配平均的缺点,例如模型参数爆炸。我们的全球模型大小在训练过程中保持不变。在客户端有本地变量,这些变量不是全局共享的共同分布中的一部分时,我们提议使用"缝合"算法,将这些变量与全局NGM模型进行个性化结合。FedNGM具有对数据不一致、大量参与者和有限通信带宽的 Robustness。

Generative AI in Mafia-like Game Simulation

  • paper_url: http://arxiv.org/abs/2309.11672
  • repo_url: https://github.com/MunyeongKim/Gen-AI-in-Mafia-like-Game
  • paper_authors: Munyeong Kim, Sungsu Kim
  • for: 这项研究探讨了生成式人工智能模型在角色扮演 simulations 中的可能性和潜力,以游戏 Spyfall 为例。
  • methods: 研究使用 GPT-4 的高级功能,通过对游戏场景的理解、决策和互动进行了比较分析,以证明模型在这些方面的潜力。
  • results: 研究发现,GPT-4 在游戏环境中的适应性有显著提高,能够更好地提问和发表人类化的回答。然而,模型在骗取和预测对手行动方面存在限制。研究还讨论了游戏开发、财政限制和非语言限制的问题。结果表明,虽然 GPT-4 表现出了较早模型的进步,但还有更多的发展空间,尤其是在塑造更人类化的 AI 模型。
    Abstract In this research, we explore the efficacy and potential of Generative AI models, specifically focusing on their application in role-playing simulations exemplified through Spyfall, a renowned mafia-style game. By leveraging GPT-4's advanced capabilities, the study aimed to showcase the model's potential in understanding, decision-making, and interaction during game scenarios. Comparative analyses between GPT-4 and its predecessor, GPT-3.5-turbo, demonstrated GPT-4's enhanced adaptability to the game environment, with significant improvements in posing relevant questions and forming human-like responses. However, challenges such as the model;s limitations in bluffing and predicting opponent moves emerged. Reflections on game development, financial constraints, and non-verbal limitations of the study were also discussed. The findings suggest that while GPT-4 exhibits promising advancements over earlier models, there remains potential for further development, especially in instilling more human-like attributes in AI.
    摘要 在这个研究中,我们探索了生成AI模型的效果和潜力,特别是在游戏角色扮演 simulations中的应用。通过利用GPT-4的高级功能,研究旨在表明模型在游戏场景中的理解、决策和互动的潜力。对比GPT-4和其前一代GPT-3.5-turbo,研究发现GPT-4在游戏环境中的适应性得到了显著提升,特别是在提问和表达人类化的问题方面。然而,模型在谎言和预测对手行动方面存在限制。研究还讨论了游戏开发、财务限制和非语言限制的问题。研究结果表明,虽然GPT-4在前一代模型之上具有显著的进步,但还有可能进一步发展,尤其是在具备更多人类特征的AI方面。

“It’s a Fair Game’’, or Is It? Examining How Users Navigate Disclosure Risks and Benefits When Using LLM-Based Conversational Agents

  • paper_url: http://arxiv.org/abs/2309.11653
  • repo_url: None
  • paper_authors: Zhiping Zhang, Michelle Jia, Hao-Ping, Lee, Bingsheng Yao, Sauvik Das, Ada Lerner, Dakuo Wang, Tianshi Li
  • for: 本研究旨在帮助建立优先考虑用户隐私的大语言模型(LLM)基于对话代理(CA),以解决现有研究主要集中在模型方面,忽略用户视角的问题。
  • methods: 本研究通过分析实际的ChatGPT对话和对19名LLM基于CA用户进行semi结构化采访,发现用户在使用LLM基于CA时经常面临privacy、utilities和便利性之间的权衡决策。
  • results: 研究发现用户的错误心理模型和系统设计中的黑暗 Patterns限制了他们对隐私风险的认识和理解,同时人工智能化的交互使用者更容易对自己的敏感信息进行披露,使用者在决策中受到增加的困难。
    Abstract The widespread use of Large Language Model (LLM)-based conversational agents (CAs), especially in high-stakes domains, raises many privacy concerns. Building ethical LLM-based CAs that respect user privacy requires an in-depth understanding of the privacy risks that concern users the most. However, existing research, primarily model-centered, does not provide insight into users' perspectives. To bridge this gap, we analyzed sensitive disclosures in real-world ChatGPT conversations and conducted semi-structured interviews with 19 LLM-based CA users. We found that users are constantly faced with trade-offs between privacy, utility, and convenience when using LLM-based CAs. However, users' erroneous mental models and the dark patterns in system design limited their awareness and comprehension of the privacy risks. Additionally, the human-like interactions encouraged more sensitive disclosures, which complicated users' ability to navigate the trade-offs. We discuss practical design guidelines and the needs for paradigmatic shifts to protect the privacy of LLM-based CA users.
    摘要 广泛使用大语言模型(LLM)基于对话代理(CA),特别在高风险领域,引发了许多隐私问题。建立尊重用户隐私的LLM基于CA需要深入了解用户关心的隐私风险。然而,现有研究主要关注模型,未能提供用户视角的深入理解。为了补强这个差距,我们分析了实际的ChatGPT对话中的敏感泄露,并进行了19名LLM基于CA用户的semi结构化采访。我们发现,用户在使用LLM基于CA时经常面临privacy、功能和便利性之间的权衡。然而,用户的错误的认知模型和系统设计中的黑暗Patterns限制了他们对隐私风险的认识和理解。此外,人类化的互动更加鼓励用户提供更多的敏感信息,使用户更难avigate权衡。我们讨论了实用的设计指南和保护LLM基于CA用户隐私的需求。

Orbital AI-based Autonomous Refuelling Solution

  • paper_url: http://arxiv.org/abs/2309.11648
  • repo_url: None
  • paper_authors: Duarte Rondao, Lei He, Nabil Aouf
  • for: 这篇论文旨在探讨使用摄像头进行太空对接和轨道服务(OOS),并且使用人工智能(AI)来将摄像头变成主要感知器。
  • methods: 这篇论文使用了许多 convolutional neural network(CNN)Backbone架构, benchmarked on synthetically generated docking manoeuvres with the International Space Station(ISS),以获得position和态度估算。
  • results: 这篇论文的结果显示,使用AI可以将relative navigation solution扩展到多种enario,例如targets或照明条件,并且可以实现position和态度估算的高精度。实际上,该方法可以大大减少了需要的工程师时间和资源。
    Abstract Cameras are rapidly becoming the choice for on-board sensors towards space rendezvous due to their small form factor and inexpensive power, mass, and volume costs. When it comes to docking, however, they typically serve a secondary role, whereas the main work is done by active sensors such as lidar. This paper documents the development of a proposed AI-based (artificial intelligence) navigation algorithm intending to mature the use of on-board visible wavelength cameras as a main sensor for docking and on-orbit servicing (OOS), reducing the dependency on lidar and greatly reducing costs. Specifically, the use of AI enables the expansion of the relative navigation solution towards multiple classes of scenarios, e.g., in terms of targets or illumination conditions, which would otherwise have to be crafted on a case-by-case manner using classical image processing methods. Multiple convolutional neural network (CNN) backbone architectures are benchmarked on synthetically generated data of docking manoeuvres with the International Space Station (ISS), achieving position and attitude estimates close to 1% range-normalised and 1 deg, respectively. The integration of the solution with a physical prototype of the refuelling mechanism is validated in laboratory using a robotic arm to simulate a berthing procedure.
    摘要 随着航天器的发展,镜头在航天器上的应用也在不断扩大。镜头的小型化和低功耗、质量和体积成本使其成为航天器上的首选感知器。然而,在协 docking 过程中,镜头通常扮演着次要角色,主要工作由活动感知器 such as lidar 完成。这篇论文描述了一种基于人工智能(AI)的导航算法的开发,旨在通过使用镜头来提高协 docking 和空间服务(OOS)中的精度和可靠性。使用 AI 可以扩展相对导航解决方案到多种场景,如目标或照明条件,而这些场景之前只能通过经典图像处理方法来手动设计。本文使用多种卷积神经网络(CNN)后处理器,对人工生成的协 docking 演示数据进行了测试,实现了 Position 和 Attitude 估计的准确率接近 1% 范围内和 1 度。此外,将解决方案与实际储存机制的物理 прототип结合,在实验室中使用 робо臂模拟协 docking 过程进行验证。

Attentive VQ-VAE

  • paper_url: http://arxiv.org/abs/2309.11641
  • repo_url: None
  • paper_authors: Mariano Rivera, Angello Hoyos
  • for: 提高 VQVAE 模型的能力,保持实用参数水平
  • methods: integrate Attentive Residual Encoder (AREN) 和 Residual Pixel Attention layer,使用多级编码器,并采用内部自注意力机制来有效地捕捉和利用 Contextual information
  • results: 实验结果表明,提案的修改可以明显提高数据表示和生成能力,使 VQVAEs 更适合各种应用。
    Abstract We present a novel approach to enhance the capabilities of VQVAE models through the integration of an Attentive Residual Encoder (AREN) and a Residual Pixel Attention layer. The objective of our research is to improve the performance of VQVAE while maintaining practical parameter levels. The AREN encoder is designed to operate effectively at multiple levels, accommodating diverse architectural complexities. The key innovation is the integration of an inter-pixel auto-attention mechanism into the AREN encoder. This approach allows us to efficiently capture and utilize contextual information across latent vectors. Additionally, our models uses additional encoding levels to further enhance the model's representational power. Our attention layer employs a minimal parameter approach, ensuring that latent vectors are modified only when pertinent information from other pixels is available. Experimental results demonstrate that our proposed modifications lead to significant improvements in data representation and generation, making VQVAEs even more suitable for a wide range of applications.
    摘要 我们提出了一种新的方法,通过结合Attentive Residual Encoder(AREN)和Residual Pixel Attention层,以提高VQVAE模型的能力。我们的研究目标是提高VQVAE表现,同时保持实际参数水平。AREN编码器设计可以在多个层次上运行,适应不同的建筑复杂性。我们的关键创新是将Inter-pixel自动注意机制integrated into AREN编码器。这种方法使得我们能够效率地捕捉并利用 latent vector中的上下文信息。此外,我们的模型还使用了多个编码层,以进一步增强模型的表达力。我们的注意层采用了最小参数的方法,确保latent vector只有当其他像素中有pertinent information时才会被修改。实验结果表明,我们的修改导致了数据表示和生成的显著改进,使VQVAEs更适合各种应用。

A survey on the semantics of sequential patterns with negation

  • paper_url: http://arxiv.org/abs/2309.11638
  • repo_url: None
  • paper_authors: Thomas Guyet
  • for: 本研究的目的是探讨用户对逻辑不同的时间序列模式具有何种直观性?
  • methods: 本研究使用了一份问卷来探讨用户对不同 semantics 的直观性。
  • results: 研究发现用户对两种 semantics 具有直观性,但这两种 semantics 并不与现有的主流算法 semantics 一致。因此,本研究提出了一些建议,以便更好地考虑这些差异。
    Abstract A sequential pattern with negation, or negative sequential pattern, takes the form of a sequential pattern for which the negation symbol may be used in front of some of the pattern's itemsets. Intuitively, such a pattern occurs in a sequence if negated itemsets are absent in the sequence. Recent work has shown that different semantics can be attributed to these pattern forms, and that state-of-the-art algorithms do not extract the same sets of patterns. This raises the important question of the interpretability of sequential pattern with negation. In this study, our focus is on exploring how potential users perceive negation in sequential patterns. Our aim is to determine whether specific semantics are more "intuitive" than others and whether these align with the semantics employed by one or more state-of-the-art algorithms. To achieve this, we designed a questionnaire to reveal the semantics' intuition of each user. This article presents both the design of the questionnaire and an in-depth analysis of the 124 responses obtained. The outcomes indicate that two of the semantics are predominantly intuitive; however, neither of them aligns with the semantics of the primary state-of-the-art algorithms. As a result, we provide recommendations to account for this disparity in the conclusions drawn.
    摘要 一种顺序模式 WITH negation,或负顺序模式,的形式是一种顺序模式,其中可以在一些模式itemset前面使用否定符。Intuitively,这种模式在序列中出现,当负否定itemset缺失在序列中。 recent work has shown that different semantics can be attributed to these pattern forms, and that state-of-the-art algorithms do not extract the same sets of patterns. This raises the important question of the interpretability of sequential pattern with negation. In this study, our focus is on exploring how potential users perceive negation in sequential patterns. Our aim is to determine whether specific semantics are more "intuitive" than others and whether these align with the semantics employed by one or more state-of-the-art algorithms. To achieve this, we designed a questionnaire to reveal the semantics' intuition of each user. This article presents both the design of the questionnaire and an in-depth analysis of the 124 responses obtained. The outcomes indicate that two of the semantics are predominantly intuitive; however, neither of them aligns with the semantics of the primary state-of-the-art algorithms. As a result, we provide recommendations to account for this disparity in the conclusions drawn.Note: The word "WITH" in the original text is not translated as it is not a word in Simplified Chinese. Instead, the phrase "顺序模式 WITH negation" is translated as "顺序模式 WITH 否定" (sequential pattern with negation).

Cloud-Based Hierarchical Imitation Learning for Scalable Transfer of Construction Skills from Human Workers to Assisting Robots

  • paper_url: http://arxiv.org/abs/2309.11619
  • repo_url: None
  • paper_authors: Hongrui Yu, Vineet R. Kamat, Carol C. Menassa
  • for: 这个研究旨在将建筑工程中的重重和physically-demanding任务交给机器人,以降低人工伤害。
  • methods: 这个研究使用Imitation Learning(IL)技术将职人的手艺技能转移到机器人身上,以成功委托建筑工程任务和获得高品质机器人制成的成果。
  • results: 这个研究提出了一个具有实验学习(HIL)模型和云 robotics技术的虚拟示范框架,可以帮助将职人的手艺技能转移到机器人身上,并且可以重复使用这些示范,以减少人工示范的需求。这个框架可以帮助提高建筑工程中的雇员多样性和教育背景。
    Abstract Assigning repetitive and physically-demanding construction tasks to robots can alleviate human workers's exposure to occupational injuries. Transferring necessary dexterous and adaptive artisanal construction craft skills from workers to robots is crucial for the successful delegation of construction tasks and achieving high-quality robot-constructed work. Predefined motion planning scripts tend to generate rigid and collision-prone robotic behaviors in unstructured construction site environments. In contrast, Imitation Learning (IL) offers a more robust and flexible skill transfer scheme. However, the majority of IL algorithms rely on human workers to repeatedly demonstrate task performance at full scale, which can be counterproductive and infeasible in the case of construction work. To address this concern, this paper proposes an immersive, cloud robotics-based virtual demonstration framework that serves two primary purposes. First, it digitalizes the demonstration process, eliminating the need for repetitive physical manipulation of heavy construction objects. Second, it employs a federated collection of reusable demonstrations that are transferable for similar tasks in the future and can thus reduce the requirement for repetitive illustration of tasks by human agents. Additionally, to enhance the trustworthiness, explainability, and ethical soundness of the robot training, this framework utilizes a Hierarchical Imitation Learning (HIL) model to decompose human manipulation skills into sequential and reactive sub-skills. These two layers of skills are represented by deep generative models, enabling adaptive control of robot actions. By delegating the physical strains of construction work to human-trained robots, this framework promotes the inclusion of workers with diverse physical capabilities and educational backgrounds within the construction industry.
    摘要 <>发现给定文本的简化中文翻译。<>委托 repetitive 和 physically-demanding 的建筑任务给机器人,可以减轻人工工作者的职业危害风险。将必要的灵活和适应的艺术工艺技能从工作者传递到机器人是成功委托建筑任务和获得高质量机器人构建的关键。预定的运动规划脚本通常在无结构的建筑现场环境中生成僵化和碰撞的机器人行为。相比之下,学习模式(IL)提供了更加稳定和灵活的技能传递方案。然而,大多数 IL 算法需要人工工作者重复地展示任务完成,这可能是不可能的和不可预期的在建筑工作中。为解决这个问题,本文提出了一个 immerse 云 robotics 基础设施,它拥有以下两个主要目的:首先,它将示例过程数字化,从而消除重复地Physical 执行重构建筑物品的需要。其次,它使用一个 Federated 集合的可重用示例,以便在未来对类似任务进行快速协调。此外,为了增强机器人培训的可靠性、可解释性和伦理合理性,该框架使用 Hierarchical Imitation Learning(HIL)模型,将人类抓取技能 decomposed 成Sequential 和 reactive 两层。这两层技能被表示为深度生成模型,以便在机器人行为中进行适应控制。通过委托建筑工作给人类培训的机器人,这个框架推广了建筑业中不同的身体能力和教育背景的人员的包容性。

Hand Gesture Recognition with Two Stage Approach Using Transfer Learning and Deep Ensemble Learning

  • paper_url: http://arxiv.org/abs/2309.11610
  • repo_url: None
  • paper_authors: Serkan Savaş, Atilla Ergüzen
  • for: 这个研究的目的是对人工智能与 Computing 进行改进,以提高其性能。
  • methods: 这个研究使用了深度学习技术,特别是卷积神经网络,并将其应用于识别手势。
  • results: 研究获得了98.88%的准确率,这表明了深度ensemble学习技术在人工智能与Computing 中的应用潜力。
    Abstract Human-Computer Interaction (HCI) has been the subject of research for many years, and recent studies have focused on improving its performance through various techniques. In the past decade, deep learning studies have shown high performance in various research areas, leading researchers to explore their application to HCI. Convolutional neural networks can be used to recognize hand gestures from images using deep architectures. In this study, we evaluated pre-trained high-performance deep architectures on the HG14 dataset, which consists of 14 different hand gesture classes. Among 22 different models, versions of the VGGNet and MobileNet models attained the highest accuracy rates. Specifically, the VGG16 and VGG19 models achieved accuracy rates of 94.64% and 94.36%, respectively, while the MobileNet and MobileNetV2 models achieved accuracy rates of 96.79% and 94.43%, respectively. We performed hand gesture recognition on the dataset using an ensemble learning technique, which combined the four most successful models. By utilizing these models as base learners and applying the Dirichlet ensemble technique, we achieved an accuracy rate of 98.88%. These results demonstrate the effectiveness of the deep ensemble learning technique for HCI and its potential applications in areas such as augmented reality, virtual reality, and game technologies.
    摘要 人机交互(HCI)已经是多年的研究主题,而最近的研究强调提高其性能通过不同的技术。过去十年,深度学习研究在各个领域表现出色,导致研究者想要把它们应用于HCI。通过深度神经网络识别手势图像,可以使用深度建筑。本研究在HG14数据集上评估了22种不同的模型,其中包括VGGNet和MobileNet模型的多种版本。结果发现,VGG16和VGG19模型的准确率分别为94.64%和94.36%,而MobileNet和MobileNetV2模型的准确率分别为96.79%和94.43%。我们使用了ensemble学习技术,将这些模型作为基础学习器,并应用Dirichlet ensemble技术,达到了98.88%的准确率。这些结果表明深度ensemble学习技术在HCI中的效iveness,并在虚拟现实、扩展现实和游戏技术等领域有潜力应用。

Dataset Factory: A Toolchain For Generative Computer Vision Datasets

  • paper_url: http://arxiv.org/abs/2309.11608
  • repo_url: None
  • paper_authors: Daniel Kharitonov, Ryan Turner
  • for: 该论文旨在解决生成AI工作流程中数据处理问题,提高数据处理效率和可重用性。
  • methods: 该论文提出了一种“数据工厂”方法,将样本存储和处理分离于元数据,并允许数据驱动操作进行批处理。
  • results: 该论文的实验结果表明,使用“数据工厂”方法可以提高生成AI工作流程的数据处理效率和可重用性。
    Abstract Generative AI workflows heavily rely on data-centric tasks - such as filtering samples by annotation fields, vector distances, or scores produced by custom classifiers. At the same time, computer vision datasets are quickly approaching petabyte volumes, rendering data wrangling difficult. In addition, the iterative nature of data preparation necessitates robust dataset sharing and versioning mechanisms, both of which are hard to implement ad-hoc. To solve these challenges, we propose a "dataset factory" approach that separates the storage and processing of samples from metadata and enables data-centric operations at scale for machine learning teams and individual researchers.
    摘要

CATS: Conditional Adversarial Trajectory Synthesis for Privacy-Preserving Trajectory Data Publication Using Deep Learning Approaches

  • paper_url: http://arxiv.org/abs/2309.11587
  • repo_url: https://github.com/geods/cats
  • paper_authors: Jinmeng Rao, Song Gao, Sijia Zhu
  • for: 本研究使用深度学习技术保护人员流动数据隐私,并生成高质量的人员流动数据。
  • methods: 本研究使用K-anonymity保证人员流动数据的分布水平隐私,并使用条件对抗训练、人流环境学习和相邻轨迹点匹配来重建轨迹 topology。
  • results: 实验结果表明,我们的方法在隐私保护、空间时间特征保持和下游实用性方面比基线方法表现更好,为人流动数据隐私研究Using生成AI技术和数据伦理问题提供新的视角。
    Abstract The prevalence of ubiquitous location-aware devices and mobile Internet enables us to collect massive individual-level trajectory dataset from users. Such trajectory big data bring new opportunities to human mobility research but also raise public concerns with regard to location privacy. In this work, we present the Conditional Adversarial Trajectory Synthesis (CATS), a deep-learning-based GeoAI methodological framework for privacy-preserving trajectory data generation and publication. CATS applies K-anonymity to the underlying spatiotemporal distributions of human movements, which provides a distributional-level strong privacy guarantee. By leveraging conditional adversarial training on K-anonymized human mobility matrices, trajectory global context learning using the attention-based mechanism, and recurrent bipartite graph matching of adjacent trajectory points, CATS is able to reconstruct trajectory topology from conditionally sampled locations and generate high-quality individual-level synthetic trajectory data, which can serve as supplements or alternatives to raw data for privacy-preserving trajectory data publication. The experiment results on over 90k GPS trajectories show that our method has a better performance in privacy preservation, spatiotemporal characteristic preservation, and downstream utility compared with baseline methods, which brings new insights into privacy-preserving human mobility research using generative AI techniques and explores data ethics issues in GIScience.
    摘要 “现代社会中普遍存在 ubique 位置意识设备和移动互联网,我们可以从用户收集巨大的个人化轨迹数据集。这些轨迹大数据为人类活动研究带来了新的机会,但也引起了人们关于位置隐私的担忧。在这种情况下,我们提出了 Conditional Adversarial Trajectory Synthesis(CATS),一种基于深度学习的GeoAI方法框架,用于隐私保护的轨迹数据生成和发布。CATS通过对人类活动的下层空间时间分布进行K-anonimity处理,提供了强的隐私保证。通过使用受条件 adversarial 训练的人类活动矩阵,沿着邻近轨迹点的循环双向图匹配,以及使用注意力机制进行轨迹全球上下文学习,CATS可以从受条件采样的位置中重建轨迹拓扑,并生成高质量的个人化 sintetic 轨迹数据,可以作为隐私保护下的轨迹数据发布的补充或替代。实验结果表明,我们的方法在隐私保护、空间时间特征保持和下游实用性方面表现更好于基eline方法,这带来了新的思路 для隐私保护的人类活动研究,并探讨了GIScience中的数据伦理问题。”

Distilling Adversarial Prompts from Safety Benchmarks: Report for the Adversarial Nibbler Challenge

  • paper_url: http://arxiv.org/abs/2309.11575
  • repo_url: None
  • paper_authors: Manuel Brack, Patrick Schramowski, Kristian Kersting
  • for: 本研究旨在挑战现有的生成图像模型安全性问题,通过分析潜在的攻击输入来检测模型的脆弱性。
  • methods: 研究人员使用了现有的安全准则来生成大量的攻击输入,并对这些输入和相应的图像进行分析,以探讨当前生成图像模型中的安全问题。
  • results: 研究人员发现了许多潜在的安全问题,包括输入筛选器的脆弱性和系统性的安全问题,这些问题可能会影响生成图像模型的安全性。
    Abstract Text-conditioned image generation models have recently achieved astonishing image quality and alignment results. Consequently, they are employed in a fast-growing number of applications. Since they are highly data-driven, relying on billion-sized datasets randomly scraped from the web, they also produce unsafe content. As a contribution to the Adversarial Nibbler challenge, we distill a large set of over 1,000 potential adversarial inputs from existing safety benchmarks. Our analysis of the gathered prompts and corresponding images demonstrates the fragility of input filters and provides further insights into systematic safety issues in current generative image models.
    摘要

BTLM-3B-8K: 7B Parameter Performance in a 3B Parameter Model

  • paper_url: http://arxiv.org/abs/2309.11568
  • repo_url: None
  • paper_authors: Nolan Dey, Daria Soboleva, Faisal Al-Khateeb, Bowen Yang, Ribhu Pathria, Hemant Khachane, Shaheer Muhammad, Zhiming, Chen, Robert Myers, Jacob Robert Steeves, Natalia Vassilieva, Marvin Tom, Joel Hestness
  • for: 这个论文的目的是介绍一种新的语言模型,即BTLM-3B-8K,该模型在下游任务中表现出色,并且具有较小的参数大小和计算资源占用。
  • methods: 该模型使用了一些现有的技术,包括ALiBi位嵌入和SwiGLU非线性函数,并且通过优化Hyperparameter和学习环境来提高模型的性能。
  • results: 相比其他3B参数模型,BTLM-3B-8K在下游任务中表现出2-5.5%的提升,而且在长上下文任务中也表现出优秀的表现,比如MPT-7B-8K和XGen-7B-8K。此外,BTLM-3B-8K的计算资源占用相对较少,只需3GB的内存和2.5倍的计算资源。
    Abstract We introduce the Bittensor Language Model, called "BTLM-3B-8K", a new state-of-the-art 3 billion parameter open-source language model. BTLM-3B-8K was trained on 627B tokens from the SlimPajama dataset with a mixture of 2,048 and 8,192 context lengths. BTLM-3B-8K outperforms all existing 3B parameter models by 2-5.5% across downstream tasks. BTLM-3B-8K is even competitive with some 7B parameter models. Additionally, BTLM-3B-8K provides excellent long context performance, outperforming MPT-7B-8K and XGen-7B-8K on tasks up to 8,192 context length. We trained the model on a cleaned and deduplicated SlimPajama dataset; aggressively tuned the \textmu P hyperparameters and schedule; used ALiBi position embeddings; and adopted the SwiGLU nonlinearity. On Hugging Face, the most popular models have 7B parameters, indicating that users prefer the quality-size ratio of 7B models. Compacting the 7B parameter model to one with 3B parameters, with little performance impact, is an important milestone. BTLM-3B-8K needs only 3GB of memory with 4-bit precision and takes 2.5x less inference compute than 7B models, helping to open up access to a powerful language model on mobile and edge devices. BTLM-3B-8K is available under an Apache 2.0 license on Hugging Face: https://huggingface.co/cerebras/btlm-3b-8k-base.
    摘要 我们介绍“BTLM-3B-8K”语言模型,是一个新的州际之冠开源语言模型,拥有30亿个参数。BTLM-3B-8K在627亿个Token的SlimPajama数据集上进行训练,并使用2048和8192的上下文长度混合训练。相比于现有的30亿个参数模型,BTLM-3B-8K在下游任务中表现出2-5.5%的提升。此外,BTLM-3B-8K在长上下文任务中表现出色,比MPT-7B-8K和XGen-7B-8K更高。我们在精简和删除了SlimPajama数据集上训练这个模型,并严格地调整了μP参数和时间表。此外,我们还使用了ALiBi位嵌入和SwiGLU非线性。在Hugging Face上,最受欢迎的模型都有70亿个参数,这表明用户对70亿个参数模型的质量-大小比例感兴趣。将70亿个参数模型缩减到30亿个参数,几乎没有影响性能,是一个重要的里程碑。BTLM-3B-8K只需要3GB的内存和4位准确,在测试过程中耗用2.5倍的计算资源,帮助开辟了一个具有强大语言模型的门槛,并且可以在移动和边缘设备上运行。BTLM-3B-8K在Hugging Face上可以免费下载:https://huggingface.co/cerebras/btlm-3b-8k-base。

Limitations in odour recognition and generalisation in a neuromorphic olfactory circuit

  • paper_url: http://arxiv.org/abs/2309.11555
  • repo_url: None
  • paper_authors: Nik Dennler, André van Schaik, Michael Schmuker
  • for: 这篇论文旨在研究一种基于神经omorphic computing的芳香学习算法,以及该算法在识别气体芳香的能力。
  • methods: 该算法使用了一种基于芳香细胞网络的神经omorphic架构,并使用了一些硬件加速技术来加速计算。
  • results: 研究发现,该算法在识别不同气体芳香的能力较强,但是在重复 presentaion 的情况下,模型的泛化能力有限。此外,研究还发现了一些限制,导致部分结论需要进一步验证。
    Abstract Neuromorphic computing is one of the few current approaches that have the potential to significantly reduce power consumption in Machine Learning and Artificial Intelligence. Imam & Cleland presented an odour-learning algorithm that runs on a neuromorphic architecture and is inspired by circuits described in the mammalian olfactory bulb. They assess the algorithm's performance in "rapid online learning and identification" of gaseous odorants and odorless gases (short "gases") using a set of gas sensor recordings of different odour presentations and corrupting them by impulse noise. We replicated parts of the study and discovered limitations that affect some of the conclusions drawn. First, the dataset used suffers from sensor drift and a non-randomised measurement protocol, rendering it of limited use for odour identification benchmarks. Second, we found that the model is restricted in its ability to generalise over repeated presentations of the same gas. We demonstrate that the task the study refers to can be solved with a simple hash table approach, matching or exceeding the reported results in accuracy and runtime. Therefore, a validation of the model that goes beyond restoring a learned data sample remains to be shown, in particular its suitability to odour identification tasks.
    摘要

Chain-of-Verification Reduces Hallucination in Large Language Models

  • paper_url: http://arxiv.org/abs/2309.11495
  • repo_url: None
  • paper_authors: Shehzaad Dhuliawala, Mojtaba Komeili, Jing Xu, Roberta Raileanu, Xian Li, Asli Celikyilmaz, Jason Weston
  • for: 本研究旨在解决大语言模型中的假信息生成问题。
  • methods: 我们提出了链式验证(CoVe)方法,该方法首先(i)生成初始回答,然后(ii)规划验证问题,(iii)独立回答验证问题,并(iv)生成最终验证后的回答。
  • results: 我们在各种任务上(如Wikidata列表问题、关闭书MultiSpanQA和长文本生成)实验表明,CoVe可以减少假信息的发生。
    Abstract Generation of plausible yet incorrect factual information, termed hallucination, is an unsolved issue in large language models. We study the ability of language models to deliberate on the responses they give in order to correct their mistakes. We develop the Chain-of-Verification (CoVe) method whereby the model first (i) drafts an initial response; then (ii) plans verification questions to fact-check its draft; (iii) answers those questions independently so the answers are not biased by other responses; and (iv) generates its final verified response. In experiments, we show CoVe decreases hallucinations across a variety of tasks, from list-based questions from Wikidata, closed book MultiSpanQA and longform text generation.
    摘要 大型语言模型中的幻想(hallucination)问题仍未得到解决。我们研究语言模型是否可以对其回答进行检查和更正。我们开发了链式验证(Chain-of-Verification,CoVe)方法,它包括以下四个步骤:1. 模型首先提出一个初步答案(draft);2. 然后,模型计划一系列的验证问题,以验证其初步答案是否正确;3. 模型独立地回答这些验证问题,以避免受其他答案的影响;4. 最后,模型生成一个经验验证的答案。在实验中,我们发现CoVe可以在多种任务上减少幻想,包括基于Wikidata的列表问题、关闭书MultiSpanQA和长文本生成等。

Text2Reward: Automated Dense Reward Function Generation for Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2309.11489
  • repo_url: None
  • paper_authors: Tianbao Xie, Siheng Zhao, Chen Henry Wu, Yitao Liu, Qian Luo, Victor Zhong, Yanchao Yang, Tao Yu
  • for: 本研究旨在提供一种数据自由的游戏奖励函数生成框架,帮助解决现有RL中特殊知识或域数据的需求,从而降低开发成本。
  • methods: 本研究使用大语言模型(LLM)自动生成 dense reward functions,并将奖励函数转换为可执行的程序,以满足不同任务的需求。
  • results: 在两个机器人操作benchmark(ManiSkill2、MetaWorld)和两个mujoco的 locomotive环境中,使用生成的奖励函数让策略取得了13项17个任务的成功率和速度与专家写的奖励函数相当或更高,并且在六个新的 locomotive行为中取得了94%以上的成功率。此外,我们还证明了使用我们的方法在实际世界中部署的策略。最后,我们通过人工反馈来进一步改进策略的奖励函数。视频结果可以在https://text-to-reward.github.io查看。
    Abstract Designing reward functions is a longstanding challenge in reinforcement learning (RL); it requires specialized knowledge or domain data, leading to high costs for development. To address this, we introduce Text2Reward, a data-free framework that automates the generation of dense reward functions based on large language models (LLMs). Given a goal described in natural language, Text2Reward generates dense reward functions as an executable program grounded in a compact representation of the environment. Unlike inverse RL and recent work that uses LLMs to write sparse reward codes, Text2Reward produces interpretable, free-form dense reward codes that cover a wide range of tasks, utilize existing packages, and allow iterative refinement with human feedback. We evaluate Text2Reward on two robotic manipulation benchmarks (ManiSkill2, MetaWorld) and two locomotion environments of MuJoCo. On 13 of the 17 manipulation tasks, policies trained with generated reward codes achieve similar or better task success rates and convergence speed than expert-written reward codes. For locomotion tasks, our method learns six novel locomotion behaviors with a success rate exceeding 94%. Furthermore, we show that the policies trained in the simulator with our method can be deployed in the real world. Finally, Text2Reward further improves the policies by refining their reward functions with human feedback. Video results are available at https://text-to-reward.github.io
    摘要 ��utes2��ward是一种抽象的游戏机制,可以自动生成填充的奖励函数,不需要特殊的知识或域数据,从而降低开发成本。为解决这个问题,我们引入Text2Reward,一种数据自由框架,可以自动生成填充的奖励函数,基于大型自然语言模型(LLM)。给出一个用自然语言描述的目标,Text2Reward可以生成填充的奖励函数,作为可执行的程序,并将其与环境的减少表示相关联。不同于反向RL和最近的工作,使用LLM写稀疏奖励代码,Text2Reward生成的奖励代码可读性好,可以覆盖各种任务,使用现有包,并允许迭代反馈。我们在ManiSkill2和MetaWorld两个机器人 manipulate 测试环境中进行了评估,以及MuJoCo两个涂抹环境。在13个机器人 manipulate 任务中,使用生成的奖励函数训练的策略的任务成功率和速度与专家写的奖励函数相当或更好。此外,我们的方法学习了6种新的行走行为,其成功率超过94%。最后,我们表明使用我们的方法在实际世界中训练的策略可以在真实世界中部署。 Text2Reward 还可以通过人类反馈来进一步改进策略的奖励函数。视频结果可以在 查看。

Fictional Worlds, Real Connections: Developing Community Storytelling Social Chatbots through LLMs

  • paper_url: http://arxiv.org/abs/2309.11478
  • repo_url: None
  • paper_authors: Yuqian Sun, Hanyi Wang, Pok Man Chan, Morteza Tabibi, Yan Zhang, Huan Lu, Yuheng Chen, Chang Hee Lee, Ali Asadipour
  • for: 这篇论文旨在开发社区中可以与人互动的社交虚拟助手(SC),通过故事的用途来增强社交互动的可信度和趣味性。
  • methods: 该论文使用了语言模型GPT-3驱动的故事社交虚拟助手(“David”和”Catherine”),并在在线游戏社区”DE (Alias)”上的Discord进行了评估。
  • results: 该研究结果表明,通过故事的使用可以增强社交虚拟助手在社区 setting中的参与度和可信度。
    Abstract We address the integration of storytelling and Large Language Models (LLMs) to develop engaging and believable Social Chatbots (SCs) in community settings. Motivated by the potential of fictional characters to enhance social interactions, we introduce Storytelling Social Chatbots (SSCs) and the concept of story engineering to transform fictional game characters into "live" social entities within player communities. Our story engineering process includes three steps: (1) Character and story creation, defining the SC's personality and worldview, (2) Presenting Live Stories to the Community, allowing the SC to recount challenges and seek suggestions, and (3) Communication with community members, enabling interaction between the SC and users. We employed the LLM GPT-3 to drive our SSC prototypes, "David" and "Catherine," and evaluated their performance in an online gaming community, "DE (Alias)," on Discord. Our mixed-method analysis, based on questionnaires (N=15) and interviews (N=8) with community members, reveals that storytelling significantly enhances the engagement and believability of SCs in community settings.
    摘要 我们研究将故事与大型自然语言模型(LLM)结合,以开发在社区中引人入来和 credible 的社交聊天机器人(SC)。我们被启发了虚构人物可以增强社交互动的潜力,因此我们引入了 Storytelling Social Chatbots(SSCs)和故事工程技术,将虚构游戏角色转化为社区中的 "live" 社交实体。我们的故事工程过程包括三个步骤:(1)人物和故事创作,定义 SC 的个性和观点,(2)向社区成员展示Live Story,让 SC 描述挑战和寻求建议,(3)与社区成员交流,允许 SC 与用户互动。我们使用 GPT-3 LLM 驱动我们的 SSC 原型 "David" 和 "Catherine",并在 Discord 上的在线游戏社区 "DE (Alias)" 进行了评估。我们的混合方法分析,基于问卷 (N=15) 和采访 (N=8) 的社区成员,表明故事在社区设置中可以显著提高 SC 的参与度和吸引力。

Multi-view Fuzzy Representation Learning with Rules based Model

  • paper_url: http://arxiv.org/abs/2309.11473
  • repo_url: None
  • paper_authors: Wei Zhang, Zhaohong Deng, Te Zhang, Kup-Sze Choi, Shitong Wang
  • for: 本文提出了一种新的多视图含义学习方法,用于解决多视图数据挖掘中的一些关键挑战。
  • methods: 本方法基于可解释的 Takagi-Sugeno-Kang (TSK) 杂化系统,通过两个方面实现多视图表示学习。首先,将多视图数据转换为高维杂化特征空间,同时同时挖掘共同视图信息和每个视图特有信息。其次,提出了基于 L_(2,1) 评估方法的新规范方法,以挖掘视图之间的一致信息,并保持数据的几何结构。
  • results: 对多个标准多视图数据集进行了广泛的实验 validate the superiority of the proposed method。
    Abstract Unsupervised multi-view representation learning has been extensively studied for mining multi-view data. However, some critical challenges remain. On the one hand, the existing methods cannot explore multi-view data comprehensively since they usually learn a common representation between views, given that multi-view data contains both the common information between views and the specific information within each view. On the other hand, to mine the nonlinear relationship between data, kernel or neural network methods are commonly used for multi-view representation learning. However, these methods are lacking in interpretability. To this end, this paper proposes a new multi-view fuzzy representation learning method based on the interpretable Takagi-Sugeno-Kang (TSK) fuzzy system (MVRL_FS). The method realizes multi-view representation learning from two aspects. First, multi-view data are transformed into a high-dimensional fuzzy feature space, while the common information between views and specific information of each view are explored simultaneously. Second, a new regularization method based on L_(2,1)-norm regression is proposed to mine the consistency information between views, while the geometric structure of the data is preserved through the Laplacian graph. Finally, extensive experiments on many benchmark multi-view datasets are conducted to validate the superiority of the proposed method.
    摘要 多视角表示学习已经广泛研究了多视角数据的挖掘。然而,有些关键挑战仍然存在。一方面,现有的方法不能全面探索多视角数据,因为它们通常学习多视角数据中的共同信息,而不是每个视角中的特定信息。另一方面,用于挖掘非线性关系的内核或神经网络方法通常缺乏可解释性。为此,本文提出了一种新的多视角杂化表示学习方法,基于可解释的 Takagi-Sugeno-Kang(TSK)杂化系统(MVRL_FS)。该方法在两个方面实现多视角表示学习。首先,多视角数据被转换成一个高维杂化特征空间,同时探索多视角数据中的共同信息和每个视角中的特定信息。其次,基于L_(2,1)-norm回归的新规则方法被提出,以挖掘视角之间的一致信息,保留数据的几何结构通过拉普拉斯图。最后,对许多标准多视角数据集进行了广泛的实验,以验证提议方法的超越性。

Multi-Label Takagi-Sugeno-Kang Fuzzy System

  • paper_url: http://arxiv.org/abs/2309.11469
  • repo_url: None
  • paper_authors: Qiongdan Lou, Zhaohong Deng, Zhiyong Xiao, Kup-Sze Choi, Shitong Wang
  • for: 提高多标签分类性能
  • methods: 基于多标签相关学习和多标签回归损失的多标签杜氏辛诺干式系统(ML-TSK FS)
  • results: 对12个多标签数据集进行实验,结果表明ML-TSK FS与现有方法相比,在各种评价指标中表现竞争力强,表明它可以有效地通过辛诺干式规则模型特性和特征标签关系,提高分类性能。
    Abstract Multi-label classification can effectively identify the relevant labels of an instance from a given set of labels. However,the modeling of the relationship between the features and the labels is critical to the classification performance. To this end, we propose a new multi-label classification method, called Multi-Label Takagi-Sugeno-Kang Fuzzy System (ML-TSK FS), to improve the classification performance. The structure of ML-TSK FS is designed using fuzzy rules to model the relationship between features and labels. The fuzzy system is trained by integrating fuzzy inference based multi-label correlation learning with multi-label regression loss. The proposed ML-TSK FS is evaluated experimentally on 12 benchmark multi-label datasets. 1 The results show that the performance of ML-TSK FS is competitive with existing methods in terms of various evaluation metrics, indicating that it is able to model the feature-label relationship effectively using fuzzy inference rules and enhances the classification performance.
    摘要 多标签分类可以有效地从给定的标签集中确定实例的相关标签。然而,模型特性和标签之间的关系是多标签分类性能的关键因素。为此,我们提出了一种新的多标签分类方法,即多标签多SK满足系统(ML-TSK FS),以提高分类性能。ML-TSK FS的结构采用规则来模型特性和标签之间的关系。这个规则是通过多态推理和多标签相互关系学习来训练的。我们对12个多标签数据集进行实验评估了ML-TSK FS的性能。结果表明,ML-TSK FS与现有方法相比,在不同的评价指标上具有竞争力,这表明它可以通过多态推理规则来有效地模型特性和标签之间的关系,提高分类性能。

AudioFool: Fast, Universal and synchronization-free Cross-Domain Attack on Speech Recognition

  • paper_url: http://arxiv.org/abs/2309.11462
  • repo_url: None
  • paper_authors: Mohamad Fakih, Rouwaida Kanj, Fadi Kurdahi, Mohammed E. Fouda
  • for: 防止自动话语识别系统受到敌对攻击,导致系统崩溃或损坏。
  • methods: 使用对数频域修改攻击,以确保攻击具有不同特性,例如不受同步调制影响和范围滤波影响。
  • results: 透过实验和分析,发现 modified frequency domain 攻击能够实现这些特性,并且在线上 keyword classification 任务中提供了高效的攻击方法。
    Abstract Automatic Speech Recognition systems have been shown to be vulnerable to adversarial attacks that manipulate the command executed on the device. Recent research has focused on exploring methods to create such attacks, however, some issues relating to Over-The-Air (OTA) attacks have not been properly addressed. In our work, we examine the needed properties of robust attacks compatible with the OTA model, and we design a method of generating attacks with arbitrary such desired properties, namely the invariance to synchronization, and the robustness to filtering: this allows a Denial-of-Service (DoS) attack against ASR systems. We achieve these characteristics by constructing attacks in a modified frequency domain through an inverse Fourier transform. We evaluate our method on standard keyword classification tasks and analyze it in OTA, and we analyze the properties of the cross-domain attacks to explain the efficiency of the approach.
    摘要 自动话语识别系统已经被证明容易受到敌意攻击,这些攻击可以控制设备上执行的命令。最近的研究主要关注于探索如何创建这些攻击,但是一些过空中攻击(OTA)问题尚未得到充分解决。在我们的工作中,我们分析了需要的抗性攻击的属性,并设计了生成攻击具有任意想要的属性的方法,包括不变性和过滤器的Robustness。我们通过对射Transformer来实现这些特性,并在标准关键词分类任务上评估了我们的方法。我们还分析了跨频域攻击的性质,以解释我们的方法的高效性。

Generative Agent-Based Modeling: Unveiling Social System Dynamics through Coupling Mechanistic Models with Generative Artificial Intelligence

  • paper_url: http://arxiv.org/abs/2309.11456
  • repo_url: None
  • paper_authors: Navid Ghaffarzadegan, Aritra Majumdar, Ross Williams, Niyousha Hosseinichimeh
  • for: 这篇论文探讨了使用生成人工智能建模社会系统的新机遇。
  • methods: 这些模型使用大语言模型如ChatGPT来表示人类决策行为在社会设置下。
  • results: 这篇论文提供了一个简单的社会规范传播模型,并对其 Results 进行了广泛的调查和敏感性分析。
    Abstract We discuss the emerging new opportunity for building feedback-rich computational models of social systems using generative artificial intelligence. Referred to as Generative Agent-Based Models (GABMs), such individual-level models utilize large language models such as ChatGPT to represent human decision-making in social settings. We provide a GABM case in which human behavior can be incorporated in simulation models by coupling a mechanistic model of human interactions with a pre-trained large language model. This is achieved by introducing a simple GABM of social norm diffusion in an organization. For educational purposes, the model is intentionally kept simple. We examine a wide range of scenarios and the sensitivity of the results to several changes in the prompt. We hope the article and the model serve as a guide for building useful diffusion models that include realistic human reasoning and decision-making.
    摘要 我们讨论新兴的机会:使用生成人工智能建构具有反馈丰富的社交系统模型。称为生成代理模型(GABM),这些个体级模型利用大量语言模型如ChatGPT来表示人类决策在社交设置中。我们提供一个GABM例子,将人类行为integrated到模拟模型中,通过与预训大量语言模型 Coupling 的方式。这是通过将社交norm传播模型简化为 Educational purposes 的方式。我们评估了广泛的情况,并评估了变量的敏感度。我们希望这篇文章和模型可以serve as a guide для建立包含现实人类思维和决策的传播模型。

Using deep learning to construct stochastic local search SAT solvers with performance bounds

  • paper_url: http://arxiv.org/abs/2309.11452
  • repo_url: https://github.com/porscheofficial/sls_sat_solving_with_deep_learning
  • paper_authors: Maximilian Kramer, Paul Boes
  • for: 这 paper 是关于 Boolean Satisfiability problem (SAT) 的研究,具体来说是使用 Graph Neural Networks (GNN) 训练 oracle,以提高 Stochastic Local Search (SLS) 算法的性能。
  • methods: 这 paper 使用了 GNN 训练 oracle,并将其应用于两种 SLS 算法上,以解决随机 SAT 实例。
  • results: 研究发现,通过使用 GNN 训练 oracle,SLS 算法的性能得到了明显提高,可以解决更难的 SAT 实例,并且可以在更少的步骤数下解决。
    Abstract The Boolean Satisfiability problem (SAT) is the most prototypical NP-complete problem and of great practical relevance. One important class of solvers for this problem are stochastic local search (SLS) algorithms that iteratively and randomly update a candidate assignment. Recent breakthrough results in theoretical computer science have established sufficient conditions under which SLS solvers are guaranteed to efficiently solve a SAT instance, provided they have access to suitable "oracles" that provide samples from an instance-specific distribution, exploiting an instance's local structure. Motivated by these results and the well established ability of neural networks to learn common structure in large datasets, in this work, we train oracles using Graph Neural Networks and evaluate them on two SLS solvers on random SAT instances of varying difficulty. We find that access to GNN-based oracles significantly boosts the performance of both solvers, allowing them, on average, to solve 17% more difficult instances (as measured by the ratio between clauses and variables), and to do so in 35% fewer steps, with improvements in the median number of steps of up to a factor of 8. As such, this work bridges formal results from theoretical computer science and practically motivated research on deep learning for constraint satisfaction problems and establishes the promise of purpose-trained SAT solvers with performance guarantees.
    摘要 布尔满意性问题(SAT)是NP完备问题的最典型例子,具有实际重要性。一种重要的SAT解决方法是随机地更新候选分配的杂化搜索算法(SLS)。最近的理论计算机科学成果表明,如果SLS算法有访问适合的"oracle",那么它们可以有效地解决SAT实例, provided they have access to suitable "oracles" that provide samples from an instance-specific distribution, exploiting an instance's local structure. 在这种情况下,我们使用图神经网络训练 oracle,并对两种SLS解决方法进行评估,在随机SAT实例上进行测试。我们发现,通过访问GNN基于 oracle,可以大幅提高SLS解决方法的性能,使其能够解决更难的实例(按照条件数和变量的比率来度量),并且在更少的步骤内完成(比如,在35% fewer steps中完成)。此外,我们发现,在 median number of steps 中,GNN基于 oracle 可以提高 SLS 解决方法的性能,最高可以提高8倍。因此,这项研究将理论计算机科学的成果与深度学习的实践研究相结合,并证明了专门为SAT问题训练的深度学习算法可以提供性能保证。

You Only Look at Screens: Multimodal Chain-of-Action Agents

  • paper_url: http://arxiv.org/abs/2309.11436
  • repo_url: https://github.com/cooelf/Auto-UI
  • paper_authors: Zhuosheng Zhang, Aston Zhang
  • for: 这篇论文旨在提高自动化用户界面(UI)代理的效率,使其可以在不需要人工干预的情况下自动完成任务。
  • methods: 该论文提出了一种多模态解决方案,即直接与界面交互,不需要环境解析或应用程序特定的API。此外,提出了一种链式动作技术,通过考虑先前和后续动作历史,帮助代理决定哪个动作执行。
  • results: 实验结果显示,Auto-UI在新的设备控制benchmark AITW上达到了状态码的性能,具有动作类型预测精度90%和总成功率74%。代码公开可用于https://github.com/cooelf/Auto-UI。
    Abstract Autonomous user interface (UI) agents aim to facilitate task automation by interacting with the user interface without manual intervention. Recent studies have investigated eliciting the capabilities of large language models (LLMs) for effective engagement in diverse environments. To align with the input-output requirement of LLMs, existing approaches are developed under a sandbox setting where they rely on external tools and application-specific APIs to parse the environment into textual elements and interpret the predicted actions. Consequently, those approaches often grapple with inference inefficiency and error propagation risks. To mitigate the challenges, we introduce Auto-UI, a multimodal solution that directly interacts with the interface, bypassing the need for environment parsing or reliance on application-dependent APIs. Moreover, we propose a chain-of-action technique -- leveraging a series of intermediate previous action histories and future action plans -- to help the agent decide what action to execute. We evaluate our approach on a new device-control benchmark AITW with 30K unique instructions, spanning multi-step tasks such as application operation, web searching, and web shopping. Experimental results show that Auto-UI achieves state-of-the-art performance with an action type prediction accuracy of 90% and an overall action success rate of 74%. Code is publicly available at https://github.com/cooelf/Auto-UI.
    摘要 自动化用户界面(UI)代理,目的是自动化任务,不需要人工干预。最近的研究已经利用大型自然语言模型(LLM)来实现多种环境中的有效交互。为了与输入和输出对应的LLM的需求,现有的方法采用沙盒环境,通过外部工具和应用程序特定的API来解析环境并解释预测的动作。然而,这些方法经常会遇到推理不准确和错误传递风险。为了解决这些挑战,我们提出了Auto-UI,一种多模式解决方案,可以直接与界面交互,无需解析环境或依赖于应用程序特定的API。此外,我们还提出了链条动作技术,利用前一系列的历史动作和未来动作计划,帮助代理决定执行哪一个动作。我们在新的设备控制标准AITW上进行了实验,并取得了state-of-the-art表现,具体如下:* 动作类型预测精度达90%* 总体动作成功率达74%代码可以在https://github.com/cooelf/Auto-UI上获取。

A Systematic Review of Few-Shot Learning in Medical Imaging

  • paper_url: http://arxiv.org/abs/2309.11433
  • repo_url: None
  • paper_authors: Eva Pachetti, Sara Colantonio
  • for: 这篇文章旨在给出医学影像分析领域中几何学学习的系统评论,尤其是在几何学学习方法中实现少数扩展学习。
  • methods: 这篇文章使用了系统性的文献搜寻方法,从2018年到2023年发表的80篇相关文章中选择了相关的文献。文章将这些文献分为不同的医疗结果(如肿瘤分类、疾病分类、影像调整等)、 investigate的 анатомі学结构(如心脏、肺等)以及使用的几何学学习方法。
  • results: 文章显示了几何学学习可以在大多数的结果中超过数据不足的问题,并且meta-learning是几何学学习中最受欢迎的方法,可以适应新任务的几何学学习。此外,文章还发现了在医学影像分析中几何学学习中使用的主要技术是supervised learning和semi-supervised learning,并且这些技术在医疗影像分析中表现最佳。最后,文章发现了主要应用领域主要是心脏、肺和腹部领域。
    Abstract The lack of annotated medical images limits the performance of deep learning models, which usually need large-scale labelled datasets. Few-shot learning techniques can reduce data scarcity issues and enhance medical image analysis, especially with meta-learning. This systematic review gives a comprehensive overview of few-shot learning in medical imaging. We searched the literature systematically and selected 80 relevant articles published from 2018 to 2023. We clustered the articles based on medical outcomes, such as tumour segmentation, disease classification, and image registration; anatomical structure investigated (i.e. heart, lung, etc.); and the meta-learning method used. For each cluster, we examined the papers' distributions and the results provided by the state-of-the-art. In addition, we identified a generic pipeline shared among all the studies. The review shows that few-shot learning can overcome data scarcity in most outcomes and that meta-learning is a popular choice to perform few-shot learning because it can adapt to new tasks with few labelled samples. In addition, following meta-learning, supervised learning and semi-supervised learning stand out as the predominant techniques employed to tackle few-shot learning challenges in medical imaging and also best performing. Lastly, we observed that the primary application areas predominantly encompass cardiac, pulmonary, and abdominal domains. This systematic review aims to inspire further research to improve medical image analysis and patient care.
    摘要 因为医疗影像标签的缺乏,深度学习模型的性能受到限制。不过,几个shot学习技术可以解决数据缺乏问题,提高医疗影像分析,特别是在meta-learning中。这个系统性审查给出了医疗影像中几个shot学习的全面回顾。我们在2018年至2023年发布的80篇相关文献中进行了系统性搜寻,并根据医疗结果(例如肿瘤分类、病理分类、影像调整)、 investigate体部(例如心脏、肺部等)和使用的meta-learning方法进行分组。对每个分组,我们评估了文献的分布和顶尖的结果。此外,我们发现了所有研究中的通用架构。审查结果表明,几个shot学习可以在大多数结果中突破数据缺乏问题,meta-learning是最受欢迎的选择,因为它可以适应新任务 WITH FEW labelled samples。此外,在医疗影像中,以supervised learning和semi-supervised learning为主的技术被大量运用,并且表现最佳。最后,我们发现主要应用领域主要是心脏、肺部和腹部领域。这个系统性审查的目的是鼓励进一步的研究,以提高医疗影像分析和patient care。

Generative Pre-Training of Time-Series Data for Unsupervised Fault Detection in Semiconductor Manufacturing

  • paper_url: http://arxiv.org/abs/2309.11427
  • repo_url: None
  • paper_authors: Sewoong Lee, JinKyou Choi, Min Su Kim
  • For: 这个研究旨在运用时间序列数据的特征来探测半导体制造中的异常现象。* Methods: 研究使用时间序列嵌入和生成预训Transformers来预训时间序列数据,并使用标 entropy损失函数来分类异常时间序列和正常时间序列。* Results: 研究表明,我们的模型在UCSD时间序列分类数据集和化学蒸发成长(CVD)设备的处理记录上都显示出更好的表现,与过去的无supervision模型相比。我们的模型在EER上的F1分数最高,并且仅仅0.026下于无supervision基准。
    Abstract This paper introduces TRACE-GPT, which stands for Time-seRies Anomaly-detection with Convolutional Embedding and Generative Pre-trained Transformers. TRACE-GPT is designed to pre-train univariate time-series sensor data and detect faults on unlabeled datasets in semiconductor manufacturing. In semiconductor industry, classifying abnormal time-series sensor data from normal data is important because it is directly related to wafer defect. However, small, unlabeled, and even mixed training data without enough anomalies make classification tasks difficult. In this research, we capture features of time-series data with temporal convolutional embedding and Generative Pre-trained Transformer (GPT) to classify abnormal sequences from normal sequences using cross entropy loss. We prove that our model shows better performance than previous unsupervised models with both an open dataset, the University of California Riverside (UCR) time-series classification archive, and the process log of our Chemical Vapor Deposition (CVD) equipment. Our model has the highest F1 score at Equal Error Rate (EER) across all datasets and is only 0.026 below the supervised state-of-the-art baseline on the open dataset.
    摘要

EDMP: Ensemble-of-costs-guided Diffusion for Motion Planning

  • paper_url: http://arxiv.org/abs/2309.11414
  • repo_url: None
  • paper_authors: Kallol Saha, Vishal Mandadi, Jayaram Reddy, Ajit Srikanth, Aditya Agarwal, Bipasha Sen, Arun Singh, Madhava Krishna
  • for: 这篇论文是为了提出一种 combining classical 和 deep learning 的动作规划方法,以提高动作规划的成功率和普遍性。
  • methods: 本文使用了一种叫做 Ensemble-of-costs-guided Diffusion for Motion Planning(EDMP)的方法,它 combinines 经典的动作规划算法和深度学习算法,以提高动作规划的成功率和普遍性。EDMP 使用了一个 diffusion-based network,训练在一组多元可行的动作轨迹上。在执行过程中,我们Compute scene-specific costs,如 “碰撞成本”,以导引 diffusion 生成符合场景内的碰撞条件的有效轨迹。
  • results: 本文的结果显示,EDMP 能够与 State-of-the-Art 的深度学习基于方法相比,成功率有所提高,并且保留了经典步骤的普遍性。
    Abstract Classical motion planning for robotic manipulation includes a set of general algorithms that aim to minimize a scene-specific cost of executing a given plan. This approach offers remarkable adaptability, as they can be directly used off-the-shelf for any new scene without needing specific training datasets. However, without a prior understanding of what diverse valid trajectories are and without specially designed cost functions for a given scene, the overall solutions tend to have low success rates. While deep-learning-based algorithms tremendously improve success rates, they are much harder to adopt without specialized training datasets. We propose EDMP, an Ensemble-of-costs-guided Diffusion for Motion Planning that aims to combine the strengths of classical and deep-learning-based motion planning. Our diffusion-based network is trained on a set of diverse kinematically valid trajectories. Like classical planning, for any new scene at the time of inference, we compute scene-specific costs such as "collision cost" and guide the diffusion to generate valid trajectories that satisfy the scene-specific constraints. Further, instead of a single cost function that may be insufficient in capturing diversity across scenes, we use an ensemble of costs to guide the diffusion process, significantly improving the success rate compared to classical planners. EDMP performs comparably with SOTA deep-learning-based methods while retaining the generalization capabilities primarily associated with classical planners.
    摘要 经典运动规划 для机器人操作包括一组通用算法,旨在最小化Scene特定的执行计划的成本。这种方法具有很好的适应性,可以直接在新场景上使用,不需要特定的训练数据。然而,不知道多元有效轨迹的特点和场景特定的成本函数,全局的解决方案通常具有低成功率。深度学习基于算法在成功率上提供了很大的改善,但是它们更难于采用,需要特定的训练数据。我们提出了EDMP,一种ensemble-of-costs-guided Diffusion for Motion Planning,旨在结合经典和深度学习基于的运动规划。我们的扩散网络被训练在一组多元可行的轨迹上。在任何新场景的推理时,我们计算场景特定的碰撞成本和导引扩散来生成符合场景特定的约束的有效轨迹。此外,而不是单一的成本函数,我们使用一个ensemble of costs来引导扩散过程,明显提高成功率相比经典规划器。EDMP和SOTA深度学习基于方法相比,保留了经典规划器的总体化能力。

Long-Form End-to-End Speech Translation via Latent Alignment Segmentation

  • paper_url: http://arxiv.org/abs/2309.11384
  • repo_url: None
  • paper_authors: Peter Polák, Ondřej Bojar
  • for: 这个论文的目的是提出一种实时同声翻译方法,可以处理长于几秒钟的语音数据。
  • methods: 这种方法使用现有的同声翻译encoder-decoder架构,并使用ST CTC进行分 segmentation。这个方法不需要额外的监督或参数,可以在实时中进行同声翻译和分 segmentation。
  • results: 在多种语言对和内外领域数据上,我们的方法可以 дости得状态的同声翻译质量,而且不需要额外的计算成本。
    Abstract Current simultaneous speech translation models can process audio only up to a few seconds long. Contemporary datasets provide an oracle segmentation into sentences based on human-annotated transcripts and translations. However, the segmentation into sentences is not available in the real world. Current speech segmentation approaches either offer poor segmentation quality or have to trade latency for quality. In this paper, we propose a novel segmentation approach for a low-latency end-to-end speech translation. We leverage the existing speech translation encoder-decoder architecture with ST CTC and show that it can perform the segmentation task without supervision or additional parameters. To the best of our knowledge, our method is the first that allows an actual end-to-end simultaneous speech translation, as the same model is used for translation and segmentation at the same time. On a diverse set of language pairs and in- and out-of-domain data, we show that the proposed approach achieves state-of-the-art quality at no additional computational cost.
    摘要 当前同时传输模型可以处理音频只有几秒长。当前数据提供了人注释的讲解和翻译,但实际世界中没有这样的分 segmentation。当前的Speech segmentation方法或者提供低质量的分 segmentation或者要求交换延迟和质量。在这篇论文中,我们提出了一种新的分 segmentation方法,用于低延迟的端到端 Speech translation。我们利用现有的Speech translation encoder-decoder架构和 ST CTC,并证明它可以完成分 segmentation任务无需监督或额外参数。根据我们所知,我们的方法是首次实现了实际的同时 Speech translation,因为同时使用了翻译和分 segmentation的同一模型。在多种语言对和内外领域数据上,我们示出了状态机器的质量,没有额外计算成本。

Discuss Before Moving: Visual Language Navigation via Multi-expert Discussions

  • paper_url: http://arxiv.org/abs/2309.11382
  • repo_url: None
  • paper_authors: Yuxing Long, Xiaoqi Li, Wenzhe Cai, Hao Dong
  • for: 这篇论文旨在提出一种新的零基础Visual Language Navigation(VLN)框架,以解决现有VLN方法单一自动思考的局限性。
  • methods: 该框架采用域专家的协助,通过讨论收集关键导航任务的信息,包括指令理解、环境识别和完成估计。
  • results: 经过广泛的实验表明,与域专家进行讨论可以有效地促进导航,提高指令相关信息的理解、更正偶极错误和筛选不一致的运动决策。相比单一自动思考,该方法在所有指标上表现出优异。
    Abstract Visual language navigation (VLN) is an embodied task demanding a wide range of skills encompassing understanding, perception, and planning. For such a multifaceted challenge, previous VLN methods totally rely on one model's own thinking to make predictions within one round. However, existing models, even the most advanced large language model GPT4, still struggle with dealing with multiple tasks by single-round self-thinking. In this work, drawing inspiration from the expert consultation meeting, we introduce a novel zero-shot VLN framework. Within this framework, large models possessing distinct abilities are served as domain experts. Our proposed navigation agent, namely DiscussNav, can actively discuss with these experts to collect essential information before moving at every step. These discussions cover critical navigation subtasks like instruction understanding, environment perception, and completion estimation. Through comprehensive experiments, we demonstrate that discussions with domain experts can effectively facilitate navigation by perceiving instruction-relevant information, correcting inadvertent errors, and sifting through in-consistent movement decisions. The performances on the representative VLN task R2R show that our method surpasses the leading zero-shot VLN model by a large margin on all metrics. Additionally, real-robot experiments display the obvious advantages of our method over single-round self-thinking.
    摘要 <> translate the following text into Simplified ChineseVisual language navigation (VLN) is an embodied task that requires a wide range of skills, including understanding, perception, and planning. Previous VLN methods have relied solely on one model's own thinking to make predictions within one round. However, even the most advanced large language model GPT4 struggles with handling multiple tasks through single-round self-thinking. In this work, inspired by expert consultation meetings, we introduce a novel zero-shot VLN framework. In this framework, large models with distinct abilities serve as domain experts. Our proposed navigation agent, called DiscussNav, can actively discuss with these experts to collect essential information before moving at every step. These discussions cover critical navigation subtasks such as understanding instructions, perceiving the environment, and estimating completion. Through comprehensive experiments, we demonstrate that discussions with domain experts can effectively facilitate navigation by perceiving instruction-relevant information, correcting inadvertent errors, and filtering out inconsistent movement decisions. The performances on the representative VLN task R2R show that our method surpasses the leading zero-shot VLN model by a large margin on all metrics. Additionally, real-robot experiments display the obvious advantages of our method over single-round self-thinking.中文简体版:视觉语言导航(VLN)是一个需要各种技能的体验任务,包括理解、感知和规划。先前的VLN方法都是单一模型自己思考,但是即使最先进的大语言模型GPT4也在处理多任务时仍然陷入困难。在这个工作中,启发于专家咨询会议,我们引入了一种新的零扩展VLN框架。在这个框架中,具有不同能力的大模型服为域专家。我们提出的导航代理人称为DiscussNav,可以在每步移动之前与这些专家进行活动的讨论,收集关键导航子任务的信息。这些讨论包括理解指令、识别环境和估计完成度。通过广泛的实验,我们证明了与域专家进行讨论可以有效地促进导航,捕捉指令相关信息, исправ错误和筛选出不一致的移动决策。R2R任务表明,我们的方法在所有指标上胜过领先的零扩展VLN模型。此外,实际Robot实验也显示了我们方法在单一自我思考方面的明显优势。

Incremental Blockwise Beam Search for Simultaneous Speech Translation with Controllable Quality-Latency Tradeoff

  • paper_url: http://arxiv.org/abs/2309.11379
  • repo_url: None
  • paper_authors: Peter Polák, Brian Yan, Shinji Watanabe, Alex Waibel, Ondřej Bojar
  • for: simultanous speech translation
  • methods: blockwise self-attentional encoder models, incremental blockwise beam search, local agreement or hold-$n$ policies
  • results: 0.6-3.6 BLEU improvement without changing latency or 0.8-1.4 s latency improvement without changing quality.Here’s the full translation in Simplified Chinese:
  • for: 这篇论文主要针对同时语言翻译。
  • methods: 这篇论文使用了块状自注意力编码器模型,并使用了增量块wise beam search和本地一致或保持-$n$ 策略来控制质量和延迟的质量。
  • results: 在 MuST-C 上实验结果显示,无需改变延迟或质量,可以获得0.6-3.6 BLEU 提升,或者可以降低0.8-1.4 s 的延迟。
    Abstract Blockwise self-attentional encoder models have recently emerged as one promising end-to-end approach to simultaneous speech translation. These models employ a blockwise beam search with hypothesis reliability scoring to determine when to wait for more input speech before translating further. However, this method maintains multiple hypotheses until the entire speech input is consumed -- this scheme cannot directly show a single \textit{incremental} translation to users. Further, this method lacks mechanisms for \textit{controlling} the quality vs. latency tradeoff. We propose a modified incremental blockwise beam search incorporating local agreement or hold-$n$ policies for quality-latency control. We apply our framework to models trained for online or offline translation and demonstrate that both types can be effectively used in online mode. Experimental results on MuST-C show 0.6-3.6 BLEU improvement without changing latency or 0.8-1.4 s latency improvement without changing quality.
    摘要 “块级自注意编码器模型在同时语音翻译方面最近几年来得到了一些承诺。这些模型使用块级搜索和假设可靠性分数来决定等待更多的输入语音之前继续翻译。然而,这种方法会维护多个假设,直到整个语音输入被消耗——这种方案无法直接显示单个增量翻译给用户。此外,这种方法缺乏控制质量vs延迟贸易的机制。我们提议修改增量块级搜索,并添加地方一致或保持-$n$ 策略来控制质量vs延迟的贸易。我们将我们的框架应用于在线或离线训练的模型,并证明两种类型都可以在线模式下使用。实验结果表明,在 Must-C 上得到了0.6-3.6 BLEU 提升,或0.8-1.4 s 延迟提升,无需改变质量或延迟。”

Preconditioned Federated Learning

  • paper_url: http://arxiv.org/abs/2309.11378
  • repo_url: None
  • paper_authors: Zeyi Tao, Jindi Wu, Qun Li
  • for: 训练分布式机器学习模型,保持通信效率和隐私性。
  • methods: 基于本地适应和服务器端适应两个框架,采用新的协VAR matrix预conditioner,实现了更高的通信效率和更好的适应性。
  • results: 在 i.i.d. 和非 i.i.d. 情况下,实验结果表明我们的方法可以达到领先的性能水平。
    Abstract Federated Learning (FL) is a distributed machine learning approach that enables model training in communication efficient and privacy-preserving manner. The standard optimization method in FL is Federated Averaging (FedAvg), which performs multiple local SGD steps between communication rounds. FedAvg has been considered to lack algorithm adaptivity compared to modern first-order adaptive optimizations. In this paper, we propose new communication-efficient FL algortithms based on two adaptive frameworks: local adaptivity (PreFed) and server-side adaptivity (PreFedOp). Proposed methods adopt adaptivity by using a novel covariance matrix preconditioner. Theoretically, we provide convergence guarantees for our algorithms. The empirical experiments show our methods achieve state-of-the-art performances on both i.i.d. and non-i.i.d. settings.
    摘要 federated learning (FL) 是一种分布式机器学习方法,可以在通信效率和隐私保护的情况下进行模型训练。标准优化方法在 FL 中是联邦平均(FedAvg),它在通信轮次之间执行多个本地 SGD 步骤。FedAvg 已被认为在与现代首个适应优化相比lack algorithm adaptivity。在这篇论文中,我们提出了新的通信效率FL算法,基于两种适应框架:本地适应(PreFed)和服务器端适应(PreFedOp)。我们的方法采用适应性的novel协方差矩阵预conditioner。我们从理论上提供了收敛保证。实验表明,我们的方法在 i.i.d. 和非 i.i.d. 设置下达到了当前最佳性能。

  • paper_url: http://arxiv.org/abs/2309.11368
  • repo_url: None
  • paper_authors: Haolin Fei, Stefano Tedeschi, Yanpei Huang, Andrew Kennedy, Ziwei Wang
  • for: 这个论文目的是提高人机合作的效率,使用多种modal interaction方式,以便用户可以专注于任务执行,而不需要额外培训用户机器人界面。
  • methods: 这个论文使用了手势认识、语音识别和可 switchable控制适应策略,以提供一个用户友好的人机合作框架。
  • results: 实验结果表明, static手势认识模块的准确率为94.3%,动态运动认识模块的准确率为97.6%。相比之下,人 Solo执行任务时,提出的方法可以提高工具交elivery的效率,而不会干扰人类意图。
    Abstract Human-robot collaboration has benefited users with higher efficiency towards interactive tasks. Nevertheless, most collaborative schemes rely on complicated human-machine interfaces, which might lack the requisite intuitiveness compared with natural limb control. We also expect to understand human intent with low training data requirements. In response to these challenges, this paper introduces an innovative human-robot collaborative framework that seamlessly integrates hand gesture and dynamic movement recognition, voice recognition, and a switchable control adaptation strategy. These modules provide a user-friendly approach that enables the robot to deliver the tools as per user need, especially when the user is working with both hands. Therefore, users can focus on their task execution without additional training in the use of human-machine interfaces, while the robot interprets their intuitive gestures. The proposed multimodal interaction framework is executed in the UR5e robot platform equipped with a RealSense D435i camera, and the effectiveness is assessed through a soldering circuit board task. The experiment results have demonstrated superior performance in hand gesture recognition, where the static hand gesture recognition module achieves an accuracy of 94.3\%, while the dynamic motion recognition module reaches 97.6\% accuracy. Compared with human solo manipulation, the proposed approach facilitates higher efficiency tool delivery, without significantly distracting from human intents.
    摘要 人机合作已经为用户带来更高的效率在互动任务中。然而,大多数合作方案依靠复杂的人机界面,可能缺乏自然的人机交互INTUITIVENESS。我们还期望在训练数据量少的情况下理解人类的意图。为回答这些挑战,本文介绍了一种创新的人机合作框架,它灵活地集成了手势认识、动态运动认识、语音识别和可调制控制策略。这些模块提供了一种用户友好的方法,使得机器人可以根据用户需要提供工具,特别是用户在双手工作时。因此,用户可以专注于任务执行而不需要额外培训人机界面的使用,而机器人可以理解用户的自然姿势。本文所提出的多模式互动框架在UR5e机器人平台上执行,装备了RealSense D435i摄像头,并通过焊接电路板任务进行评估。实验结果显示, static手势认识模块的准确率为94.3%,而动态运动认识模块的准确率达97.6%。相比人类独立操作,提议的方法可以提高工具交付效率,无需明显干扰人类意图。

Knowledge Graph Question Answering for Materials Science (KGQA4MAT): Developing Natural Language Interface for Metal-Organic Frameworks Knowledge Graph (MOF-KG)

  • paper_url: http://arxiv.org/abs/2309.11361
  • repo_url: None
  • paper_authors: Yuan An, Jane Greenberg, Alex Kalinowski, Xintong Zhao, Xiaohua Hu, Fernando J. Uribe-Romo, Kyle Langlois, Jacob Furst, Diego A. Gómez-Gualdrón
  • for: 本研究开发了一个包括161个复杂问题的知识Graph问答板(KGQA4MAT),旨在提高材料科学领域知识Graph(MOF-KG)的访问性。
  • methods: 本研究使用了一种自然语言界面来查询MOF-KG,并开发了一个系统来使用ChatGPT将自然语言问题翻译成正式的KG查询语言。
  • results: 研究发现ChatGPT可以有效地解决不同平台和查询语言的KG问答问题,并且可以帮助加速材料科学领域知识Graph的搜索和探索。
    Abstract We present a comprehensive benchmark dataset for Knowledge Graph Question Answering in Materials Science (KGQA4MAT), with a focus on metal-organic frameworks (MOFs). A knowledge graph for metal-organic frameworks (MOF-KG) has been constructed by integrating structured databases and knowledge extracted from the literature. To enhance MOF-KG accessibility for domain experts, we aim to develop a natural language interface for querying the knowledge graph. We have developed a benchmark comprised of 161 complex questions involving comparison, aggregation, and complicated graph structures. Each question is rephrased in three additional variations, resulting in 644 questions and 161 KG queries. To evaluate the benchmark, we have developed a systematic approach for utilizing ChatGPT to translate natural language questions into formal KG queries. We also apply the approach to the well-known QALD-9 dataset, demonstrating ChatGPT's potential in addressing KGQA issues for different platforms and query languages. The benchmark and the proposed approach aim to stimulate further research and development of user-friendly and efficient interfaces for querying domain-specific materials science knowledge graphs, thereby accelerating the discovery of novel materials.
    摘要 我们提供了一个完整的基准数据集 для知识 graphsQuestion Answering in Materials Science (KGQA4MAT), WITH a focus on metal-organic frameworks (MOFs). A knowledge graph for metal-organic frameworks (MOF-KG) has been constructed by integrating structured databases and knowledge extracted from the literature. To enhance MOF-KG accessibility for domain experts, we aim to develop a natural language interface for querying the knowledge graph. We have developed a benchmark comprised of 161 complex questions involving comparison, aggregation, and complicated graph structures. Each question is rephrased in three additional variations, resulting in 644 questions and 161 KG queries. To evaluate the benchmark, we have developed a systematic approach for utilizing ChatGPT to translate natural language questions into formal KG queries. We also apply the approach to the well-known QALD-9 dataset, demonstrating ChatGPT's potential in addressing KGQA issues for different platforms and query languages. The benchmark and the proposed approach aim to stimulate further research and development of user-friendly and efficient interfaces for querying domain-specific materials science knowledge graphs, thereby accelerating the discovery of novel materials.Here is the translation in Traditional Chinese:我们提供了一个完整的基准数据集 для知识 graphsQuestion Answering in Materials Science (KGQA4MAT), WITH a focus on metal-organic frameworks (MOFs). A knowledge graph for metal-organic frameworks (MOF-KG) 已经建立了由structured databases和文献中提取的知识 integrate。为了增强MOF-KG对领域专家的存取,我们目标是开发一个自然语言界面来查询知识 Graph。我们已经开发了一个包含161个复杂问题,涉及比较、总和、图像结构的问题。每个问题都有三个版本,共计644个问题和161个KG查询。为了评估基准,我们开发了一个系统性的方法,使用ChatGPT来将自然语言问题转换为正式的KG查询。我们还将这个方法应用到知名的QALD-9数据集上,展示了ChatGPT对不同平台和查询语言的应用潜力。基准和我们提出的方法的目的是促进领域专家用户友好和高效的界面来查询领域专门的材料科学知识图表,以便加速发现新材料的发现。

3D Face Reconstruction: the Road to Forensics

  • paper_url: http://arxiv.org/abs/2309.11357
  • repo_url: None
  • paper_authors: Simone Maurizio La Cava, Giulia Orrù, Martin Drahansky, Gian Luca Marcialis, Fabio Roli
  • for: 法律领域中的3D面部重建应用
  • methods: 使用Surveillance影像和照片进行3D面部重建
  • results: 略见问题,尚未确立3D面部重建在法律领域的积极角色
    Abstract 3D face reconstruction algorithms from images and videos are applied to many fields, from plastic surgery to the entertainment sector, thanks to their advantageous features. However, when looking at forensic applications, 3D face reconstruction must observe strict requirements that still make its possible role in bringing evidence to a lawsuit unclear. An extensive investigation of the constraints, potential, and limits of its application in forensics is still missing. Shedding some light on this matter is the goal of the present survey, which starts by clarifying the relation between forensic applications and biometrics, with a focus on face recognition. Therefore, it provides an analysis of the achievements of 3D face reconstruction algorithms from surveillance videos and mugshot images and discusses the current obstacles that separate 3D face reconstruction from an active role in forensic applications. Finally, it examines the underlying data sets, with their advantages and limitations, while proposing alternatives that could substitute or complement them.
    摘要 三维面部重建算法从图像和视频应用到多个领域,从整形外科到娱乐业,因为它们的优点。但当看到审判应用时,三维面部重建必须遵守严格的要求,这些要求仍然使其在提供法律证据的角色是不清晰。为了解决这个问题,本调查的目的是 shedding some light on this matter,开始从审判应用和生物ometrics之间的关系进行清楚的解释,并对surveillance视频和抓捕图像中的3D面部重建算法的成果进行分析,并讨论当前障碍三维面部重建在审判应用中扮演活跃角色的原因。最后,它检查了下面的数据集,包括其优点和限制,并提出了代替或补充的方案。Note: Please note that the translation is in Simplified Chinese, which is used in mainland China and Singapore. If you need Traditional Chinese, please let me know.

A Comprehensive Survey on Rare Event Prediction

  • paper_url: http://arxiv.org/abs/2309.11356
  • repo_url: None
  • paper_authors: Chathurangi Shyalika, Ruwan Wickramarachchi, Amit Sheth
  • For: 本研究主要针对频率低的罕见事件预测,即使用机器学习和数据分析方法来预测这些事件的发生。* Methods: 本文综述了目前预测罕见事件的方法,包括数据处理、算法方法和评估方法等,并从不同的数据模式和预测方法角度进行了梳理和分析。* Results: 本文结果显示,预测罕见事件存在许多挑战,如数据不均衡、模型偏向等问题,同时还存在许多研究缺乏或未得到充分发挥的问题。
    Abstract Rare event prediction involves identifying and forecasting events with a low probability using machine learning and data analysis. Due to the imbalanced data distributions, where the frequency of common events vastly outweighs that of rare events, it requires using specialized methods within each step of the machine learning pipeline, i.e., from data processing to algorithms to evaluation protocols. Predicting the occurrences of rare events is important for real-world applications, such as Industry 4.0, and is an active research area in statistical and machine learning. This paper comprehensively reviews the current approaches for rare event prediction along four dimensions: rare event data, data processing, algorithmic approaches, and evaluation approaches. Specifically, we consider 73 datasets from different modalities (i.e., numerical, image, text, and audio), four major categories of data processing, five major algorithmic groupings, and two broader evaluation approaches. This paper aims to identify gaps in the current literature and highlight the challenges of predicting rare events. It also suggests potential research directions, which can help guide practitioners and researchers.
    摘要 罕seen事件预测 involve identifying和forecasting事件with a low probability using机器学习和数据分析。由于数据分布的偏度,其中常见事件的频率远远大于罕seen事件的频率,因此需要使用特殊的方法在每个机器学习管道中,从数据处理到算法到评估协议。预测罕seen事件的发生是现实世界应用中的重要问题,如第四代工业,并是机器学习的活跃研究领域。本文全面回顾当前approaches for rare event prediction along four dimensions:罕seen事件数据、数据处理、算法approaches、和评估approaches。Specifically, we consider 73 datasets from different modalities(i.e., numerical, image, text, and audio)、四大类数据处理、五大算法组合、和两大评估方法。本文的目的是要标识当前文献中的空白和预测罕seen事件的挑战,并提出了 potential research directions,以帮助实践者和研究人员。

C$\cdot$ASE: Learning Conditional Adversarial Skill Embeddings for Physics-based Characters

  • paper_url: http://arxiv.org/abs/2309.11351
  • repo_url: None
  • paper_authors: Zhiyang Dou, Xuelin Chen, Qingnan Fan, Taku Komura, Wenping Wang
  • for: 这个论文目标是为physics-based characters提供一个有效的学习推荐系统,使得这些角色可以学习多种技能并提供可控性。
  • methods: 这个系统使用了 conditional Adversarial Skill Embeddings(C$\cdot$ASE),将技能动作分成不同的子集,并使用低级别的条件模型来学习条件行为分布。
  • results: 论文表明,使用C$\cdot$ASE可以生成高度多样化和现实的技能动作,并且可以在不同的下游任务中重用。此外,该系统还提供了一个高级别的政策或用户可以使用某种技能特定的指定来控制角色的行为。
    Abstract We present C$\cdot$ASE, an efficient and effective framework that learns conditional Adversarial Skill Embeddings for physics-based characters. Our physically simulated character can learn a diverse repertoire of skills while providing controllability in the form of direct manipulation of the skills to be performed. C$\cdot$ASE divides the heterogeneous skill motions into distinct subsets containing homogeneous samples for training a low-level conditional model to learn conditional behavior distribution. The skill-conditioned imitation learning naturally offers explicit control over the character's skills after training. The training course incorporates the focal skill sampling, skeletal residual forces, and element-wise feature masking to balance diverse skills of varying complexities, mitigate dynamics mismatch to master agile motions and capture more general behavior characteristics, respectively. Once trained, the conditional model can produce highly diverse and realistic skills, outperforming state-of-the-art models, and can be repurposed in various downstream tasks. In particular, the explicit skill control handle allows a high-level policy or user to direct the character with desired skill specifications, which we demonstrate is advantageous for interactive character animation.
    摘要 我们提出C$\cdot$ASE框架,一种高效有效的框架,学习受条件敌意素嵌入,用于物理基础的角色。我们的物理模拟角色可以学习多种多样的技能,同时提供可控性,通过直接控制技能的执行。C$\cdot$ASE将不同的技能动作分成不同的子集,对具有相同性的样本进行训练低级别的条件模型,学习条件行为分布。通过技能条件学习,可以直接控制角色的技能,并且可以在训练过程中通过焦点技能采样、骨骼剩余力和元素特征掩码来平衡多种技能的复杂性,弥补动力匹配问题,捕捉更加普遍的行为特征。一旦训练完成,条件模型可以生成高度多样化和真实的技能,超越当前模型,并且可以在下游任务中重用。特别是,条件控制把手允许高级政策或用户指定角色的愿望技能规格,我们示示其对交互角色动画有利。

TRAVID: An End-to-End Video Translation Framework

  • paper_url: http://arxiv.org/abs/2309.11338
  • repo_url: None
  • paper_authors: Prottay Kumar Adhikary, Bandaru Sugandhi, Subhojit Ghimire, Santanu Pal, Partha Pakray
    for: 这篇论文是为了提供一种实现语言翻译的视频翻译系统,以便在不同语言背景下进行有效的沟通。methods: 该系统使用了一种综合语音和视频的翻译方法,通过具体的语音和视频对应关系来实现视频中的语言翻译。results: 该系统可以帮助学生和用户在低资源环境中进行有效的学习和沟通,同时提供了一种更加真实和吸引人的学习环境,从而提高学习效果和参与度。
    Abstract In today's globalized world, effective communication with people from diverse linguistic backgrounds has become increasingly crucial. While traditional methods of language translation, such as written text or voice-only translations, can accomplish the task, they often fail to capture the complete context and nuanced information conveyed through nonverbal cues like facial expressions and lip movements. In this paper, we present an end-to-end video translation system that not only translates spoken language but also synchronizes the translated speech with the lip movements of the speaker. Our system focuses on translating educational lectures in various Indian languages, and it is designed to be effective even in low-resource system settings. By incorporating lip movements that align with the target language and matching them with the speaker's voice using voice cloning techniques, our application offers an enhanced experience for students and users. This additional feature creates a more immersive and realistic learning environment, ultimately making the learning process more effective and engaging.
    摘要 今天的全球化世界中,与不同语言背景的人进行有效沟通已经变得越来越重要。传统的语言翻译方法,如文本或声音翻译,可以完成任务,但它们经常无法捕捉 spoken language 中的完整上下文和细节信息。在这篇论文中,我们提出了一个端到端视频翻译系统,不仅翻译 spoken language,还将翻译后的语音与说话人的嘴语ynchronize。我们的系统专注于翻译印度各语言的教育讲解,并且针对具有低资源系统的设置进行设计。通过使用声音恶搅技术,我们的应用程序将嘴语与目标语言的对应语音进行匹配,从而提供了一个更加真实和有趣的学习环境。这种附加的特性使得学习过程更加有效和有趣。

Gold-YOLO: Efficient Object Detector via Gather-and-Distribute Mechanism

  • paper_url: http://arxiv.org/abs/2309.11331
  • repo_url: https://github.com/huawei-noah/Efficient-Computing
  • paper_authors: Chengcheng Wang, Wei He, Ying Nie, Jianyuan Guo, Chuanjian Liu, Kai Han, Yunhe Wang
  • for: This paper aims to improve the object detection performance of YOLO-series models by introducing a new Gather-Distribute (GD) mechanism and implementing MAE-style pretraining.
  • methods: The proposed Gold-YOLO model uses a GD mechanism that combines convolution and self-attention operations to improve multi-scale feature fusion. The model also uses MAE-style pretraining to enhance the performance.
  • results: The Gold-YOLO model achieves an outstanding 39.9% AP on the COCO val2017 dataset and 1030 FPS on a T4 GPU, outperforming the previous SOTA model YOLOv6-3.0-N by +2.4% in terms of AP.
    Abstract In the past years, YOLO-series models have emerged as the leading approaches in the area of real-time object detection. Many studies pushed up the baseline to a higher level by modifying the architecture, augmenting data and designing new losses. However, we find previous models still suffer from information fusion problem, although Feature Pyramid Network (FPN) and Path Aggregation Network (PANet) have alleviated this. Therefore, this study provides an advanced Gatherand-Distribute mechanism (GD) mechanism, which is realized with convolution and self-attention operations. This new designed model named as Gold-YOLO, which boosts the multi-scale feature fusion capabilities and achieves an ideal balance between latency and accuracy across all model scales. Additionally, we implement MAE-style pretraining in the YOLO-series for the first time, allowing YOLOseries models could be to benefit from unsupervised pretraining. Gold-YOLO-N attains an outstanding 39.9% AP on the COCO val2017 datasets and 1030 FPS on a T4 GPU, which outperforms the previous SOTA model YOLOv6-3.0-N with similar FPS by +2.4%. The PyTorch code is available at https://github.com/huawei-noah/Efficient-Computing/tree/master/Detection/Gold-YOLO, and the MindSpore code is available at https://gitee.com/mindspore/models/tree/master/research/cv/Gold_YOLO.
    摘要 在过去的几年中,YOLO系列模型在实时对象检测领域取得了领先地位。许多研究尝试提高基线,通过修改架构、增强数据和设计新的损失函数。然而,我们发现先前的模型仍然受到信息融合问题的困扰,尽管Feature Pyramid Network(FPN)和Path Aggregation Network(PANet)已经减轻了这个问题。因此,本研究提出了一种高级的聚合分发机制(GD)机制,通过 convolution 和自注意操作实现。这新的设计的模型被称为 Gold-YOLO,它提高了多尺度特征融合能力,并在所有模型缩放水平上实现了理想的平衡 между延迟和准确率。此外,我们在 YOLO 系列模型中实施了 MAE 风格的预训练,让 YOLO 系列模型可以从无监督预训练中受益。Gold-YOLO-N 在 COCO val2017 数据集上达到了出色的 39.9% AP 和 T4 GPU 上的 1030 FPS,超过了先前的 SOTA 模型 YOLOv6-3.0-N 的相似 FPS 值 by +2.4%。PyTorch 代码可以在 找到,MindSpore 代码可以在 找到。

Dynamic Pricing of Applications in Cloud Marketplaces using Game Theory

  • paper_url: http://arxiv.org/abs/2309.11316
  • repo_url: None
  • paper_authors: Safiye Ghasemi, Mohammad Reza Meybodi, Mehdi Dehghan Takht-Fooladi, Amir Masoud Rahmani
  • for: 这个论文旨在研究云市场竞争对应的价格策略,以帮助企业更好地制定价格策略。
  • methods: 该论文采用了游戏理论来设计动态价格策略,并在委员会中考虑了多家提供商的竞争。
  • results: 该论文通过数学模型来研究云市场竞争,并证明了存在和uniqueness的纳什平衡,从而为企业提供了新的动态价格策略。
    Abstract The competitive nature of Cloud marketplaces as new concerns in delivery of services makes the pricing policies a crucial task for firms. so that, pricing strategies has recently attracted many researchers. Since game theory can handle such competing well this concern is addressed by designing a normal form game between providers in current research. A committee is considered in which providers register for improving their competition based pricing policies. The functionality of game theory is applied to design dynamic pricing policies. The usage of the committee makes the game a complete information one, in which each player is aware of every others payoff functions. The players enhance their pricing policies to maximize their profits. The contribution of this paper is the quantitative modeling of Cloud marketplaces in form of a game to provide novel dynamic pricing strategies; the model is validated by proving the existence and the uniqueness of Nash equilibrium of the game.
    摘要 云市场的竞争性新问题在服务交付中带来了价格策略的核心任务 для公司。因此,价格策略在最近吸引了许多研究人员。由于游戏理论可以良好处理这种竞争,因此在当前研究中,设计了一个委员会,让提供者为了改善其竞争基础价格策略进行注册。通过游戏理论的应用,设计了动态价格策略。由于委员会的存在,游戏变为完全信息游戏,每个玩家知道彼此的利益函数。玩家通过优化价格策略来 maximize 利润。本文的贡献在于以游戏的形式对云市场进行量化模型化,提供了新的动态价格策略;模型的存在和uniqueness 的证明,证明了这种游戏的稳定性。

A Competition-based Pricing Strategy in Cloud Markets using Regret Minimization Techniques

  • paper_url: http://arxiv.org/abs/2309.11312
  • repo_url: None
  • paper_authors: S. Ghasemi, M. R. Meybodi, M. Dehghan, A. M. Rahmani
  • for: This paper aims to address the challenge of pricing in Cloud computing marketplaces, where providers compete without knowing each other’s pricing policies.
  • methods: The paper proposes a pricing policy based on regret minimization and applies it to an incomplete-information game modeling the competition among Cloud providers. The algorithm updates the distribution of strategies based on experienced regret, leading to faster minimization of regret and increased profits for providers.
  • results: The experimental results show that the proposed pricing policy leads to much greater increases in provider profits compared to other pricing policies, and the efficiency of various regret minimization techniques in a simulated marketplace of Cloud is discussed. Additionally, the study examines the return on investment of providers in considered organizations and finds promising results.Here’s the Chinese translation of the three key points:
  • for: 这篇论文目标是解决云计算市场场所中的价格问题, provider competing without knowing each other’s pricing policies。
  • methods: 论文提出一种基于后悔最小化的价格策略,并应用到了不完全信息游戏中模拟云提供商的竞争。算法根据经验的后悔来更新策略分布,导致快速减少后悔。
  • results: 实验结果表明,提出的价格策略在其他价格策略的比较中显示出了很大的增长,并且在模拟云中的竞争市场中,不同的后悔最小化技术的效率得到了详细的讨论。此外,论文还研究了考虑了不同组织中提供商的投资回报,并发现了有前提。
    Abstract Cloud computing as a fairly new commercial paradigm, widely investigated by different researchers, already has a great range of challenges. Pricing is a major problem in Cloud computing marketplace; as providers are competing to attract more customers without knowing the pricing policies of each other. To overcome this lack of knowledge, we model their competition by an incomplete-information game. Considering the issue, this work proposes a pricing policy related to the regret minimization algorithm and applies it to the considered incomplete-information game. Based on the competition based marketplace of the Cloud, providers update the distribution of their strategies using the experienced regret. The idea of iteratively applying the algorithm for updating probabilities of strategies causes the regret get minimized faster. The experimental results show much more increase in profits of the providers in comparison with other pricing policies. Besides, the efficiency of a variety of regret minimization techniques in a simulated marketplace of Cloud are discussed which have not been observed in the studied literature. Moreover, return on investment of providers in considered organizations is studied and promising results appeared.
    摘要 云计算作为一种比较新的商业模式,已经广泛研究了不同的研究者。在云计算市场中,价格是一个主要的问题,Provider competing to attract more customers without knowing each other's pricing policies。为了解决这个问题,我们模拟了这个 incomplete-information game。基于云计算市场的竞争性,提供者通过经验的 regret 更新分布的策略。iteratively applying the algorithm for updating probabilities of strategies causes the regret get minimized faster。实验结果表明,与其他价格策略相比,提供者的利润增加了很多。此外,我们还发现了一些 regret minimization techniques 在云计算市场中的效率,这些result未经studied literature。此外,我们还研究了Provider的投资回报,并获得了扎实的结果。

Rating Prediction in Conversational Task Assistants with Behavioral and Conversational-Flow Features

  • paper_url: http://arxiv.org/abs/2309.11307
  • repo_url: https://github.com/rafaelhferreira/cta_rating_prediction
  • paper_authors: Rafael Ferreira, David Semedo, João Magalhães
  • for: 预测对话任务助手(CTA)的成功可以帮助我们理解用户行为并采取相应的行动。
  • methods: 这篇论文提出了TB-Rater模型,这是一种将对话流程特征与用户行为特征结合在一起的Transformer模型,用于在CTA场景下预测用户评分。具体来说,我们使用了真实的人类-机器人对话和在Alexa TaskBot挑战中收集的用户评分数据。
  • results: 我们的结果表明,模型对话流程和用户行为方面的特征可以在单个模型中结合,以预测Offline评分。此外,对CTA特有的行为特征进行分析,可以为未来系统提供参考。
    Abstract Predicting the success of Conversational Task Assistants (CTA) can be critical to understand user behavior and act accordingly. In this paper, we propose TB-Rater, a Transformer model which combines conversational-flow features with user behavior features for predicting user ratings in a CTA scenario. In particular, we use real human-agent conversations and ratings collected in the Alexa TaskBot challenge, a novel multimodal and multi-turn conversational context. Our results show the advantages of modeling both the conversational-flow and behavioral aspects of the conversation in a single model for offline rating prediction. Additionally, an analysis of the CTA-specific behavioral features brings insights into this setting and can be used to bootstrap future systems.
    摘要 预测对话任务助手(CTA)的成功可以帮助我们更好地理解用户行为,从而更好地行动。在这篇论文中,我们提出了TB-Rater模型,这是一个基于转换器模型,结合对话流程特征和用户行为特征来预测用户评分在CTA场景中。具体来说,我们使用了真实的人类-机器人对话和在Alexa TaskBot挑战中收集的用户评分数据,这是一个新的多模式和多轮对话上下文。我们的结果表明,将对话流程和行为方面的特征模型在单个模型中可以在线评分中获得优势。此外,对CTA特有的行为特征进行分析,可以为未来系统提供Bootstrap。

FaceDiffuser: Speech-Driven 3D Facial Animation Synthesis Using Diffusion

  • paper_url: http://arxiv.org/abs/2309.11306
  • repo_url: https://github.com/uuembodiedsocialai/FaceDiffuser
  • paper_authors: Stefan Stan, Kazi Injamamul Haque, Zerrin Yumak
  • for: 这个论文旨在解决Current methods mostly focus on deterministic deep learning methods for speech-driven 3D facial animation synthesis, which do not accurately capture non-verbal facial cues.
  • methods: 该方法基于Diffusion Technique,使用预训练的大语音表示模型HuBERT对音频输入进行编码。
  • results: 我们的方法在对比于现有方法时达到了更好或相当的结果,并且引入了一个新的基于blendshape的rigged character的数据集。Here’s the full summary in Simplified Chinese:
  • for: 这个论文旨在解决Current methods mostly focus on deterministic deep learning methods for speech-driven 3D facial animation synthesis, which do not accurately capture non-verbal facial cues.
  • methods: 该方法基于Diffusion Technique,使用预训练的大语音表示模型HuBERT对音频输入进行编码。
  • results: 我们的方法在对比于现有方法时达到了更好或相当的结果,并且引入了一个新的基于blendshape的rigged character的数据集。I hope this helps! Let me know if you have any other questions.
    Abstract Speech-driven 3D facial animation synthesis has been a challenging task both in industry and research. Recent methods mostly focus on deterministic deep learning methods meaning that given a speech input, the output is always the same. However, in reality, the non-verbal facial cues that reside throughout the face are non-deterministic in nature. In addition, majority of the approaches focus on 3D vertex based datasets and methods that are compatible with existing facial animation pipelines with rigged characters is scarce. To eliminate these issues, we present FaceDiffuser, a non-deterministic deep learning model to generate speech-driven facial animations that is trained with both 3D vertex and blendshape based datasets. Our method is based on the diffusion technique and uses the pre-trained large speech representation model HuBERT to encode the audio input. To the best of our knowledge, we are the first to employ the diffusion method for the task of speech-driven 3D facial animation synthesis. We have run extensive objective and subjective analyses and show that our approach achieves better or comparable results in comparison to the state-of-the-art methods. We also introduce a new in-house dataset that is based on a blendshape based rigged character. We recommend watching the accompanying supplementary video. The code and the dataset will be publicly available.
    摘要 <>Translate the given text into Simplified Chinese.<>人工智能驱动的3D面部动画生成问题在行业和研究中都是挑战性的。现有的方法大多涉及决定性深度学习方法,即给定一个语音输入,输出总是一样的。然而,现实中的非语言面部征标是不决定的性质。此外,大多数方法都集中在3D顶点基本的数据集和方法上,与现有的人物动画管道相容的方法scarce。为解决这些问题,我们介绍FaceDiffuser,一种非决定性深度学习模型,用于生成语音驱动的3D面部动画。我们的方法基于扩散技术,使用预训练的大语音表示模型HuBERT来编码音频输入。到目前为止,我们是第一个使用扩散方法来解决语音驱动3D面部动画生成问题。我们进行了广泛的对象和主观分析,并证明我们的方法可以与当前状态的方法相比或更好的成绩。我们还介绍了一个新的基于blendshape的人物动画数据集。建议观看附加的补充视频。代码和数据集将公开发布。

A Cost-Aware Mechanism for Optimized Resource Provisioning in Cloud Computing

  • paper_url: http://arxiv.org/abs/2309.11299
  • repo_url: None
  • paper_authors: Safiye Ghasemi, Mohammad Reza Meybodi, Mehdi Dehghan Takht Fooladi, Amir Masoud Rahmani
  • for: 这篇论文旨在提出一种新的资源配置方法,以减少资源配置成本的方式来满足需求。
  • methods: 本文使用了学习自动过程来选择最适合的资源来主机每个服务,并考虑成本和服务需求。
  • results: 实验结果显示,我们的方法能够有效地运行许多不同类型的应用程序,并且可以适当地减少资源配置成本。
    Abstract Due to the recent wide use of computational resources in cloud computing, new resource provisioning challenges have been emerged. Resource provisioning techniques must keep total costs to a minimum while meeting the requirements of the requests. According to widely usage of cloud services, it seems more challenging to develop effective schemes for provisioning services cost-effectively; we have proposed a novel learning based resource provisioning approach that achieves cost-reduction guarantees of demands. The contributions of our optimized resource provisioning (ORP) approach are as follows. Firstly, it is designed to provide a cost-effective method to efficiently handle the provisioning of requested applications; while most of the existing models allow only workflows in general which cares about the dependencies of the tasks, ORP performs based on services of which applications comprised and cares about their efficient provisioning totally. Secondly, it is a learning automata-based approach which selects the most proper resources for hosting each service of the demanded application; our approach considers both cost and service requirements together for deploying applications. Thirdly, a comprehensive evaluation is performed for three typical workloads: data-intensive, process-intensive and normal applications. The experimental results show that our method adapts most of the requirements efficiently, and furthermore the resulting performance meets our design goals.
    摘要 The contributions of our optimized resource provisioning (ORP) approach are as follows:1. Cost-effective method: ORP provides a cost-effective method to efficiently handle the provisioning of requested applications, while most existing models only consider workflows in general and ignore the dependencies of tasks. ORP takes into account the services that applications comprise and cares about their efficient provisioning.2. Learning automata-based approach: ORP is a learning automata-based approach that selects the most appropriate resources for hosting each service of the demanded application. Our approach considers both cost and service requirements together for deploying applications.3. Comprehensive evaluation: We conducted a comprehensive evaluation for three typical workloads: data-intensive, process-intensive, and normal applications. The experimental results show that our method adapts to most of the requirements efficiently, and the resulting performance meets our design goals.

CPLLM: Clinical Prediction with Large Language Models

  • paper_url: http://arxiv.org/abs/2309.11295
  • repo_url: https://github.com/nadavlab/CPLLM
  • paper_authors: Ofir Ben Shoham, Nadav Rappoport
  • for: 这个论文是为了提出一种基于大语言模型的临床预测方法,以便预测患者是否会在下一次访问或接下来的诊断中被诊断出某种疾病。
  • methods: 这个方法是基于已经预训练的大语言模型(LLM),通过quantization和提示来进行微调,以便预测患者的疾病风险。
  • results: 对于不同的基线模型,包括Logistic Regression、RETAIN和Med-BERT,我们的CPLLM模型在PR-AUC和ROC-AUC metric上都显示出了明显的提升,较baseline模型更高。
    Abstract We present Clinical Prediction with Large Language Models (CPLLM), a method that involves fine-tuning a pre-trained Large Language Model (LLM) for clinical disease prediction. We utilized quantization and fine-tuned the LLM using prompts, with the task of predicting whether patients will be diagnosed with a target disease during their next visit or in the subsequent diagnosis, leveraging their historical diagnosis records. We compared our results versus various baselines, including Logistic Regression, RETAIN, and Med-BERT, which is the current state-of-the-art model for disease prediction using structured EHR data. Our experiments have shown that CPLLM surpasses all the tested models in terms of both PR-AUC and ROC-AUC metrics, displaying noteworthy enhancements compared to the baseline models.
    摘要 我团队现在提出了临床预测使用大型语言模型(CPLLM),这种方法是通过先前训练的大型语言模型(LLM)进行精度调整,以预测患者将在下一次访问或接下来的诊断中被诊断出的疾病。我们使用量化和精度调整LLM,使其能够利用患者历史诊断记录来预测疾病。我们与various baselines进行比较,包括Logistic Regression、RETAIN和Med-BERT,这些模型都是使用结构化医疗记录数据进行疾病预测的现状之arte。我们的实验结果表明,CPLLM在PR-AUC和ROC-AUC指标上都超过了所有测试模型,显示了与基线模型相比而言的remarkable enhancements。

Overview of AuTexTification at IberLEF 2023: Detection and Attribution of Machine-Generated Text in Multiple Domains

  • paper_url: http://arxiv.org/abs/2309.11285
  • repo_url: https://github.com/autextification/AuTexTification-Overview
  • paper_authors: Areg Mikael Sarvazyan, José Ángel González, Marc Franco-Salvador, Francisco Rangel, Berta Chulvi, Paolo Rosso
  • for: 这篇论文描述了2023年的IberLEF工作坊中的AuTexTification分类任务,这是一个iberian语言评估论坛(SEPLN)2023年会议的一部分。
  • methods: 这篇论文描述了AuTexTification任务的两个子任务:第一个子任务是判断文本是人工生成的还是大语言模型生成的;第二个子任务是归属一个机器生成文本到六种不同的文本生成模型中。
  • results: 这篇论文描述了AuTexTification2023数据集,包含了英语和西班牙语的160,000多个文本,来自五个领域(微博、评论、新闻、法律和使用教程)。总共有114个团队参加了比赛,其中36个团队发送了175个运行,20个团队发送了工作笔记。在这篇报告中,我们介绍了AuTexTification数据集和任务,参与系统,以及结果。
    Abstract This paper presents the overview of the AuTexTification shared task as part of the IberLEF 2023 Workshop in Iberian Languages Evaluation Forum, within the framework of the SEPLN 2023 conference. AuTexTification consists of two subtasks: for Subtask 1, participants had to determine whether a text is human-authored or has been generated by a large language model. For Subtask 2, participants had to attribute a machine-generated text to one of six different text generation models. Our AuTexTification 2023 dataset contains more than 160.000 texts across two languages (English and Spanish) and five domains (tweets, reviews, news, legal, and how-to articles). A total of 114 teams signed up to participate, of which 36 sent 175 runs, and 20 of them sent their working notes. In this overview, we present the AuTexTification dataset and task, the submitted participating systems, and the results.
    摘要

Rethinking Sensors Modeling: Hierarchical Information Enhanced Traffic Forecasting

  • paper_url: http://arxiv.org/abs/2309.11284
  • repo_url: https://github.com/VAN-QIAN/CIKM23-HIEST
  • paper_authors: Qian Ma, Zijian Zhang, Xiangyu Zhao, Haoliang Li, Hongwei Zhao, Yiqi Wang, Zitao Liu, Wanyu Wang
  • for: 这篇论文主要关注于城市化加速时的交通预测,并在空间时间预测中提出了一个新的方法。
  • methods: 本文提出了一个 Hierarchical Information Enhanced Spatio-Temporal prediction 方法(HIEST),它将感应器之间的依赖性分为两层:地域层和全球层。
  • results: 实验结果显示,HIEST 方法在比较于现有基eline之上获得了leading performance。
    Abstract With the acceleration of urbanization, traffic forecasting has become an essential role in smart city construction. In the context of spatio-temporal prediction, the key lies in how to model the dependencies of sensors. However, existing works basically only consider the micro relationships between sensors, where the sensors are treated equally, and their macroscopic dependencies are neglected. In this paper, we argue to rethink the sensor's dependency modeling from two hierarchies: regional and global perspectives. Particularly, we merge original sensors with high intra-region correlation as a region node to preserve the inter-region dependency. Then, we generate representative and common spatio-temporal patterns as global nodes to reflect a global dependency between sensors and provide auxiliary information for spatio-temporal dependency learning. In pursuit of the generality and reality of node representations, we incorporate a Meta GCN to calibrate the regional and global nodes in the physical data space. Furthermore, we devise the cross-hierarchy graph convolution to propagate information from different hierarchies. In a nutshell, we propose a Hierarchical Information Enhanced Spatio-Temporal prediction method, HIEST, to create and utilize the regional dependency and common spatio-temporal patterns. Extensive experiments have verified the leading performance of our HIEST against state-of-the-art baselines. We publicize the code to ease reproducibility.
    摘要 随着城市化的加速,城市智能化建设中的交通预测已成为一项重要的任务。在空间时间预测的上下文中,关键在于如何模型感知器之间的依赖关系。然而,现有的工作基本上只考虑了感知器之间的微型关系,忽略了感知器的宏观依赖关系。在这篇论文中,我们认为应重新考虑感知器之间的依赖模型化,从两个层次来看:地域和全球视角。具体来说,我们将原始感知器高度相关的内部节点合并为一个地域节点,以保留宏观依赖关系。然后,我们生成了代表性的全球节点,用于反映全球感知器之间的依赖关系,并提供辅助的空间时间依赖学习信息。为了保证节点表示的通用性和实际性,我们将MetaGCN integrate into physical data space。此外,我们提出了跨层次图 convolution来传递不同层次的信息。简而言之,我们提出了一种增强空间时间预测方法,即 Hierarchical Information Enhanced Spatio-Temporal prediction(HIEST),以创造和利用地域依赖关系和共同空间时间模式。我们的实验证明了HIEST在比较顶尖基准下的领先性。我们公布了代码,以便重现。

Open-endedness induced through a predator-prey scenario using modular robots

  • paper_url: http://arxiv.org/abs/2309.11275
  • repo_url: None
  • paper_authors: Dimitri Kachler, Karine Miras
  • for: 这个研究探讨了如何通过探险-猎食情况引发开放演化(OEE)。
  • methods: 研究使用固定 morphology 的模块机器人,其控制器被进行进化。机器人可以发送和接收信号,并在环境中识别其他机器人的相对位置。研究还引入了一个标记系统,它改变了个体如何识别彼此的方式,并预计会增加行为复杂性。
  • results: 研究发现了适应策略的出现,证明了通过探险-猎食 dinamics 使用模块机器人来引发 OEE 的可能性。然而,这种emergence似乎需要根据行为标准来条件繁殖。
    Abstract This work investigates how a predator-prey scenario can induce the emergence of Open-Ended Evolution (OEE). We utilize modular robots of fixed morphologies whose controllers are subject to evolution. In both species, robots can send and receive signals and perceive the relative positions of other robots in the environment. Specifically, we introduce a feature we call a tagging system: it modifies how individuals can perceive each other and is expected to increase behavioral complexity. Our results show the emergence of adaptive strategies, demonstrating the viability of inducing OEE through predator-prey dynamics using modular robots. Such emergence, nevertheless, seemed to depend on conditioning reproduction to an explicit behavioral criterion.
    摘要 这项研究探讨了掠食-猎物情况如何引起开放演化(OEE)的出现。我们利用固定形态的模块机器人的控制器进行进化。在两种机器人中,机器人可以发送和接收信号,并且可以感知环境中其他机器人的相对位置。我们引入了一个特征,即标记系统:它改变了个体如何感知彼此,并且预期会增加行为复杂性。我们的结果显示了适应策略的出现,证明了通过掠食-猎物 dinamics 使用模块机器人来引起 OEE 的可能性。然而,这种出现似乎виси于对行为标准的条件修复 reproduce。Note: Please keep in mind that the translation is not perfect and may not capture all the nuances of the original text.

Machine Learning Data Suitability and Performance Testing Using Fault Injection Testing Framework

  • paper_url: http://arxiv.org/abs/2309.11274
  • repo_url: None
  • paper_authors: Manal Rahal, Bestoun S. Ahmed, Jorgen Samuelsson
    for:This paper aims to address the gap in testing approaches for input data in machine learning (ML) systems, specifically the resilience of ML models to intentionally-triggered data faults.methods:The proposed framework, called FIUL-Data, uses data mutators to explore vulnerabilities of ML systems against data fault injections. The framework is designed with three main ideas: mutators are not random, one mutator is applied at a time, and selected ML models are optimized beforehand.results:The FIUL-Data framework is evaluated using data from analytical chemistry, and the results show that the framework allows for the evaluation of the resilience of ML models. In most experiments, ML models show higher resilience at larger training datasets, and gradient boost performed better than support vector regression in smaller training sets. The mean squared error metric is found to be useful in evaluating the resilience of models due to its higher sensitivity to data mutation.Here is the text in Simplified Chinese:for:这篇论文目标是解决机器学习(ML)系统中输入数据测试方法的差距,具体是测试ML模型对数据fault的抗性。methods:该提议的框架是FIUL-Data,使用数据变换器来探索ML系统对数据fault的敏感性。框架设计了三个主要想法:变换器不是随机的,一个变换器在一次实例时应用,并且选择的ML模型在先前优化。results:FIUL-Data框架在分析化学中使用数据进行评估,结果显示该框架可以评估ML模型的抗性。大多数实验结果表明,ML模型在更大的训练集上显示更高的抗性,并且在较小的训练集中,梯度拟合perform луч于支持向量回归。总的来说, Mean Squared Error 度量有用于评估模型的抗性,因为它对数据变换更敏感。
    Abstract Creating resilient machine learning (ML) systems has become necessary to ensure production-ready ML systems that acquire user confidence seamlessly. The quality of the input data and the model highly influence the successful end-to-end testing in data-sensitive systems. However, the testing approaches of input data are not as systematic and are few compared to model testing. To address this gap, this paper presents the Fault Injection for Undesirable Learning in input Data (FIUL-Data) testing framework that tests the resilience of ML models to multiple intentionally-triggered data faults. Data mutators explore vulnerabilities of ML systems against the effects of different fault injections. The proposed framework is designed based on three main ideas: The mutators are not random; one data mutator is applied at an instance of time, and the selected ML models are optimized beforehand. This paper evaluates the FIUL-Data framework using data from analytical chemistry, comprising retention time measurements of anti-sense oligonucleotide. Empirical evaluation is carried out in a two-step process in which the responses of selected ML models to data mutation are analyzed individually and then compared with each other. The results show that the FIUL-Data framework allows the evaluation of the resilience of ML models. In most experiments cases, ML models show higher resilience at larger training datasets, where gradient boost performed better than support vector regression in smaller training sets. Overall, the mean squared error metric is useful in evaluating the resilience of models due to its higher sensitivity to data mutation.
    摘要 创建可恢复的机器学习(ML)系统已经成为确保生产准备的ML系统获得用户信任的必要手段。输入数据质量和模型对生成端到端测试的成功产生很大影响。然而,输入数据测试的方法并不够系统化,与模型测试相比相对落后。为解决这个差距,本文提出了输入数据中的异常投入测试框架(FIUL-Data),用于测试ML模型对多种意外触发的数据异常的抗性。数据变换器探索了ML系统对各种异常投入的敏感性。该框架基于以下三个主要想法:变换器不是随机的,只有一个变换器在一个时间点上应用,并且选择的ML模型在先前优化。本文通过使用分析化学数据,包括抑制肽的释放时间测量,对FIUL-Data框架进行了实证评估。实验在两步进行,先分别分析选择的ML模型对数据变换的响应,然后对各模型进行比较。结果表明,FIUL-Data框架可以评估ML模型的抗性。大多数实验情况下,ML模型在更大的训练集上显示更高的抗性,其中梯度拟合在小训练集中表现更好。总的来说,平均方差误差度量是评估ML模型抗性的有用指标。

Grounded Complex Task Segmentation for Conversational Assistants

  • paper_url: http://arxiv.org/abs/2309.11271
  • repo_url: https://github.com/rafaelhferreira/grounded_task_segmentation_cta
  • paper_authors: Rafael Ferreira, David Semedo, João Magalhães
  • for: 这 paper 是为了改进 web-based instrucional text,使其更适合 conversational Setting。
  • methods: 该 paper 使用 Transformer-based 架构进行计算模型,以及按照 conversational enario 进行 instrucional 结构标注。
  • results: 经过测试,用户对 step 的 complexity 和 length 有所偏好,并且提出的方法可以改善原始的 web-based instrucional text,提高了 86% 的评价。
    Abstract Following complex instructions in conversational assistants can be quite daunting due to the shorter attention and memory spans when compared to reading the same instructions. Hence, when conversational assistants walk users through the steps of complex tasks, there is a need to structure the task into manageable pieces of information of the right length and complexity. In this paper, we tackle the recipes domain and convert reading structured instructions into conversational structured ones. We annotated the structure of instructions according to a conversational scenario, which provided insights into what is expected in this setting. To computationally model the conversational step's characteristics, we tested various Transformer-based architectures, showing that a token-based approach delivers the best results. A further user study showed that users tend to favor steps of manageable complexity and length, and that the proposed methodology can improve the original web-based instructional text. Specifically, 86% of the evaluated tasks were improved from a conversational suitability point of view.
    摘要 请求中的复杂指令可能会让用户感到困惑,这是因为与阅读相同的指令相比,用户的注意力和记忆 span 更短。因此,当 conversational assistant 通过多个步骤引导用户完成复杂任务时,需要将任务分解成可管理的小块信息,以便用户更好地理解和完成。在这篇论文中,我们将 recipes 领域中的指令结构化为 conversational 结构,并通过对话情境进行标注,从而获得了更深刻的理解。为了计算 conversational 步骤的特点,我们测试了不同的 Transformer 基 architecture,发现 token 基本法取得了最好的结果。进一步的用户研究表明,用户偏好管理 complexity 和 length 的步骤,而我们的方法ologies 可以改善原始的网络上的指令文本。特别是,86% 的评估任务得到了 conversational 适用性的改进。

Sequence-to-Sequence Spanish Pre-trained Language Models

  • paper_url: http://arxiv.org/abs/2309.11259
  • repo_url: https://github.com/vgaraujov/Seq2Seq-Spanish-PLMs
  • paper_authors: Vladimir Araujo, Maria Mihaela Trusca, Rodrigo Tufiño, Marie-Francine Moens
  • for: 这篇论文旨在开发针对西班牙语序列训练的encoder-decoder模型,用于进行文本摘要、重叙和生成问答等序列转换任务。
  • methods: 该论文采用了BERT、RoBERTa和GPT等批处理语言模型的encoder-decoder架构,并对其进行了适应性的预训练,以便在西班牙语文本中进行更好的表现。
  • results: 论文通过对各模型进行了广泛的评估,发现BERT和T5模型在所有评估任务中表现最佳,而BART模型也在某些任务中表现出色。此外,该论文还将所有模型公开发布到研究社区,以促进未来的西班牙语处理研究。
    Abstract In recent years, substantial advancements in pre-trained language models have paved the way for the development of numerous non-English language versions, with a particular focus on encoder-only and decoder-only architectures. While Spanish language models encompassing BERT, RoBERTa, and GPT have exhibited prowess in natural language understanding and generation, there remains a scarcity of encoder-decoder models designed for sequence-to-sequence tasks involving input-output pairs. This paper breaks new ground by introducing the implementation and evaluation of renowned encoder-decoder architectures, exclusively pre-trained on Spanish corpora. Specifically, we present Spanish versions of BART, T5, and BERT2BERT-style models and subject them to a comprehensive assessment across a diverse range of sequence-to-sequence tasks, spanning summarization, rephrasing, and generative question answering. Our findings underscore the competitive performance of all models, with BART and T5 emerging as top performers across all evaluated tasks. As an additional contribution, we have made all models publicly available to the research community, fostering future exploration and development in Spanish language processing.
    摘要 近年来,大规模的预训练语言模型技术得到了广泛应用,特别是针对英语以外语言的研发。虽然西班牙语模型,包括BERT、RoBERTa和GPT,在自然语言理解和生成方面具有卓越表现,但是还缺乏适用于序列-序列任务的encoder-decoder模型。这篇论文创新地介绍了西班牙语encoder-decoder模型的实现和评估,具体来说是在西班牙语 corpus 上预训练的 BART、T5 和 BERT2BERT 样式模型。我们对这些模型进行了广泛的评估,包括概要、重新写和生成问答等序列-序列任务,我们的发现表明所有模型都具有竞争力,BART 和 T5 在所有评估任务中表现出色。此外,我们将所有模型公开发布给研究社区,以促进未来的探索和发展在西班牙语处理领域。

Hierarchical Multi-Agent Reinforcement Learning for Air Combat Maneuvering

  • paper_url: http://arxiv.org/abs/2309.11247
  • repo_url: https://github.com/IDSIA/marl
  • paper_authors: Ardian Selmonaj, Oleg Szehr, Giacomo Del Rio, Alessandro Antonucci, Adrian Schneider, Michael Rüegsegger
  • for: 这个研究旨在提供一个多代理人问题决策框架,以实现精确的空中作战决策。
  • methods: 本研究使用多代理人问题决策框架,分为两个阶层:低层为单位对战斗控制的细节政策,高层为高级指挥官策略,对于任务目标进行决策。低层策略透过增加复杂训练enario和联赛自游戏的方式进行训练,而高层策略则透过已经预训练的低层策略进行训练。
  • results: 这个框架的实验验证表明,这种多代理人问题决策框架具有优化空中作战决策的功能。
    Abstract The application of artificial intelligence to simulate air-to-air combat scenarios is attracting increasing attention. To date the high-dimensional state and action spaces, the high complexity of situation information (such as imperfect and filtered information, stochasticity, incomplete knowledge about mission targets) and the nonlinear flight dynamics pose significant challenges for accurate air combat decision-making. These challenges are exacerbated when multiple heterogeneous agents are involved. We propose a hierarchical multi-agent reinforcement learning framework for air-to-air combat with multiple heterogeneous agents. In our framework, the decision-making process is divided into two stages of abstraction, where heterogeneous low-level policies control the action of individual units, and a high-level commander policy issues macro commands given the overall mission targets. Low-level policies are trained for accurate unit combat control. Their training is organized in a learning curriculum with increasingly complex training scenarios and league-based self-play. The commander policy is trained on mission targets given pre-trained low-level policies. The empirical validation advocates the advantages of our design choices.
    摘要 application of artificial intelligence to simulate air-to-air combat scenarios is attracting increasing attention. To date, the high-dimensional state and action spaces, the high complexity of situation information (such as imperfect and filtered information, stochasticity, incomplete knowledge about mission targets) and the nonlinear flight dynamics pose significant challenges for accurate air combat decision-making. These challenges are exacerbated when multiple heterogeneous agents are involved. We propose a hierarchical multi-agent reinforcement learning framework for air-to-air combat with multiple heterogeneous agents. In our framework, the decision-making process is divided into two stages of abstraction, where heterogeneous low-level policies control the action of individual units, and a high-level commander policy issues macro commands given the overall mission targets. Low-level policies are trained for accurate unit combat control. Their training is organized in a learning curriculum with increasingly complex training scenarios and league-based self-play. The commander policy is trained on mission targets given pre-trained low-level policies. The empirical validation advocates the advantages of our design choices.Here's the translation in Traditional Chinese:运用人工智能模拟空中武器战场情况的应用正在吸引越来越多的注意。到目前为止,高维度的状态和动作空间,高复杂的情况信息(如受损和范围信息、数据满意度、任务目标知识不完整)以及非线性的飞行动力学都对于精准的空中战斗决策带来巨大挑战。当多个不同性的代理人参与时,这些挑战更加严重。我们提出了一个层次多代理人学习框架,用于空中战斗多个不同性代理人。在我们的框架中,决策过程分为两个层次的抽象,其中专门的低层策略控制个别单位的行动,而高层策略根据全局任务目标发出大规模的指令。低层策略在增加复杂的训练enario和联赛自游中进行训练。高层策略则是根据已经预训的低层策略进行训练。实际验证表明了我们的设计选择的优点。

Colour Passing Revisited: Lifted Model Construction with Commutative Factors

  • paper_url: http://arxiv.org/abs/2309.11236
  • repo_url: None
  • paper_authors: Malte Luttermann, Tanya Braun, Ralf Möller, Marcel Gehrke
  • for: 这篇论文目的是提出一种基于Symmetries的升级概率推理方法,以实现可靠地概率推理。
  • methods: 该方法使用了colour passing算法,但是现有的colour passing算法受限于特定的推理算法,并且忽略了因素的 коммутатив性。本文提出了一种基于逻辑变量的修改版colour passing算法,可以独立于特定的推理算法来构建升级表示,同时充分利用因素的 коммутатив性。
  • results: 对比于现有的colour passing算法,本文的方法可以更好地检测Symmetries,从而实现更高的压缩率和更快的在线查询速度。
    Abstract Lifted probabilistic inference exploits symmetries in a probabilistic model to allow for tractable probabilistic inference with respect to domain sizes. To apply lifted inference, a lifted representation has to be obtained, and to do so, the so-called colour passing algorithm is the state of the art. The colour passing algorithm, however, is bound to a specific inference algorithm and we found that it ignores commutativity of factors while constructing a lifted representation. We contribute a modified version of the colour passing algorithm that uses logical variables to construct a lifted representation independent of a specific inference algorithm while at the same time exploiting commutativity of factors during an offline-step. Our proposed algorithm efficiently detects more symmetries than the state of the art and thereby drastically increases compression, yielding significantly faster online query times for probabilistic inference when the resulting model is applied.
    摘要 增强概率推理利用模型中的对称性来实现可行的概率推理,具体来说是通过增强的可行推理算法来实现。为了应用增强推理,需要首先获得增强表示,而现有的颜色传递算法是state of the art的解决方案。然而,这个算法受到特定推理算法的限制,而且忽略了因素的 коммутатив性。我们提出了一种改进的颜色传递算法,使用逻辑变量来构建独立于特定推理算法的增强表示,同时在Offline阶段利用因素的 commutativity 来提高压缩率。我们的提议算法可以更好地检测模型中的对称性,从而导致更快的在线查询时间。

ChatGPT-4 as a Tool for Reviewing Academic Books in Spanish

  • paper_url: http://arxiv.org/abs/2309.11231
  • repo_url: None
  • paper_authors: Jonnathan Berrezueta-Guzman, Laura Malache-Silva, Stephan Krusche
  • For: This study evaluates the potential of ChatGPT-4 as an editing tool for Spanish literary and academic books.* Methods: The study analyzes the features and capabilities of ChatGPT-4 in terms of grammatical correction, stylistic coherence, and linguistic enrichment of texts in Spanish.* Results: ChatGPT-4 is capable of making grammatical and orthographic corrections with high accuracy and in a very short time, but faces challenges in areas such as context sensitivity and interaction with visual content. Collaboration between ChatGPT-4 and human reviewers and editors is a promising strategy for improving efficiency without compromising quality.Here are the three points in Simplified Chinese text:* For: 这项研究评估了OpenAI开发的ChatGPT-4语言模型是否能够用于西班牙文学和学术书籍的编辑。* Methods: 研究分析了ChatGPT-4模型在西班牙文 grammar修正、风格一致性和语言丰富性方面的功能和能力。* Results: ChatGPT-4能够快速和准确地进行语法和拼写修正,但在上下文敏感性和图表和表格交互方面存在挑战。人类编辑和评审者和ChatGPT-4 collaboration 可能是提高效率而无需降低质量的有效策略。
    Abstract This study evaluates the potential of ChatGPT-4, an artificial intelligence language model developed by OpenAI, as an editing tool for Spanish literary and academic books. The need for efficient and accessible reviewing and editing processes in the publishing industry has driven the search for automated solutions. ChatGPT-4, being one of the most advanced language models, offers notable capabilities in text comprehension and generation. In this study, the features and capabilities of ChatGPT-4 are analyzed in terms of grammatical correction, stylistic coherence, and linguistic enrichment of texts in Spanish. Tests were conducted with 100 literary and academic texts, where the edits made by ChatGPT-4 were compared to those made by expert human reviewers and editors. The results show that while ChatGPT-4 is capable of making grammatical and orthographic corrections with high accuracy and in a very short time, it still faces challenges in areas such as context sensitivity, bibliometric analysis, deep contextual understanding, and interaction with visual content like graphs and tables. However, it is observed that collaboration between ChatGPT-4 and human reviewers and editors can be a promising strategy for improving efficiency without compromising quality. Furthermore, the authors consider that ChatGPT-4 represents a valuable tool in the editing process, but its use should be complementary to the work of human editors to ensure high-caliber editing in Spanish literary and academic books.
    摘要 Tests were conducted on 100 literary and academic texts, comparing the edits made by ChatGPT-4 to those made by expert human reviewers and editors. The results show that ChatGPT-4 is capable of making grammatical and orthographic corrections with high accuracy and in a very short time. However, it still struggles with context sensitivity, bibliometric analysis, deep contextual understanding, and interaction with visual content like graphs and tables.Despite these limitations, collaboration between ChatGPT-4 and human reviewers and editors is a promising strategy for improving efficiency without compromising quality. The authors conclude that ChatGPT-4 represents a valuable tool in the editing process, but its use should be complementary to the work of human editors to ensure high-caliber editing in Spanish literary and academic books.

Leveraging Diversity in Online Interactions

  • paper_url: http://arxiv.org/abs/2309.11224
  • repo_url: None
  • paper_authors: Nardine Osman, Bruno Rosell i Gui, Carles Sierra
  • for: 本研究旨在通过在线连接人们,帮助他们解决日常问题。
  • methods: 本研究使用了声明性规范来mediate在线交互,特别是在连接人们时利用多样性。
  • results: 在不同的大学站点上进行的试验显示,选择的profile多样性得到了相对成功,并得到了用户满意的评价。
    Abstract This paper addresses the issue of connecting people online to help them find support with their day-to-day problems. We make use of declarative norms for mediating online interactions, and we specifically focus on the issue of leveraging diversity when connecting people. We run pilots at different university sites, and the results show relative success in the diversity of the selected profiles, backed by high user satisfaction.
    摘要 这篇论文关注在线连接人们,以帮助他们解决日常问题。我们利用声明性规范来调控在线交互,特别是利用多样性连接人们。我们在不同的大学站点进行了试点,结果表明在选择的profile中的多样性得到了相对成功,并得到了用户满意的评价。Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.

Retrieve-Rewrite-Answer: A KG-to-Text Enhanced LLMs Framework for Knowledge Graph Question Answering

  • paper_url: http://arxiv.org/abs/2309.11206
  • repo_url: None
  • paper_authors: Yike Wu, Nan Hu, Sheng Bi, Guilin Qi, Jie Ren, Anhuan Xie, Wei Song
  • for: 提高知识GraphQuestionAnswering(KGQA)任务的表现,解决rich world knowledge的问题。
  • methods: 提出了一种Answer-sensitive KG-to-Text方法,将知识Graph(KG)知识转化成文本表示,以便与语言模型(LLMs)集成。
  • results: 实验表明,提出的KG-to-Text增强的LLMs框架在KGQA任务上的答案准确率和知识声明的有用性都高于之前的KG-加强LLMs方法。
    Abstract Despite their competitive performance on knowledge-intensive tasks, large language models (LLMs) still have limitations in memorizing all world knowledge especially long tail knowledge. In this paper, we study the KG-augmented language model approach for solving the knowledge graph question answering (KGQA) task that requires rich world knowledge. Existing work has shown that retrieving KG knowledge to enhance LLMs prompting can significantly improve LLMs performance in KGQA. However, their approaches lack a well-formed verbalization of KG knowledge, i.e., they ignore the gap between KG representations and textual representations. To this end, we propose an answer-sensitive KG-to-Text approach that can transform KG knowledge into well-textualized statements most informative for KGQA. Based on this approach, we propose a KG-to-Text enhanced LLMs framework for solving the KGQA task. Experiments on several KGQA benchmarks show that the proposed KG-to-Text augmented LLMs approach outperforms previous KG-augmented LLMs approaches regarding answer accuracy and usefulness of knowledge statements.
    摘要 尽管大语言模型(LLMs)在知识密集任务上表现竞争性强,但它们仍有吸收全球知识的限制,特别是长尾知识。在这篇论文中,我们研究了将知识图(KG)扩展到语言模型(LMs)的方法,以解决需要丰富世界知识的问题 answering(KGQA)任务。现有的研究表明,使用KG知识来提高LLMs的提问可以显著提高LLMs在KGQA任务上的表现。然而,现有的方法忽略了KG表示和文本表示之间的差异,即KG知识的形式化表述。为了解决这问题,我们提出了一种答案相关的KG知识转换方法,可以将KG知识转换成最有用的文本表述,以便于KGQA任务。基于这种方法,我们提出了一种增强LLMs的KG-to-Text框架,用于解决KGQA任务。实验表明,我们的方法在多个KGQA bencmark上显著提高了答案准确率和知识声明的用用性。

Using Artificial Intelligence for the Automation of Knitting Patterns

  • paper_url: http://arxiv.org/abs/2309.11202
  • repo_url: None
  • paper_authors: Uduak Uboh
  • for: 这个研究是为了判断使用自动化系统来分类锦纹。
  • methods: 该研究使用了数据扩展和传输学习技术,采用了inception ResNet-V2作为特征提取和分类算法。
  • results: 模型的评估结果显示了高的模型精度、精度、回归率和F1分数,而且大多数类的AUC分数在(0.7-0.9)的范围内。
    Abstract Knitting patterns are a crucial component in the creation and design of knitted materials. Traditionally, these patterns were taught informally, but thanks to advancements in technology, anyone interested in knitting can use the patterns as a guide to start knitting. Perhaps because knitting is mostly a hobby, with the exception of industrial manufacturing utilising specialised knitting machines, the use of Al in knitting is less widespread than its application in other fields. However, it is important to determine whether knitted pattern classification using an automated system is viable. In order to recognise and classify knitting patterns. Using data augmentation and a transfer learning technique, this study proposes a deep learning model. The Inception ResNet-V2 is the main feature extraction and classification algorithm used in the model. Metrics like accuracy, logarithmic loss, F1-score, precision, and recall score were used to evaluate the model. The model evaluation's findings demonstrate high model accuracy, precision, recall, and F1 score. In addition, the AUC score for majority of the classes was in the range (0.7-0.9). A comparative analysis was done using other pretrained models and a ResNet-50 model with transfer learning and the proposed model evaluation results surpassed all others. The major limitation for this project is time, as with more time, there might have been better accuracy over a larger number of epochs.
    摘要 针脊图案是创作和设计针脊材料的关键组件。在过去,这些图案通常是通过口述传授的,但现在随着技术的进步,任何感兴趣的人都可以使用这些图案作为指南开始针脊。由于针脊主要是一项兴趣爱好,除了特殊针脊机器在工业生产中使用外,使用人工智能(AI)在针脊中的应用范围相对较少。然而,是否可以使用自动化系统来分类针脊图案是一个重要的问题。为了识别和分类针脊图案,这个研究提出了一个深度学习模型。使用Inception ResNet-V2算法作为主要特征提取和分类算法。对模型的评估结果,发现模型的准确率、精度、准确率、和F1分数均达到了高水平。此外,大多数类别的AUC分数都在(0.7-0.9)之间。与其他预训练模型和ResNet-50模型进行比较分析后,这个研究的评估结果超过了其他所有。该项目的主要限制是时间,如果有更多的时间,可能会在更多的轮次上得到更高的准确率。

When to Trust AI: Advances and Challenges for Certification of Neural Networks

  • paper_url: http://arxiv.org/abs/2309.11196
  • repo_url: None
  • paper_authors: Marta Kwiatkowska, Xiyue Zhang
  • for: 本文旨在探讨如何确保人工智能(AI)决策的安全性,以便在实际应用中使用AI技术。
  • methods: 本文使用了证明和解释性技术来确保AI决策的安全性。
  • results: 本文提出了未来的挑战和研究方向,以确保AI决策的安全性和可靠性。
    Abstract Artificial intelligence (AI) has been advancing at a fast pace and it is now poised for deployment in a wide range of applications, such as autonomous systems, medical diagnosis and natural language processing. Early adoption of AI technology for real-world applications has not been without problems, particularly for neural networks, which may be unstable and susceptible to adversarial examples. In the longer term, appropriate safety assurance techniques need to be developed to reduce potential harm due to avoidable system failures and ensure trustworthiness. Focusing on certification and explainability, this paper provides an overview of techniques that have been developed to ensure safety of AI decisions and discusses future challenges.
    摘要 人工智能(AI)在过去几年中得到了快速发展,现在它已经准备好在各种应用中使用,如自主系统、医疗诊断和自然语言处理。虽然在实际应用中早期采用AI技术有一些问题,特别是神经网络可能存在不稳定性和可靠性问题,以及可能受到敌意的示例的影响。在长期来看,我们需要开发适当的安全保障技术,以降低可预防的系统失效的可能性,并确保AI决策的可靠性。本文将关注证书和解释性,提供了安全AI决策的技术ensure的概述,并讨论未来的挑战。

Long-tail Augmented Graph Contrastive Learning for Recommendation

  • paper_url: http://arxiv.org/abs/2309.11177
  • repo_url: https://github.com/im0qianqian/LAGCL
  • paper_authors: Qian Zhao, Zhengwei Wu, Zhiqiang Zhang, Jun Zhou
  • for: 提高推荐系统中Graph Convolutional Networks (GCNs)的性能, Address the data sparsity issue in real-world scenarios.
  • methods: 使用contrastive learning方法,并 introduce learnable long-tail augmentation approach to enhance tail nodes, generate contrastive views based on the resulting augmented graph.
  • results: 对三个 benchmark dataset进行了extensive experiments,demonstrate the significant improvement in performance of our model over the state-of-the-arts,further analyses demonstrate the uniformity of learned representations and the superiority of LAGCL on long-tail performance.
    Abstract Graph Convolutional Networks (GCNs) has demonstrated promising results for recommender systems, as they can effectively leverage high-order relationship. However, these methods usually encounter data sparsity issue in real-world scenarios. To address this issue, GCN-based recommendation methods employ contrastive learning to introduce self-supervised signals. Despite their effectiveness, these methods lack consideration of the significant degree disparity between head and tail nodes. This can lead to non-uniform representation distribution, which is a crucial factor for the performance of contrastive learning methods. To tackle the above issue, we propose a novel Long-tail Augmented Graph Contrastive Learning (LAGCL) method for recommendation. Specifically, we introduce a learnable long-tail augmentation approach to enhance tail nodes by supplementing predicted neighbor information, and generate contrastive views based on the resulting augmented graph. To make the data augmentation schema learnable, we design an auto drop module to generate pseudo-tail nodes from head nodes and a knowledge transfer module to reconstruct the head nodes from pseudo-tail nodes. Additionally, we employ generative adversarial networks to ensure that the distribution of the generated tail/head nodes matches that of the original tail/head nodes. Extensive experiments conducted on three benchmark datasets demonstrate the significant improvement in performance of our model over the state-of-the-arts. Further analyses demonstrate the uniformity of learned representations and the superiority of LAGCL on long-tail performance. Code is publicly available at https://github.com/im0qianqian/LAGCL
    摘要 图像 convolutional networks (GCNs) 在推荐系统中表现出色,可以有效利用高阶关系。然而,这些方法通常在实际场景中遇到数据稀缺问题。为解决这个问题,GCN 基于的推荐方法使用对照学习引入自我超vised信号。尽管它们有效,但是它们忽视了主要度差的问题,这可能导致非均衡的表示分布,这是对对照学习方法的表现非常重要的因素。为解决这个问题,我们提出了一种长尾增强图像对照学习(LAGCL)方法。具体来说,我们引入可学习的长尾增强approach,通过预测邻居信息来增强尾节点,并基于所得到的扩展图像生成对照视图。为使数据增强 schema 学习可能,我们设计了自动Drop模块,将头节点转化为 pseudo-tail 节点,并设计了知识传递模块,将 pseudo-tail 节点还原为头节点。此外,我们使用生成对抗网络,确保生成的尾/头节点的分布与原始的尾/头节点的分布一致。我们在三个标准数据集上进行了广泛的实验,并证明了我们的模型在现状上的显著改进。进一步的分析也表明了我们学习的表示的均匀性和我们对长尾性能的优势。代码可以在https://github.com/im0qianqian/LAGCL 中找到。

Are Large Language Models Really Robust to Word-Level Perturbations?

  • paper_url: http://arxiv.org/abs/2309.11166
  • repo_url: https://github.com/Harry-mic/TREval
  • paper_authors: Haoyu Wang, Guozheng Ma, Cong Yu, Ning Gui, Linrui Zhang, Zhiqi Huang, Suwei Ma, Yongzhe Chang, Sen Zhang, Li Shen, Xueqian Wang, Peilin Zhao, Dacheng Tao
  • for: 本研究旨在提供一种用于评估大语言模型(LLM)的有用性和可靠性的新方法。
  • methods: 本研究提出了一种基于预训练奖励模型的评估方法,称为TREval,用于评估LLM的可靠性,特别是在面对更加困难的开放问题时。
  • results: 实验结果表明,TREval可以准确地评估LLM的可靠性,并且发现LLM经常受到单词水平的干扰,这种干扰在日常语言使用中很常见。另外,研究发现,在进行练习和强化训练后,LLM的可靠性往往会下降。
    Abstract The swift advancement in the scale and capabilities of Large Language Models (LLMs) positions them as promising tools for a variety of downstream tasks. In addition to the pursuit of better performance and the avoidance of violent feedback on a certain prompt, to ensure the responsibility of the LLM, much attention is drawn to the robustness of LLMs. However, existing evaluation methods mostly rely on traditional question answering datasets with predefined supervised labels, which do not align with the superior generation capabilities of contemporary LLMs. To address this issue, we propose a novel rational evaluation approach that leverages pre-trained reward models as diagnostic tools to evaluate the robustness of LLMs, which we refer to as the Reward Model for Reasonable Robustness Evaluation (TREvaL). Our extensive empirical experiments have demonstrated that TREval provides an accurate method for evaluating the robustness of an LLM, especially when faced with more challenging open questions. Furthermore, our results demonstrate that LLMs frequently exhibit vulnerability to word-level perturbations, which are commonplace in daily language usage. Notably, we were surprised to discover that robustness tends to decrease as fine-tuning (SFT and RLHF) is conducted. The code of TREval is available in https://github.com/Harry-mic/TREval.
    摘要 Large Language Models (LLMs) 的快速发展和能力提高,使其成为许多下游任务的优秀工具。除了提高性能和避免某些提示导致的暴力反馈外,为了确保 LLM 的责任,也引起了一些关注。现有的评估方法主要基于已经定义的传统问答数据集,这些数据集并不符合当代 LLM 的优秀生成能力。为解决这个问题,我们提出了一种新的合理评估方法,利用预训练的奖励模型作为诊断工具来评估 LLM 的 robustness,我们称之为 TREvaL。我们的广泛的实验证明了 TREval 能够准确地评估 LLM 的 robustness,特别是面对更加困难的开放问题。此外,我们的结果表明,LLM часто会受到单词水平的扰动,这些扰动在日常语言使用中很常见。意外地,我们发现,在 fine-tuning (SFT 和 RLHF) 过程中,LLM 的 Robustness 往往减退。TREval 的代码可以在 GitHub 上找到:https://github.com/Harry-mic/TREval。

ProtoExplorer: Interpretable Forensic Analysis of Deepfake Videos using Prototype Exploration and Refinement

  • paper_url: http://arxiv.org/abs/2309.11155
  • repo_url: None
  • paper_authors: Merel de Leeuw den Bouter, Javier Lloret Pardo, Zeno Geradts, Marcel Worring
  • for: 这个研究旨在提高深度学习模型的可读性,尤其是在高度竞争的应用场景下。
  • methods: 这篇文章提出了一个可视化分析过程模型,并基于这个模型提出了一个名为ProtoExplorer的可视化分析系统,用于探索和修改基于原型的伪动态检测模型。
  • results: 这篇文章透过对实际应用场景进行评估,确认了这个方法的可行性和有效性。
    Abstract In high-stakes settings, Machine Learning models that can provide predictions that are interpretable for humans are crucial. This is even more true with the advent of complex deep learning based models with a huge number of tunable parameters. Recently, prototype-based methods have emerged as a promising approach to make deep learning interpretable. We particularly focus on the analysis of deepfake videos in a forensics context. Although prototype-based methods have been introduced for the detection of deepfake videos, their use in real-world scenarios still presents major challenges, in that prototypes tend to be overly similar and interpretability varies between prototypes. This paper proposes a Visual Analytics process model for prototype learning, and, based on this, presents ProtoExplorer, a Visual Analytics system for the exploration and refinement of prototype-based deepfake detection models. ProtoExplorer offers tools for visualizing and temporally filtering prototype-based predictions when working with video data. It disentangles the complexity of working with spatio-temporal prototypes, facilitating their visualization. It further enables the refinement of models by interactively deleting and replacing prototypes with the aim to achieve more interpretable and less biased predictions while preserving detection accuracy. The system was designed with forensic experts and evaluated in a number of rounds based on both open-ended think aloud evaluation and interviews. These sessions have confirmed the strength of our prototype based exploration of deepfake videos while they provided the feedback needed to continuously improve the system.
    摘要 高度的场景中,可以提供人类可解释的机器学习模型是非常重要的。这种情况更加真实,特别是在复杂的深度学习模型中,其中有很多可调参数。最近,原型基方法在使得深度学习可解释方面表现出了扎实的抑制力。我们特别关注深度假影像在法医方面的分析。虽然原型基方法已经应用于深度假影像的检测,但在实际应用中仍然存在主要挑战,即原型往往相似,解释性 между原型异常不一致。这篇论文提出了一种可见分析过程模型,并基于这种模型提出了ProtoExplorer,一种可见分析系统,用于深度假影像检测模型的探索和细化。ProtoExplorer提供了视觉分析和视频数据中的时间滤波功能,可以识别和分析深度假影像。它还可以通过交互删除和替换原型来实现更加可解释和不偏执的预测,同时保持检测精度。系统针对法医专家进行了多轮评估,包括开放式思维回答评估和面试。这些评估过程确认了我们的原型基 explore深度假影像的优势,同时提供了需要不断改进系统的反馈。

CoT-BERT: Enhancing Unsupervised Sentence Representation through Chain-of-Thought

  • paper_url: http://arxiv.org/abs/2309.11143
  • repo_url: https://github.com/ZBWpro/CoT-BERT
  • paper_authors: Bowen Zhang, Kehua Chang, Chunping Li
  • for: 提高不supervised sentence representation learning的性能,尝试使用链条思维来解锁预训练模型中的潜在能力。
  • methods: 提出了一种两阶段方法,首先使用理解阶段对输入句子进行理解,然后使用摘要阶段对输入句子进行摘要,最后使用摘要阶段的输出作为输入句子的vector化表示。同时,对冲突学习损失函数和模板干扰技术进行精细调整,以提高提示工程的性能。
  • results: 对多个 robust baseline进行了严格的实验证明,发现CoT-BERT可以在不需要其他文本表示模型或外部数据库的情况下,与supervised sentence representation learning具有相同或更高的性能。
    Abstract Unsupervised sentence representation learning aims to transform input sentences into fixed-length vectors enriched with intricate semantic information while obviating the reliance on labeled data. Recent progress within this field, propelled by contrastive learning and prompt engineering, has significantly bridged the gap between unsupervised and supervised strategies. Nonetheless, the potential utilization of Chain-of-Thought, remains largely untapped within this trajectory. To unlock latent capabilities within pre-trained models, such as BERT, we propose a two-stage approach for sentence representation: comprehension and summarization. Subsequently, the output of the latter phase is harnessed as the vectorized representation of the input sentence. For further performance enhancement, we meticulously refine both the contrastive learning loss function and the template denoising technique for prompt engineering. Rigorous experimentation substantiates our method, CoT-BERT, transcending a suite of robust baselines without necessitating other text representation models or external databases.
    摘要 不监督句子表示学习目标是将输入句子转化为固定长度的向量,具有细致的 semantics信息,而不需要标注数据。在这个领域,最近的进展,受到对短文本检测和提取技术的影响,已经大幅度减少了不监督和监督方法之间的差距。然而,链式思维的潜在应用,在这个轨迹上仍然尚未得到充分利用。为了解锁预训练模型中的强化特性,我们提出了一种两阶段方法:理解和概要。然后,后一阶段的输出被用作输入句子的向量表示。为了进一步提高性能,我们仔细修改了对短文本检测和提取技术的权重,以及模板干扰技术。我们的方法,CoT-BERT,在一系列强大的基线上进行了严格的实验,并不需要其他文本表示模型或外部数据库。

Contrastive Pseudo Learning for Open-World DeepFake Attribution

  • paper_url: http://arxiv.org/abs/2309.11132
  • repo_url: https://github.com/TencentYoutuResearch/OpenWorld-DeepFakeAttribution
  • paper_authors: Zhimin Sun, Shen Chen, Taiping Yao, Bangjie Yin, Ran Yi, Shouhong Ding, Lizhuang Ma
  • for: 评估深伪检测领域中匿名攻击的隐藏迹象,以推动相关前沿研究。
  • methods: 提出一个新的评估指标集合called Open-World DeepFake Attribution(OW-DFA),并提出一种基于对比学习的novel框架 named Contrastive Pseudo Learning(CPL)。
  • results: 经验表明,我们提出的方法在OW-DFA任务上具有优秀的表现,并且能够增强深伪检测领域的安全性。
    Abstract The challenge in sourcing attribution for forgery faces has gained widespread attention due to the rapid development of generative techniques. While many recent works have taken essential steps on GAN-generated faces, more threatening attacks related to identity swapping or expression transferring are still overlooked. And the forgery traces hidden in unknown attacks from the open-world unlabeled faces still remain under-explored. To push the related frontier research, we introduce a new benchmark called Open-World DeepFake Attribution (OW-DFA), which aims to evaluate attribution performance against various types of fake faces under open-world scenarios. Meanwhile, we propose a novel framework named Contrastive Pseudo Learning (CPL) for the OW-DFA task through 1) introducing a Global-Local Voting module to guide the feature alignment of forged faces with different manipulated regions, 2) designing a Confidence-based Soft Pseudo-label strategy to mitigate the pseudo-noise caused by similar methods in unlabeled set. In addition, we extend the CPL framework with a multi-stage paradigm that leverages pre-train technique and iterative learning to further enhance traceability performance. Extensive experiments verify the superiority of our proposed method on the OW-DFA and also demonstrate the interpretability of deepfake attribution task and its impact on improving the security of deepfake detection area.
    摘要 “对于伪造的挑战,随着生成技术的快速发展,已经受到了广泛的关注。然而,许多最近的研究仅对生成器生成的面部进行了重要的步骤,尚未充分处理隐藏在未知攻击中的伪造迹象。为了推进相关的前沿研究,我们提出了一个新的 bencmark 叫做 Open-World DeepFake Attribution(OW-DFA),旨在评估对不同类型的伪造面部进行权重评估。同时,我们提出了一个名为 Contrastive Pseudo Learning(CPL)的新框架,通过以下两个方法来解决问题:1)引入全球-本地投票模组,以帮助伪造面部的不同权重区域进行整合;2)设计一种基于信任的软定式标签策略,以减少 pseudo-noise 对不明文件集的影响。此外,我们将 CPL 框架扩展为多阶段模型,利用预训技术和迭代学习来进一步增强 traceability 性能。实验结果显示了我们的提案方法在 OW-DFA 中的超越性和深度伪造检测领域的解释性。”

Language-Oriented Communication with Semantic Coding and Knowledge Distillation for Text-to-Image Generation

  • paper_url: http://arxiv.org/abs/2309.11127
  • repo_url: None
  • paper_authors: Hyelin Nam, Jihong Park, Jinho Choi, Mehdi Bennis, Seong-Lyun Kim
  • for: 提出了一种语言响应式 semantic communication(LSC)框架,用于机器人与人类之间的语言交互。
  • methods: 提出了三种新算法:1)Semantic Source Coding(SSC),压缩文本提示中的主要词语,保持提示的语法结构和上下文;2)Semantic Channel Coding(SCC),使用长语言同义词代替主要词语,提高免错性;3)Semantic Knowledge Distillation(SKD),通过在学习Listener的语言风格上下文中进行启发式学习,生成适应Listener的提示。
  • results: 在进行文本生成到图像任务中,提议的方法可以实现更高的感知相似性,并降低通信频率,同时提高干扰通信频率下的Robustness。
    Abstract By integrating recent advances in large language models (LLMs) and generative models into the emerging semantic communication (SC) paradigm, in this article we put forward to a novel framework of language-oriented semantic communication (LSC). In LSC, machines communicate using human language messages that can be interpreted and manipulated via natural language processing (NLP) techniques for SC efficiency. To demonstrate LSC's potential, we introduce three innovative algorithms: 1) semantic source coding (SSC) which compresses a text prompt into its key head words capturing the prompt's syntactic essence while maintaining their appearance order to keep the prompt's context; 2) semantic channel coding (SCC) that improves robustness against errors by substituting head words with their lenghthier synonyms; and 3) semantic knowledge distillation (SKD) that produces listener-customized prompts via in-context learning the listener's language style. In a communication task for progressive text-to-image generation, the proposed methods achieve higher perceptual similarities with fewer transmissions while enhancing robustness in noisy communication channels.
    摘要 通过将最新的大语言模型(LLM)和生成模型与发展的语义通信(SC) paradigm结合起来,在本文我们提出了一种新的语言启发型通信(LSC)框架。在LSC中,机器通过使用人类语言消息进行通信,这些消息可以通过自然语言处理(NLP)技术进行解释和修改,以提高SC的效率。为了证明LSC的潜力,我们提出了三种新算法:1) semanticsource coding(SSC),它压缩文本提示到其主要头语言,保留提示的语法结构和上下文;2) semantics channel coding(SCC),它通过将主要头语言替换为其更长的同义词,提高了对错误的Robustness;3) semantics knowledge distillation(SKD),它通过在上下文学习收者的语言风格,生成适合收者的启发式文本。在一个进步文本到图像生成任务中,我们的提案方法可以实现更高的感知相似性,同时减少传输量,并在噪声通信频道中提高了Robustness。

  • paper_url: http://arxiv.org/abs/2309.11528
  • repo_url: None
  • paper_authors: Jie Wang, Hanzhu Chen, Qitan Lv, Zhihao Shi, Jiajun Chen, Huarui He, Hongtao Xie, Yongdong Zhang, Feng Wu
  • for: 提高知识图的完整性和可靠性,使其能够在无需知道实体的情况下预测链接关系。
  • methods: 提出了一种新的子图基于方法TACO,利用逻辑相关性between关系来模型图STRUCTURE。TACO方法包括 seven种topological pattern,并通过关系相关网络(RCN)来学习每种pattern的重要性。
  • results: 对比 existed state-of-the-art方法,TACO方法在预测链接关系任务中表现出了superior的性能。
    Abstract Inductive link prediction -- where entities during training and inference stages can be different -- has shown great potential for completing evolving knowledge graphs in an entity-independent manner. Many popular methods mainly focus on modeling graph-level features, while the edge-level interactions -- especially the semantic correlations between relations -- have been less explored. However, we notice a desirable property of semantic correlations between relations is that they are inherently edge-level and entity-independent. This implies the great potential of the semantic correlations for the entity-independent inductive link prediction task. Inspired by this observation, we propose a novel subgraph-based method, namely TACO, to model Topology-Aware COrrelations between relations that are highly correlated to their topological structures within subgraphs. Specifically, we prove that semantic correlations between any two relations can be categorized into seven topological patterns, and then proposes Relational Correlation Network (RCN) to learn the importance of each pattern. To further exploit the potential of RCN, we propose Complete Common Neighbor induced subgraph that can effectively preserve complete topological patterns within the subgraph. Extensive experiments demonstrate that TACO effectively unifies the graph-level information and edge-level interactions to jointly perform reasoning, leading to a superior performance over existing state-of-the-art methods for the inductive link prediction task.
    摘要 依"\induction link prediction" -- 在训练和推理阶段之间的实体可以不同 -- 已经展现出了完善 evolving knowledge graphs 的巨大潜力。许多受欢迎的方法主要关注图级特征,而图级交互 -- 特别是关系之间的semantic correlation -- 则得到了更少的关注。然而,我们注意到了semantic correlation between relations 的一个愉悦性质,即它们是自然的edge-level和实体独立的。这意味着semantic correlation between relations 具有潜在的很大潜力 для实体独立的 inductive link prediction 任务。针对这一观察,我们提出了一种新的子图基于方法,即 TACO,用于模型 topology-aware COrrelations between relations (TACO)。具体来说,我们证明了任意两个关系的semantic correlation可以被分类为七种 topological pattern,并提出了 Relational Correlation Network (RCN) 来学习每种pattern的重要性。为了更好地利用 RCn 的潜力,我们提出了 Complete Common Neighbor induced subgraph,可以有效地保留完整的 topological patterns within the subgraph。我们的实验表明,TACO 能够具有图级信息和边级交互的整合,以jointly perform reasoning,从而对 inductive link prediction 任务 дости得更高的性能。

TrueLearn: A Python Library for Personalised Informational Recommendations with (Implicit) Feedback

  • paper_url: http://arxiv.org/abs/2309.11527
  • repo_url: None
  • paper_authors: Yuxiang Qiu, Karim Djemili, Denis Elezi, Aaneel Shalman, María Pérez-Ortiz, Sahan Bulathwela
  • for: 这篇论文是为了介绍TrueLearn Python库,这是一个基于在线学习 bayesian模型的教育(或更广泛的信息)推荐系统的建构。
  • methods: 这个家族模型采用了”开放学习”概念,使用人类可理解的用户表示。为了提高可读性和让用户控制自己的模型,TrueLearn库还包含了不同的表示方式,可以帮助用户视觉化自己的学习者模型。
  • results: 论文附录了一个已经公布的隐式反馈教育数据集,并提供了评价指标来衡量模型的性能。TrueLearn库的广泛的文档和代码示例使得机器学习开发者和教育数据挖掘和学习分析专家可以很容易地使用这个库。
    Abstract This work describes the TrueLearn Python library, which contains a family of online learning Bayesian models for building educational (or more generally, informational) recommendation systems. This family of models was designed following the "open learner" concept, using humanly-intuitive user representations. For the sake of interpretability and putting the user in control, the TrueLearn library also contains different representations to help end-users visualise the learner models, which may in the future facilitate user interaction with their own models. Together with the library, we include a previously publicly released implicit feedback educational dataset with evaluation metrics to measure the performance of the models. The extensive documentation and coding examples make the library highly accessible to both machine learning developers and educational data mining and learning analytic practitioners. The library and the support documentation with examples are available at https://truelearn.readthedocs.io/en/latest.
    摘要 这个工作描述了TrueLearn Python库,该库包含一家在线学习 bayesian 模型,用于建立教育(或更广泛地说,信息)推荐系统。这家模型遵循“开放学习”概念,使用人类可理解的用户表示。为了提高可解性和让用户控制,TrueLearn 库还包含了不同的表示,帮助结束用户可视化学习者模型,以便未来与自己的模型进行交互。此外,我们还提供了在线学习教育数据集,以便评估模型的性能。TrueLearn 库的文档和代码示例使得机器学习开发者和教育数据挖掘和学习分析专业人士可以轻松地使用。库和支持文档,以及示例可以在 上获取。

AttentionMix: Data augmentation method that relies on BERT attention mechanism

  • paper_url: http://arxiv.org/abs/2309.11104
  • repo_url: None
  • paper_authors: Dominik Lewy, Jacek Mańdziuk
  • for: 这 paper 是关于如何在自然语言处理(NLP)领域中使用混合方法进行数据增强的研究。
  • methods: 这 paper 使用了一种新的混合方法 called AttentionMix,它基于注意力机制。这种方法可以应用于任何注意力基于模型。
  • results: 在三个标准情感分类 dataset 上测试,AttentionMix 都超过了两种 Mixup 机制的参考方法以及vanilla BERT 方法。结果表明,注意力信息可以有效地用于 NLP 领域中的数据增强。
    Abstract The Mixup method has proven to be a powerful data augmentation technique in Computer Vision, with many successors that perform image mixing in a guided manner. One of the interesting research directions is transferring the underlying Mixup idea to other domains, e.g. Natural Language Processing (NLP). Even though there already exist several methods that apply Mixup to textual data, there is still room for new, improved approaches. In this work, we introduce AttentionMix, a novel mixing method that relies on attention-based information. While the paper focuses on the BERT attention mechanism, the proposed approach can be applied to generally any attention-based model. AttentionMix is evaluated on 3 standard sentiment classification datasets and in all three cases outperforms two benchmark approaches that utilize Mixup mechanism, as well as the vanilla BERT method. The results confirm that the attention-based information can be effectively used for data augmentation in the NLP domain.
    摘要 《混合方法》在计算机视觉领域已经证明是一种强大的数据增强技术,有许多后继者在指导下进行图像混合。一个有趣的研究方向是将基于混合的想法传递到其他领域,如自然语言处理(NLP)。虽然现有一些应用混合到文本数据的方法,但还是有很多空间 для新的、改进的方法。在这项工作中,我们介绍了一种新的混合方法,即关注混合(AttentionMix)。这种方法基于关注信息,而paper中关注BERT的注意机制。AttentionMix可以应用于任何关注基于模型。我们在3个标准情感分类dataset上进行评估,并在所有3个案例中超过了两个参考方法和vanilla BERT方法。结果表明,关注信息可以有效地用于NLP领域中的数据增强。

A New Interpretable Neural Network-Based Rule Model for Healthcare Decision Making

  • paper_url: http://arxiv.org/abs/2309.11101
  • repo_url: None
  • paper_authors: Adrien Benamira, Tristan Guerand, Thomas Peyrin
  • for: 本研究旨在提出一种神经网络框架,即 $\textit{Truth Table rules}$(TT-rules),该框架结合神经网络的高性能和规则型模型的全面和准确解释性质。
  • methods: TT-rules 基于 $\textit{Truth Table nets}$(TTnet),一种初始为形式验证而开发的深度神经网络家族。通过从训练过程中提取全面和准确的规则集 $\mathcal{R}$,以便使得 TT-rules 模型能够具备全面和准确的解释性。
  • results: 我们对健康应用场景中的数据进行评估,并与现有的解释性方法进行比较。结果表明,TT-rules 能够达到与其他解释性方法相当或更高的性能,并且在大型表格数据集上进行适应也是可能的。特别是,TT-rules 成为了首个能够适应大型表格数据集,包括两个真实的 DNA 数据集,每个数据集具有超过 20K 的特征的解释性模型。
    Abstract In healthcare applications, understanding how machine/deep learning models make decisions is crucial. In this study, we introduce a neural network framework, $\textit{Truth Table rules}$ (TT-rules), that combines the global and exact interpretability properties of rule-based models with the high performance of deep neural networks. TT-rules is built upon $\textit{Truth Table nets}$ (TTnet), a family of deep neural networks initially developed for formal verification. By extracting the necessary and sufficient rules $\mathcal{R}$ from the trained TTnet model (global interpretability) to yield the same output as the TTnet (exact interpretability), TT-rules effectively transforms the neural network into a rule-based model. This rule-based model supports binary classification, multi-label classification, and regression tasks for small to large tabular datasets. After outlining the framework, we evaluate TT-rules' performance on healthcare applications and compare it to state-of-the-art rule-based methods. Our results demonstrate that TT-rules achieves equal or higher performance compared to other interpretable methods. Notably, TT-rules presents the first accurate rule-based model capable of fitting large tabular datasets, including two real-life DNA datasets with over 20K features.
    摘要 在医疗应用中,理解机器学习/深度学习模型的决策方法是非常重要的。在这项研究中,我们介绍了一种神经网络框架,称为“真实表格规则”(TT-rules),这种框架结合了神经网络的高性能和规则型模型的全面和准确解释性质。TT-rules基于一种名为“真实表格网络”(TTnet)的深度神经网络,该网络最初是为了正式验证而开发的。通过从训练过程中提取出神经网络模型中的必要和充分规则(global interpretability),并将这些规则转换成可以准确地预测神经网络输出的规则型模型(exact interpretability),TT-rules可以将神经网络转换成一种规则型模型。这种规则型模型支持二分类、多标签分类和回归任务,适用于小至大的表格数据集。在这项研究中,我们介绍了TT-rules的框架,并对其性能进行了健康应用的评估,并与当前的可解释方法进行了比较。我们的结果表明,TT-rules可以与其他可解释方法匹配或超越其性能。尤其是TT-rules是首个能够适用于大型表格数据集的准确规则型模型,包括两个实际的DNA数据集,每个数据集有超过20K的特征。

Likelihood-based Sensor Calibration for Expert-Supported Distributed Learning Algorithms in IoT Systems

  • paper_url: http://arxiv.org/abs/2309.11526
  • repo_url: None
  • paper_authors: Rüdiger Machhamer, Lejla Begic Fazlic, Eray Guven, David Junk, Gunes Karabulut Kurt, Stefan Naumann, Stephan Didas, Klaus-Uwe Gollmer, Ralph Bergmann, Ingo J. Timm, Guido Dartmann
  • for: 这篇论文主要是为了提高感知技术中数据测量的精度和效率。
  • methods: 本论文使用了估计斜射变换的方法,并利用专家知识进行改进。它还可以应用于软件校准、专家基于适应和联邦学习方法。
  • results: 实验和仿真数据都表明,这种解决方案可以提高测量数据的精度和效率。
    Abstract An important task in the field of sensor technology is the efficient implementation of adaptation procedures of measurements from one sensor to another sensor of identical design. One idea is to use the estimation of an affine transformation between different systems, which can be improved by the knowledge of experts. This paper presents an improved solution from Glacier Research that was published back in 1973. It is shown that this solution can be adapted for software calibration of sensors, implementation of expert-based adaptation, and federated learning methods. We evaluate our research with simulations and also with real measured data of a multi-sensor board with 8 identical sensors. The results show an improvement for both the simulation and the experiments with real data.
    摘要 在感测技术领域,一项重要任务是有效地实现感测器之间测量转换的方法。一种思路是使用估算投影变换,可以通过专家知识进行改进。这篇文章介绍了1973年由冰川研究所发表的改进解决方案。我们表明该解决方案可以适用于软件准确性检测、专家知识基于的调整和联邦学习方法。我们通过实验和真实测量数据来评估我们的研究。结果表明,优化后的方法可以提高仪器测量精度。Here's the translation in Traditional Chinese:在感测技术领域,一个重要任务是有效地实现感测器之间测量转换的方法。一种思路是使用估算投影变换,可以通过专家知识进行改进。这篇文章介绍了1973年由冰川研究所发表的改进解决方案。我们表明这个解决方案可以应用于软件准确性检测、专家知识基于的调整和联邦学习方法。我们通过实验和真实测量数据来评估我们的研究。结果表明,优化后的方法可以提高仪器测量精度。

Practical Probabilistic Model-based Deep Reinforcement Learning by Integrating Dropout Uncertainty and Trajectory Sampling

  • paper_url: http://arxiv.org/abs/2309.11089
  • repo_url: https://github.com/mrjun123/DPETS
  • paper_authors: Wenjun Huang, Yunduan Cui, Huiyun Li, Xinyu Wu
  • for: 这篇论文旨在解决现有的 probabilistic model-based reinforcement learning(MBRL)模型,它们基于神经网络,但它们的预测稳定性和准确性有限制。
  • methods: 该论文提出了一种新的方法,即dropout-based probabilistic ensembles with trajectory sampling(DPETS),它将Monte-Carlo dropout和 trajectory sampling结合在一起,以稳定地预测系统的不确定性。DPETS的损失函数设计用于更正神经网络的适应错误,以更准确地预测 probabilistic models。
  • results: 论文通过在多个 Mujoco benchmark control任务和一个实际的 robot arm manipulation任务上进行评估,发现 DPETS 可以在更高的 sample efficiency 下达到更高的均返回值和快速吞吐量,同时超过了相关的 MBRL 方法。此外,DPETS 还可以在面临附加干扰和实际操作中表现出色。
    Abstract This paper addresses the prediction stability, prediction accuracy and control capability of the current probabilistic model-based reinforcement learning (MBRL) built on neural networks. A novel approach dropout-based probabilistic ensembles with trajectory sampling (DPETS) is proposed where the system uncertainty is stably predicted by combining the Monte-Carlo dropout and trajectory sampling in one framework. Its loss function is designed to correct the fitting error of neural networks for more accurate prediction of probabilistic models. The state propagation in its policy is extended to filter the aleatoric uncertainty for superior control capability. Evaluated by several Mujoco benchmark control tasks under additional disturbances and one practical robot arm manipulation task, DPETS outperforms related MBRL approaches in both average return and convergence velocity while achieving superior performance than well-known model-free baselines with significant sample efficiency. The open source code of DPETS is available at https://github.com/mrjun123/DPETS.
    摘要 In the evaluation, DPETS outperforms other MBRL approaches in both average return and convergence velocity on several Mujoco benchmark control tasks with additional disturbances and one practical robot arm manipulation task. It also achieves superior performance compared to well-known model-free baselines with significant sample efficiency. The open source code of DPETS is available on GitHub at https://github.com/mrjun123/DPETS.Translated into Simplified Chinese:这篇论文关注现有基于神经网络的概率模型学习(MBRL)方法的预测稳定性、预测准确性和控制能力。一种新的方法叫做 dropout-based 概率集合with trajectory sampling(DPETS)被提议,它将 Monte-Carlo dropout 和 trajectory sampling 集成到一个框架中,以稳定系统uncertainty的预测。loss函数设计用于更正神经网络的适应错误,以便更准确地预测概率模型。Policy也被扩展以筛选 aleatoric uncertainty,以提高控制能力。在评估中,DPETS 比其他 MBRL 方法在多个 Mujoco benchmark控制任务上(包括附加干扰)和一个实际的机械臂控制任务上表现出更高的平均返点和更快的连续速度,同时与许多已知的模型自由基eline表现出更好的性能,并且具有显著的样本效率。DPETS 的开源代码可以在 GitHub 上获取,地址为

Embed-Search-Align: DNA Sequence Alignment using Transformer Models

  • paper_url: http://arxiv.org/abs/2309.11087
  • repo_url: None
  • paper_authors: Pavan Holur, K. C. Enevoldsen, Lajoyce Mboning, Thalia Georgiou, Louis-S. Bouchard, Matteo Pellegrini, Vwani Roychowdhury
  • for: 该研究旨在开发一种基于Transformer架构的DNA序列对齐方法,以提高DNA序列对齐精度。
  • methods: 该方法使用了自动对齐精度进行自我超vised培训,并引入了DNAvector存储以实现全文搜索。
  • results: 该方法可以高度准确地对齐250个基因组中的DNA序列,并且在不同染色体和物种上进行了任务转移。
    Abstract DNA sequence alignment involves assigning short DNA reads to the most probable locations on an extensive reference genome. This process is crucial for various genomic analyses, including variant calling, transcriptomics, and epigenomics. Conventional methods, refined over decades, tackle this challenge in two steps: genome indexing followed by efficient search to locate likely positions for given reads. Building on the success of Large Language Models (LLM) in encoding text into embeddings, where the distance metric captures semantic similarity, recent efforts have explored whether the same Transformer architecture can produce numerical representations for DNA sequences. Such models have shown early promise in tasks involving classification of short DNA sequences, such as the detection of coding vs non-coding regions, as well as the identification of enhancer and promoter sequences. Performance at sequence classification tasks does not, however, translate to sequence alignment, where it is necessary to conduct a genome-wide search to successfully align every read. We address this open problem by framing it as an Embed-Search-Align task. In this framework, a novel encoder model DNA-ESA generates representations of reads and fragments of the reference, which are projected into a shared vector space where the read-fragment distance is used as surrogate for alignment. In particular, DNA-ESA introduces: (1) Contrastive loss for self-supervised training of DNA sequence representations, facilitating rich sequence-level embeddings, and (2) a DNA vector store to enable search across fragments on a global scale. DNA-ESA is >97% accurate when aligning 250-length reads onto a human reference genome of 3 gigabases (single-haploid), far exceeds the performance of 6 recent DNA-Transformer model baselines and shows task transfer across chromosomes and species.
    摘要 DNNA序列Alignment含义在将短DNNA读物 assigning 到参考基因组中最有可能的位置上。这个过程是生物学分析中的关键步骤,包括变异检测、转录组学和epigenomics。传统方法通过两步进行:基因组索引,然后是高效的搜索来找到给定读物的可能位置。基于大自然语言模型(LLM)在编码文本为嵌入中的成功,最近的努力是否是使用同样的Transformer架构生成DNNA序列的数字表示。这些模型在短DNNA序列分类任务中表现出了早期的 promise,例如分类 coding vs non-coding 区域以及激活器和激发器序列的识别。但是,性能在序列分类任务上不能直接转移到Alignment任务,因为需要进行全基因组搜索以成功地对每个读物进行Alignment。我们解决这个开放问题 by framing it as an Embed-Search-Align task。在这种框架中,一种新的编码器模型DNA-ESA生成了读物和参考基因组中的 фрагментов的表示,并将它们投射到一个共享的vector空间中,其中读物-фрагмент的距离作为Alignment的Surrogate。特别是,DNA-ESA引入了:(1)对DNNA序列表示进行自我超vised 训练,以获得丰富的序列水平嵌入,以及(2)DNNA vector store,以实现在全球范围内搜索多个 фрагментов。DNNA-ESA在对3 gigabases的人类参考基因组上Alignment 250个长度的读物时,准确率高于97%,大幅超过了6个最近的DNNA-Transformer模型基eline,并在 хромосомы和种类之间显示任务传递。

Weak Supervision for Label Efficient Visual Bug Detection

  • paper_url: http://arxiv.org/abs/2309.11077
  • repo_url: None
  • paper_authors: Farrukh Rahman
  • for: 本研究旨在提高视频游戏中的视觉质量,并 Addressing the challenge of traditional testing methods being limited by resources and unable to cover the wide range of potential bugs.
  • methods: 我们提出了一种新的方法,使用无标注游戏记录和域特定的扩充来生成数据集和自我标注目标,并在预训练或多任务设置中使用这些目标进行预训练。我们使用弱监督来扩大数据集,并实现了自主和互动式弱监督,通过不supervised clustering和/或基于文本和几何提示的交互方式。
  • results: 我们在Giantmap游戏中测试了FPPC(首个玩家截割/碰撞漏洞),发现我们的方法非常有效,超越了强监督基线,在实际、非常低频率、低数据量 régime中(0.336 $\rightarrow$ 0.550 F1分数)。只需5个标注的“好”示例(即0个漏洞),我们的自我标注目标就能够捕捉足够的信号,超越低标注监督设置。我们的方法可以在不同的视觉漏洞上进行应用,并且可以在视频游戏中拓展到更广泛的图像和视频任务。
    Abstract As video games evolve into expansive, detailed worlds, visual quality becomes essential, yet increasingly challenging. Traditional testing methods, limited by resources, face difficulties in addressing the plethora of potential bugs. Machine learning offers scalable solutions; however, heavy reliance on large labeled datasets remains a constraint. Addressing this challenge, we propose a novel method, utilizing unlabeled gameplay and domain-specific augmentations to generate datasets & self-supervised objectives used during pre-training or multi-task settings for downstream visual bug detection. Our methodology uses weak-supervision to scale datasets for the crafted objectives and facilitates both autonomous and interactive weak-supervision, incorporating unsupervised clustering and/or an interactive approach based on text and geometric prompts. We demonstrate on first-person player clipping/collision bugs (FPPC) within the expansive Giantmap game world, that our approach is very effective, improving over a strong supervised baseline in a practical, very low-prevalence, low data regime (0.336 $\rightarrow$ 0.550 F1 score). With just 5 labeled "good" exemplars (i.e., 0 bugs), our self-supervised objective alone captures enough signal to outperform the low-labeled supervised settings. Building on large-pretrained vision models, our approach is adaptable across various visual bugs. Our results suggest applicability in curating datasets for broader image and video tasks within video games beyond visual bugs.
    摘要 Traditional video game testing methods are limited by resources and have difficulty addressing the many potential bugs that exist. Machine learning offers scalable solutions, but relying on large labeled datasets is a challenge. To address this, we propose a new method that uses unlabeled gameplay and domain-specific augmentations to generate datasets and self-supervised objectives for pre-training or multi-task settings. Our method uses weak supervision to scale the datasets and can be used in both autonomous and interactive modes, incorporating unsupervised clustering and/or an interactive approach based on text and geometric prompts. We demonstrate the effectiveness of our approach on first-person player clipping/collision bugs within the Giantmap game world, achieving an F1 score of 0.550 in a practical, low-prevalence, low-data regime with just 5 labeled "good" exemplars. Our self-supervised objective captures enough signal to outperform low-labeled supervised settings, and our approach is adaptable to various visual bugs and can be applied to curating datasets for broader image and video tasks within video games.

Dynamic Tiling: A Model-Agnostic, Adaptive, Scalable, and Inference-Data-Centric Approach for Efficient and Accurate Small Object Detection

  • paper_url: http://arxiv.org/abs/2309.11069
  • repo_url: None
  • paper_authors: Son The Nguyen, Theja Tulabandhula, Duy Nguyen
  • for: 这篇论文主要是为了提出一种模型不偏的、可适应的、扩展性强的小对象检测方法,以提高对象检测的准确率和效率。
  • methods: 这篇论文使用了动态瓷纹法,即首先使用非重叠的瓷纹来确定初始检测结果,然后通过动态调整瓷纹的重叠率和瓷纹最小化器来解决分布在不同瓷纹之间的 Fragmented 对象,从而提高检测精度和降低计算开销。
  • results: 相比现有的模型不偏的均匀裁剪方法,Dynamic Tiling 方法在不同的对象大小和环境下都能够达到更高的检测精度和效率,并且不需要劳动的重新调整。此外,这种方法还可以在不同的操作环境下进行适应,以提高对象检测的可扩展性和灵活性。
    Abstract We introduce Dynamic Tiling, a model-agnostic, adaptive, and scalable approach for small object detection, anchored in our inference-data-centric philosophy. Dynamic Tiling starts with non-overlapping tiles for initial detections and utilizes dynamic overlapping rates along with a tile minimizer. This dual approach effectively resolves fragmented objects, improves detection accuracy, and minimizes computational overhead by reducing the number of forward passes through the object detection model. Adaptable to a variety of operational environments, our method negates the need for laborious recalibration. Additionally, our large-small filtering mechanism boosts the detection quality across a range of object sizes. Overall, Dynamic Tiling outperforms existing model-agnostic uniform cropping methods, setting new benchmarks for efficiency and accuracy.
    摘要 我团队介绍了一种名为动态瓷纹的模型无关、可适应、可扩展的方法,用于小物体检测。这种方法基于我们的推理数据中心的哲学,使用非 overlap 的瓷纹开始,然后采用动态重叠率和瓷纹最小化器。这种双重方法能够有效地解决分割物体,提高检测精度,并减少计算负担。我们的方法适用于多种操作环境,无需劳辑重新调整。此外,我们的大小筛选机制可以在不同的物体大小下提高检测质量。总之,动态瓷纹超过了现有的模型无关均匀割 методы,设置了新的效率和准确性的benchmark。

Exploring the Relationship between LLM Hallucinations and Prompt Linguistic Nuances: Readability, Formality, and Concreteness

  • paper_url: http://arxiv.org/abs/2309.11064
  • repo_url: None
  • paper_authors: Vipula Rawte, Prachi Priya, S. M Towhidul Islam Tonmoy, S M Mehedi Zaman, Amit Sheth, Amitava Das
  • for: investigate the influence of linguistic factors in prompts on the occurrence of LLM hallucinations
  • methods: experimental study using prompts with varying levels of readability, formality, and concreteness
  • results: prompts with greater formality and concreteness tend to result in reduced hallucinations, while the outcomes pertaining to readability are mixed.
    Abstract As Large Language Models (LLMs) have advanced, they have brought forth new challenges, with one of the prominent issues being LLM hallucination. While various mitigation techniques are emerging to address hallucination, it is equally crucial to delve into its underlying causes. Consequently, in this preliminary exploratory investigation, we examine how linguistic factors in prompts, specifically readability, formality, and concreteness, influence the occurrence of hallucinations. Our experimental results suggest that prompts characterized by greater formality and concreteness tend to result in reduced hallucination. However, the outcomes pertaining to readability are somewhat inconclusive, showing a mixed pattern.
    摘要 LLMs 的进步也带来了新的挑战,其中一个主要问题是 LLM 幻觉。虽然各种 mitigation 技术正在emerging,但是也非常重要探讨幻觉的深层原因。因此,在这项初步的探索性研究中,我们研究了提示中语言因素对幻觉的影响,特别是可读性、正式度和具体性。我们的实验结果表明,使用更正式和具体的提示可以减少幻觉,但是关于可读性的结果呈杂化的模式。

Design of Chain-of-Thought in Math Problem Solving

  • paper_url: http://arxiv.org/abs/2309.11054
  • repo_url: https://github.com/lqtrung1998/mwp_cot_design
  • paper_authors: Zhanming Jie, Trung Quoc Luong, Xinbo Zhang, Xiaoran Jin, Hang Li
  • for: 本研究旨在探讨链条思维(CoT)在数学问题解决中的作用,并对不同的程序CoT进行比较,包括自然语言CoT、自我描述程序、注释描述程序和非描述程序。此外,研究还 investigate了编程语言对程序CoT的影响,并对Python和Wolfram语言进行比较。
  • methods: 本研究采用了extensive experiments方法,在GSM8K、MATHQA和SVAMP上进行了评测,并发现了程序CoT在数学问题解决中的优势。特别是,最佳组合(30B参数)击败了GPT-3.5-turbo的表现,并且自然语言CoT提供了更大的多样性,因此可以通常实现更高的性能。
  • results: 研究结果显示,程序CoT在数学问题解决中具有优势,特别是自然语言CoT提供了更大的多样性,可以实现更高的性能。此外,研究还发现了Python是程序CoT的更好的编程语言。研究结果可以为未来的CoT设计提供有价值的指导,并且可以考虑编程语言和编程风格的因素进行进一步的改进。
    Abstract Chain-of-Thought (CoT) plays a crucial role in reasoning for math problem solving. We conduct a comprehensive examination of methods for designing CoT, comparing conventional natural language CoT with various program CoTs, including the self-describing program, the comment-describing program, and the non-describing program. Furthermore, we investigate the impact of programming language on program CoTs, comparing Python and Wolfram Language. Through extensive experiments on GSM8K, MATHQA, and SVAMP, we find that program CoTs often have superior effectiveness in math problem solving. Notably, the best performing combination with 30B parameters beats GPT-3.5-turbo by a significant margin. The results show that self-describing program offers greater diversity and thus can generally achieve higher performance. We also find that Python is a better choice of language than Wolfram for program CoTs. The experimental results provide a valuable guideline for future CoT designs that take into account both programming language and coding style for further advancements. Our datasets and code are publicly available.
    摘要 Chain-of-Thought (CoT) 在数学问题解决中扮演着关键性的角色。我们对设计 CoT 的方法进行了全面的评估,比较了自然语言 CoT 与不同的程序 CoT,包括自我描述程序、注释描述程序和非描述程序。此外,我们还 investigate了编程语言对程序 CoT 的影响,比较了 Python 和 Wolfram 语言。通过对 GSM8K、MATHQA 和 SVAMP 等数据集进行了广泛的实验,我们发现program CoT 在数学问题解决中经常具有更高的效果。特别是,使用 30B 参数的最佳组合可以很大幅度地超越 GPT-3.5-turbo。结果表明,自我描述程序可以提供更多的多样性,因此通常可以达到更高的性能。我们还发现 Python 比 Wolfram 更适合用于 program CoT。我们的实验结果提供了未来 CoT 设计的价值指南,考虑到编程语言和编程风格,以便进一步提高表达能力。我们的数据集和代码公开可用。

Clustered FedStack: Intermediate Global Models with Bayesian Information Criterion

  • paper_url: http://arxiv.org/abs/2309.11044
  • repo_url: None
  • paper_authors: Thanveer Shaik, Xiaohui Tao, Lin Li, Niall Higgins, Raj Gururajan, Xujuan Zhou, Jianming Yong
  • for: 提高 Federated Learning(FL)在非Identical和非独立分布(non-IID)和数据偏置标签(imbalanced labels)的情况下的性能。
  • methods: 使用 Stacked Federated Learning(FedStack)框架,并采用三种集群机制:K-Means、Agglomerative和Gaussian Mixture Models。使用 Bayesian Information Criterion(BIC)确定集群数量。
  • results: Clustered FedStack模型比基eline模型 WITH clustering机制表现更好,并且使用cyclical learning rates来估计框架的整合程度。
    Abstract Federated Learning (FL) is currently one of the most popular technologies in the field of Artificial Intelligence (AI) due to its collaborative learning and ability to preserve client privacy. However, it faces challenges such as non-identically and non-independently distributed (non-IID) and data with imbalanced labels among local clients. To address these limitations, the research community has explored various approaches such as using local model parameters, federated generative adversarial learning, and federated representation learning. In our study, we propose a novel Clustered FedStack framework based on the previously published Stacked Federated Learning (FedStack) framework. The local clients send their model predictions and output layer weights to a server, which then builds a robust global model. This global model clusters the local clients based on their output layer weights using a clustering mechanism. We adopt three clustering mechanisms, namely K-Means, Agglomerative, and Gaussian Mixture Models, into the framework and evaluate their performance. We use Bayesian Information Criterion (BIC) with the maximum likelihood function to determine the number of clusters. The Clustered FedStack models outperform baseline models with clustering mechanisms. To estimate the convergence of our proposed framework, we use Cyclical learning rates.
    摘要 现在的 Federated Learning(FL)技术在人工智能(AI)领域中非常流行,这是因为它可以实现协同学习并保持客户端隐私。然而,FL还面临着非标一同分布(non-IID)和数据偏极性(imbalanced labels)等问题。为了解决这些局限性,研究人员已经提出了多种方法,如使用本地模型参数、联邦生成敌方搜索学习和联邦表示学习。在我们的研究中,我们提出了一种基于先前发表的 Stacked Federated Learning(FedStack)框架的 Novel Clustered FedStack 框架。本地客户端将其模型预测结果和输出层加权值发送到服务器,服务器然后建立一个强大的全局模型。这个全局模型使用一种卷积机制将本地客户端分为不同的集群。我们在框架中采用了 K-Means、Agglomerative 和 Gaussian Mixture Models 三种卷积机制,并使用 Bayesian Information Criterion(BIC)与最大似然函数来确定集群数量。Clustered FedStack 模型在基eline模型中表现出色,以便估算我们提出的框架的整合。为了估算我们的提出的框架的整合,我们使用 Cyclical learning rates。

Making Small Language Models Better Multi-task Learners with Mixture-of-Task-Adapters

  • paper_url: http://arxiv.org/abs/2309.11042
  • repo_url: None
  • paper_authors: Yukang Xie, Chengyu Wang, Junbing Yan, Jiyong Zhou, Feiqi Deng, Jun Huang
  • for: 本研究旨在提出一种基于小语言模型(less than 1B parameters)的多任务学习系统,以支持域pecific应用。
  • methods: 本研究提出了一种扩展 transformer 架构的 Mixture-of-Task-Adapters(MTA)模块,以capture intra-task 和inter-task 知识。同时,提出了一种两个阶段训练方法来优化 adapter 之间的协作。
  • results: 实验结果表明,提出的 MTA 架构和两个阶段训练方法可以达到良好的性能。此外,基于 ALTER 的 MTA-equipped 语言模型在不同领域中也得到了良好的result。
    Abstract Recently, Large Language Models (LLMs) have achieved amazing zero-shot learning performance over a variety of Natural Language Processing (NLP) tasks, especially for text generative tasks. Yet, the large size of LLMs often leads to the high computational cost of model training and online deployment. In our work, we present ALTER, a system that effectively builds the multi-tAsk Learners with mixTure-of-task-adaptERs upon small language models (with <1B parameters) to address multiple NLP tasks simultaneously, capturing the commonalities and differences between tasks, in order to support domain-specific applications. Specifically, in ALTER, we propose the Mixture-of-Task-Adapters (MTA) module as an extension to the transformer architecture for the underlying model to capture the intra-task and inter-task knowledge. A two-stage training method is further proposed to optimize the collaboration between adapters at a small computational cost. Experimental results over a mixture of NLP tasks show that our proposed MTA architecture and the two-stage training method achieve good performance. Based on ALTER, we have also produced MTA-equipped language models for various domains.
    摘要 最近,大型语言模型(LLMs)在多种自然语言处理(NLP)任务上实现了惊人的零shot学习性能,尤其是文本生成任务。然而,大型模型的大小经常导致模型训练和在线部署的计算成本高涨。在我们的工作中,我们提出了ALTER系统,可以有效地建立多任务学习者,通过将小型语言模型( Parameters <1B)扩展到多个NLP任务,以便同时处理多个任务,捕捉任务之间的共同点和差异,以支持域pecific应用。具体来说,在ALTER中,我们提出了mixture-of-task-adaptERs(MTA)模块,作为 transformer 架构的增强部分,以Capture intra-task和inter-task知识。我们还提出了一种两Stage训练方法,以便在小型计算成本下优化 adapter collaboration。实验结果表明,我们的提议的MTA架构和两Stage训练方法在一组多种NLP任务上具有良好的表现。基于ALTER,我们还生成了各个领域的MTA语言模型。

Federated Learning in Intelligent Transportation Systems: Recent Applications and Open Problems

  • paper_url: http://arxiv.org/abs/2309.11039
  • repo_url: None
  • paper_authors: Shiying Zhang, Jun Li, Long Shi, Ming Ding, Dinh C. Nguyen, Wuzheng Tan, Jian Weng, Zhu Han
  • for: 本研究旨在探讨基于分布式机器学习技术的智能交通系统(ITS)中的应用前景,以及在不同场景下如何使用 Federated Learning(FL)来解决智能交通系统中的问题。
  • methods: 本研究使用了分布式机器学习技术Federated Learning(FL)来解决智能交通系统中的问题,包括对象识别、交通管理和服务提供等场景。
  • results: 本研究发现了在智能交通系统中应用FL后,可以提高对象识别精度、提高交通管理效率和提高服务提供质量等。但是,FL也存在一些挑战,如数据不均匀分布、计算机力和存储空间的限制,以及隐私和安全问题。
    Abstract Intelligent transportation systems (ITSs) have been fueled by the rapid development of communication technologies, sensor technologies, and the Internet of Things (IoT). Nonetheless, due to the dynamic characteristics of the vehicle networks, it is rather challenging to make timely and accurate decisions of vehicle behaviors. Moreover, in the presence of mobile wireless communications, the privacy and security of vehicle information are at constant risk. In this context, a new paradigm is urgently needed for various applications in dynamic vehicle environments. As a distributed machine learning technology, federated learning (FL) has received extensive attention due to its outstanding privacy protection properties and easy scalability. We conduct a comprehensive survey of the latest developments in FL for ITS. Specifically, we initially research the prevalent challenges in ITS and elucidate the motivations for applying FL from various perspectives. Subsequently, we review existing deployments of FL in ITS across various scenarios, and discuss specific potential issues in object recognition, traffic management, and service providing scenarios. Furthermore, we conduct a further analysis of the new challenges introduced by FL deployment and the inherent limitations that FL alone cannot fully address, including uneven data distribution, limited storage and computing power, and potential privacy and security concerns. We then examine the existing collaborative technologies that can help mitigate these challenges. Lastly, we discuss the open challenges that remain to be addressed in applying FL in ITS and propose several future research directions.
    摘要 智能交通系统(ITS)因通信技术、感知技术和互联网对话的快速发展而得到推动。然而,由于车辆网络的动态特性,很难在时间上进行准确的车辆行为决策。此外,在移动无线通信的存在下,车辆信息的隐私和安全总是处于风险之中。在这种情况下,一种新的思维方式是紧迫的,以满足不同应用场景的需求。作为分布式机器学习技术,联邦学习(FL)在隐私保护和扩展可扩展性等方面受到了广泛的关注。我们进行了ITS中FL最新的发展情况的全面评估。我们首先研究了ITS中存在的主要挑战和应用FL的动机,然后评论了ITS中FL的不同场景应用,包括物体识别、交通管理和服务提供等方面的问题。此外,我们还进行了进一步的分析,探讨FL部署引入的新挑战和FL本身无法解决的内在限制,包括数据分布不均、计算和存储能力有限和隐私和安全问题。最后,我们讨论了在应用FL时存在的开放挑战,并提出了未来研究方向。

ModelGiF: Gradient Fields for Model Functional Distance

  • paper_url: http://arxiv.org/abs/2309.11013
  • repo_url: https://github.com/zju-vipa/modelgif
  • paper_authors: Jie Song, Zhengqi Xu, Sai Wu, Gang Chen, Mingli Song
  • for: 这 paper 的目的是量化不同预训练模型之间的功能距离,以便为各种目的进行评估。
  • methods: 该 paper 使用了基于 “场” 的思想,提出了 Model Gradient Field (ModelGiF),用于从不同预训练模型中提取同谱表示。
  • results: 实验结果表明,ModelGiF 在任务相关性判断、知识产权保护和模型忘却验证等方面具有显著的优势,与当前竞争者相比显著性更高。
    Abstract The last decade has witnessed the success of deep learning and the surge of publicly released trained models, which necessitates the quantification of the model functional distance for various purposes. However, quantifying the model functional distance is always challenging due to the opacity in inner workings and the heterogeneity in architectures or tasks. Inspired by the concept of "field" in physics, in this work we introduce Model Gradient Field (abbr. ModelGiF) to extract homogeneous representations from the heterogeneous pre-trained models. Our main assumption underlying ModelGiF is that each pre-trained deep model uniquely determines a ModelGiF over the input space. The distance between models can thus be measured by the similarity between their ModelGiFs. We validate the effectiveness of the proposed ModelGiF with a suite of testbeds, including task relatedness estimation, intellectual property protection, and model unlearning verification. Experimental results demonstrate the versatility of the proposed ModelGiF on these tasks, with significantly superiority performance to state-of-the-art competitors. Codes are available at https://github.com/zju-vipa/modelgif.
    摘要 过去一个十年,深度学习的成功和公共释放的训练模型的涌现,使得模型功能距离的量化变得非常重要。然而,量化模型功能距离总是困难的,因为深度学习模型的内部工作机制是不透明的,而且模型或任务的architecture和task都是多样的。引用物理学中的“场”概念,在这种工作中我们提出了Model Gradient Field(简称ModelGiF)来EXTRACT homogeneous representation from heterogeneous pre-trained models。我们假设每个预训练深度模型具有唯一的ModelGiF over the input space,因此可以通过比较这些ModelGiF的相似性来度量模型之间的距离。我们验证了提议的ModelGiF的效果通过一系列测试床,包括任务相似性预测、知识产权保护和模型忘记验证。实验结果表明提议的ModelGiF在这些任务上具有显著的优势性能,与现有的竞争对手相比。代码可以在https://github.com/zju-vipa/modelgif上获取。

Spiking NeRF: Making Bio-inspired Neural Networks See through the Real World

  • paper_url: http://arxiv.org/abs/2309.10987
  • repo_url: None
  • paper_authors: Xingting Yao, Qinghao Hu, Tielong Liu, Zitao Mo, Zeyu Zhu, Zhengyang Zhuge, Jian Cheng
  • for: 这个论文的目的是提出一种能源优化的神经鳗网络(Spiking Neural Network,SNN),用于实现高品质的3D场景渲染,并且与生物学上的神经元运作相似。
  • methods: 这个方法使用了神经鳗网络(SNN)和射线场(NeRF)技术,将射线场与时间维度进行对应,从而使计算变成了一个发射-自由的方式,以减少能源消耗。
  • results: 实验结果显示,这个方法可以实现$76.74%$的能源优化,并且与生物学上的神经元运作相似。
    Abstract Spiking neuron networks (SNNs) have been thriving on numerous tasks to leverage their promising energy efficiency and exploit their potentialities as biologically plausible intelligence. Meanwhile, the Neural Radiance Fields (NeRF) render high-quality 3D scenes with massive energy consumption, and few works delve into the energy-saving solution with a bio-inspired approach. In this paper, we propose spiking NeRF (SpikingNeRF), which aligns the radiance ray with the temporal dimension of SNN, to naturally accommodate the SNN to the reconstruction of Radiance Fields. Thus, the computation turns into a spike-based, multiplication-free manner, reducing the energy consumption. In SpikingNeRF, each sampled point on the ray is matched onto a particular time step, and represented in a hybrid manner where the voxel grids are maintained as well. Based on the voxel grids, sampled points are determined whether to be masked for better training and inference. However, this operation also incurs irregular temporal length. We propose the temporal condensing-and-padding (TCP) strategy to tackle the masked samples to maintain regular temporal length, i.e., regular tensors, for hardware-friendly computation. Extensive experiments on a variety of datasets demonstrate that our method reduces the $76.74\%$ energy consumption on average and obtains comparable synthesis quality with the ANN baseline.
    摘要 神经风暴网络(SNN)在许多任务上得到了广泛应用,以利用其能效的能源和生物可能的智能潜力。然而,神经辐射场(NeRF)的渲染高质量3D场景却需要巨大的能源消耗,而很少的研究探讨了以生物静脉为导向的能源抑制方法。在这篇论文中,我们提出了神经辐射场(SpikingNeRF),它将辐射场的强度方向与SNN的时间维度对齐,以自然地让SNN参与辐射场的重建。因此,计算变成了一种快速、无 multiplication 的方式,从而降低了能源消耗。在SpikingNeRF中,每个样本点被匹配到特定的时间步,并以混合方式表示,保留了 voxel 网格。基于 voxel 网格,样本点是否需要被masking 以提高训练和推理的质量。然而,这个操作也会产生不规则的时间长度。我们提出了时间condensing-and-padding(TCP)策略,以解决masked samples的问题,以保持常规的时间长度,即常规的tensor,为硬件友好的计算。在多个dataset上进行了广泛的实验,表明我们的方法可以降低76.74%的能源消耗,并与ANN基线相当的Synthesis质量。

Is GPT4 a Good Trader?

  • paper_url: http://arxiv.org/abs/2309.10982
  • repo_url: None
  • paper_authors: Bingzhe Wu
    for: 本研究旨在检验GPT-4对经典投资理论的理解程度和对实际交易数据分析的代码解释能力。methods: 本研究使用GPT-4对特定资产的日均K线数据进行分析,基于尼采尔浪幕理论等特定理论。results: 本研究发现GPT-4在对实际交易数据分析中表现出较高的解释深度和准确率,同时提供了有价值的投资理论应用方法。
    Abstract Recently, large language models (LLMs), particularly GPT-4, have demonstrated significant capabilities in various planning and reasoning tasks \cite{cheng2023gpt4,bubeck2023sparks}. Motivated by these advancements, there has been a surge of interest among researchers to harness the capabilities of GPT-4 for the automated design of quantitative factors that do not overlap with existing factor libraries, with an aspiration to achieve alpha returns \cite{webpagequant}. In contrast to these work, this study aims to examine the fidelity of GPT-4's comprehension of classic trading theories and its proficiency in applying its code interpreter abilities to real-world trading data analysis. Such an exploration is instrumental in discerning whether the underlying logic GPT-4 employs for trading is intrinsically reliable. Furthermore, given the acknowledged interpretative latitude inherent in most trading theories, we seek to distill more precise methodologies of deploying these theories from GPT-4's analytical process, potentially offering invaluable insights to human traders. To achieve this objective, we selected daily candlestick (K-line) data from specific periods for certain assets, such as the Shanghai Stock Index. Through meticulous prompt engineering, we guided GPT-4 to analyze the technical structures embedded within this data, based on specific theories like the Elliott Wave Theory. We then subjected its analytical output to manual evaluation, assessing its interpretative depth and accuracy vis-\`a-vis these trading theories from multiple dimensions. The results and findings from this study could pave the way for a synergistic amalgamation of human expertise and AI-driven insights in the realm of trading.
    摘要 最近,大语言模型(LLM),特别是GPT-4,在各种计划和理解任务中表现出了显著的能力。这些进步引起了研究人员对GPT-4的投资 alpha 回报的兴趣,并寻求通过自动设计不同于现有因素库的量化因素来实现这一目标。与这些工作不同,本研究旨在检验GPT-4对经典交易理论的理解和对实际交易数据分析中的代码解释能力。这种探索有助于判断GPT-4在交易中使用的逻辑是否具有内在的可靠性。此外,由于交易理论中的解释空间往往很大,我们寻求通过GPT-4的分析过程中提取更加精细的方法来应用这些理论,从而为人类交易员提供有价值的想法。为达到这个目标,我们选择了特定期间的一些资产的日均盘形(K-line)数据,例如上海股票指数。通过仔细的提问工程,我们导引GPT-4分析这些数据中的技术结构,基于特定的投资理论,如欧拉瓦vecenie理论。然后,我们对GPT-4的分析输出进行手动评估,评估其在这些交易理论多个维度的解释深度和准确性。研究结果和发现可能为人类专家和 AI 驱动的想法带来协同合作,为交易领域带来新的发展。

AI-Driven Patient Monitoring with Multi-Agent Deep Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2309.10980
  • repo_url: None
  • paper_authors: Thanveer Shaik, Xiaohui Tao, Haoran Xie, Lin Li, Jianming Yong, Hong-Ning Dai
  • for: 提高医疗卫生监测效果,实现时间有效的干预和改善医疗结果。
  • methods: 使用多智能深度强化学习(DRL)方法,投入多个学习代理,每个代理负责监测特定生理参数,如心率、呼吸和体温等。这些代理与通用医疗监测环境互动,学习患者的行为模式,根据紧急程度估算,向相应的医疗应急团队(METs)发出警示。
  • results: 与多种基线模型进行比较,研究表明,提posed的DRL方法在实际生理和运动数据集PPG-DaLiA和WESAD上的表现准确性高于所有基线模型,并且通过调整Hyperparameter进行优化,进一步提高代理的总性能。
    Abstract Effective patient monitoring is vital for timely interventions and improved healthcare outcomes. Traditional monitoring systems often struggle to handle complex, dynamic environments with fluctuating vital signs, leading to delays in identifying critical conditions. To address this challenge, we propose a novel AI-driven patient monitoring framework using multi-agent deep reinforcement learning (DRL). Our approach deploys multiple learning agents, each dedicated to monitoring a specific physiological feature, such as heart rate, respiration, and temperature. These agents interact with a generic healthcare monitoring environment, learn the patients' behavior patterns, and make informed decisions to alert the corresponding Medical Emergency Teams (METs) based on the level of emergency estimated. In this study, we evaluate the performance of the proposed multi-agent DRL framework using real-world physiological and motion data from two datasets: PPG-DaLiA and WESAD. We compare the results with several baseline models, including Q-Learning, PPO, Actor-Critic, Double DQN, and DDPG, as well as monitoring frameworks like WISEML and CA-MAQL. Our experiments demonstrate that the proposed DRL approach outperforms all other baseline models, achieving more accurate monitoring of patient's vital signs. Furthermore, we conduct hyperparameter optimization to fine-tune the learning process of each agent. By optimizing hyperparameters, we enhance the learning rate and discount factor, thereby improving the agents' overall performance in monitoring patient health status. Our AI-driven patient monitoring system offers several advantages over traditional methods, including the ability to handle complex and uncertain environments, adapt to varying patient conditions, and make real-time decisions without external supervision.
    摘要 通过人工智能驱动的患者监测框架,我们可以提高医疗结果和患者监测效果。传统的监测系统经常在复杂和动态的环境中难以处理,导致检测重要情况的延迟。为解决这个挑战,我们提出了一种基于多代理深度学习(DRL)的新型患者监测框架。我们的方法在多个学习代理之间分配不同的生物 physiological 特征,例如心率、呼吸和体温。这些代理与一个通用医疗监测环境进行交互,学习患者的行为模式,并根据紧急程度来通知相应的医疗紧急队伍(METs)。在本研究中,我们使用实际的生理和运动数据进行评估,并与多种基准模型进行比较,包括Q学习、PPO、actor-critic、Double DQN 和 DDPG 等。我们的实验表明,提出的 DRL 方法在监测患者生命体征上的准确性比基准模型高。此外,我们还进行了 гипер参数优化,以提高每个代理的学习过程。通过优化 гипер参数,我们可以提高代理的总表现,以更好地监测患者健康状态。我们的人工智能驱动的患者监测系统具有许多优势,包括能够处理复杂和不确定的环境、适应变化的患者状况,以及不需要外部监督而行动。