cs.AI - 2023-07-12

DSSE: a drone swarm search environment

  • paper_url: http://arxiv.org/abs/2307.06240
  • repo_url: https://github.com/pfe-embraer/drone-swarm-search
  • paper_authors: Manuel Castanares, Luis F. S. Carrete, Enrico F. Damiani, Leonardo D. M. de Abreu, José Fernando B. Brancalion, Fabrício J. Barth
  • for: 这个项目是用于研究基于可变概率输入的奖励学习算法的环境,用于搜索失事人员。
  • methods: 该项目使用多代理(或单代理)奖励学习算法,其中代理(飞机)不知道目标(失事人员)的位置,也不会根据自己与目标之间的距离 полу receive 奖励。然而,代理会收到地图中各个单元的目标概率。
  • results: 该项目的目的是通过使用动态概率作为输入,研究奖励学习算法的性能。
    Abstract The Drone Swarm Search project is an environment, based on PettingZoo, that is to be used in conjunction with multi-agent (or single-agent) reinforcement learning algorithms. It is an environment in which the agents (drones), have to find the targets (shipwrecked people). The agents do not know the position of the target and do not receive rewards related to their own distance to the target(s). However, the agents receive the probabilities of the target(s) being in a certain cell of the map. The aim of this project is to aid in the study of reinforcement learning algorithms that require dynamic probabilities as inputs.
    摘要 《飞行群体搜索项目》是基于《喂喂的 zoo》环境,用于与多体(或单体)奖励学习算法结合使用。在这个环境中,代理(飞行器)需要找到目标(船难人),但它们不知道目标的位置,也不会根据自己与目标之间的距离获得奖励。然而,代理会收到目标的可能存在于地图中的维度概率。该项目的目标是为奖励学习算法,以 dynamic probabilities 作为输入进行研究。

Testing different Log Bases For Vector Model Weighting Technique

  • paper_url: http://arxiv.org/abs/2307.06213
  • repo_url: None
  • paper_authors: Kamel Assaf
  • for: 本研究旨在测试TF-IDF权重技术的效果,并研究不同的底数对Vector模型的表现有优化的影响。
  • methods: 本研究使用MED、CRAN、NPL、LISA和CISI测试集,这些测试集是由科学家专门为数据信息检索系统实验而整理的。研究人员使用TF-IDF权重技术,并在不同的底数(0.1-100.0)下计算IDF,以测试系统在不同权重值下的表现。
  • results: 研究人员发现,不同的底数对Vector模型的表现有很大的影响。在0.1-10的底数范围内,系统的准确率逐渐增长,但超过10的底数后,准确率开始下降。此外,在不同的测试集中,系统的准确率也有所差异。
    Abstract Information retrieval systems retrieves relevant documents based on a query submitted by the user. The documents are initially indexed and the words in the documents are assigned weights using a weighting technique called TFIDF which is the product of Term Frequency (TF) and Inverse Document Frequency (IDF). TF represents the number of occurrences of a term in a document. IDF measures whether the term is common or rare across all documents. It is computed by dividing the total number of documents in the system by the number of documents containing the term and then computing the logarithm of the quotient. By default, we use base 10 to calculate the logarithm. In this paper, we are going to test this weighting technique by using a range of log bases from 0.1 to 100.0 to calculate the IDF. Testing different log bases for vector model weighting technique is to highlight the importance of understanding the performance of the system at different weighting values. We use the documents of MED, CRAN, NPL, LISA, and CISI test collections that scientists assembled explicitly for experiments in data information retrieval systems.
    摘要 信息检索系统可以根据用户提交的查询来检索相关的文档。文档在系统中首先被索引,并且文档中的词语被赋予权重使用一种权重技术 called TFIDF(Term Frequency-Inverse Document Frequency)。TF表示文档中词语的次数,IDF则计算文档中词语的通用程度,即文档中词语的次数与整个系统中文档的数量之间的比率。我们在这篇论文中将使用一个范围的对数几何来计算IDF,从0.1到100.0。这样可以评估不同的对数几何对Vector模型的Weighting技术的影响。我们使用MED、CRAN、NPL、LISA和CISI测试集,这些测试集由科学家专门为数据信息检索系统实验而搜集的文档。

Self-Adaptive Large Language Model (LLM)-Based Multiagent Systems

  • paper_url: http://arxiv.org/abs/2307.06187
  • repo_url: None
  • paper_authors: Nathalia Nascimento, Paulo Alencar, Donald Cowan
  • for: 本研究旨在提高多代理系统(MAS)的自适应能力,以应对复杂的环境和需求。
  • methods: 本研究提出将大型自然语言模型(LLM)如GPT技术 integrate into MAS,以实现更高效的交互通信。方法基于MAP-K模型,实现监控、分析、规划和执行系统自适应。
  • results: 本研究实现了将LLM技术应用到MAS自适应中,实现了增加自适应系统的能力和效能。
    Abstract In autonomic computing, self-adaptation has been proposed as a fundamental paradigm to manage the complexity of multiagent systems (MASs). This achieved by extending a system with support to monitor and adapt itself to achieve specific concerns of interest. Communication in these systems is key given that in scenarios involving agent interaction, it enhances cooperation and reduces coordination challenges by enabling direct, clear information exchange. However, improving the expressiveness of the interaction communication with MASs is not without challenges. In this sense, the interplay between self-adaptive systems and effective communication is crucial for future MAS advancements. In this paper, we propose the integration of large language models (LLMs) such as GPT-based technologies into multiagent systems. We anchor our methodology on the MAPE-K model, which is renowned for its robust support in monitoring, analyzing, planning, and executing system adaptations in response to dynamic environments. We also present a practical illustration of the proposed approach, in which we implement and assess a basic MAS-based application. The approach significantly advances the state-of-the-art of self-adaptive systems by proposing a new paradigm for MAS self-adaptation of autonomous systems based on LLM capabilities.
    摘要 在自主计算中,自适应被提议为多代理系统(MAS)的基本思路,以扩展系统支持监测和适应自己以实现特定关注点。在这些系统中,交流是关键,因为在代理之间交流可以增强合作并减少协调挑战,使信息直接、明确地交换。然而,提高交流表达性的挑战仍然存在。在这种情况下,自适应系统和有效交流之间的互动是未来 MAS 的关键。在本文中,我们提议将大型自然语言模型(LLM),如 GPT 技术,integrated into multiagent systems。我们基于MAP-K模型,这是一种已知的稳定监测、分析、规划和执行系统变化的环境中的系统适应模型。我们还提供了一个实践的示例,在该示例中,我们实现和评估了一个基本的 MAS-based 应用。该方法在自适应系统领域中提供了一个新的思路,即基于 LLM 能力的 MAS 自适应思路。

Can Vision-Language Models be a Good Guesser? Exploring VLMs for Times and Location Reasoning

  • paper_url: http://arxiv.org/abs/2307.06166
  • repo_url: None
  • paper_authors: Gengyuan Zhang, Yurui Zhang, Kerui Zhang, Volker Tresp
  • for: This paper aims to investigate the ability of Vision-Language Models (VLMs) to reason with commonsense knowledge, specifically the ability to recognize times and locations based on visual cues.
  • methods: The authors propose a two-stage recognition and reasoning probing task to evaluate the ability of VLMs to recognize times and location-relevant features and reason about them. They use a well-curated image dataset called WikiTiLo, which contains images with rich socio-cultural cues.
  • results: The authors find that although VLMs can effectively retain relevant features in visual encoders, they still fail to make perfect reasoning. They also release their dataset and codes to facilitate future studies.
    Abstract Vision-Language Models (VLMs) are expected to be capable of reasoning with commonsense knowledge as human beings. One example is that humans can reason where and when an image is taken based on their knowledge. This makes us wonder if, based on visual cues, Vision-Language Models that are pre-trained with large-scale image-text resources can achieve and even outperform human's capability in reasoning times and location. To address this question, we propose a two-stage \recognition\space and \reasoning\space probing task, applied to discriminative and generative VLMs to uncover whether VLMs can recognize times and location-relevant features and further reason about it. To facilitate the investigation, we introduce WikiTiLo, a well-curated image dataset compromising images with rich socio-cultural cues. In the extensive experimental studies, we find that although VLMs can effectively retain relevant features in visual encoders, they still fail to make perfect reasoning. We will release our dataset and codes to facilitate future studies.
    摘要 “视觉语言模型(VLM)预期能够与人类一样理解常识知识。一个例子是人们可以根据自己的知识来判断图像的拍摄时间和地点。这让我们感到奇怪,是否可以通过视觉上下文来让VLM在理解时间和地点方面超越人类水平?为了回答这个问题,我们提出了两个阶段的认知和理解探测任务,应用于推理型和生成型VLM,以探索VLM是否可以识别时间和地点相关的特征,并进一步进行这些特征的理解。为了促进这些研究,我们提出了WikiTiLo图像集,这是一个包含了丰富社会文化提示的图像集。在广泛的实验研究中,我们发现,虽然VLM可以很好地保留视觉编码器中的相关特征,但它们仍然无法做出完美的理解。我们将发布我们的数据集和代码,以便未来的研究。”

Deep Generative Models for Physiological Signals: A Systematic Literature Review

  • paper_url: http://arxiv.org/abs/2307.06162
  • repo_url: None
  • paper_authors: Nour Neifar, Afef Mdhaffar, Achraf Ben-Hamadou, Mohamed Jmaiel
  • for: 本文系统性地审查了深度生成模型在生物学信号处理中的应用,尤其是电生字agram、电脑�agram、光敏脉agram和电动肌agram等生理信号。
  • methods: 本文分析了深度生成模型的最新状态艺术,以及其主要应用和挑战。
  • results: 本文对深度生成模型在生理信号处理中的应用进行了系统性的审查,并将主要的评价卷宗和生物学数据库进行了推荐。
    Abstract In this paper, we present a systematic literature review on deep generative models for physiological signals, particularly electrocardiogram, electroencephalogram, photoplethysmogram and electromyogram. Compared to the existing review papers, we present the first review that summarizes the recent state-of-the-art deep generative models. By analysing the state-of-the-art research related to deep generative models along with their main applications and challenges, this review contributes to the overall understanding of these models applied to physiological signals. Additionally, by highlighting the employed evaluation protocol and the most used physiological databases, this review facilitates the assessment and benchmarking of deep generative models.
    摘要 在这篇论文中,我们提供了深入的文献评审,涉及到生理信号的深度生成模型,特别是电听脉agram、电энце法agram、光折射 Plethysmogram 和电动肌agram。与现有的评审文献不同,我们的评审是第一个总结最新的深度生成模型的。通过分析最新的深度生成模型相关的研究,以及他们的主要应用和挑战,这篇评审对深度生成模型应用于生理信号的全面理解做出了贡献。此外,通过描述使用的评价协议和最常用的生理数据库,这篇评审为评价和比较深度生成模型提供了便捷的工具。

Reflective Hybrid Intelligence for Meaningful Human Control in Decision-Support Systems

  • paper_url: http://arxiv.org/abs/2307.06159
  • repo_url: None
  • paper_authors: Catholijn M. Jonker, Luciano Cavalcante Siebert, Pradeep K. Murukannaiah
  • for: This paper is written for the purpose of exploring the idea of self-reflective AI systems and their potential to increase meaningful human control over AI systems.
  • methods: The paper proposes a framework that integrates knowledge from psychology and philosophy with formal reasoning methods and machine learning approaches to create AI systems responsive to human values and social norms.
  • results: The paper argues that self-reflective AI systems can lead to self-reflective hybrid systems (human + AI), which can increase meaningful human control and empower human moral reasoning by providing comprehensible information and insights on possible human moral blind spots.
    Abstract With the growing capabilities and pervasiveness of AI systems, societies must collectively choose between reduced human autonomy, endangered democracies and limited human rights, and AI that is aligned to human and social values, nurturing collaboration, resilience, knowledge and ethical behaviour. In this chapter, we introduce the notion of self-reflective AI systems for meaningful human control over AI systems. Focusing on decision support systems, we propose a framework that integrates knowledge from psychology and philosophy with formal reasoning methods and machine learning approaches to create AI systems responsive to human values and social norms. We also propose a possible research approach to design and develop self-reflective capability in AI systems. Finally, we argue that self-reflective AI systems can lead to self-reflective hybrid systems (human + AI), thus increasing meaningful human control and empowering human moral reasoning by providing comprehensible information and insights on possible human moral blind spots.
    摘要 随着人工智能系统的能力和普遍性的增长,社会必须共同选择:削减人类自主权、损害民主制度和限制人权,或者AI系统与人类和社会价值相吻合,促进合作、韧性、知识和伦理行为。在这一章中,我们介绍了自适应AI系统的概念,以实现人类有意义的控制 над AI系统。我们专注于决策支持系统,并提出了一种整合心理学和哲学知识、正式逻辑方法和机器学习方法的框架,以创造响应人类价值和社会规范的AI系统。我们还提出了设计和开发自适应能力的可能的研究方法。最后,我们 argue that自适应AI系统可以导致人类+AI的自适应系统,从而增强人类有意义的控制和使人类伦理思维更加明了自己的人类道德盲点。

Maneuver Decision-Making Through Automatic Curriculum Reinforcement Learning Without Handcrafted Reward functions

  • paper_url: http://arxiv.org/abs/2307.06152
  • repo_url: None
  • paper_authors: Zhang Hong-Peng
  • for: 本研究旨在提出一种自动课程学习方法,以帮助无人战斗飞机在自主空中作战中做出有效的决策。
  • methods: 本文使用自动课程学习方法,将战斗机动决策分解为一系列逐渐增加的低难度课程,然后通过测试结果来调整课程。这种方法使得代理人逐渐从易度增加的低难度课程学习到更高难度的课程,从而学习完成有效的决策。
  • results: 实验结果显示,使用自动课程学习方法可以让代理人在不同的状态下做出有效的决策,包括跟踪、攻击和逃脱等。这些决策都是理解和可行的。
    Abstract Maneuver decision-making is the core of unmanned combat aerial vehicle for autonomous air combat. To solve this problem, we propose an automatic curriculum reinforcement learning method, which enables agents to learn effective decisions in air combat from scratch. The range of initial states are used for distinguishing curricula of different difficulty levels, thereby maneuver decision is divided into a series of sub-tasks from easy to difficult, and test results are used to change sub-tasks. As sub-tasks change, agents gradually learn to complete a series of sub-tasks from easy to difficult, enabling them to make effective maneuvering decisions to cope with various states without the need to spend effort designing reward functions. The ablation studied show that the automatic curriculum learning proposed in this article is an essential component for training through reinforcement learning, namely, agents cannot complete effective decisions without curriculum learning. Simulation experiments show that, after training, agents are able to make effective decisions given different states, including tracking, attacking and escaping, which are both rational and interpretable.
    摘要 <>输入文本翻译成简化中文。<>无人战斗飞机的战斗决策是核心问题。为解决这个问题,我们提出了自动课程强化学习方法,允许代理人从零开始学习有效的决策。不同Difficulty Level的初始状态范围用于分别定义不同的课程,以便将战斗决策分解成一系列从易到Difficult的子任务,并使用测试结果来改变子任务。随着子任务的变化,代理人逐渐学习完成从易到Difficult的一系列子任务,从而允许它们在不同的状态下完成有效的战斗决策,而无需额外设计奖励函数。试验表明,本文中提出的自动课程学习是训练通过强化学习的必要组成部分,即代理人无法完成有效的决策 без curriculum learning。在实验中,经过训练后,代理人能够在不同的状态下做出有效的决策,包括追踪、攻击和逃脱,这些决策都是合理和可解释的。

SayPlan: Grounding Large Language Models using 3D Scene Graphs for Scalable Task Planning

  • paper_url: http://arxiv.org/abs/2307.06135
  • repo_url: None
  • paper_authors: Krishan Rana, Jesse Haviland, Sourav Garg, Jad Abou-Chakra, Ian Reid, Niko Suenderhauf
  • for: 这篇论文的目的是发展一个可扩展的大型语言模型(LLM)基于的大型任务观念构成器,以便在多层、多房、多室环境中执行任务。
  • methods: 我们提出了一个可扩展的方法,使用3D scene graph(3DSG)表示,并将LLM应用在这些表示上进行搜寻和规划。我们还使用了一个传统的路径规划器来缩短LLM的规划时间,并导入了一个迭代的重规划管道,以更新初始的规划,并对于无法执行的动作和规划失败进行调整。
  • results: 我们在两个大规模环境中进行了评估,这些环境包括3层、36个房间、140个物品,并示出我们的方法可以将大规模、长期任务观念构成器转换为mobile manipulator robot可以执行的实际动作。
    Abstract Large language models (LLMs) have demonstrated impressive results in developing generalist planning agents for diverse tasks. However, grounding these plans in expansive, multi-floor, and multi-room environments presents a significant challenge for robotics. We introduce SayPlan, a scalable approach to LLM-based, large-scale task planning for robotics using 3D scene graph (3DSG) representations. To ensure the scalability of our approach, we: (1) exploit the hierarchical nature of 3DSGs to allow LLMs to conduct a semantic search for task-relevant subgraphs from a smaller, collapsed representation of the full graph; (2) reduce the planning horizon for the LLM by integrating a classical path planner and (3) introduce an iterative replanning pipeline that refines the initial plan using feedback from a scene graph simulator, correcting infeasible actions and avoiding planning failures. We evaluate our approach on two large-scale environments spanning up to 3 floors, 36 rooms and 140 objects, and show that our approach is capable of grounding large-scale, long-horizon task plans from abstract, and natural language instruction for a mobile manipulator robot to execute.
    摘要 To ensure the scalability of our approach, we leverage the hierarchical nature of 3DSGs to allow LLMs to conduct a semantic search for task-relevant subgraphs from a smaller, collapsed representation of the full graph. We also reduce the planning horizon for the LLM by integrating a classical path planner and introduce an iterative replanning pipeline that refines the initial plan using feedback from a scene graph simulator, correcting infeasible actions and avoiding planning failures.We evaluate our approach on two large-scale environments spanning up to 3 floors, 36 rooms, and 140 objects, and show that our approach is capable of grounding large-scale, long-horizon task plans from abstract, natural language instruction for a mobile manipulator robot to execute.

Guided Bottom-Up Interactive Constraint Acquisition

  • paper_url: http://arxiv.org/abs/2307.06126
  • repo_url: https://github.com/dimosts/activeconlearn
  • paper_authors: Dimos Tsouros, Senne Berden, Tias Guns
  • for: 提高 constraint acquisition (CA) 系统的效率,使其更能够快速地模型约束满足问题。
  • methods: 我们提出了两种新方法来改进 CA 系统的效率,包括底层方法名为 GrowAcq,可以减少用户等待时间,并处理更大的候选约束集。第二种方法是根据概率来引导查询生成,可以减少需要的查询数量,并且可以使用公开 accessible CP 解决方案来生成查询。
  • results: 我们的提议方法比状态艺术 CA 方法更高效,可以减少查询数量达到 60%。我们的方法在候选约束集是 50 倍于常见 литературе的情况下也表现良好。
    Abstract Constraint Acquisition (CA) systems can be used to assist in the modeling of constraint satisfaction problems. In (inter)active CA, the system is given a set of candidate constraints and posts queries to the user with the goal of finding the right constraints among the candidates. Current interactive CA algorithms suffer from at least two major bottlenecks. First, in order to converge, they require a large number of queries to be asked to the user. Second, they cannot handle large sets of candidate constraints, since these lead to large waiting times for the user. For this reason, the user must have fairly precise knowledge about what constraints the system should consider. In this paper, we alleviate these bottlenecks by presenting two novel methods that improve the efficiency of CA. First, we introduce a bottom-up approach named GrowAcq that reduces the maximum waiting time for the user and allows the system to handle much larger sets of candidate constraints. It also reduces the total number of queries for problems in which the target constraint network is not sparse. Second, we propose a probability-based method to guide query generation and show that it can significantly reduce the number of queries required to converge. We also propose a new technique that allows the use of openly accessible CP solvers in query generation, removing the dependency of existing methods on less well-maintained custom solvers that are not publicly available. Experimental results show that our proposed methods outperform state-of-the-art CA methods, reducing the number of queries by up to 60%. Our methods work well even in cases where the set of candidate constraints is 50 times larger than the ones commonly used in the literature.
    摘要 优化约束检索(CA)系统可以帮助模型约束满足问题。在互动式CA中,系统会给用户提供一组候选约束,然后向用户提出问题,以找到正确的约束之一。现有的互动式CA算法受到至少两大瓶颈。首先,以让系统 converge 需要让用户提供大量的问题。其次,它们无法处理大量的候选约束,因为这会导致用户等待时间过长。因此,用户需要对系统中的约束有很好的知识。在这篇论文中,我们解决这些瓶颈,并提出了两种新的方法来提高CA的效率。首先,我们引入了底层方法 named GrowAcq,它可以降低用户等待时间的最大值,并让系统处理大量的候选约束。它还可以降低对于目标约束网络不稀有的问题中的总数量。其次,我们提出了基于概率的方法来引导问题生成,并证明它可以减少需要 converge 的问题数量。最后,我们提出了一种新的技术,使得可以使用公开 accessible CP 解决方案来生成问题,从而消除了现有方法对于不太养的自定义解决方案的依赖。我们的实验结果表明,我们的提议方法可以与当前状态的CA方法进行比较,减少问题数量达到 60%。我们的方法在候选约束集是 50 倍于常见文献中的情况下也能够良好工作。

Learning Hierarchical Interactive Multi-Object Search for Mobile Manipulation

  • paper_url: http://arxiv.org/abs/2307.06125
  • repo_url: https://github.com/robot-learning-freiburg/HIMOS
  • paper_authors: Fabian Schmalstieg, Daniel Honerkamp, Tim Welschehold, Abhinav Valada
  • for: 本研究旨在开发一种能够在人类中心环境中搜寻多个物品的机器人系统,该系统需要同时掌握搜寻、导航和机械操作技能。
  • methods: 本研究使用了层次强化学习方法(HIMOS),该方法通过将搜寻、导航和机械操作技能组合在一起,以便在未经探索的环境中实现多个物品搜寻任务。
  • results: 实验和实际应用中的result表明,HIMOS可以在新环境中 zeroshot 协调机器人系统,并且具有较好的可靠性和灵活性。
    Abstract Existing object-search approaches enable robots to search through free pathways, however, robots operating in unstructured human-centered environments frequently also have to manipulate the environment to their needs. In this work, we introduce a novel interactive multi-object search task in which a robot has to open doors to navigate rooms and search inside cabinets and drawers to find target objects. These new challenges require combining manipulation and navigation skills in unexplored environments. We present HIMOS, a hierarchical reinforcement learning approach that learns to compose exploration, navigation, and manipulation skills. To achieve this, we design an abstract high-level action space around a semantic map memory and leverage the explored environment as instance navigation points. We perform extensive experiments in simulation and the real-world that demonstrate that HIMOS effectively transfers to new environments in a zero-shot manner. It shows robustness to unseen subpolicies, failures in their execution, and different robot kinematics. These capabilities open the door to a wide range of downstream tasks across embodied AI and real-world use cases.
    摘要 现有的对象搜索方法可以让机器人在自由的通道上进行搜索,但是在人类中心的不结构环境中,机器人经常需要 manipulate 环境以满足自己的需求。在这项工作中,我们介绍了一种新的互动式多对象搜索任务,在这个任务中,机器人需要打开门来导航房间,并在抽屉和柜中搜索目标对象。这些新的挑战需要机器人结合探索、导航和 manipulate 技能。我们提出了一种层次学习扩进方法,即 HIMOS,它可以学习拼接探索、导航和 manipulate 技能。为了实现这一点,我们设计了一个抽象的高级动作空间,基于 semantic map 的快照,并利用已经探索的环境作为实例导航点。我们在 simulate 和实际场景中进行了广泛的实验,并证明了 HIMOS 可以适应新环境,并且在零基础情况下转移。它具有对不见的补做、执行失败和不同机器人骨干的 Robustness。这些能力开启了许多下游任务和实际应用场景。

TreeFormer: a Semi-Supervised Transformer-based Framework for Tree Counting from a Single High Resolution Image

  • paper_url: http://arxiv.org/abs/2307.06118
  • repo_url: https://github.com/haaclassic/treeformer
  • paper_authors: Hamed Amini Amirkolaee, Miaojing Shi, Mark Mulligan
  • for: 本文提出了一个semi-supervised transformer-based树数计算框架,用于自动树密度估计和计数,从而减少了远程感知图像中树注释的成本。
  • methods: 该方法首先开发了一个基于transformer块的pyramid树表示模块,用于提取多尺度特征。接着, Contextual attention-based feature fusion和树密度预测模块被设计来利用编码阶段中提取的强大特征来估计树密度图。此外,我们还提出了一种pyramid学习策略,包括本地树密度一致性和本地树数排名损失,以使用无标注图像进行训练。
  • results: 我们的TreeFormer模型在两个 benchmark树数计算数据集(Jiangsu和Yosemite)以及我们自己创建的新数据集(KCL-London)上进行了评估,并与现有的semi-supervised方法进行了比较。结果显示,我们的TreeFormer模型在同一个设定下出于state of the art semi-supervised方法,并在同样使用同样多的标注图像下超越了fully-supervised方法。代码和数据集可以在https://github.com/HAAClassic/TreeFormer上下载。
    Abstract Automatic tree density estimation and counting using single aerial and satellite images is a challenging task in photogrammetry and remote sensing, yet has an important role in forest management. In this paper, we propose the first semisupervised transformer-based framework for tree counting which reduces the expensive tree annotations for remote sensing images. Our method, termed as TreeFormer, first develops a pyramid tree representation module based on transformer blocks to extract multi-scale features during the encoding stage. Contextual attention-based feature fusion and tree density regressor modules are further designed to utilize the robust features from the encoder to estimate tree density maps in the decoder. Moreover, we propose a pyramid learning strategy that includes local tree density consistency and local tree count ranking losses to utilize unlabeled images into the training process. Finally, the tree counter token is introduced to regulate the network by computing the global tree counts for both labeled and unlabeled images. Our model was evaluated on two benchmark tree counting datasets, Jiangsu, and Yosemite, as well as a new dataset, KCL-London, created by ourselves. Our TreeFormer outperforms the state of the art semi-supervised methods under the same setting and exceeds the fully-supervised methods using the same number of labeled images. The codes and datasets are available at https://github.com/HAAClassic/TreeFormer.
    摘要 自动计算树密度和数量使用单一飞行和卫星图像是摄ogrammetry和远程感知领域中的挑战,但对森林管理具有重要的作用。在这篇论文中,我们提出了首个半监督式 transformer 框架 для树数计算,可以减少 Remote sensing 图像上的昂贵树注释。我们的方法,称为 TreeFormer,首先开发了基于 transformer 块的 pyramid 树表示模块,以EXTRACT 多尺度特征 во время编码阶段。接着,我们设计了基于上下文注意力的特征融合和树密度预测模块,以利用编码器中的强特征来估算树密度地图。此外,我们还提出了一种 pyramid 学习策略,包括地方树密度一致性和地方树数排名损失函数,以利用无标注图像进行训练过程。最后,我们引入了树计数卡,以计算树 counts 的全局值,并且用此值来调控网络。我们的模型在 Jiangsu 和 Yosemite 两个标准树数据集上进行了评估,以及我们自己创建的 KCL-London 数据集。结果显示,我们的 TreeFormer 在同一个设定下超过了现有的半监督式方法,并且与完全监督方法使用同样多的标注图像进行训练时达到了更高的性能。代码和数据集可以在 上获取。

CLAIMED – the open source framework for building coarse-grained operators for accelerated discovery in science

  • paper_url: http://arxiv.org/abs/2307.06824
  • repo_url: https://github.com/claimed-framework/component-library
  • paper_authors: Romeo Kienzler, Rafflesia Khan, Jerome Nilmeier, Ivan Nesic, Ibrahim Haddad
  • for: 这篇论文的目的是解决现代数据驱动科学中重复性和可重用性的挑战,帮助科学家更好地重新运行和验证实验。
  • methods: 这篇论文使用了CLAIMED框架,该框架可以帮助科学家从existingscientific operators库中重新组合 workflows,以实现可重用的科学数据处理代码。CLAIMED支持多种编程语言、科学库和执行环境,因此可以在不同的平台上使用。
  • results: 这篇论文通过使用CLAIMED框架,实现了可重用的科学数据处理代码的重新组合和执行,提高了现代数据驱动科学中重复性和可重用性的能力。
    Abstract In modern data-driven science, reproducibility and reusability are key challenges. Scientists are well skilled in the process from data to publication. Although some publication channels require source code and data to be made accessible, rerunning and verifying experiments is usually hard due to a lack of standards. Therefore, reusing existing scientific data processing code from state-of-the-art research is hard as well. This is why we introduce CLAIMED, which has a proven track record in scientific research for addressing the repeatability and reusability issues in modern data-driven science. CLAIMED is a framework to build reusable operators and scalable scientific workflows by supporting the scientist to draw from previous work by re-composing workflows from existing libraries of coarse-grained scientific operators. Although various implementations exist, CLAIMED is programming language, scientific library, and execution environment agnostic.
    摘要 现代数据驱动科学中,重复性和可重用性是关键问题。科学家们具备了从数据到出版的过程的技能,但是一些出版渠道要求提供源代码和数据以便重新运行和验证实验,但实际上这是很困难的,因为没有标准。因此,我们介绍了 CLAIMED,它在科学研究中成功地解决了现代数据驱动科学中重复性和可重用性问题。CLAIMED 是一个框架,用于建立可重用的运算员和扩展科学工作流程,通过支持科学家们将之前的工作重新组合成现有的库中的粗粒度科学运算员。尽管有多种实现,CLAIMED 是编程语言、科学库和执行环境无关的。

Quantitative CLTs in Deep Neural Networks

  • paper_url: http://arxiv.org/abs/2307.06092
  • repo_url: None
  • paper_authors: Stefano Favaro, Boris Hanin, Domenico Marinucci, Ivan Nourdin, Giovanni Peccati
  • for: 这个论文主要研究了一种具有随机 Gaussian 权重和偏置的完全连接神经网络的分布。
  • methods: 作者使用了随机 Gaussian 权重和偏置来研究这种神经网络的分布,并在不太强的假设下获得了量化的Bounds。
  • results: 研究结果表明,当网络宽度趋于无穷大时,这种神经网络的分布与相应的无穷宽 Gaussian 过程之间的距离将随着网络宽度的减少而减少,并且这个距离的减少速率取决于使用的度量。
    Abstract We study the distribution of a fully connected neural network with random Gaussian weights and biases in which the hidden layer widths are proportional to a large constant $n$. Under mild assumptions on the non-linearity, we obtain quantitative bounds on normal approximations valid at large but finite $n$ and any fixed network depth. Our theorems show both for the finite-dimensional distributions and the entire process, that the distance between a random fully connected network (and its derivatives) to the corresponding infinite width Gaussian process scales like $n^{-\gamma}$ for $\gamma>0$, with the exponent depending on the metric used to measure discrepancy. Our bounds are strictly stronger in terms of their dependence on network width than any previously available in the literature; in the one-dimensional case, we also prove that they are optimal, i.e., we establish matching lower bounds.
    摘要 我们研究一个全连接神经网络,其隐藏层宽度是一个大常数$n$。我们假设非线性函数的严格性,并取得了在大数字$n$和任意神经网络层数之间的量化上界。我们的定理表明,隐藏层宽度与神经网络宽度成正比,并且距离随着神经网络宽度的增加而增加,具体取值为$n^{-\gamma}$,其中$\gamma>0$。我们的上界是文献中已知的对宽度的依赖性较强,在一维情况下,我们还证明了它们是优化的,即我们建立了对应的下界。

VELMA: Verbalization Embodiment of LLM Agents for Vision and Language Navigation in Street View

  • paper_url: http://arxiv.org/abs/2307.06082
  • repo_url: https://github.com/raphael-sch/velma
  • paper_authors: Raphael Schumann, Wanrong Zhu, Weixi Feng, Tsu-Jui Fu, Stefan Riezler, William Yang Wang
  • for: 这个论文的目的是解决在实际环境中进行步骤决策是人工智能embodied最大的挑战之一,具体来说是视觉语言导航(VLN)任务。
  • methods: 这个论文使用了一种名为VELMA的机器人,它使用了语音描述的方式来提供下一步行动的准备。视觉信息被一个抽象管道处理,从人类写的导航指南中提取了标志性的地标,并使用CLIP来确定它们在当前的拓扑视图中的可见性。
  • results: 这个论文表明VELMA可以在Street View中成功遵循导航指南,只需要两个在线示例。进一步的 fine-tune 这个LLM Agent在一些千个示例后,可以达到25%-30%的任务完成度提升,比前一个状态的占据领先地位。
    Abstract Incremental decision making in real-world environments is one of the most challenging tasks in embodied artificial intelligence. One particularly demanding scenario is Vision and Language Navigation~(VLN) which requires visual and natural language understanding as well as spatial and temporal reasoning capabilities. The embodied agent needs to ground its understanding of navigation instructions in observations of a real-world environment like Street View. Despite the impressive results of LLMs in other research areas, it is an ongoing problem of how to best connect them with an interactive visual environment. In this work, we propose VELMA, an embodied LLM agent that uses a verbalization of the trajectory and of visual environment observations as contextual prompt for the next action. Visual information is verbalized by a pipeline that extracts landmarks from the human written navigation instructions and uses CLIP to determine their visibility in the current panorama view. We show that VELMA is able to successfully follow navigation instructions in Street View with only two in-context examples. We further finetune the LLM agent on a few thousand examples and achieve 25%-30% relative improvement in task completion over the previous state-of-the-art for two datasets.
    摘要 实际环境中的步骤决策是人工智能体现化的挑战之一,特别是视觉语言导航(VLN)。这需要视觉和自然语言理解能力,以及空间和时间逻辑能力。实体代理需要将导航指令的理解与现场环境观察相连接。虽然其他研究领域中LLMs的成果很印象,但是这是一个持续的问题。在这个工作中,我们提出了VELMA,一个具有视觉和自然语言理解能力的实体LLM代理。VELMA使用航向和视觉环境观察的描述作为下一步行动的Contextual prompt。视觉信息是通过将人写的导航指令中的地标抽出,使用CLIP来决定它们在当前的潘视图中的可见性。我们显示了VELMA可以在Street View中成功遵循导航指令,只需要两个内容示例。我们进一步精练了LLM代理,使其在几千个示例后 achieve 25%-30%的相对改善。

Assessment of the suitability of degradation models for the planning of CCTV inspections of sewer pipes

  • paper_url: http://arxiv.org/abs/2307.06341
  • repo_url: https://github.com/fidaeic/sewer-pred
  • paper_authors: Fidae El Morer, Stefan Wittek, Andreas Rausch
  • for: This paper aims to develop a methodology for planning inspections of sewer pipes based on degradation models that consider statistical and machine learning methods.
  • methods: The paper proposes using accuracy metrics, long-term degradation curves, and explainability to evaluate the suitability of different degradation models for inspection planning. The authors use ensemble models, Logistic Regression, and other methods to assess the pipes’ degradation.
  • results: The results show that while ensemble models have high accuracy, they are unable to infer long-term degradation curves. In contrast, the Logistic Regression model provides slightly less accurate results but can produce consistent degradation curves with high explainability. The authors demonstrate the efficiency of their methodology using a real-world use case.
    Abstract The degradation of sewer pipes poses significant economical, environmental and health concerns. The maintenance of such assets requires structured plans to perform inspections, which are more efficient when structural and environmental features are considered along with the results of previous inspection reports. The development of such plans requires degradation models that can be based on statistical and machine learning methods. This work proposes a methodology to assess their suitability to plan inspections considering three dimensions: accuracy metrics, ability to produce long-term degradation curves and explainability. Results suggest that although ensemble models yield the highest accuracy, they are unable to infer the long-term degradation of the pipes, whereas the Logistic Regression offers a slightly less accurate model that is able to produce consistent degradation curves with a high explainability. A use case is presented to demonstrate this methodology and the efficiency of model-based planning compared to the current inspection plan.
    摘要 水管系统的衰退对经济、环境和健康造成了重要的潜在影响。维护这些资产需要结构化的计划,以便进行定期检查,并且在结构和环境特征以及前一次检查报告的结果相互考虑。开发这些计划需要衰退模型,这些模型可以基于统计学和机器学习方法。本工作提出了一种方法来评估这些模型的适用性,包括三个维度:准确指标、长期衰退曲线的生成能力和可解释性。结果表明, ensemble 模型具有最高准确性,但它们无法推断管道的长期衰退趋势,而逻辑回归模型则具有较低准确性,但能够生成高可解释性的衰退曲线。一个使用情况被提出,以示这种方法的效果和模型基本规划的效率比现有检查计划。

Machine Learning for Autonomous Vehicle’s Trajectory Prediction: A comprehensive survey, Challenges, and Future Research Directions

  • paper_url: http://arxiv.org/abs/2307.07527
  • repo_url: None
  • paper_authors: Vibha Bharilya, Neetesh Kumar
  • For: The paper is written to provide a comprehensive review of trajectory prediction methods for autonomous vehicles (AVs), with a focus on machine learning techniques such as deep learning and reinforcement learning.* Methods: The paper evaluates several deep learning-based techniques and reinforcement learning-based methods for trajectory prediction in the context of AVs. It also discusses the various datasets and evaluation metrics commonly used in these tasks.* Results: The paper provides a detailed analysis of the strengths and weaknesses of each method, and identifies challenges in the existing literature and potential research directions for future study.
    Abstract Autonomous Vehicles (AVs) have emerged as a promising solution by replacing human drivers with advanced computer-aided decision-making systems. However, for AVs to effectively navigate the road, they must possess the capability to predict the future behavior of nearby traffic participants, similar to the predictive driving abilities of human drivers. Building upon existing literature is crucial to advance the field and develop a comprehensive understanding of trajectory prediction methods in the context of automated driving. To address this need, we have undertaken a comprehensive review that focuses on trajectory prediction methods for AVs, with a particular emphasis on machine learning techniques including deep learning and reinforcement learning-based approaches. We have extensively examined over two hundred studies related to trajectory prediction in the context of AVs. The paper begins with an introduction to the general problem of predicting vehicle trajectories and provides an overview of the key concepts and terminology used throughout. After providing a brief overview of conventional methods, this review conducts a comprehensive evaluation of several deep learning-based techniques. Each method is summarized briefly, accompanied by a detailed analysis of its strengths and weaknesses. The discussion further extends to reinforcement learning-based methods. This article also examines the various datasets and evaluation metrics that are commonly used in trajectory prediction tasks. Encouraging an unbiased and objective discussion, we compare two major learning processes, considering specific functional features. By identifying challenges in the existing literature and outlining potential research directions, this review significantly contributes to the advancement of knowledge in the domain of AV trajectory prediction.
    摘要 自动驾驶车(AV)已经出现为解决问题,替代人类 drivers 使用高级计算机助成决策系统。然而,为了让 AV 能够有效地导航道路,它们必须具备预测周围交通参与者的未来行为的能力,类似于人类 drivers 的预测驾驶能力。在现有文献的基础之上,我们需要进一步发展预测方法,以便更好地理解自动驾驶预测方法的整体情况。为此,我们已经进行了全面的审查,重点是机器学习技术,包括深度学习和奖励学习等方法。我们对二百余篇与自动驾驶预测相关的研究进行了广泛的审查。本文从预测车辆轨迹的总问题开始,并提供了关键概念和术语的概述。然后,我们对传统方法进行了简要的概述,并对深度学习和奖励学习等方法进行了详细的分析。每种方法都被简要概述,并且附加了其优点和缺点的分析。讨论还扩展到奖励学习基于方法。本文还考虑了在预测任务中通常使用的数据集和评价指标。为促进不偏不倚的讨论,我们对两种主要的学习过程进行了比较,考虑了特定的功能特征。本文通过发现现有文献中的挑战和提出可能的研究方向,对自动驾驶预测领域的知识进行了重要贡献。

Visualization for Multivariate Gaussian Anomaly Detection in Images

  • paper_url: http://arxiv.org/abs/2307.06052
  • repo_url: None
  • paper_authors: Joao P C Bertoldo, David Arrustico
  • for: 这个论文是为了提出一种简化版的PaDiM方法,用于图像异常检测。
  • methods: 这个方法使用了一个多变量 Gaussian(MVG)分布来适应特征向量,并使用这些 Mahalanobis 距离作为异常分数。在这个框架中,我们还提出了一个中间步骤,即通过归一化变换来生成可视化的热图,以解释由 MVG 学习的特征。
  • results: 该方法在 MVTec-AD 数据集上进行了评估,结果表明了视觉模型验证的重要性,并提供了一些不可见的问题的视觉化解释。
    Abstract This paper introduces a simplified variation of the PaDiM (Pixel-Wise Anomaly Detection through Instance Modeling) method for anomaly detection in images, fitting a single multivariate Gaussian (MVG) distribution to the feature vectors extracted from a backbone convolutional neural network (CNN) and using their Mahalanobis distance as the anomaly score. We introduce an intermediate step in this framework by applying a whitening transformation to the feature vectors, which enables the generation of heatmaps capable of visually explaining the features learned by the MVG. The proposed technique is evaluated on the MVTec-AD dataset, and the results show the importance of visual model validation, providing insights into issues in this framework that were otherwise invisible. The visualizations generated for this paper are publicly available at https://doi.org/10.5281/zenodo.7937978.
    摘要 Simplified Chinese:这篇论文介绍了一种简化版的PaDiM方法,用于图像异常检测。该方法是使用一个单个多变量 Gaussian (MVG) 分布来适应 CNN 后处理的特征向量,并使用其 Mahalanobis 距离作为异常分数。在这个框架中,我们还提出了一个中间步骤,即对特征向量应用白化变换,以生成可以可见地解释 MVG 学习的热图。我们对 MVTec-AD 数据集进行评估,结果显示了视觉模型验证的重要性,并提供了不可见的问题的视觉化。论文中生成的视觉化可以在https://doi.org/10.5281/zenodo.7937978中获取。

  • paper_url: http://arxiv.org/abs/2307.06046
  • repo_url: None
  • paper_authors: Jincheng Zhou, Beatrice Bevilacqua, Bruno Ribeiro
  • for: 这篇论文旨在提供一种可以在新的测试多边Graph中预测缺失的关联(关系)的方法,并且能够在没有额外资讯的情况下进行预测。
  • methods: 本研究使用了一种基于双交换性(for nodes & relation types)的方法,即使用了双交换性来设计Graph Neural Networks(GNNs),并且进一步将双交换性扩展到多任务双交换性,以便在多个任务下进行预测。
  • results: 实验结果显示,本方法可以对于没有额外资讯的测试多边Graph进行有效的预测,并且能够对于整个新的关联类型进行预测,不需要更多的训练数据或额外资讯。
    Abstract The task of inductive link prediction in (discrete) attributed multigraphs infers missing attributed links (relations) between nodes in new test multigraphs. Traditional relational learning methods face the challenge of limited generalization to OOD test multigraphs containing both novel nodes and novel relation types not seen in training. Recently, under the only assumption that all relation types share the same structural predictive patterns (single task), Gao et al. (2023) proposed an OOD link prediction method using the theoretical concept of double exchangeability (for nodes & relation types), in contrast to the (single) exchangeability (only for nodes) used to design Graph Neural Networks (GNNs). In this work we further extend the double exchangeability concept to multi-task double exchangeability, where we define link prediction in attributed multigraphs that can have distinct and potentially conflicting predictive patterns for different sets of relation types (multiple tasks). Our empirical results on real-world datasets demonstrate that our approach can effectively generalize to entirely new relation types in test, without access to additional information, yielding significant performance improvements over existing methods.
    摘要 假设我们有一个批处理图(discrete attributed multigraph),其中每个节点都有一些特征(attributes)。Link prediction task在新的测试图中推断缺失的关系(relations)。传统的关系学习方法面临的挑战是对外部数据(out-of-distribution,OOD)测试图中的新节点和关系类型进行泛化。近年来,GAO等人(2023)提出了一种OOD链接预测方法,基于理论概念“双交换性”(double exchangeability),而不是传统的单一交换性(single exchangeability),用于设计图 neural networks(GNNs)。在这个工作中,我们进一步扩展了双交换性概念到多任务双交换性,其中我们定义在彩色图中预测缺失关系的任务,可能有多个任务,每个任务可能有不同的预测模式。我们的实验结果表明,我们的方法可以有效地泛化到整个新关系类型,无需访问额外信息,从而实现显著性能提升。

AI-Generated Imagery: A New Era for the `Readymade’

  • paper_url: http://arxiv.org/abs/2307.06033
  • repo_url: None
  • paper_authors: Amy Smith, Michael Cook
  • for: 本研究旨在探讨 digitization 技术生成的图像是如何被称为艺术作品。
  • methods: 本文使用现有的哲学框架和语言理论来建议一些 AI 生成的图像可以被视为“准备好的”艺术作品。
  • results: 研究表明,一些 AI 生成的图像具有一定的艺术性,并且可以被视为艺术作品。
    Abstract While the term `art' defies any concrete definition, this paper aims to examine how digital images produced by generative AI systems, such as Midjourney, have come to be so regularly referred to as such. The discourse around the classification of AI-generated imagery as art is currently somewhat homogeneous, lacking the more nuanced aspects that would apply to more traditional modes of artistic media production. This paper aims to bring important philosophical considerations to the surface of the discussion around AI-generated imagery in the context of art. We employ existing philosophical frameworks and theories of language to suggest that some AI-generated imagery, by virtue of its visual properties within these frameworks, can be presented as `readymades' for consideration as art.
    摘要 “艺术”(art)这个术语无法准确定义,这篇论文旨在研究如何使用生成AI系统,如Midjourney生成的数字图像被称为“艺术”的现象。现在关于AI生成图像是否为艺术的讨论,存在一定的同化性,缺乏传统艺术媒体生产中更加细腻的方面。这篇论文想要把关于AI生成图像在艺术领域的哲学考虑问题推到前台。我们利用现有的哲学框架和语言理论,提出一些AI生成图像,因其视觉特点,可以被视为“准备好的”艺术作品。

An Effective and Efficient Time-aware Entity Alignment Framework via Two-aspect Three-view Label Propagation

  • paper_url: http://arxiv.org/abs/2307.06013
  • repo_url: None
  • paper_authors: Li Cai, Xin Mao, Youshao Xiao, Changxu Wu, Man Lan
  • for: 提高知识融合的适用性和精度,即实现高质量的实体对应关系检索 between 不同的知识图(KG)。
  • methods: 提出了一种非神经网络基于的高效精度的实体对应关系检索方法,包括两个关键组成部分:一是两个视角三个观察符的标签卷积,二是稀疏相似度与时间约束的权重学习。
  • results: 经过广泛的实验表明,提出的方法在公共数据集上显著超越了现有的最佳方法,并且时间占用在最多只有毫秒级,不超过10%的TEA方法时间占用。
    Abstract Entity alignment (EA) aims to find the equivalent entity pairs between different knowledge graphs (KGs), which is crucial to promote knowledge fusion. With the wide use of temporal knowledge graphs (TKGs), time-aware EA (TEA) methods appear to enhance EA. Existing TEA models are based on Graph Neural Networks (GNN) and achieve state-of-the-art (SOTA) performance, but it is difficult to transfer them to large-scale TKGs due to the scalability issue of GNN. In this paper, we propose an effective and efficient non-neural EA framework between TKGs, namely LightTEA, which consists of four essential components: (1) Two-aspect Three-view Label Propagation, (2) Sparse Similarity with Temporal Constraints, (3) Sinkhorn Operator, and (4) Temporal Iterative Learning. All of these modules work together to improve the performance of EA while reducing the time consumption of the model. Extensive experiments on public datasets indicate that our proposed model significantly outperforms the SOTA methods for EA between TKGs, and the time consumed by LightTEA is only dozens of seconds at most, no more than 10% of the most efficient TEA method.
    摘要 Entity alignment (EA) 目标是找到不同知识Graph (KG) 中相应的实体对,这对知识融合具有关键作用。随着时间知识Graph (TKG) 的广泛使用,时间意识的 EA 方法 (TEA) 得到推广。现有的 TEA 模型基于图神经网络 (GNN) ,实现了状态前景 (SOTA) 性能,但是对大规模 TKG 的扩展存在可扩展性问题。在本文中,我们提出了一种高效和高效的非神经 EA 框架 между TKGs,即 LightTEA,该框架包括四个关键组成部分:1. 两面三视Label协同传播2. 稀疏相似度 temporal 约束3. Sinkhorn 算子4. 时间迭代学习这些模块结合起来,以提高 EA 性能,同时降低模型的时间消耗。我们在公共数据集上进行了广泛的实验,结果显示,我们提出的模型在 EA между TKGs 方面具有显著优势,并且模型的时间消耗只有毫不到十个秒,最多只有最高效的 TEA 方法的十分之一。

Pathway: a fast and flexible unified stream data processing framework for analytical and Machine Learning applications

  • paper_url: http://arxiv.org/abs/2307.13116
  • repo_url: None
  • paper_authors: Michal Bartoszkiewicz, Jan Chorowski, Adrian Kosowski, Jakub Kowalski, Sergey Kulik, Mateusz Lewandowski, Krzysztof Nowicki, Kamil Piechowiak, Olivier Ruas, Zuzanna Stamirowska, Przemyslaw Uznanski
  • for: 本研究旨在解决物理经济数据分析和处理中的挑战,包括互联网对象(IoT)和企业系统生成的数据流。
  • methods: 本研究使用了一种新的数据处理框架,叫做Pathway,可以处理有约束和无约束数据流。Pathway使用Python和Python/SQL工作流程,并由rust编程语言实现分布式增量数据流。
  • results: 本研究表明,Pathway在批处理和流处理上都能够超越当前行业框架的性能。此外,Pathway还可以处理一些现成industry框架无法解决的流处理用例,如流动迭代图算法(PageRank等)。
    Abstract We present Pathway, a new unified data processing framework that can run workloads on both bounded and unbounded data streams. The framework was created with the original motivation of resolving challenges faced when analyzing and processing data from the physical economy, including streams of data generated by IoT and enterprise systems. These required rapid reaction while calling for the application of advanced computation paradigms (machinelearning-powered analytics, contextual analysis, and other elements of complex event processing). Pathway is equipped with a Table API tailored for Python and Python/SQL workflows, and is powered by a distributed incremental dataflow in Rust. We describe the system and present benchmarking results which demonstrate its capabilities in both batch and streaming contexts, where it is able to surpass state-of-the-art industry frameworks in both scenarios. We also discuss streaming use cases handled by Pathway which cannot be easily resolved with state-of-the-art industry frameworks, such as streaming iterative graph algorithms (PageRank, etc.).
    摘要 我们现在宣布Pathway,一个新的统一资料处理框架,可以在bounded和unbounded数据流中运行工作负载。这个框架是为了解决物理经济中资料分析和处理的挑战,包括来自IoT和企业系统的数据流,需要快速反应,并且需要应用进阶计算理论(机器学习驱动分析、 контек斯特义分析等)。Pathway拥有适合Python和Python/SQL工作流程的表格API,并且由rust语言所构成的分布式增量数据流程所动力。我们详细描述这个系统,并提供了实验结果,证明其在批处理和流处理上都能够超越现有的产业框架,同时也能够处理流程中的迭代图算法(PageRank等)。

Transformers in Reinforcement Learning: A Survey

  • paper_url: http://arxiv.org/abs/2307.05979
  • repo_url: None
  • paper_authors: Pranav Agarwal, Aamer Abdul Rahman, Pierre-Luc St-Charles, Simon J. D. Prince, Samira Ebrahimi Kahou
  • for: This paper explores the use of transformers in reinforcement learning (RL) to address challenges such as unstable training, credit assignment, lack of interpretability, and partial observability.
  • methods: The paper discusses the properties of transformers and their variants, and how they can be applied to various aspects of RL, including representation learning, transition and reward function modeling, and policy optimization.
  • results: The paper presents a broad overview of how transformers have been adapted for several applications, including robotics, medicine, language modeling, cloud computing, and combinatorial optimization, and discusses the limitations of using transformers in RL.
    Abstract Transformers have significantly impacted domains like natural language processing, computer vision, and robotics, where they improve performance compared to other neural networks. This survey explores how transformers are used in reinforcement learning (RL), where they are seen as a promising solution for addressing challenges such as unstable training, credit assignment, lack of interpretability, and partial observability. We begin by providing a brief domain overview of RL, followed by a discussion on the challenges of classical RL algorithms. Next, we delve into the properties of the transformer and its variants and discuss the characteristics that make them well-suited to address the challenges inherent in RL. We examine the application of transformers to various aspects of RL, including representation learning, transition and reward function modeling, and policy optimization. We also discuss recent research that aims to enhance the interpretability and efficiency of transformers in RL, using visualization techniques and efficient training strategies. Often, the transformer architecture must be tailored to the specific needs of a given application. We present a broad overview of how transformers have been adapted for several applications, including robotics, medicine, language modeling, cloud computing, and combinatorial optimization. We conclude by discussing the limitations of using transformers in RL and assess their potential for catalyzing future breakthroughs in this field.
    摘要 transformers 对于自然语言处理、计算机视觉和机器人等领域有着显著的影响,提高了其他神经网络的性能。这篇评论探讨了 transformers 在奖励学习(RL)中的应用,认为它们可以解决一些挑战,如不稳定的训练、奖励分配、 interpretability 和部分可见性。我们从 RL 领域的概述开始,然后讨论了 классиical RL 算法的挑战。接着,我们介绍 transformers 的性能和其变种,并讨论了它们在 RL 中的应用。我们还讨论了使 transformers 更加可读性和效率的研究,使用可视化技术和高效的训练策略。在应用 transformers 时,需要根据特定应用场景进行定制。我们将展示一些在机器人、医学、语言模型、云计算和组合优化等领域中的 transformers 的应用。最后,我们讨论了使用 transformers 在 RL 中的局限性和未来可能的突破。

Towards Safe Self-Distillation of Internet-Scale Text-to-Image Diffusion Models

  • paper_url: http://arxiv.org/abs/2307.05977
  • repo_url: https://github.com/nannullna/safe-diffusion
  • paper_authors: Sanghyun Kim, Seohyeon Jung, Balhae Kim, Moonseok Choi, Jinwoo Shin, Juho Lee
  • for: 防止文本到图像扩散模型中的危险或版权内容生成
  • methods: 提出了基于自适应扩散模型的混合约束方法,通过自适应扩散模型自身的约束来引导噪声估计,使得生成的图像中减少了大量危险或版权内容
  • results: 比前一种方法更高效地减少了生成图像中的危险或版权内容,同时允许同时除掉多个概念,而不会降低整体图像质量
    Abstract Large-scale image generation models, with impressive quality made possible by the vast amount of data available on the Internet, raise social concerns that these models may generate harmful or copyrighted content. The biases and harmfulness arise throughout the entire training process and are hard to completely remove, which have become significant hurdles to the safe deployment of these models. In this paper, we propose a method called SDD to prevent problematic content generation in text-to-image diffusion models. We self-distill the diffusion model to guide the noise estimate conditioned on the target removal concept to match the unconditional one. Compared to the previous methods, our method eliminates a much greater proportion of harmful content from the generated images without degrading the overall image quality. Furthermore, our method allows the removal of multiple concepts at once, whereas previous works are limited to removing a single concept at a time.
    摘要 大规模图像生成模型,具有吸引人艺术质量,受互联网庞大数据的支持,但也引发社会问题,这些模型可能生成有害或版权内容。这些偏见和有害性在训练过程中产生,难以完全去除,成为安全部署这些模型的重大障碍。在这篇论文中,我们提出了一种方法called SDD,用于防止文本到图像扩散模型中的问题内容生成。我们通过自我概拟扩散模型,使噪声估计条件于目标Removal概念与无条件噪声估计匹配。相比之前的方法,我们的方法可以完全除去大量有害内容,而不会影响整体图像质量。此外,我们的方法允许同时去除多个概念,而前一代工作只能一个概念一次去除。

VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models

  • paper_url: http://arxiv.org/abs/2307.05973
  • repo_url: None
  • paper_authors: Wenlong Huang, Chen Wang, Ruohan Zhang, Yunzhu Li, Jiajun Wu, Li Fei-Fei
  • for: 这 paper 是为了开发一种可以在不同的 manipulation task 上 Synthesize robot trajectory,以实现在 open-set instruction 和 open-set object 下进行物理交互。
  • methods: 该 paper 使用了 LLMs 的 reasoning 和 planning 能力,以及 VLM 的 visual-language 能力,将知识 compose 到 observation space 中,并使用 model-based planning 框架 synthesize closed-loop robot trajectory。
  • results: 该 paper 在 simulated 和 real-robot 环境中进行了大规模的研究,并达到了在 free-form natural language 下实现 everyday manipulation task 的能力。
    Abstract Large language models (LLMs) are shown to possess a wealth of actionable knowledge that can be extracted for robot manipulation in the form of reasoning and planning. Despite the progress, most still rely on pre-defined motion primitives to carry out the physical interactions with the environment, which remains a major bottleneck. In this work, we aim to synthesize robot trajectories, i.e., a dense sequence of 6-DoF end-effector waypoints, for a large variety of manipulation tasks given an open-set of instructions and an open-set of objects. We achieve this by first observing that LLMs excel at inferring affordances and constraints given a free-form language instruction. More importantly, by leveraging their code-writing capabilities, they can interact with a visual-language model (VLM) to compose 3D value maps to ground the knowledge into the observation space of the agent. The composed value maps are then used in a model-based planning framework to zero-shot synthesize closed-loop robot trajectories with robustness to dynamic perturbations. We further demonstrate how the proposed framework can benefit from online experiences by efficiently learning a dynamics model for scenes that involve contact-rich interactions. We present a large-scale study of the proposed method in both simulated and real-robot environments, showcasing the ability to perform a large variety of everyday manipulation tasks specified in free-form natural language. Project website: https://voxposer.github.io
    摘要 大型语言模型(LLM)有丰富的实用知识,可以用于机器人操作中的理解和观念。 despite progress, most still rely on pre-defined motion primitives to interact with the environment, which remains a major bottleneck. In this work, we aim to synthesize robot trajectories, i.e., a dense sequence of 6-DoF end-effector waypoints, for a large variety of manipulation tasks given an open-set of instructions and an open-set of objects. We achieve this by first observing that LLMs excel at inferring affordances and constraints given a free-form language instruction. More importantly, by leveraging their code-writing capabilities, they can interact with a visual-language model (VLM) to compose 3D value maps to ground the knowledge into the observation space of the agent. The composed value maps are then used in a model-based planning framework to zero-shot synthesize closed-loop robot trajectories with robustness to dynamic perturbations. We further demonstrate how the proposed framework can benefit from online experiences by efficiently learning a dynamics model for scenes that involve contact-rich interactions. We present a large-scale study of the proposed method in both simulated and real-robot environments, showcasing the ability to perform a large variety of everyday manipulation tasks specified in free-form natural language. Project website:

Giving Robots a Hand: Learning Generalizable Manipulation with Eye-in-Hand Human Video Demonstrations

  • paper_url: http://arxiv.org/abs/2307.05959
  • repo_url: None
  • paper_authors: Moo Jin Kim, Jiajun Wu, Chelsea Finn
  • for: 这篇论文旨在使用人工视频示例来提高机器人视omyotor策略的普适性。
  • methods: 这篇论文使用了人工视频示例,并通过对这些示例进行增强来增强机器人的普适性。
  • results: 该方法在八个真实世界任务中,使得机器人的抓取率提高了58%(绝对)的平均值,允许机器人在新的环境配置和新任务中进行普适的抓取。
    Abstract Eye-in-hand cameras have shown promise in enabling greater sample efficiency and generalization in vision-based robotic manipulation. However, for robotic imitation, it is still expensive to have a human teleoperator collect large amounts of expert demonstrations with a real robot. Videos of humans performing tasks, on the other hand, are much cheaper to collect since they eliminate the need for expertise in robotic teleoperation and can be quickly captured in a wide range of scenarios. Therefore, human video demonstrations are a promising data source for learning generalizable robotic manipulation policies at scale. In this work, we augment narrow robotic imitation datasets with broad unlabeled human video demonstrations to greatly enhance the generalization of eye-in-hand visuomotor policies. Although a clear visual domain gap exists between human and robot data, our framework does not need to employ any explicit domain adaptation method, as we leverage the partial observability of eye-in-hand cameras as well as a simple fixed image masking scheme. On a suite of eight real-world tasks involving both 3-DoF and 6-DoF robot arm control, our method improves the success rates of eye-in-hand manipulation policies by 58% (absolute) on average, enabling robots to generalize to both new environment configurations and new tasks that are unseen in the robot demonstration data. See video results at https://giving-robots-a-hand.github.io/ .
    摘要 眼手相机已经在视觉基于机器人操作中表现出了扩大样本效率和通用性的推荐。然而,对于机器人模仿,仍然是非常昂贵的收集大量专家示范数据。人类视频示范则比较便宜,因为它们消除了机器人 теле操作的专业知识的需求,并可以快速在各种场景中采集。因此,人类视频示范是学习普遍的机器人 manipulation 策略的有效数据源。在这项工作中,我们将宽频狭频的机器人模仿数据与广泛的无标签人类视频示范相结合,以大幅提高眼手视motor策略的普遍性。虽然机器人和人类数据之间存在明显的视觉领域差异,但我们的框架不需要使用任何明确的领域适应方法,因为我们利用眼手相机的偏见以及简单的固定图像遮盾。在八个实际任务中,我们的方法提高了眼手操作策略的成功率 by 58% (绝对) 的平均值,使机器人能够普遍应对新环境配置和新任务,这些任务在机器人示范数据中未被见过。请参考视频结果在

Automatically Reconciling the Trade-off between Prediction Accuracy and Earliness in Prescriptive Business Process Monitoring

  • paper_url: http://arxiv.org/abs/2307.05939
  • repo_url: None
  • paper_authors: Andreas Metzger, Tristan Kley, Aristide Rothweiler, Klaus Pohl
  • for: 这个论文是为了提供对业务过程监控中的决策支持,帮助过程管理者在进行预测和适应的过程中做出更好的决策。
  • methods: 这篇论文使用了不同的方法来平衡预测准确性和预测早晚的负面影响,包括使用不同的预测模型和数据集。
  • results: 研究结果显示,不同的方法在不同的情况下具有不同的效果,而且具体的选择哪种方法需要考虑具体的情况和目标。
    Abstract Prescriptive business process monitoring provides decision support to process managers on when and how to adapt an ongoing business process to prevent or mitigate an undesired process outcome. We focus on the problem of automatically reconciling the trade-off between prediction accuracy and prediction earliness in determining when to adapt. Adaptations should happen sufficiently early to provide enough lead time for the adaptation to become effective. However, earlier predictions are typically less accurate than later predictions. This means that acting on less accurate predictions may lead to unnecessary adaptations or missed adaptations. Different approaches were presented in the literature to reconcile the trade-off between prediction accuracy and earliness. So far, these approaches were compared with different baselines, and evaluated using different data sets or even confidential data sets. This limits the comparability and replicability of the approaches and makes it difficult to choose a concrete approach in practice. We perform a comparative evaluation of the main alternative approaches for reconciling the trade-off between prediction accuracy and earliness. Using four public real-world event log data sets and two types of prediction models, we assess and compare the cost savings of these approaches. The experimental results indicate which criteria affect the effectiveness of an approach and help us state initial recommendations for the selection of a concrete approach in practice.
    摘要 现有的文献中提出了多种方法来协调预测精度和预测早期性的负面关系。这些方法与不同的基准点进行比较,并使用不同的数据集或者even confidential data set进行评估。这限制了比较和复制性的可能性,使其难以在做实践中选择一个具体的方法。我们进行了对主要 altenative approaches的比较性评估。使用四个公共的实际世界事件日志数据集和两种预测模型,我们评估和比较这些方法的成本节省。实验结果表明哪些因素影响了方法的效果,并帮助我们提出初步的实践中的选择建议。

BiRP: Learning Robot Generalized Bimanual Coordination using Relative Parameterization Method on Human Demonstration

  • paper_url: http://arxiv.org/abs/2307.05933
  • repo_url: https://github.com/skylark0924/rofunc
  • paper_authors: Junjia Liu, Hengyi Sim, Chenzui Li, Fei Chen
  • for: 本研究旨在提出一种能够从人类示范学习(Learning from Demonstration,LfD)的简单易用的双手协调方法,以便应用于机器人大型抓取模型训练中。
  • methods: 本研究使用了分类型双手任务(leader-follower和synergistic coordination),并提出了一种相对参数化方法来从人类示范中学习这两种协调方式。该方法通过描述人类动作中协调变化的概率分布来表示协调。
  • results: 研究人员使用了人工动作和人类示范数据,并在一个人类形态机器人上进行了一系列实验,以证明该方法可以通过学习人类示范来实现新任务参数下的一致协调。
    Abstract Human bimanual manipulation can perform more complex tasks than a simple combination of two single arms, which is credited to the spatio-temporal coordination between the arms. However, the description of bimanual coordination is still an open topic in robotics. This makes it difficult to give an explainable coordination paradigm, let alone applied to robotics. In this work, we divide the main bimanual tasks in human daily activities into two types: leader-follower and synergistic coordination. Then we propose a relative parameterization method to learn these types of coordination from human demonstration. It represents coordination as Gaussian mixture models from bimanual demonstration to describe the change in the importance of coordination throughout the motions by probability. The learned coordinated representation can be generalized to new task parameters while ensuring spatio-temporal coordination. We demonstrate the method using synthetic motions and human demonstration data and deploy it to a humanoid robot to perform a generalized bimanual coordination motion. We believe that this easy-to-use bimanual learning from demonstration (LfD) method has the potential to be used as a data augmentation plugin for robot large manipulation model training. The corresponding codes are open-sourced in https://github.com/Skylark0924/Rofunc.
    摘要 人类双手把握可以完成更复杂的任务,比如单独的两只手不能做的,这是因为双手之间的空间时间协调。然而,人类双手协调的描述仍然是Robotics中的一个开放问题。在这种情况下,我们将主要的双手任务分为两类:领导者-跟进和协调协调。然后,我们提议一种相对参数化方法来学习这两种协调类型从人类示范中。它将协调表示为 Gaussian mixture models,从双手示范中描述变化的协调重要性。学习的协调表示可以通过新任务参数进行扩展,保证空间时间协调。我们使用人工动作和人类示范数据进行示例,并将其应用于人型机器人执行通用双手协调运动。我们认为这种易于使用的双手学习从示范(LfD)方法有可能被用作机器人大型抓取模型训练的数据增强插件。相关代码在https://github.com/Skylark0924/Rofunc上开源。

Emotion recognition based on multi-modal electrophysiology multi-head attention Contrastive Learning

  • paper_url: http://arxiv.org/abs/2308.01919
  • repo_url: None
  • paper_authors: Yunfei Guo, Tao Zhang, Wu Huang
  • For: The paper is written for researchers and practitioners in the field of emotion recognition and artificial intelligence, as well as those interested in multimodal electrophysiological signals and their applications.* Methods: The paper proposes a self-supervised contrastive learning-based multimodal emotion recognition method called ME-MHACL, which uses unlabeled electrophysiological signals and multi-head attention mechanisms to learn meaningful feature representations and improve recognition performance.* Results: The paper reports that the proposed ME-MHACL method outperformed existing benchmark methods in emotion recognition tasks and had good cross-individual generalization ability, as demonstrated through experiments on two public datasets (DEAP and MAHNOB-HCI).Here’s the information in Simplified Chinese text:* 为:该论文适用于人工智能领域的情绪识别和多modal生理学信号等领域的研究人员和实践者,以及关注情绪识别和生理学信号应用的人士。* 方法:该论文提出一种基于自我超vised contrastive learning的多modal情绪识别方法ME-MHACL,该方法可以从无标注的生理学信号中学习有用的特征表示,并使用多头注意机制进行特征融合,以提高识别性能。* 结果:论文报告称,提出的ME-MHACL方法在两个公共数据集(DEAP和MAHNOB-HCI)上的情绪识别任务中,与现有的参考方法相比,表现出了更好的跨个体普遍能力和识别性能。
    Abstract Emotion recognition is an important research direction in artificial intelligence, helping machines understand and adapt to human emotional states. Multimodal electrophysiological(ME) signals, such as EEG, GSR, respiration(Resp), and temperature(Temp), are effective biomarkers for reflecting changes in human emotions. However, using electrophysiological signals for emotion recognition faces challenges such as data scarcity, inconsistent labeling, and difficulty in cross-individual generalization. To address these issues, we propose ME-MHACL, a self-supervised contrastive learning-based multimodal emotion recognition method that can learn meaningful feature representations from unlabeled electrophysiological signals and use multi-head attention mechanisms for feature fusion to improve recognition performance. Our method includes two stages: first, we use the Meiosis method to group sample and augment unlabeled electrophysiological signals and design a self-supervised contrastive learning task; second, we apply the trained feature extractor to labeled electrophysiological signals and use multi-head attention mechanisms for feature fusion. We conducted experiments on two public datasets, DEAP and MAHNOB-HCI, and our method outperformed existing benchmark methods in emotion recognition tasks and had good cross-individual generalization ability.
    摘要 artificial intelligence的重要研究方向之一是情感识别,帮助机器人理解和适应人类情感状态。多Modal生物电Physiological(ME)信号,如EEG、GSR、呼吸(Resp)和体温(Temp),是人类情感变化的有效生物标志。然而,使用生物电信号进行情感识别存在数据缺乏、标签不一致和跨个体普遍化的问题。为解决这些问题,我们提出了ME-MHACL方法,它是一种基于自动学习的多Modal生物电情感识别方法。我们的方法包括两个阶段:第一个阶段,我们使用Meiosis方法将样本分组和增强无标签生物电信号,并设计了一个自我supervised Contrastive学习任务;第二个阶段,我们使用训练过的特征提取器对标签生物电信号进行特征融合,使用多头注意力机制进行特征融合。我们在DEAP和MAHNOB-HCI两个公共数据集上进行了实验,并证明了我们的方法在情感识别任务中的优于现有 referbenchmark方法,同时具有良好的跨个体普遍能力。

A New Dataset and Comparative Study for Aphid Cluster Detection

  • paper_url: http://arxiv.org/abs/2307.05929
  • repo_url: None
  • paper_authors: Tianxiao Zhang, Kaidong Li, Xiangyu Chen, Cuncong Zhong, Bo Luo, Ivan Grijalva Teran, Brian McCornack, Daniel Flippo, Ajay Sharda, Guanghui Wang
    for: This paper aims to estimate the aphid infestation level in sorghum fields by detecting aphid clusters using machine learning models.methods: The authors use millions of images taken in sorghum fields, manually select images with aphids, and annotate each aphid cluster in the images. They then crop the images into patches and create a labeled dataset with over 151,000 image patches to train and compare the performance of four state-of-the-art object detection models.results: The authors evaluate the performance of the four object detection models and compare their results, achieving an average precision of 84.1% and an average recall of 81.3%. They also demonstrate the effectiveness of their approach in estimating the aphid infestation level in sorghum fields.
    Abstract Aphids are one of the main threats to crops, rural families, and global food security. Chemical pest control is a necessary component of crop production for maximizing yields, however, it is unnecessary to apply the chemical approaches to the entire fields in consideration of the environmental pollution and the cost. Thus, accurately localizing the aphid and estimating the infestation level is crucial to the precise local application of pesticides. Aphid detection is very challenging as each individual aphid is really small and all aphids are crowded together as clusters. In this paper, we propose to estimate the infection level by detecting aphid clusters. We have taken millions of images in the sorghum fields, manually selected 5,447 images that contain aphids, and annotated each aphid cluster in the image. To use these images for machine learning models, we crop the images into patches and created a labeled dataset with over 151,000 image patches. Then, we implement and compare the performance of four state-of-the-art object detection models.
    摘要 螨虫是农业生产中的一大难题,对农家、全球食品安全也有很大的影响。化学防除虫害是农业生产中必不可少的一部分,但是应该尽量避免对环境和成本而过度使用化学药品。因此,准确地Local化螨虫和评估感染程度非常重要。螨虫检测非常困难,因为每个螨虫都很小,而且所有螨虫都会集结在一起。在这篇论文中,我们提议通过检测螨虫群体来估算感染程度。我们在甫酒田中拍摄了数百万张照片, manually选择了5,447张包含螨虫的照片,并对每个螨虫群体在图像中进行了标注。然后,我们使用这些图像进行机器学习模型的训练, cropped the images into patches and created a labeled dataset with over 151,000 image patches。最后,我们实现并比较了四种当前最佳对象检测模型的性能。

Reading Radiology Imaging Like The Radiologist

  • paper_url: http://arxiv.org/abs/2307.05921
  • repo_url: None
  • paper_authors: Yuhao Wang
  • for: 这个论文旨在提高自动放射报告生成的质量,使其包含更多的细节和精细的疾病描述。
  • methods: 该论文提出了一种基于疾病对准检索框架的方法,使用类似报告作为知识参照,并采用了事实一致描述生成器来生成更加准确和事实一致的疾病描述。
  • results: 该论文的实验结果表明,使用该方法可以生成更加精细和准确的放射报告,并且可以减少Visual和文本数据偏见。
    Abstract Automated radiology report generation aims to generate radiology reports that contain rich, fine-grained descriptions of radiology imaging. Compared with image captioning in the natural image domain, medical images are very similar to each other, with only minor differences in the occurrence of diseases. Given the importance of these minor differences in the radiology report, it is crucial to encourage the model to focus more on the subtle regions of disease occurrence. Secondly, the problem of visual and textual data biases is serious. Not only do normal cases make up the majority of the dataset, but sentences describing areas with pathological changes also constitute only a small part of the paragraph. Lastly, generating medical image reports involves the challenge of long text generation, which requires more expertise and empirical training in medical knowledge. As a result, the difficulty of generating such reports is increased. To address these challenges, we propose a disease-oriented retrieval framework that utilizes similar reports as prior knowledge references. We design a factual consistency captioning generator to generate more accurate and factually consistent disease descriptions. Our framework can find most similar reports for a given disease from the CXR database by retrieving a disease-oriented mask consisting of the position and morphological characteristics. By referencing the disease-oriented similar report and the visual features, the factual consistency model can generate a more accurate radiology report.
    摘要 自动化放射学报告生成目标是生成包含详细放射学影像描述的放射学报告。与自然图像领域中的图像描述不同,医疗图像具有只有小差异的疾病出现。由于这些小差异对放射学报告的重要性,因此需要鼓励模型更加注重细微的疾病区域。其次,图像和文本数据偏见问题严重。一般案例占大多数数据集,而描述疾病改变的句子也只占报告中的一小部分。最后,生成医学影像报告需要进行长文本生成,需要更多的专业知识和实践医学知识。因此,生成这些报告的难度更高。为解决这些挑战,我们提议一种疾病启发式检索框架,利用相似报告作为启发知识参考。我们设计了一个精准一致描述生成器,以生成更加准确和精准一致的疾病描述。我们的框架可以从CXR数据库中找到最相似的报告,并通过对疾病启发式掩码进行检索,使用视觉特征和疾病启发式报告进行匹配。通过对疾病启发式报告和视觉特征进行参照,Factual Consistency Model可以生成更加准确的放射学报告。

Close-up View synthesis by Interpolating Optical Flow

  • paper_url: http://arxiv.org/abs/2307.05913
  • repo_url: None
  • paper_authors: Xinyi Bai, Ze Wang, Lu Yang, Hong Cheng
  • for: 本文提出了一种实现近距离虚拟视角的方法,不需要深度信息和摄像头参数。
  • methods: 该方法使用光流来构建假3D投影,并通过反向光流计算获得任意虚拟视角。
  • results: 该方法可以在Google街景视图系统中实现高清晰和视觉准确的虚拟视角变换和放大,并且可以解决视角变换和放大所导致的视觉扭曲和图像模糊。
    Abstract The virtual viewpoint is perceived as a new technique in virtual navigation, as yet not supported due to the lack of depth information and obscure camera parameters. In this paper, a method for achieving close-up virtual view is proposed and it only uses optical flow to build parallax effects to realize pseudo 3D projection without using depth sensor. We develop a bidirectional optical flow method to obtain any virtual viewpoint by proportional interpolation of optical flow. Moreover, with the ingenious application of the optical-flow-value, we achieve clear and visual-fidelity magnified results through lens stretching in any corner, which overcomes the visual distortion and image blur through viewpoint magnification and transition in Google Street View system.
    摘要 <>虚拟视角被视为虚拟导航新技术,目前还没有支持,主要原因是缺乏深度信息和摄像头参数的不明确。在这篇论文中,我们提出了实现近距离虚拟视角的方法,只使用光流来构建质感效果,实现 Pseudo 3D 投影,不需要使用深度传感器。我们开发了双向光流方法,通过质感 interpolate 来获取任何虚拟视点。此外,我们通过光流值的创新应用,实现了高清晰度和视觉准确性的放大结果,在任何角落都可以清晰地看到,这些结果超越了视点放大和转换所带来的视觉扭曲和图像模糊。<>

Stability Guarantees for Feature Attributions with Multiplicative Smoothing

  • paper_url: http://arxiv.org/abs/2307.05902
  • repo_url: None
  • paper_authors: Anton Xue, Rajeev Alur, Eric Wong
  • for: 这个论文目的是提出一种稳定的特征归因方法,以确保模型的决策过程是可靠的。
  • methods: 这个论文使用了一种名为多项平滑(MuS)的缓和方法,以确保模型在掩蔽特征时的稳定性。此外,论文还使用了其他的标准缓和技术,并证明了MuS可以与任何分类器和特征归因方法结合使用。
  • results: 论文通过对视觉和语言模型进行测试,证明了MuS可以提供非致命的稳定性保证,并且可以与其他特征归因方法结合使用。
    Abstract Explanation methods for machine learning models tend to not provide any formal guarantees and may not reflect the underlying decision-making process. In this work, we analyze stability as a property for reliable feature attribution methods. We prove that relaxed variants of stability are guaranteed if the model is sufficiently Lipschitz with respect to the masking of features. To achieve such a model, we develop a smoothing method called Multiplicative Smoothing (MuS). We show that MuS overcomes theoretical limitations of standard smoothing techniques and can be integrated with any classifier and feature attribution method. We evaluate MuS on vision and language models with a variety of feature attribution methods, such as LIME and SHAP, and demonstrate that MuS endows feature attributions with non-trivial stability guarantees.
    摘要 机器学习模型的解释方法通常无法提供正式的保证和不能反映模型做出决策的下面过程。在这种工作中,我们研究稳定性作为可靠特征归因方法的属性。我们证明如果模型具有特定的 lipschitz 性,那么可以遮盖特征的模型会具有稳定性。为达到这种模型,我们开发了一种平滑方法called Multiplicative Smoothing(MuS)。我们表明了MuS可以超越标准平滑技术的理论限制,并且可以与任何分类器和特征归因方法集成。我们在视觉和语言模型中进行了多种特征归因方法的评估,包括LIME和SHAP,并证明了MuS可以为特征归因提供非常有用的稳定性保证。

PID-Inspired Inductive Biases for Deep Reinforcement Learning in Partially Observable Control Tasks

  • paper_url: http://arxiv.org/abs/2307.05891
  • repo_url: https://github.com/ianchar/gpide
  • paper_authors: Ian Char, Jeff Schneider
  • for: 这篇论文旨在探讨深度强化学习(RL)如何在数据alone下学习控制系统。
  • methods: 这篇论文使用了PID控制器的思想,提出了两种历史编码方法,其中一种直接使用PID特征,另一种是扩展这些核心思想,可以应用于任何控制任务。
  • results: 与之前的方法比较,这篇论文的政策可以更好地适应环境变化,并在跟踪任务上达到更高的性能。此外,这些政策在高维控制任务上的平均性能也高于过去的状态之 искус智能方法的1.7倍。
    Abstract Deep reinforcement learning (RL) has shown immense potential for learning to control systems through data alone. However, one challenge deep RL faces is that the full state of the system is often not observable. When this is the case, the policy needs to leverage the history of observations to infer the current state. At the same time, differences between the training and testing environments makes it critical for the policy not to overfit to the sequence of observations it sees at training time. As such, there is an important balancing act between having the history encoder be flexible enough to extract relevant information, yet be robust to changes in the environment. To strike this balance, we look to the PID controller for inspiration. We assert the PID controller's success shows that only summing and differencing are needed to accumulate information over time for many control tasks. Following this principle, we propose two architectures for encoding history: one that directly uses PID features and another that extends these core ideas and can be used in arbitrary control tasks. When compared with prior approaches, our encoders produce policies that are often more robust and achieve better performance on a variety of tracking tasks. Going beyond tracking tasks, our policies achieve 1.7x better performance on average over previous state-of-the-art methods on a suite of high dimensional control tasks.
    摘要

Ecosystem-level Analysis of Deployed Machine Learning Reveals Homogeneous Outcomes

  • paper_url: http://arxiv.org/abs/2307.05862
  • repo_url: None
  • paper_authors: Connor Toups, Rishi Bommasani, Kathleen A. Creel, Sarah H. Bana, Dan Jurafsky, Percy Liang
  • for: 本研究旨在探讨机器学习技术在社会中的影响,尤其是在不同情况下如何导致系统性失败。
  • methods: 本研究使用多种模式(文本、图像、语音)和11个数据集进行了评估,发现在不同的情况下,机器学习模型的部署常常导致系统性失败,即一些用户被所有模型都错误地分类。
  • results: 研究发现,尽管各个模型在人口水平上得到改进,但这些改进很少降低了系统性失败的频率。此外,研究还发现了新的种族差距在模型预测中,这些差距不存在于人类预测中。这些例子表明,生态系统级分析具有描述机器学习技术在社会中的影响的独特优势。
    Abstract Machine learning is traditionally studied at the model level: researchers measure and improve the accuracy, robustness, bias, efficiency, and other dimensions of specific models. In practice, the societal impact of machine learning is determined by the surrounding context of machine learning deployments. To capture this, we introduce ecosystem-level analysis: rather than analyzing a single model, we consider the collection of models that are deployed in a given context. For example, ecosystem-level analysis in hiring recognizes that a job candidate's outcomes are not only determined by a single hiring algorithm or firm but instead by the collective decisions of all the firms they applied to. Across three modalities (text, images, speech) and 11 datasets, we establish a clear trend: deployed machine learning is prone to systemic failure, meaning some users are exclusively misclassified by all models available. Even when individual models improve at the population level over time, we find these improvements rarely reduce the prevalence of systemic failure. Instead, the benefits of these improvements predominantly accrue to individuals who are already correctly classified by other models. In light of these trends, we consider medical imaging for dermatology where the costs of systemic failure are especially high. While traditional analyses reveal racial performance disparities for both models and humans, ecosystem-level analysis reveals new forms of racial disparity in model predictions that do not present in human predictions. These examples demonstrate ecosystem-level analysis has unique strengths for characterizing the societal impact of machine learning.
    摘要 machine learning traditionally studied at model level: researchers measure improve accuracy, robustness, bias, efficiency, other dimensions specific models. in practice, societal impact machine learning determined surrounding context machine learning deployments. to capture this, introduce ecosystem-level analysis: analyze collection models deployed given context. for example, ecosystem-level analysis hiring recognizes job candidate's outcomes not determined single hiring algorithm firm but collective decisions all firms applied. across three modalities (text, images, speech) 11 datasets, establish clear trend: deployed machine learning prone systemic failure, some users exclusively misclassified all models available. even individual models improve population level over time, find improvements rarely reduce prevalence systemic failure. instead, benefits improvements predominantly accrue individuals correctly classified other models. light these trends, consider medical imaging dermatology costs systemic failure especially high. traditional analyses reveal racial performance disparities both models humans, ecosystem-level analysis reveals new forms racial disparity model predictions do not present human predictions. these examples demonstrate ecosystem-level analysis unique strengths characterizing societal impact machine learning.

FAIRO: Fairness-aware Adaptation in Sequential-Decision Making for Human-in-the-Loop Systems

  • paper_url: http://arxiv.org/abs/2307.05857
  • repo_url: None
  • paper_authors: Tianyu Zhao, Mojtaba Taherisadr, Salma Elmalaki
  • for: 本研究旨在提高在人类在Loop(HITL)环境中的决策系统中的公平性,特别是当多个人的不同行为和期望被同一个逻辑决策系统影响时。
  • methods: 本文使用了 Options 权值学习框架,将公平性问题分解成适应性任务,考虑人类行为变量和时间变化的人类偏好。
  • results: 对三种不同的 HITL 应用场景进行了广泛的评估,证明 FAIRO 能够有效地促进公平性,同时考虑人类变量和时间变化。比较其他方法,FAIRO 在所有三个应用场景中平均提高公平性约35.36%。
    Abstract Achieving fairness in sequential-decision making systems within Human-in-the-Loop (HITL) environments is a critical concern, especially when multiple humans with different behavior and expectations are affected by the same adaptation decisions in the system. This human variability factor adds more complexity since policies deemed fair at one point in time may become discriminatory over time due to variations in human preferences resulting from inter- and intra-human variability. This paper addresses the fairness problem from an equity lens, considering human behavior variability, and the changes in human preferences over time. We propose FAIRO, a novel algorithm for fairness-aware sequential-decision making in HITL adaptation, which incorporates these notions into the decision-making process. In particular, FAIRO decomposes this complex fairness task into adaptive sub-tasks based on individual human preferences through leveraging the Options reinforcement learning framework. We design FAIRO to generalize to three types of HITL application setups that have the shared adaptation decision problem. Furthermore, we recognize that fairness-aware policies can sometimes conflict with the application's utility. To address this challenge, we provide a fairness-utility tradeoff in FAIRO, allowing system designers to balance the objectives of fairness and utility based on specific application requirements. Extensive evaluations of FAIRO on the three HITL applications demonstrate its generalizability and effectiveness in promoting fairness while accounting for human variability. On average, FAIRO can improve fairness compared with other methods across all three applications by 35.36%.
    摘要 We propose FAIRO, a novel algorithm for fairness-aware sequential decision-making in HITL adaptation. FAIRO decomposes this complex fairness task into adaptive sub-tasks based on individual human preferences, leveraging the Options reinforcement learning framework. FAIRO generalizes to three types of HITL application setups with shared adaptation decision problems.We acknowledge that fairness-aware policies can sometimes conflict with application utility. To address this challenge, FAIRO provides a fairness-utility tradeoff, allowing system designers to balance the objectives of fairness and utility based on specific application requirements. Extensive evaluations of FAIRO on the three HITL applications demonstrate its generalizability and effectiveness in promoting fairness while accounting for human variability. On average, FAIRO improves fairness compared with other methods across all three applications by 35.36%.

Influential Simplices Mining via Simplicial Convolutional Network

  • paper_url: http://arxiv.org/abs/2307.05841
  • repo_url: None
  • paper_authors: Yujie Zeng, Yiming Huang, Qiang Wu, Linyuan Lü
  • for: 本研究旨在透过Identifying influential simplices mine neural network(ISMnet)模型,更好地理解 simplicial complexes 中的高阶环境和功能。
  • methods: 本研究使用了一种新的高阶图学习模型,即ISMnet,可以同时利用图laplacian和节点特征来捕捉高阶环境的结构和功能。
  • results: 实验结果表明,ISMnet 可以准确地identify 0-simplices和2-simplices的影响 simplices,并且在某些情况下可以大幅提高对 simplicial complexes 的理解。
    Abstract Simplicial complexes have recently been in the limelight of higher-order network analysis, where a minority of simplices play crucial roles in structures and functions due to network heterogeneity. We find a significant inconsistency between identifying influential nodes and simplices. Therefore, it remains elusive how to characterize simplices' influence and identify influential simplices, despite the relative maturity of research on influential nodes (0-simplices) identification. Meanwhile, graph neural networks (GNNs) are potent tools that can exploit network topology and node features simultaneously, but they struggle to tackle higher-order tasks. In this paper, we propose a higher-order graph learning model, named influential simplices mining neural network (ISMnet), to identify vital h-simplices in simplicial complexes. It can tackle higher-order tasks by leveraging novel higher-order presentations: hierarchical bipartite graphs and higher-order hierarchical (HoH) Laplacians, where targeted simplices are grouped into a hub set and can interact with other simplices. Furthermore, ISMnet employs learnable graph convolutional operators in each HoH Laplacian domain to capture interactions among simplices, and it can identify influential simplices of arbitrary order by changing the hub set. Empirical results demonstrate that ISMnet significantly outperforms existing methods in ranking 0-simplices (nodes) and 2-simplices. In general, this novel framework excels in identifying influential simplices and promises to serve as a potent tool in higher-order network analysis.
    摘要 高等网络分析中的 simplicial 复体 recent 在更高阶网络分析中受到重点关注,因为网络对称性导致一小部分 simplicial 扮演至关重要的作用。然而,评估 simplicial 的影响和找出关键 simplicial 仍然是一个悬峰问题,即使研究了 influential 节点(0-simplices)的识别已经有一定的成熔。此外,图神经网络(GNNs)是一种可以同时利用网络结构和节点特征的强大工具,但它们在更高阶任务上陷入困难。在本文中,我们提出了一种更高阶的图学学习模型,即 influential simplices mining neural network(ISMnet),可以在 simplicial 复体中识别关键的 h-simplices。ISMnet 可以通过利用新的高阶表示:层次二分图和高阶层次(HoH)拉普拉凯,将目标 simplicial 分组到一个枢纽集中,并与其他 simplicial 进行交互。此外,ISMnet 使用可学习的图 convolutional 算子在每个 HoH Laplacian 领域中捕捉 simplicial 之间的交互,并可以根据枢纽集来识别关键的 h-simplices。实验结果表明,ISMnet 在评估 0-simplices 和 2-simplices 时表现出色,在整体来说,这种新的框架在识别关键 simplicial 方面表现出优异,并且承诺成为高阶网络分析中的强大工具。

Scaling Distributed Multi-task Reinforcement Learning with Experience Sharing

  • paper_url: http://arxiv.org/abs/2307.05834
  • repo_url: None
  • paper_authors: Sanae Amani, Khushbu Pahwa, Vladimir Braverman, Lin F. Yang
  • for: 本文旨在研究分布式多任务 reinforcement learning(RL),以帮助分布式学习代理人在面对新挑战时适应。
  • methods: 我们使用 linearly parameterized contextual Markov decision processes(MDPs)来形式化问题,每个任务被表示为一个上下文,该上下文指定了过程动态和奖励。我们提出了一个名为 DistMT-LSVI 的算法,其中每个代理人先标识任务,然后通过中央服务器交换信息,以 derive $\epsilon$-优化策略。
  • results: 我们的研究表明,使用 DistMT-LSVI,每个代理人只需要运行 $\tilde{\mathcal{O}({d^3H^6(\epsilon^{-2}+c_{\rm sep}^{-2})}\cdot M/N)$ 集 Episodes,以达到 $\epsilon$-优化策略 для所有 $M$ 个任务。这比非分布式设置的样本复杂性提高 factor of $1/N$。此外,我们通过 OpenAI Gym Atari 环境的数值实验 validate 我们的理论发现。
    Abstract Recently, DARPA launched the ShELL program, which aims to explore how experience sharing can benefit distributed lifelong learning agents in adapting to new challenges. In this paper, we address this issue by conducting both theoretical and empirical research on distributed multi-task reinforcement learning (RL), where a group of $N$ agents collaboratively solves $M$ tasks without prior knowledge of their identities. We approach the problem by formulating it as linearly parameterized contextual Markov decision processes (MDPs), where each task is represented by a context that specifies the transition dynamics and rewards. To tackle this problem, we propose an algorithm called DistMT-LSVI. First, the agents identify the tasks, and then they exchange information through a central server to derive $\epsilon$-optimal policies for the tasks. Our research demonstrates that to achieve $\epsilon$-optimal policies for all $M$ tasks, a single agent using DistMT-LSVI needs to run a total number of episodes that is at most $\tilde{\mathcal{O}({d^3H^6(\epsilon^{-2}+c_{\rm sep}^{-2})}\cdot M/N)$, where $c_{\rm sep}>0$ is a constant representing task separability, $H$ is the horizon of each episode, and $d$ is the feature dimension of the dynamics and rewards. Notably, DistMT-LSVI improves the sample complexity of non-distributed settings by a factor of $1/N$, as each agent independently learns $\epsilon$-optimal policies for all $M$ tasks using $\tilde{\mathcal{O}(d^3H^6M\epsilon^{-2})$ episodes. Additionally, we provide numerical experiments conducted on OpenAI Gym Atari environments that validate our theoretical findings.
    摘要 近期,DARPA发布了Shell计划,旨在探索经验分享如何促进分布式一生学习代理人在面临新挑战时适应。在这篇论文中,我们解决这个问题,通过对分布式多任务强化学习(RL)进行理论和实验研究,其中一群N代理人共同解决M任务,无需先知道任务的标识。我们将问题转化为线性参数化上下文 Markov决策过程(MDP),每个任务由一个上下文表示,该上下文描述了过程动态和奖励。为解决这个问题,我们提议一种名为DistMT-LSVI的算法。首先,代理人识别任务,然后通过中央服务器交换信息,以 derivation ε-优化策略。我们的研究表明,要为所有M任务获得ε-优化策略,单个代理人使用DistMT-LSVI只需要总共运行episode数量为最多 $\tilde{\mathcal{O}({d^3H^6(\epsilon^{-2}+c_{\rm sep}^{-2})}\cdot M/N)$,其中 $c_{\rm sep}>0$ 是任务分离度, $H$ 是每集的 horizon, $d$ 是动态和奖励的特征维度。需要注意的是,DistMT-LSVI提高了非分布式设置的样本复杂度的系数,由每个代理人独立学习所有M任务的ε-优化策略,需要 $\tilde{\mathcal{O}(d^3H^6M\epsilon^{-2})$ 集。此外,我们对OpenAI Gym Atari环境进行了实验 validate我们的理论发现。

Bag of Views: An Appearance-based Approach to Next-Best-View Planning for 3D Reconstruction

  • paper_url: http://arxiv.org/abs/2307.05832
  • repo_url: https://github.com/acis2021/viewplanningtoolbox
  • paper_authors: Sara Hatami Gazani, Matthew Tucsok, Iraj Mantegh, Homayoun Najjaran
  • for: 这篇论文的目的是提出一种基于UAV的智能数据收集技术,用于3D重建和基础设施监测。
  • methods: 这篇论文使用了图像处理和深度学习技术,并提出了一种基于视图计划的fully appearance-based模型,用于分配视图的有用性。
  • results: 经过实验,这种模型可以减少数据收集的视图数量,提高3D重建的质量。
    Abstract UAV-based intelligent data acquisition for 3D reconstruction and monitoring of infrastructure has been experiencing an increasing surge of interest due to the recent advancements in image processing and deep learning-based techniques. View planning is an essential part of this task that dictates the information capture strategy and heavily impacts the quality of the 3D model generated from the captured data. Recent methods have used prior knowledge or partial reconstruction of the target to accomplish view planning for active reconstruction; the former approach poses a challenge for complex or newly identified targets while the latter is computationally expensive. In this work, we present Bag-of-Views (BoV), a fully appearance-based model used to assign utility to the captured views for both offline dataset refinement and online next-best-view (NBV) planning applications targeting the task of 3D reconstruction. With this contribution, we also developed the View Planning Toolbox (VPT), a lightweight package for training and testing machine learning-based view planning frameworks, custom view dataset generation of arbitrary 3D scenes, and 3D reconstruction. Through experiments which pair a BoV-based reinforcement learning model with VPT, we demonstrate the efficacy of our model in reducing the number of required views for high-quality reconstructions in dataset refinement and NBV planning.
    摘要 In this work, we propose a fully appearance-based model called Bag-of-Views (BoV) to assign utility to captured views for both offline dataset refinement and online next-best-view (NBV) planning applications. We also developed the View Planning Toolbox (VPT), a lightweight package for training and testing machine learning-based view planning frameworks, custom view dataset generation of arbitrary 3D scenes, and 3D reconstruction.Through experiments that pair a BoV-based reinforcement learning model with VPT, we demonstrate the effectiveness of our model in reducing the number of required views for high-quality reconstructions in dataset refinement and NBV planning.

Memorization Through the Lens of Curvature of Loss Function Around Samples

  • paper_url: http://arxiv.org/abs/2307.05831
  • repo_url: None
  • paper_authors: Isha Garg, Kaushik Roy
  • for: 该研究旨在探讨神经网络在训练集上的溯源和适应性问题。
  • methods: 该研究使用损失函数的曲线性作为评估神经网络的记忆和泛化性的指标,并在各个训练轮数中平均计算。
  • results: 研究发现,在各种图像集中,神经网络可以记忆训练集,并且可以通过对损失函数的曲线性进行分析来找到特别的训练样本。此外,该研究还发现了一种在CIFAR100集中的新的失败模型,即拥有不同标签的图像 duplicates。此外,该研究还通过随机损害一些样本的标签,示出了对损失函数曲线性的排序可以高效地分类出异常标签的样本。
    Abstract Neural networks are overparametrized and easily overfit the datasets they train on. In the extreme case, it is shown that they can memorize a training set with fully randomized labels. We propose using the curvature of loss function around the training sample as a measure of its memorization, averaged over all training epochs. We use this to study the generalization versus memorization properties of different samples in popular image datasets. We visualize samples with the highest curvature of loss around them, and show that these visually correspond to long-tailed, mislabeled or conflicting samples. This analysis helps us find a, to the best of our knowledge, novel failure model on the CIFAR100 dataset, that of duplicated images with different labels. We also synthetically mislabel a proportion of the dataset by randomly corrupting the labels of a few samples, and show that sorting by curvature yields high AUROC values for identifying the mislabeled samples.
    摘要 神经网络具有过参数和易于适应训练集的问题。在极端情况下,它们可以记忆训练集的全部标签。我们提议使用损失函数的曲线在训练样本周围的平均幅度作为记忆度量,并在各训练轮次中计算。我们利用这种方法来研究不同样本的泛化与记忆性质。我们可视化具有最高损失函数曲线幅度的样本,并发现这些样本视觉上对应于长尾、错误标签或冲突样本。这种分析帮助我们发现了,到目前知道的,CIFAR100数据集上的复制图像标签错误模型。我们还随机扰乱了一些样本的标签,并显示了按照曲线排序可以高AUROC值来识别错误标签样本。

Relational Extraction on Wikipedia Tables using Convolutional and Memory Networks

  • paper_url: http://arxiv.org/abs/2307.05827
  • repo_url: https://github.com/simpleparadox/re_656
  • paper_authors: Arif Shahriar, Rohan Saha, Denilson Barbosa
  • for: 本研究旨在探讨基于表格数据的关系提取(RE)问题。
  • methods: 我们提出一种新的模型, combining Convolutional Neural Network (CNN) 和 Bidirectional-Long Short Term Memory (BiLSTM) 网络来编码实体和学习实体之间的依赖关系。
  • results: 我们对一个大型和最新的数据集进行了评估,并与之前的神经网络方法进行比较。实验结果表明,我们的模型在关系提取任务中的表格数据上表现出了不败的成绩。我们还进行了全面的错误分析和减少研究,以显示模型的组成部分的贡献。
    Abstract Relation extraction (RE) is the task of extracting relations between entities in text. Most RE methods extract relations from free-form running text and leave out other rich data sources, such as tables. We explore RE from the perspective of applying neural methods on tabularly organized data. We introduce a new model consisting of Convolutional Neural Network (CNN) and Bidirectional-Long Short Term Memory (BiLSTM) network to encode entities and learn dependencies among them, respectively. We evaluate our model on a large and recent dataset and compare results with previous neural methods. Experimental results show that our model consistently outperforms the previous model for the task of relation extraction on tabular data. We perform comprehensive error analyses and ablation study to show the contribution of various components of our model. Finally, we discuss the usefulness and trade-offs of our approach, and provide suggestions for fostering further research.
    摘要 <>关系提取(RE)是文本中实体之间关系的提取任务。大多数RE方法从自由文本中提取关系,而忽略其他丰富数据源,如表格。我们从表格化数据的视角出发,应用神经网络方法进行关系提取。我们介绍一种新的模型,包括卷积神经网络(CNN)和双向长短期记忆(BiLSTM)网络,用于编码实体和学习实体之间的依赖关系。我们对大量最新数据进行评估,与之前的神经方法进行比较。实验结果表明,我们的模型在关系提取任务中一直表现出色,并且与之前的模型相比,具有更高的性能。我们进行了全面的错误分析和剥离研究,以示模型各部分的贡献。最后,我们讨论了我们的方法的实用性和缺点,并提供了进一步研究的建议。

Neuro-Inspired Efficient Map Building via Fragmentation and Recall

  • paper_url: http://arxiv.org/abs/2307.05793
  • repo_url: https://github.com/fietelab/farmap
  • paper_authors: Jaedong Hwang, Zhang-Wei Hong, Eric Chen, Akhilan Boopathy, Pulkit Agrawal, Ila Fiete
  • for: 该论文旨在提出一种基于神经科学的Fragmentation-and-Recall(FarMap)策略,以帮助动物和机器人在空间中穿梭和探索环境。
  • methods: 该论文使用了一种基于预测 surprisal 的归一化方法,通过将空间分解成多个地方图,然后使用这些地方图来设置空间探索的争取目标。当遇到冲击事件时,当地的地方图会被 truncate 并被存储在长期内存(LTM)中,而不是抛弃。
  • results: 论文通过在复杂的生成的空间环境中测试和评估 FarMap 策略,发现该策略可以更快地探索环境,并且更高效地使用活动内存,而不会影响性能。
    Abstract Animals and robots navigate through environments by building and refining maps of the space. These maps enable functions including navigating back to home, planning, search, and foraging. In large environments, exploration of the space is a hard problem: agents can become stuck in local regions. Here, we use insights from neuroscience to propose and apply the concept of Fragmentation-and-Recall (FarMap), with agents solving the mapping problem by building local maps via a surprisal-based clustering of space, which they use to set subgoals for spatial exploration. Agents build and use a local map to predict their observations; high surprisal leads to a ``fragmentation event'' that truncates the local map. At these events, the recent local map is placed into long-term memory (LTM), and a different local map is initialized. If observations at a fracture point match observations in one of the stored local maps, that map is recalled (and thus reused) from LTM. The fragmentation points induce a natural online clustering of the larger space, forming a set of intrinsic potential subgoals that are stored in LTM as a topological graph. Agents choose their next subgoal from the set of near and far potential subgoals from within the current local map or LTM, respectively. Thus, local maps guide exploration locally, while LTM promotes global exploration. We evaluate FarMap on complex procedurally-generated spatial environments to demonstrate that this mapping strategy much more rapidly covers the environment (number of agent steps and wall clock time) and is more efficient in active memory usage, without loss of performance.
    摘要 Animals and robots通过建立和改进环境空间的地图来导航。这些地图使得功能包括返回家园、规划、搜索和搜寻。在大型环境中,探索空间是一个困难的问题:代理人可能会被局部区域困住。我们使用 neuroscience 的发现来提出和应用 Fragmentation-and-Recall(FarMap)概念,代理人通过在空间上基于喜 surprisal 的归一化分组来解决地图问题。代理人建立和使用本地地图,预测其观察结果,高 surprisal 会导致一个“分解事件”,截断本地地图。在这些事件中,最近的本地地图被置入长期记忆(LTM),并初始化一个不同的本地地图。如果观察结果与存储在 LTM 中的地图匹配,那么该地图会被回忆(并因此重复使用)。这些分解点引入了自然的在线归一化,形成了一个内在的潜在分子目标集,并被存储在 LTM 中为一个トポлогиカル 图。代理人在当前本地地图或 LTM 中选择下一个目标,从而使得本地地图引导了本地探索,而 LTM 则促进了全局探索。我们在复杂的生成过程空间中评估 FarMap,以示其在环境探索中的更快速、更高效,而无损性能。

Merging multiple input descriptors and supervisors in a deep neural network for tractogram filtering

  • paper_url: http://arxiv.org/abs/2307.05786
  • repo_url: None
  • paper_authors: Daniel Jörgens, Pierre-Marc Jodoin, Maxime Descoteaux, Rodrigo Moreno
  • for: 本研究旨在提高 tractography 方法的准确率,通过训练深度学习模型来筛选 tractography 数据中的假阳性流线。
  • methods: 本研究使用了四种不同的 tractogram 筛选策略作为监督器:TractQuerier、RecobundlesX、TractSeg 和一种基于 анато学的筛选器。这些筛选器的输出被组合以获取流线的分类标签。
  • results: 研究发现,流线坐标和 diffusion 数据在本特定的分类任务中是最 relevante 的信息,其次是 T1 束质数据。
    Abstract One of the main issues of the current tractography methods is their high false-positive rate. Tractogram filtering is an option to remove false-positive streamlines from tractography data in a post-processing step. In this paper, we train a deep neural network for filtering tractography data in which every streamline of a tractogram is classified as {\em plausible, implausible}, or {\em inconclusive}. For this, we use four different tractogram filtering strategies as supervisors: TractQuerier, RecobundlesX, TractSeg, and an anatomy-inspired filter. Their outputs are combined to obtain the classification labels for the streamlines. We assessed the importance of different types of information along the streamlines for performing this classification task, including the coordinates of the streamlines, diffusion data, landmarks, T1-weighted information, and a brain parcellation. We found that the streamline coordinates are the most relevant followed by the diffusion data in this particular classification task.
    摘要 一个主要问题是现有的轨迹图方法的假阳性率过高。轨迹图过滤是一种在后处理步骤中除去假阳性流线的方法。在这篇论文中,我们用深度神经网络来筛选轨迹图数据,每个流线都被分类为{\em 可能、不可能}或{\em 不明确}. 我们使用了四种不同的轨迹图筛选策略来作为监管器:TractQuerier、RecobundlesX、TractSeg和一种基于解剖学的筛选器。它们的输出被组合以获得流线的分类标签。我们评估了不同类型的轨迹图信息的重要性来进行这种分类任务,包括流线坐标、扩散数据、标记点、T1强化信息和脑分割。我们发现流线坐标是最重要的,其次是扩散数据。

EgoAdapt: A multi-stream evaluation study of adaptation to real-world egocentric user video

  • paper_url: http://arxiv.org/abs/2307.05784
  • repo_url: https://github.com/facebookresearch/egocentricuseradaptation
  • paper_authors: Matthias De Lange, Hamid Eghbalzadeh, Reuben Tan, Michael Iuzzolino, Franziska Meier, Karl Ridgeway
  • for: 本研究旨在提出一种适应型 Egocentric Action Recognition 模型,可以在用户眼镜上运行,并在用户的体验中进行适应。
  • methods: 本研究使用了两个阶段的方法,首先预训练一个人口模型,然后在设备上进行在线适应,以适应用户的体验。
  • results: 研究表明,使用这种适应型模型可以在真实世界应用中提高 Egocentric Action Recognition 的性能,并且可以在长尾动作分布和大规模分类等实际应用中表现出色。
    Abstract In egocentric action recognition a single population model is typically trained and subsequently embodied on a head-mounted device, such as an augmented reality headset. While this model remains static for new users and environments, we introduce an adaptive paradigm of two phases, where after pretraining a population model, the model adapts on-device and online to the user's experience. This setting is highly challenging due to the change from population to user domain and the distribution shifts in the user's data stream. Coping with the latter in-stream distribution shifts is the focus of continual learning, where progress has been rooted in controlled benchmarks but challenges faced in real-world applications often remain unaddressed. We introduce EgoAdapt, a benchmark for real-world egocentric action recognition that facilitates our two-phased adaptive paradigm, and real-world challenges naturally occur in the egocentric video streams from Ego4d, such as long-tailed action distributions and large-scale classification over 2740 actions. We introduce an evaluation framework that directly exploits the user's data stream with new metrics to measure the adaptation gain over the population model, online generalization, and hindsight performance. In contrast to single-stream evaluation in existing works, our framework proposes a meta-evaluation that aggregates the results from 50 independent user streams. We provide an extensive empirical study for finetuning and experience replay.
    摘要 固有人称行为识别中通常使用单一人口模型,例如扩展现实头盔设备。而我们提出了一种适应型两阶段方法,其中首先预训练人口模型,然后在设备上线上适应用户的经验。这种设置具有高度挑战性,因为从人口预训练模型到用户预测模型的变化,以及用户数据流中的分布转移问题。为了应对后一点,我们引入了不断学习,其中进步基于控制的标准准 benchmark,但实际应用中的挑战通常未得到解决。我们引入了 EgoAdapt,一个用于实际世界 egocentric 行为识别的 benchmark,以及来自 Ego4d 的 egocentric 视频流,其中包括长尾动作分布和大规模分类的 2740 个动作。我们引入了一种直接利用用户数据流的新评价指标,以度量适应准则、在线泛化和后知性性能。与单流评价不同,我们的框架提出了一种元评价,可以将50个独立用户流的结果相加。我们进行了广泛的实验研究,以让 fine-tuning 和经验回放。

Has China caught up to the US in AI research? An exploration of mimetic isomorphism as a model for late industrializers

  • paper_url: http://arxiv.org/abs/2307.10198
  • repo_url: None
  • paper_authors: Chao Min, Yi Zhao, Yi Bu, Ying Ding, Caroline S. Wagner
  • For: This paper examines China’s development of artificial intelligence (AI) and compares it to the USA.* Methods: The paper uses data on AI-related research papers to analyze the volume and quality of China’s AI research, as well as a novel measure to gauge China’s imitation of US research.* Results: The paper finds that China has made remarkable progress in AI development, surpassing the USA in volume of research papers, but the USA still has a slight edge in terms of quality. Additionally, the paper shows that China has effectively bridged a significant knowledge gap and could potentially be setting out on an independent research trajectory.Here are the three points in Simplified Chinese text:* For: 这篇论文研究了中国人工智能的发展,并与美国进行比较。* Methods: 论文使用了AI相关研究论文的数据,分析了中国的AI研究量和质量,以及一种新的测量方法来衡量中国对美国研究的模仿。* Results: 论文发现,中国在人工智能方面的研究进步很快,已经超过美国的研究量,但美国仍然在质量上保持一定的优势。此外,论文还显示,中国已经减少了一个重要的知识差距,并可能在独立的研究轨迹上进行独立的研究。
    Abstract Artificial Intelligence (AI), a cornerstone of 21st-century technology, has seen remarkable growth in China. In this paper, we examine China's AI development process, demonstrating that it is characterized by rapid learning and differentiation, surpassing the export-oriented growth propelled by Foreign Direct Investment seen in earlier Asian industrializers. Our data indicates that China currently leads the USA in the volume of AI-related research papers. However, when we delve into the quality of these papers based on specific metrics, the USA retains a slight edge. Nevertheless, the pace and scale of China's AI development remain noteworthy. We attribute China's accelerated AI progress to several factors, including global trends favoring open access to algorithms and research papers, contributions from China's broad diaspora and returnees, and relatively lax data protection policies. In the vein of our research, we have developed a novel measure for gauging China's imitation of US research. Our analysis shows that by 2018, the time lag between China and the USA in addressing AI research topics had evaporated. This finding suggests that China has effectively bridged a significant knowledge gap and could potentially be setting out on an independent research trajectory. While this study compares China and the USA exclusively, it's important to note that research collaborations between these two nations have resulted in more highly cited work than those produced by either country independently. This underscores the power of international cooperation in driving scientific progress in AI.
    摘要 人工智能(AI),21世纪科技的重要架构,在中国受到了无史前的发展。在这篇论文中,我们研究了中国的AI发展过程,发现它具有快速学习和 diferenciación的特点,超越了在更早的亚洲工业化国家中由外irect投资驱动的出口增长。 据我们的数据显示,中国目前在美国之前在AI相关研究论文的量方面领先。然而,当我们根据特定指标进行评价时,美国仍保持一定的优势。不过,中国的AI发展速度和规模仍然很有吸引力。 我们归因中国的快速AI进步于一些因素,包括全球趋势对算法和研究论文的开放Access,中国广泛的移民和返回者的贡献,以及相对松懈的数据保护政策。 在我们的研究中,我们还开发了一种新的中国imitates US research的度量器。我们的分析显示,到2018年,中国和美国在解决AI研究话题上的时差已经消失。这一发现表明中国已经 bridge了一个重要的知识差距,可能在独立的研究轨迹上发展。 虽然这篇研究仅对中国和美国进行了对比,但是需要注意的是,中美两国之间的合作研究已经生成了更多的引用的论文,这表明了国际合作在人工智能领域的科学进步的力量。

Unsupervised Learning in Complex Systems

  • paper_url: http://arxiv.org/abs/2307.10993
  • repo_url: https://github.com/hugcis/evolving-structures-in-complex-systems
  • paper_authors: Hugo Cisneros
  • for: 这些论文的目的是研究自然和人工系统中的学习和适应。
  • methods: 这篇论文使用复杂系统来研究学习和适应现象,包括发展一个普适的复杂度度量标准,以及应用大规模复杂系统中的减简方法来研究计算。
  • results: 这篇论文的主要结果是开发了一个学习效率度量标准,以及一个大规模复杂系统中的学习算法测试集。这些发现对于理解自然和人工系统中的学习和适应现象具有重要意义,并可能推动未来的学习算法的开发。
    Abstract In this thesis, we explore the use of complex systems to study learning and adaptation in natural and artificial systems. The goal is to develop autonomous systems that can learn without supervision, develop on their own, and become increasingly complex over time. Complex systems are identified as a suitable framework for understanding these phenomena due to their ability to exhibit growth of complexity. Being able to build learning algorithms that require limited to no supervision would enable greater flexibility and adaptability in various applications. By understanding the fundamental principles of learning in complex systems, we hope to advance our ability to design and implement practical learning algorithms in the future. This thesis makes the following key contributions: the development of a general complexity metric that we apply to search for complex systems that exhibit growth of complexity, the introduction of a coarse-graining method to study computations in large-scale complex systems, and the development of a metric for learning efficiency as well as a benchmark dataset for evaluating the speed of learning algorithms. Our findings add substantially to our understanding of learning and adaptation in natural and artificial systems. Moreover, our approach contributes to a promising new direction for research in this area. We hope these findings will inspire the development of more effective and efficient learning algorithms in the future.
    摘要 在这个论文中,我们探索了使用复杂系统来研究学习和适应自然和人工系统中的现象。目标是开发能够自主学习、不需监督、逐渐增加复杂性的自适应系统。由于复杂系统能够展现增长复杂性的特点,因此我们选择使用复杂系统作为研究的理想框架。通过理解复杂系统中学习的基本原理,我们期望能够在未来设计和实现更加有效和高效的学习算法。本论文做出了以下关键贡献:开发了一种通用的复杂度指标,用于搜索展现增长复杂性的复杂系统,引入了大规模复杂系统中计算的粗糙化方法,以及开发了学习效率指标和评估学习算法速度的标准数据集。我们的发现对自然和人工系统中的学习和适应现象做出了重要贡献,同时,我们的方法也对研究这一领域的未来发展做出了重要贡献。我们希望这些发现能够激励未来的研究人员开发更有效和高效的学习算法。

Rad-ReStruct: A Novel VQA Benchmark and Method for Structured Radiology Reporting

  • paper_url: http://arxiv.org/abs/2307.05766
  • repo_url: https://github.com/chantalmp/rad-restruct
  • paper_authors: Chantal Pellegrini, Matthias Keicher, Ege Özsoy, Nassir Navab
  • for: 本研究旨在提高验Images radiology reporting的效率和准确性,通过结构化报告来alleviate radiologists和其他医疗专业人员之间的沟通问题。
  • methods: 本研究使用了 hierarchical visual question answering (VQA) 模型,named hi-VQA,该模型考虑了上一个问题和答案的层次结构,以便在结构化报告中填充验Images的信息。
  • results: experiments 表明,hi-VQA 在 medical VQA benchmark VQARad 上 achieved competitive performance,并在不具备域专业视语言预训练的情况下表现最佳,同时在 Rad-ReStruct 上提供了一个强大的基线值。本研究的成果为自动化结构化报告带来了一个重要的一步,并为未来在这个领域的研究提供了一个有价值的首个benchmark。
    Abstract Radiology reporting is a crucial part of the communication between radiologists and other medical professionals, but it can be time-consuming and error-prone. One approach to alleviate this is structured reporting, which saves time and enables a more accurate evaluation than free-text reports. However, there is limited research on automating structured reporting, and no public benchmark is available for evaluating and comparing different methods. To close this gap, we introduce Rad-ReStruct, a new benchmark dataset that provides fine-grained, hierarchically ordered annotations in the form of structured reports for X-Ray images. We model the structured reporting task as hierarchical visual question answering (VQA) and propose hi-VQA, a novel method that considers prior context in the form of previously asked questions and answers for populating a structured radiology report. Our experiments show that hi-VQA achieves competitive performance to the state-of-the-art on the medical VQA benchmark VQARad while performing best among methods without domain-specific vision-language pretraining and provides a strong baseline on Rad-ReStruct. Our work represents a significant step towards the automated population of structured radiology reports and provides a valuable first benchmark for future research in this area. We will make all annotations and our code for annotation generation, model evaluation, and training publicly available upon acceptance. Our dataset and code is available at https://github.com/ChantalMP/Rad-ReStruct.
    摘要 辐射报告是医疗专业人员之间重要的沟通方式,但是它可能占用时间和容易出错。一种解决方案是使用结构化报告,这可以保存时间并且帮助更准确地评估。然而,关于自动化结构化报告的研究很少,而且没有公共的标准对比板准。为了填补这个差距,我们引入Rad-ReStruct,一个新的标准数据集,它提供了高级、层次结构的注释形式的X射线图像报告。我们将报告生成任务模型为层次视Question Answering(VQA),并提出一种新的方法,即hi-VQA,它考虑了上一个问题和答案的先前 контекст,以便填充结构化的医疗报告。我们的实验表明,hi-VQA可以与当前医疗VQA标准 benchmark VQARad 竞争,而且在不含特定视力语言预训练的情况下,hi-VQA 表现最佳。我们的工作代表了自动化结构化医疗报告的重要一步,并提供了未来这一领域的价值先锋。我们将在接受后发布所有注释和代码,包括报告生成、模型评估和训练代码。我们的数据集和代码可以在https://github.com/ChantalMP/Rad-ReStruct 中找到。

Towards A Scalable Solution for Improving Multi-Group Fairness in Compositional Classification

  • paper_url: http://arxiv.org/abs/2307.05728
  • repo_url: None
  • paper_authors: James Atwood, Tina Tian, Ben Packer, Meghana Deodhar, Jilin Chen, Alex Beutel, Flavien Prost, Ahmad Beirami
  • for: 这篇论文旨在解决复杂系统中的机器学习公平问题,其中最终预测结果是多个分类器的组合,并且存在多个群体。
  • methods: 作者首先显示了自然基线方法用于提高等机会公平性的扩展性不佳,这些方法的扩展性 linearly 增长与多个修复群体和多个预测标签的乘积。然后,作者介绍了两种简单的技术,称为“任务过conditioning”和“群体排列”,以实现常数扩展性在多个群体多个标签设置下。
  • results: 作者在学术和实际环境中进行了实验,证明了他们的提议可以有效地 mitigate 在这种环境中。
    Abstract Despite the rich literature on machine learning fairness, relatively little attention has been paid to remediating complex systems, where the final prediction is the combination of multiple classifiers and where multiple groups are present. In this paper, we first show that natural baseline approaches for improving equal opportunity fairness scale linearly with the product of the number of remediated groups and the number of remediated prediction labels, rendering them impractical. We then introduce two simple techniques, called {\em task-overconditioning} and {\em group-interleaving}, to achieve a constant scaling in this multi-group multi-label setup. Our experimental results in academic and real-world environments demonstrate the effectiveness of our proposal at mitigation within this environment.
    摘要 尽管机器学习公平related literature已经有很多研究,但对于复杂系统来说,即最终预测是多个分类器的组合,多个群体存在的情况,相对较少获得了关注。在这篇论文中,我们首先表明了自然基线方法来提高equal opportunity fairness的扩展性是线性增长的,这意味着在多个群体多个预测标签的多组合情况下实现不可行。然后,我们介绍了两种简单的技术,即任务过程和群体排序,以实现常数级别的扩展性在多个多标签的设置下。我们在学术和实际环境中进行了实验,并证明了我们的提议的效果。

An Open-Source Knowledge Graph Ecosystem for the Life Sciences

  • paper_url: http://arxiv.org/abs/2307.05727
  • repo_url: None
  • paper_authors: Tiffany J. Callahan, Ignacio J. Tripodi, Adrianne L. Stefanski, Luca Cappelletti, Sanya B. Taneja, Jordan M. Wyrwa, Elena Casiraghi, Nicolas A. Matentzoglu, Justin Reese, Jonathan C. Silverstein, Charles Tapley Hoyt, Richard D. Boyce, Scott A. Malec, Deepak R. Unni, Marcin P. Joachimiak, Peter N. Robinson, Christopher J. Mungall, Emanuele Cavalleri, Tommaso Fontana, Giorgio Valentini, Marco Mesiti, Lucas A. Gillenwater, Brook Santangelo, Nicole A. Vasilevsky, Robert Hoehndorf, Tellen D. Bennett, Patrick B. Ryan, George Hripcsak, Michael G. Kahn, Michael Bada, William A. Baumgartner Jr, Lawrence E. Hunter
  • for: 本研究旨在提高生物层次结构数据的融合,以提高翻译研究的效果。
  • methods: 该研究使用知识图(KG)模型复杂现象,并提供了自动构建KG的方法。
  • results: 研究表明,使用PheKnowLator可以实现自定义知识表示,而不需要固定的知识表示模型。此外,PheKnowLator在构建12个大规模KG时的计算性能也充分表现了其可用性和可靠性。
    Abstract Translational research requires data at multiple scales of biological organization. Advancements in sequencing and multi-omics technologies have increased the availability of these data but researchers face significant integration challenges. Knowledge graphs (KGs) are used to model complex phenomena, and methods exist to automatically construct them. However, tackling complex biomedical integration problems requires flexibility in the way knowledge is modeled. Moreover, existing KG construction methods provide robust tooling at the cost of fixed or limited choices among knowledge representation models. PheKnowLator (Phenotype Knowledge Translator) is a semantic ecosystem for automating the FAIR (Findable, Accessible, Interoperable, and Reusable) construction of ontologically grounded KGs with fully customizable knowledge representation. The ecosystem includes KG construction resources (e.g., data preparation APIs), analysis tools (e.g., SPARQL endpoints and abstraction algorithms), and benchmarks (e.g., prebuilt KGs and embeddings). We evaluate the ecosystem by surveying open-source KG construction methods and analyzing its computational performance when constructing 12 large-scale KGs. With flexible knowledge representation, PheKnowLator enables fully customizable KGs without compromising performance or usability.
    摘要 跨度研究需要生物组织层次的数据。sequencing和多Omics技术的进步使得这些数据更加可 accessible,但是研究人员面临着融合问题。知识图(KG)可以模型复杂现象,而且有方法可以自动构建它们。然而,解决生物医学融合问题需要在知识表示模型中的灵活性。此外,现有的KG建构方法提供了可靠的工具,但是这些方法通常具有固定或有限的知识表示模型选择。PheKnowLator(现象知识翻译器)是一个 semantic生态系统,用于自动构建符合FAIR(找到、访问、可重用)的ontologically grounded KGs,并具有完全可定义的知识表示模型。该生态系统包括KG建构资源(例如数据准备API),分析工具(例如SPARQL端点和抽象算法),以及标准(例如预建KGs和嵌入)。我们通过survey open-source KG建构方法和分析其计算性能,发现PheKnowLator可以在构建12个大规模KGs时保持高效性和可用性。通过 flexible知识表示,PheKnowLator允许无需妥协性能和用户体验的自定义KG。Simplified Chinese translation:研究需要多种生物组织层次的数据。现代sequencing和多Omics技术的进步使得这些数据更加可 accessible,但是研究人员面临着融合问题。知识图可以模型复杂现象,并且有方法可以自动构建它们。然而,解决生物医学融合问题需要在知识表示模型中的灵活性。现有的KG建构方法提供了可靠的工具,但是这些方法通常具有固定或有限的知识表示模型选择。PheKnowLator是一个 semantic生态系统,用于自动构建符合FAIR的ontologically grounded KGs,并具有完全可定义的知识表示模型。该生态系统包括KG建构资源、分析工具和标准。我们通过survey open-source KG建构方法和分析其计算性能,发现PheKnowLator可以在构建12个大规模KGs时保持高效性和可用性。通过 flexible知识表示,PheKnowLator允许无需妥协性能和用户体验的自定义KG。

A Causal Ordering Prior for Unsupervised Representation Learning

  • paper_url: http://arxiv.org/abs/2307.05704
  • repo_url: None
  • paper_authors: Avinash Kori, Pedro Sanchez, Konstantinos Vilouras, Ben Glocker, Sotirios A. Tsaftaris
  • for: 本研究旨在提出一种完全无监督的表征学习方法,以帮助理解数据中因果关系的潜在结构。
  • methods: 本方法基于一种隐藏变量模型(ANM),通过对 latent space 的梯度来鼓励 latent space 遵循 causal 顺序。
  • results: 研究人员通过实验表明,该方法可以自动找到 causal 关系,并且可以在不同的数据集上进行适应。
    Abstract Unsupervised representation learning with variational inference relies heavily on independence assumptions over latent variables. Causal representation learning (CRL), however, argues that factors of variation in a dataset are, in fact, causally related. Allowing latent variables to be correlated, as a consequence of causal relationships, is more realistic and generalisable. So far, provably identifiable methods rely on: auxiliary information, weak labels, and interventional or even counterfactual data. Inspired by causal discovery with functional causal models, we propose a fully unsupervised representation learning method that considers a data generation process with a latent additive noise model (ANM). We encourage the latent space to follow a causal ordering via loss function based on the Hessian of the latent distribution.
    摘要 <>通过无监督学习和变量推理来实现无监督表示学习,它们假设了离散变量之间的独立性。然而, causal representation learning(CRL)认为,数据集中的变量是 causally related的。允许潜在变量之间存在相关性,是更加现实和普遍的。目前,可证可明的方法包括:协助信息、弱标签和 intervenational或者 counterfactual 数据。我们 draw inspiration from causal discovery with functional causal models,并提出了一种完全无监督表示学习方法,基于 latent additive noise model(ANM)。我们鼓励潜在空间遵循 causal ordering via loss function based on the Hessian of the latent distribution。Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you prefer Traditional Chinese, I can provide that as well.

Objaverse-XL: A Universe of 10M+ 3D Objects

  • paper_url: http://arxiv.org/abs/2307.05663
  • repo_url: None
  • paper_authors: Matt Deitke, Ruoshi Liu, Matthew Wallingford, Huong Ngo, Oscar Michel, Aditya Kusupati, Alan Fan, Christian Laforte, Vikram Voleti, Samir Yitzhak Gadre, Eli VanderBilt, Aniruddha Kembhavi, Carl Vondrick, Georgia Gkioxari, Kiana Ehsani, Ludwig Schmidt, Ali Farhadi
  • for: 本研究旨在提供大规模3D数据集,以推动3D视觉任务的进步。
  • methods: 本研究使用了多种来源的3D对象,包括手动设计的对象、光学摄影扫描的景点和日常物品、以及专业扫描的历史和珍贵品。
  • results: 研究表明,通过训练Zero123在新视图synthesis中,使用100万多视图渲染图像,可以实现强的零批处理泛化能力。
    Abstract Natural language processing and 2D vision models have attained remarkable proficiency on many tasks primarily by escalating the scale of training data. However, 3D vision tasks have not seen the same progress, in part due to the challenges of acquiring high-quality 3D data. In this work, we present Objaverse-XL, a dataset of over 10 million 3D objects. Our dataset comprises deduplicated 3D objects from a diverse set of sources, including manually designed objects, photogrammetry scans of landmarks and everyday items, and professional scans of historic and antique artifacts. Representing the largest scale and diversity in the realm of 3D datasets, Objaverse-XL enables significant new possibilities for 3D vision. Our experiments demonstrate the improvements enabled with the scale provided by Objaverse-XL. We show that by training Zero123 on novel view synthesis, utilizing over 100 million multi-view rendered images, we achieve strong zero-shot generalization abilities. We hope that releasing Objaverse-XL will enable further innovations in the field of 3D vision at scale.
    摘要 自然语言处理和2D视觉模型在许多任务上已经达到了很高的水平,主要是通过增加训练数据的规模来实现。然而,3D视觉任务尚未经历同样的进步,一部分原因是收集高质量3D数据的困难。在这篇文章中,我们提出了Objaverse-XL数据集,包含了1000万个独特的3D对象。我们的数据集包括手动设计的对象、光学扫描的景点和日常用品、以及专业扫描的历史和珍贵 artifacts。 represent the largest scale and diversity in the field of 3D datasets, Objaverse-XL enables significant new possibilities for 3D vision. our experiments show that by training Zero123 on novel view synthesis using over 100 million multi-view rendered images, we achieve strong zero-shot generalization abilities. we hope that releasing Objaverse-XL will enable further innovations in the field of 3D vision at scale.Please note that the translation is in Simplified Chinese, and the grammar and sentence structure may be different from the original text.

Self-consistency for open-ended generations

  • paper_url: http://arxiv.org/abs/2307.06857
  • repo_url: None
  • paper_authors: Siddhartha Jain, Xiaofei Ma, Anoop Deoras, Bing Xiang
  • for: 这篇论文是关于如何提高语言模型(LLM)生成质量的研究。
  • methods: 该论文提出了一种新的方法来重新排序和选择LLM生成的最佳结果,而不需要额外的推理或训练特殊的reranker。这种方法基于生成之间的对比统计,计算成本很低。
  • results: 论文通过 teoretic 分析和仿真实验示出,这种方法可以帮助选择LLM生成的最佳$k$个结果,并且在代码生成、自动化ormalization和概要化等任务上都有强大的提升。此外,如果有更多的token概率信息可用,那么性能会更好。
    Abstract Large Language Models (LLMs) can exhibit considerable variation in the quality of their sampled outputs. Reranking and selecting the best generation from the sampled set is a popular way of obtaining strong gains in generation quality. In this paper, we present a novel approach for reranking LLM generations. Unlike other techniques that might involve additional inferences or training a specialized reranker, our approach relies on easy to compute pairwise statistics between the generations that have minimal compute overhead. We show that our approach can be formalized as an extension of self-consistency and analyze its performance in that framework, theoretically as well as via simulations. We show strong improvements for selecting the best $k$ generations for code generation tasks as well as robust improvements for best generation for the tasks of autoformalization, and summarization. While our approach only assumes black-box access to LLMs, we show that additional access to token probabilities can improve performance even further.
    摘要 大语言模型(LLM)可以具有较大的输出质量差异。重新排序和选择最佳一代是一种常见的方法来提高生成质量。在这篇论文中,我们提出了一种新的重新排序LLM生成的方法。与其他技术不同,我们的方法不需要额外的推理或训练专门的重新排序器,而是基于易于计算的对生成的对应统计。我们证明了我们的方法可以视为自相关性的扩展,并在这个框架下分析其性能,包括理论分析和仿真分析。我们表明了在代码生成任务和自动化ormalization、概要化任务中可以获得强大的改进。我们的方法只需要黑盒访问LLM,但我们还表明了通过获得Token概率可以进一步提高性能。

Bio-Inspired Night Image Enhancement Based on Contrast Enhancement and Denoising

  • paper_url: http://arxiv.org/abs/2307.05447
  • repo_url: None
  • paper_authors: Xinyi Bai, Steffi Agino Priyanka, Hsiao-Jung Tung, Yuankai Wang
  • for: 提高夜间图像质量,以提高智能监测系统的检测和识别精度。
  • methods: 提出了一种基于生物体的图像增强算法,通过提高亮度和对比度,同时降低噪声,将低照度图像转化为更加亮丽和清晰的图像。
  • results: 对实验和模拟实验进行测试,Result show了提案算法的优势,比contrast pair、Meylan和Retinex等方法更有优势。
    Abstract Due to the low accuracy of object detection and recognition in many intelligent surveillance systems at nighttime, the quality of night images is crucial. Compared with the corresponding daytime image, nighttime image is characterized as low brightness, low contrast and high noise. In this paper, a bio-inspired image enhancement algorithm is proposed to convert a low illuminance image to a brighter and clear one. Different from existing bio-inspired algorithm, the proposed method doesn't use any training sequences, we depend on a novel chain of contrast enhancement and denoising algorithms without using any forms of recursive functions. Our method can largely improve the brightness and contrast of night images, besides, suppress noise. Then we implement on real experiment, and simulation experiment to test our algorithms. Both results show the advantages of proposed algorithm over contrast pair, Meylan and Retinex.
    摘要 因为许多智能监视系统夜间物体检测和识别精度低,夜间图像质量非常重要。与白天图像相比,夜间图像具有低亮度、低对比度和高噪声特点。在这篇论文中,我们提出了一种生物体会得到的图像加强算法,以提高低照度图像的亮度和清晰度。与现有的生物体算法不同,我们不使用任何训练序列,而是基于一个新的对比增强和释放算法,不使用任何回归函数。我们的方法可以大幅提高夜间图像的亮度和对比度,同时减少噪声。然后我们在实验和模拟实验中测试了我们的算法,结果表明我们的算法在对比对和昂 Meylan 和 Retinex 方面具有优势。

ISLTranslate: Dataset for Translating Indian Sign Language

  • paper_url: http://arxiv.org/abs/2307.05440
  • repo_url: https://github.com/exploration-lab/isltranslate
  • paper_authors: Abhinav Joshi, Susmit Agrawal, Ashutosh Modi
  • for: bridging the communication gap between the hard-of-hearing community and the rest of the population
  • methods: using a translation dataset for continuous Indian Sign Language (ISL) consisting of 31k ISL-English sentence/phrase pairs
  • results: providing a detailed analysis of the dataset and benchmarking the performance of existing end-to-end Sign language to spoken language translation systems using a transformer-based model for ISL translation.Here’s the information in Simplified Chinese text:
  • for: bridging the communication gap between听力弱化的社区和普通人口
  • methods: 使用印度手语译文件(ISLTranslate),包含31k个印度手语-英语句子/短语对
  • results: 提供了详细的数据分析,并对现有的端到端手语到口语翻译系统的性能进行了 benchmarking,使用基于 transformer 的 ISL 翻译模型。
    Abstract Sign languages are the primary means of communication for many hard-of-hearing people worldwide. Recently, to bridge the communication gap between the hard-of-hearing community and the rest of the population, several sign language translation datasets have been proposed to enable the development of statistical sign language translation systems. However, there is a dearth of sign language resources for the Indian sign language. This resource paper introduces ISLTranslate, a translation dataset for continuous Indian Sign Language (ISL) consisting of 31k ISL-English sentence/phrase pairs. To the best of our knowledge, it is the largest translation dataset for continuous Indian Sign Language. We provide a detailed analysis of the dataset. To validate the performance of existing end-to-end Sign language to spoken language translation systems, we benchmark the created dataset with a transformer-based model for ISL translation.
    摘要 签语是许多听力不佳人群的主要沟通方式。近些年,为bridging听力不佳社区和一般人群之间的沟通差距,一些签语翻译数据集已经被提议,以便开发统计签语翻译系统。然而,印度签语资源匮乏。这篇资源文章介绍了ISLTranslate,一个包含31k ISL-英语句子/短语对的不间断印度签语翻译数据集。根据我们所知,这是最大的签语翻译数据集。我们提供了细化分析。为验证现有的签语到口语翻译系统的性能,我们对创建的数据集进行了基于transformer的ISL翻译模型的测试。Note: "签语" (GS) in Simplified Chinese refers to sign language, and "听力不佳" (DW) refers to hearing impairment.

Named entity recognition using GPT for identifying comparable companies

  • paper_url: http://arxiv.org/abs/2307.07420
  • repo_url: None
  • paper_authors: Eurico Covas
  • For: 本研究旨在提高 comparable companies 方法的精度和成功率,并用于评估私人公司的估值。* Methods: 本研究使用大语言模型(LLM),如 openaAI 的 GPT,以及自然语言处理(NLP)技术,对公司描述或 Wikipedia 网站上的公司描述进行相似性分析。* Results: 研究表明,使用 LLM 比使用标准命名实体识别(NER)更有精度和成功率,并可以创建适当的相似公司 peer group,用于评估私人公司的估值。
    Abstract For both public and private firms, comparable companies analysis is widely used as a method for company valuation. In particular, the method is of great value for valuation of private equity companies. The several approaches to the comparable companies method usually rely on a qualitative approach to identifying similar peer companies, which tends to use established industry classification schemes and/or analyst intuition and knowledge. However, more quantitative methods have started being used in the literature and in the private equity industry, in particular, machine learning clustering, and natural language processing (NLP). For NLP methods, the process consists of extracting product entities from e.g., the company's website or company descriptions from some financial database system and then to perform similarity analysis. Here, using companies descriptions/summaries from publicly available companies' Wikipedia websites, we show that using large language models (LLMs), such as GPT from openaAI, has a much higher precision and success rate than using the standard named entity recognition (NER) which uses manual annotation. We demonstrate quantitatively a higher precision rate, and show that, qualitatively, it can be used to create appropriate comparable companies peer groups which can then be used for equity valuation.
    摘要 Translated into Simplified Chinese:对于公共和私人公司来说,相似公司分析是广泛使用的公司估价方法。特别是对于私人Equity公司的估价,这种方法具有很高的价值。不同的方法通常采用 качеitative方法来确定类似 peer 公司,这些方法常常使用Established的行业分类 schemes和/或分析师的Intuition和知识。然而,更Quantitative的方法在文献和私人Equity行业中得到了更多的应用,例如机器学习 clustering和自然语言处理(NLP)。对NLP方法来说,过程包括从公司网站或Financial数据库系统中提取产品实体,并 then perform similarity analysis。在这里,我们使用公司Wikipedia网站上公开的公司描述/概要,并使用大语言模型(LLMs),如openaAI中的GPT,来实现更高的准确率和成功率,而不是使用标准的名实体识别(NER),NER使用手动批注。我们量化地示出了更高的准确率,并显示,质量地,可以使用这种方法来创建合适的类似公司 peer group,然后用于股票估价。

3D detection of roof sections from a single satellite image and application to LOD2-building reconstruction

  • paper_url: http://arxiv.org/abs/2307.05409
  • repo_url: None
  • paper_authors: Johann Lussange, Mulin Yu, Yuliya Tarabalka, Florent Lafarge
  • for: 这个论文的目的是 reconstruction 三角形urbane areas out of satellite raster images.
  • methods: 该方法包括两个新特征:一是基于深度学习的3D建筑瓦片探测,二是只需一个非正交的卫星照片作为模型输入。这是通过两步进行的:首先,一个Mask R-CNN模型进行2D分割建筑的瓦片部分,然后将这些分割的像素混合到RGB卫星照片中,并在第二步中,另一个相同的Mask R-CNN模型通过人工分割来推算瓦片部分的高度到地面的准确性,从而实现了建筑和城市的3D重建。
  • results: 该方法的可能性被证明了,通过在几分钟内重建不同的城市区域,Jaccard指数为2D分割个瓦片部分的88.55%和75.21%,以及3D重建中 correctly segmented pixels的高度差的平均错误为1.60米和2.06米。
    Abstract Reconstructing urban areas in 3D out of satellite raster images has been a long-standing and challenging goal of both academical and industrial research. The rare methods today achieving this objective at a Level Of Details $2$ rely on procedural approaches based on geometry, and need stereo images and/or LIDAR data as input. We here propose a method for urban 3D reconstruction named KIBS(\textit{Keypoints Inference By Segmentation}), which comprises two novel features: i) a full deep learning approach for the 3D detection of the roof sections, and ii) only one single (non-orthogonal) satellite raster image as model input. This is achieved in two steps: i) by a Mask R-CNN model performing a 2D segmentation of the buildings' roof sections, and after blending these latter segmented pixels within the RGB satellite raster image, ii) by another identical Mask R-CNN model inferring the heights-to-ground of the roof sections' corners via panoptic segmentation, unto full 3D reconstruction of the buildings and city. We demonstrate the potential of the KIBS method by reconstructing different urban areas in a few minutes, with a Jaccard index for the 2D segmentation of individual roof sections of $88.55\%$ and $75.21\%$ on our two data sets resp., and a height's mean error of such correctly segmented pixels for the 3D reconstruction of $1.60$ m and $2.06$ m on our two data sets resp., hence within the LOD2 precision range.
    摘要 traditional Chinese:重建城市区域在3D级别出现了长期的学术和工业研究的挑战。今天的方法只有在Level Of Details 2(LOD2)级别达到这个目标,并且需要遮盖图像和/或激光雷达数据作为输入。我们在这里提出了一种名为KIBS(关键点推断 BY 分割)的城市3D重建方法,它包含两个新特点:1. 基于深度学习的3D瓦屋部分检测方法,可以准确地检测各个建筑物的瓦屋部分。2. 只需一个非对称的卫星照片图像作为模型输入,从而简化了输入数据的需求。这些方法在两个步骤中实现:1. 使用Mask R-CNN模型对建筑物的瓦屋部分进行2D分割,并将这些分割后的像素混合到RGB卫星照片图像中。2. 使用另一个相同的Mask R-CNN模型对瓦屋部分的角落进行高度推断,从而实现了建筑物和城市的3D重建。我们通过使用KIBS方法重建不同的城市区域,并证明了该方法的可行性。在我们的两个数据集上,2D分割的瓦屋部分Jaccard指数为88.55%和75.21%,而3D重建中高度的平均误差为1.60米和2.06米。这些结果表明KIBS方法在LOD2级别内达到了高度精度。

Domain-Agnostic Neural Architecture for Class Incremental Continual Learning in Document Processing Platform

  • paper_url: http://arxiv.org/abs/2307.05399
  • repo_url: https://github.com/mateusz-wojcik-97/domain-agnostic-architecture
  • paper_authors: Mateusz Wójcik, Witold Kościukiewicz, Mateusz Baran, Tomasz Kajdanowicz, Adam Gonczarek
  • for: 这个论文主要旨在解决复杂系统中ML体系高效可 reuse的问题,具体是在流式数据下进行分类问题。
  • methods: 该论文提出了一种基于混合专家模型的完全可导的体系,可以在每个类例 separately 进行训练高性能的分类器。
  • results: 经过了大量的实验证明,该方法可以在多个领域中达到最佳性能,并且可以在生产环境中进行在线学习,不需要内存缓存。与参考方法相比,该方法显著超越了其性能。
    Abstract Production deployments in complex systems require ML architectures to be highly efficient and usable against multiple tasks. Particularly demanding are classification problems in which data arrives in a streaming fashion and each class is presented separately. Recent methods with stochastic gradient learning have been shown to struggle in such setups or have limitations like memory buffers, and being restricted to specific domains that disable its usage in real-world scenarios. For this reason, we present a fully differentiable architecture based on the Mixture of Experts model, that enables the training of high-performance classifiers when examples from each class are presented separately. We conducted exhaustive experiments that proved its applicability in various domains and ability to learn online in production environments. The proposed technique achieves SOTA results without a memory buffer and clearly outperforms the reference methods.
    摘要 生产部署在复杂系统中需要机器学习建筑高效可重用,能够对多个任务进行学习。特别是在流动数据中进行分类问题时, recient方法 based on stochastic gradient learning 可能会在这种设置下遇到问题或有限制,如内存缓存和特定领域约束,这限制了它们在实际场景中的使用。为此,我们提出了一种完全可导的混合专家模型基础 architecture,可以在每个类别的示例被分别传输时,训练高性能的分类器。我们进行了广泛的实验,证明了它在不同领域中的可应用性和在生产环境中的在线学习能力。我们的方法在SOTA结果中获得了无缓存的优势,并明显超过了参考方法。

cs.CL - 2023-07-12

Ashaar: Automatic Analysis and Generation of Arabic Poetry Using Deep Learning Approaches

  • paper_url: http://arxiv.org/abs/2307.06218
  • repo_url: https://github.com/arbml/ashaar
  • paper_authors: Zaid Alyafeai, Maged S. Al-Shaibani, Moataz Ahmed
  • for: 本研究旨在开发一个名为\textit{Ashaar}的框架,用于分析和生成阿拉伯诗歌。
  • methods: 该框架包括了多种诗歌方面的数据集和预训练模型,可以进行诗歌的米特、主题和时期分类,以及自动生成诗歌的字符串。
  • results: 研究人员通过采用这个框架,可以自动检测和分类诗歌的不同方面,并生成符合不同主题和时期的诗歌。此外,还提供了四个数据集,用于诗歌生成、字符串分类、以及阿鲁迪风格预测。
    Abstract Poetry holds immense significance within the cultural and traditional fabric of any nation. It serves as a vehicle for poets to articulate their emotions, preserve customs, and convey the essence of their culture. Arabic poetry is no exception, having played a cherished role in the heritage of the Arabic community throughout history and maintaining its relevance in the present era. Typically, comprehending Arabic poetry necessitates the expertise of a linguist who can analyze its content and assess its quality. This paper presents the introduction of a framework called \textit{Ashaar} https://github.com/ARBML/Ashaar, which encompasses a collection of datasets and pre-trained models designed specifically for the analysis and generation of Arabic poetry. The pipeline established within our proposed approach encompasses various aspects of poetry, such as meter, theme, and era classification. It also incorporates automatic poetry diacritization, enabling more intricate analyses like automated extraction of the \textit{Arudi} style. Additionally, we explore the feasibility of generating conditional poetry through the pre-training of a character-based GPT model. Furthermore, as part of this endeavor, we provide four datasets: one for poetry generation, another for diacritization, and two for Arudi-style prediction. These datasets aim to facilitate research and development in the field of Arabic poetry by enabling researchers and enthusiasts to delve into the nuances of this rich literary tradition.
    摘要 文学在任何国家的文化和传统中具有极大的重要性。它作为诗人表达情感、保存习俗和传递文化精神的媒介。阿拉伯诗歌也不例外,历史上一直具有阿拉伯社会珍贵的地位,并在当今仍然保持其重要性。通常,理解阿拉伯诗歌需要语言专家的帮助,分析其内容和评价质量。本文提出了一个名为《Ashaar》的框架,包括特定 для阿拉伯诗歌分析和生成的数据集和预训练模型。我们的提posed方法包括诗歌的米特、主题和时期分类,以及自动诗歌 диакритика,以便进行更加细致的分析,如自动提取Arudi风格。此外,我们还探索了基于人物的GPT模型预训练 Conditional Poetry 的可能性。此外,作为这项努力的一部分,我们提供了四个数据集:一个用于诗歌生成,一个用于 диаcritization,以及两个用于 Arudi 风格预测。这些数据集的目的是促进阿拉伯诗歌研究和发展,让研究人员和爱好者可以更深入探索这一丰富的文学传统。

Detecting the Presence of COVID-19 Vaccination Hesitancy from South African Twitter Data Using Machine Learning

  • paper_url: http://arxiv.org/abs/2307.15072
  • repo_url: None
  • paper_authors: Nicholas Perikli, Srimoy Bhattacharya, Blessing Ogbuokiri, Zahra Movahedi Nia, Benjamin Lieberman, Nidhi Tripathi, Salah-Eddine Dahbi, Finn Stevenson, Nicola Bragazzi, Jude Kong, Bruce Mellado
    for: 这个研究的目的是使用 sentiment analysis 分析南非用户生成的 tweet 中对疫苗不确定性的看法,以培养基于 AI 的分类模型并评估其可靠性。methods: 这个研究使用了 LSTM、bi-LSTM、SVM、BERT-base-cased 和 RoBERTa-base 模型,并且在 WandB 平台上仔细调整了这些模型的超参数。研究还使用了两种不同的预处理方法来比较:一种是 semantics-based,另一种是 corpus-based。results: 研究发现所有模型都有 45%-55% 的低 F1 分数,只有 BERT 和 RoBERTa 两个模型达到了显著更高的 F1 分数(60% 和 61%)。对于 RoBERTa 模型的错分 tweet 进行了 LDA 主题分析,以了解如何进一步提高模型准确性。
    Abstract Very few social media studies have been done on South African user-generated content during the COVID-19 pandemic and even fewer using hand-labelling over automated methods. Vaccination is a major tool in the fight against the pandemic, but vaccine hesitancy jeopardizes any public health effort. In this study, sentiment analysis on South African tweets related to vaccine hesitancy was performed, with the aim of training AI-mediated classification models and assessing their reliability in categorizing UGC. A dataset of 30000 tweets from South Africa were extracted and hand-labelled into one of three sentiment classes: positive, negative, neutral. The machine learning models used were LSTM, bi-LSTM, SVM, BERT-base-cased and the RoBERTa-base models, whereby their hyperparameters were carefully chosen and tuned using the WandB platform. We used two different approaches when we pre-processed our data for comparison: one was semantics-based, while the other was corpus-based. The pre-processing of the tweets in our dataset was performed using both methods, respectively. All models were found to have low F1-scores within a range of 45$\%$-55$\%$, except for BERT and RoBERTa which both achieved significantly better measures with overall F1-scores of 60$\%$ and 61$\%$, respectively. Topic modelling using an LDA was performed on the miss-classified tweets of the RoBERTa model to gain insight on how to further improve model accuracy.
    摘要 很少有关于南非用户生成内容的社交媒体研究在COVID-19大流行期间进行,而且使用手动标注而不是自动方法更少。疫苗是战胜COVID-19的重要工具,但是疫苗拒绝会对公共健康努力产生威胁。本研究通过对南非推文中的疫苗拒绝 sentiment 分析,以训练 AI 承载分类模型并评估其可靠性。研究采集了30000条南非推文,并 manually 标注为一个sentiment类型:正面、负面或中性。使用的机器学习模型包括 LSTM、bi-LSTM、SVM、BERT-base-cased 和 RoBERTa-base 模型,其中 hyperparameter 通过WandB平台仔细调整。我们使用了两种不同的方法进行数据预处理,以便比较:一种是基于 semantics,另一种是基于 corpus。对于我们的数据集,我们使用了两种预处理方法,分别对应这两种方法。所有模型都显示了45%-55%的低 F1 分数,只有 BERT 和 RoBERTa 两个模型显示了明显更好的性能,其中 F1 分数分别为 60% 和 61%。使用 LDA 进行主题分析,以获取 RoBERTa 模型中错误分类的推文,以了解如何进一步改进模型准确性。

Sumformer: A Linear-Complexity Alternative to Self-Attention for Speech Recognition

  • paper_url: http://arxiv.org/abs/2307.07421
  • repo_url: None
  • paper_authors: Titouan Parcollet, Rogier van Dalen, Shucong Zhang, Sourav Bhattacharya
  • for: 提高Speech recognition系统的效率和可扩展性
  • methods: 提出了一种linear-time的自注意力alternative方法,通过计算整个语音段的均值来概括整个语音段,然后与时间特定信息相结合
  • results: 在state-of-the-art ASR模型中引入Summary Mixing后,可以保持或超越之前的语音识别性能,同时降低训练和推理时间和内存预算,相对降低27%和减少一半
    Abstract Modern speech recognition systems rely on self-attention. Unfortunately, token mixing with self-attention takes quadratic time in the length of the speech utterance, slowing down inference as well as training and increasing memory consumption. Cheaper alternatives to self-attention for ASR have been developed, but fail to consistently reach the same level of accuracy. In practice, however, the self-attention weights of trained speech recognizers take the form of a global average over time. This paper, therefore, proposes a linear-time alternative to self-attention for speech recognition. It summarises a whole utterance with the mean over vectors for all time steps. This single summary is then combined with time-specific information. We call this method ``Summary Mixing''. Introducing Summary Mixing in state-of-the-art ASR models makes it feasible to preserve or exceed previous speech recognition performance while lowering the training and inference times by up to 27% and reducing the memory budget by a factor of two.
    摘要 现代语音识别系统倚靠自注意。不幸地,自注意与语音识别的混合计算时间平方增长,从而降低了推理和训练的速度以及内存占用。但是,尝试使用更便宜的自注意代替方法来实现语音识别,却无法一直稳定地达到同等级别的准确率。在实践中,已经训练过的语音识别模型中的自注意 веса呈全 UTC 的平均值。这篇论文因此提议一种 linear-time 的自注意替代方法,将整个声音utterance 概括为所有时间步骤的 vectors 的mean。这个概括然后与时间特定信息结合。我们称之为“概括混合”(Summary Mixing)。在现有的语音识别模型中引入概括混合后,可以保持或超越之前的语音识别性能,同时降低训练和推理时间(最多下降27%)以及内存预算(减半)。

Enhancing Portuguese Sign Language Animation with Dynamic Timing and Mouthing

  • paper_url: http://arxiv.org/abs/2307.06124
  • repo_url: None
  • paper_authors: Inês Lacerda, Hugo Nicolau, Luisa Coheur
  • for: 这篇论文的目的是提出一种新的动态方法来处理译注语言手势的过渡动画,尤其是葡萄牙手语的嘴部动画。
  • methods: 这篇论文使用了native signers的口语动画和没有动画的控制组进行比较,以评估动画的影响。
  • results: 研究发现,使用 mouthing 动画可以提高译注语言学习者对手势的理解和感知自然性,但没有显著的差异在Native signers中。这些结果有关于计算机语言学、人机交互和 sintetic 手势人物动画的应用。
    Abstract Current signing avatars are often described as unnatural as they cannot accurately reproduce all the subtleties of synchronized body behaviors of a human signer. In this paper, we propose a new dynamic approach for transitions between signs, focusing on mouthing animations for Portuguese Sign Language. Although native signers preferred animations with dynamic transitions, we did not find significant differences in comprehension and perceived naturalness scores. On the other hand, we show that including mouthing behaviors improved comprehension and perceived naturalness for novice sign language learners. Results have implications in computational linguistics, human-computer interaction, and synthetic animation of signing avatars.
    摘要 当前的签名人物通常被描述为不自然,因为它们无法准确地复制人类签名者的同步身体行为的细微变化。在这篇论文中,我们提出了一种新的动态方法,专注于葡萄牙手语的嘴部动画。虽然本地签名者喜欢使用动态过渡的动画,但我们没有发现显著的差异在理解和自然感 scores。然而,我们发现包含嘴部行为可以提高理解和自然感 scores for novice 手语学习者。结果有关计算语言学、人机交互和 sintetic 签名人物动画的应用。

Interpreting deep embeddings for disease progression clustering

  • paper_url: http://arxiv.org/abs/2307.06060
  • repo_url: None
  • paper_authors: Anna Munoz-Farre, Antonios Poulakakis-Daktylidis, Dilini Mahesha Kothalawala, Andrea Rodriguez-Martinez
  • for: 针对类型2糖尿病患者的 clustering 分析
  • methods: 使用深度嵌入进行解释,并在UK Biobank dataset上进行评估
  • results: 提供了临床有意义的疾病进程模式的洞察Translation:
  • for: Targeting patient clustering analysis for type 2 diabetes
  • methods: Using deep embeddings for explanation, and evaluated on the UK Biobank dataset
  • results: Provided clinically meaningful insights into disease progression patterns
    Abstract We propose a novel approach for interpreting deep embeddings in the context of patient clustering. We evaluate our approach on a dataset of participants with type 2 diabetes from the UK Biobank, and demonstrate clinically meaningful insights into disease progression patterns.
    摘要 我们提出了一种新的方法来解释深度嵌入在患者划分中的应用。我们在UK Biobank dataset上评估了我们的方法,并发现了对疾病进程的深刻理解。Note: "深度嵌入" (shēngrán zhù) in Chinese refers to deep learning models, specifically neural networks with multiple layers. "患者划分" (huàizěr bùfèn) means patient clustering, and "疾病进程" (jiàojiè jìnèsè) refers to the progression of a disease.

A Study on the Appropriate size of the Mongolian general corpus

  • paper_url: http://arxiv.org/abs/2307.06050
  • repo_url: None
  • paper_authors: Sunsoo Choi, Ganbat Tsend
  • For: This paper aims to determine the appropriate size of the Mongolian general corpus.* Methods: The study uses the Heaps function and Type Token Ratio (TTR) to determine the appropriate size of the corpus.* Results: The study found that an appropriate size for a Mongolian general corpus is from 39 to 42 million tokens, based on the observation of changes in the number of types and TTR values while increasing the number of tokens.Here is the same information in Simplified Chinese text:* For: 这个研究的目的是确定蒙古通用词库的合适大小。* Methods: 这个研究使用堆函数和类型Token比率(TTR)来确定词库的合适大小。* Results: 研究发现,蒙古通用词库的合适大小在39到42百万个字之间,根据增加字符数时类型和TTR值的变化观察结果。
    Abstract This study aims to determine the appropriate size of the Mongolian general corpus. This study used the Heaps function and Type Token Ratio to determine the appropriate size of the Mongolian general corpus. The sample corpus of 906,064 tokens comprised texts from 10 domains of newspaper politics, economy, society, culture, sports, world articles and laws, middle and high school literature textbooks, interview articles, and podcast transcripts. First, we estimated the Heaps function with this sample corpus. Next, we observed changes in the number of types and TTR values while increasing the number of tokens by one million using the estimated Heaps function. As a result of observation, we found that the TTR value hardly changed when the number of tokens exceeded from 39 to 42 million. Thus, we conclude that an appropriate size for a Mongolian general corpus is from 39 to 42 million tokens.
    摘要 Simplified Chinese:这个研究的目标是确定蒙古通用词汇库的合适大小。这个研究使用堆函数和类型Token比率来确定蒙古通用词汇库的合适大小。样本句子集中包括10个领域的报纸政治、经济、社会、文化、体育、世界文章和法律、中学和高中文学教材、采访文章和Podcast脚本等10个领域的文本。首先,我们使用这个样本句子集来估算堆函数。然后,我们观察增加一百万个字时,类型和TTR值的变化,使用估算的堆函数来进行观察。结果发现,TTR值几乎不变化,当字符数超过39到42百万时。因此,我们认为一个合适的蒙古通用词汇库的大小是39到42百万个字。

Pluggable Neural Machine Translation Models via Memory-augmented Adapters

  • paper_url: http://arxiv.org/abs/2307.06029
  • repo_url: https://github.com/urvashik/knnmt
  • paper_authors: Yuzhuang Xu, Shuo Wang, Peng Li, Xuebo Liu, Xiaolong Wang, Weidong Liu, Yang Liu
  • for: 用于控制神经机器翻译模型的生成行为,满足不同用户需求。
  • methods: 使用内存增强器,可插入预训练的神经机器翻译模型,以便在不同用户需求下进行可控的生成。
  • results: 比 Representatives pluggable baseline 高效,在样式和领域特定 экспериментах中 validate our approach.
    Abstract Although neural machine translation (NMT) models perform well in the general domain, it remains rather challenging to control their generation behavior to satisfy the requirement of different users. Given the expensive training cost and the data scarcity challenge of learning a new model from scratch for each user requirement, we propose a memory-augmented adapter to steer pretrained NMT models in a pluggable manner. Specifically, we construct a multi-granular memory based on the user-provided text samples and propose a new adapter architecture to combine the model representations and the retrieved results. We also propose a training strategy using memory dropout to reduce spurious dependencies between the NMT model and the memory. We validate our approach on both style- and domain-specific experiments and the results indicate that our method can outperform several representative pluggable baselines.
    摘要 Note: The above text is in Traditional Chinese, which is one of the two standard forms of Chinese. Simplified Chinese is the other standard form, and it is used in mainland China. Here is the translation of the text into Simplified Chinese: Although neural machine translation (NMT) models perform well in the general domain, it remains challenging to control their generation behavior to satisfy the requirements of different users. Given the expensive training cost and the data scarcity challenge of learning a new model from scratch for each user requirement, we propose a memory-augmented adapter to steer pretrained NMT models in a pluggable manner. Specifically, we construct a multi-granular memory based on the user-provided text samples and propose a new adapter architecture to combine the model representations and the retrieved results. We also propose a training strategy using memory dropout to reduce spurious dependencies between the NMT model and the memory. We validate our approach on both style- and domain-specific experiments, and the results indicate that our method can outperform several representative pluggable baselines.

PolyLM: An Open Source Polyglot Large Language Model

  • paper_url: http://arxiv.org/abs/2307.06018
  • repo_url: None
  • paper_authors: Xiangpeng Wei, Haoran Wei, Huan Lin, Tianhao Li, Pei Zhang, Xingzhang Ren, Mei Li, Yu Wan, Zhiwei Cao, Binbin Xie, Tianxiang Hu, Shangjie Li, Binyuan Hui, Bowen Yu, Dayiheng Liu, Baosong Yang, Fei Huang, Jun Xie
  • for: 这个研究旨在提高大型自然语言模型(LLM)的多语言能力,并提供一个多语言模型 PolyLM,可以在640亿个字元的数据上进行训练。
  • methods: 这个研究使用了两种方法来增强多语言能力:1)整合双语数据到训练数据中;2)使用一种课程学习策略,将非英语数据的比例从30%提升到60%。
  • results: 实验结果显示,PolyLM在多语言任务上表现出色,比其他开源模型LLaMA和BLOOM更好,同时在英语任务中也维持相似的表现。
    Abstract Large language models (LLMs) demonstrate remarkable ability to comprehend, reason, and generate following nature language instructions. However, the development of LLMs has been primarily focused on high-resource languages, such as English, thereby limiting their applicability and research in other languages. Consequently, we present PolyLM, a multilingual LLM trained on 640 billion (B) tokens, avaliable in two model sizes: 1.7B and 13B. To enhance its multilingual capabilities, we 1) integrate bilingual data into training data; and 2) adopt a curriculum learning strategy that increases the proportion of non-English data from 30% in the first stage to 60% in the final stage during pre-training. Further, we propose a multilingual self-instruct method which automatically generates 132.7K diverse multilingual instructions for model fine-tuning. To assess the model's performance, we collect several existing multilingual tasks, including multilingual understanding, question answering, generation, and translation. Extensive experiments show that PolyLM surpasses other open-source models such as LLaMA and BLOOM on multilingual tasks while maintaining comparable performance in English. Our models, alone with the instruction data and multilingual benchmark, are available at: \url{https://modelscope.cn/models/damo/nlp_polylm_13b_text_generation}.
    摘要 大型语言模型(LLM)表现出了惊人的理解、思维和生成能力,但是其发展受到了高resource语言,如英语的限制,因此它们的应用和研究在其他语言上受到了限制。为了解决这个问题,我们提出了PolyLM,一个多语言模型,在640亿token的训练数据中获得了两个模型大小:1.7B和13B。为提高其多语言能力,我们采用了以下两种方法:1. 将双语数据integrated到训练数据中;2. 采用一种学习策略,其中在第一个阶段,非英语数据占比为30%,在最后一个阶段升级到60%。此外,我们提出了一种多语言自我指导方法,可以自动生成132.7K多种多语言指导文本,用于模型细化。为评估模型的性能,我们收集了多种现有的多语言任务,包括多语言理解、问答、生成和翻译。广泛的实验表明,PolyLM在多语言任务上表现出优于其他开源模型,如LLaMA和BLOOM,同时在英语任务上保持相似的性能。我们的模型,以及指导数据和多语言准标,可以在以下链接中下载:

DDNAS: Discretized Differentiable Neural Architecture Search for Text Classification

  • paper_url: http://arxiv.org/abs/2307.06005
  • repo_url: https://github.com/ddnas/ddnas
  • paper_authors: Kuan-Chun Chen, Cheng-Te Li, Kuo-Jung Lee
  • for: 这篇论文是针对文本表现学ARNING的Neural Architecture Search(NAS)进行了创新的研究。
  • methods: 这篇论文提出了一种新的NAS方法,即粗粒度可微的神经建筑搜寻(DDNAS),它可以用梯度下降来优化搜寻。此外,论文还提出了一种新的粗粒度层,即互信息最大化层,用于模型文本表现中的隐藏顺序分类。
  • results: 实验结果显示,DDNAS可以在八个不同的实验数据集上连续性地超越现有的NAS方法。尽管DDNAS只使用了三种基本操作(即对缩、对缩和none)来组成NAS建筑块,但它的表现仍然很有 Promise和可以进一步提高。
    Abstract Neural Architecture Search (NAS) has shown promising capability in learning text representation. However, existing text-based NAS neither performs a learnable fusion of neural operations to optimize the architecture, nor encodes the latent hierarchical categorization behind text input. This paper presents a novel NAS method, Discretized Differentiable Neural Architecture Search (DDNAS), for text representation learning and classification. With the continuous relaxation of architecture representation, DDNAS can use gradient descent to optimize the search. We also propose a novel discretization layer via mutual information maximization, which is imposed on every search node to model the latent hierarchical categorization in text representation. Extensive experiments conducted on eight diverse real datasets exhibit that DDNAS can consistently outperform the state-of-the-art NAS methods. While DDNAS relies on only three basic operations, i.e., convolution, pooling, and none, to be the candidates of NAS building blocks, its promising performance is noticeable and extensible to obtain further improvement by adding more different operations.
    摘要 neural architecture search (NAS) 显示了可观的能力在文本表示学习中。然而,现有的文本基于 NAS neither performs a learnable fusion of neural operations to optimize the architecture,nor encodes the latent hierarchical categorization behind text input。这篇论文提出了一种新的 NAS 方法,Discretized Differentiable Neural Architecture Search (DDNAS),用于文本表示学习和分类。通过继续松散化架构表示,DDNAS 可以使用梯度下降优化搜索。我们还提出了一种新的笛卡尔层,通过最大化互信息来强制每个搜索节点模型文本表示中的隐藏层次分类。经验表明,DDNAS 可以在八种不同的实际数据集上具有稳定的高性能,并且可以进一步改进的添加更多不同的操作。Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need the translation in Traditional Chinese, please let me know.

Self-Distilled Quantization: Achieving High Compression Rates in Transformer-Based Language Models

  • paper_url: http://arxiv.org/abs/2307.05972
  • repo_url: None
  • paper_authors: James O’ Neill, Sourav Dutta
  • for: 研究对Transformer语言模型的普适性的影响,并提出一种新的自适度量化法(SDQ)来减少积累量化错误。
  • methods: 使用SDQ法对多语言模型XLM-R-Base和InfoXLM-Base进行量化,并证明两种模型可以从32位浮点数 weights 降低到8位整数 weights 而保持高水平性在XGLUEbenchmark中。
  • results: 研究结果表明,量化多语言模型具有普适性问题,需要涵盖它们没有精心调整的语言。
    Abstract We investigate the effects of post-training quantization and quantization-aware training on the generalization of Transformer language models. We present a new method called self-distilled quantization (SDQ) that minimizes accumulative quantization errors and outperforms baselines. We apply SDQ to multilingual models XLM-R-Base and InfoXLM-Base and demonstrate that both models can be reduced from 32-bit floating point weights to 8-bit integer weights while maintaining a high level of performance on the XGLUE benchmark. Our results also highlight the challenges of quantizing multilingual models, which must generalize to languages they were not fine-tuned on.
    摘要 我们研究了培训后量化和量化感知训练对转移语言模型的泛化性能的影响。我们提出了一种新的方法called自适应量化(SDQ),可以减少积累量化错误,并超过基eline。我们对多语言模型XLM-R-Base和InfoXLM-Base进行应用,并证明这两个模型可以从32位浮点数 weights降低到8位整数 weights而保持高水平的性能在XGLUE测试套件中。我们的结果还探讨了量化多语言模型的挑战,它们需要泛化到它们没有精度调整的语言上。

Prototypical Contrastive Transfer Learning for Multimodal Language Understanding

  • paper_url: http://arxiv.org/abs/2307.05942
  • repo_url: None
  • paper_authors: Seitaro Otsuki, Shintaro Ishikawa, Komei Sugiura
  • for: 本研究旨在提高家庭服务机器人对自然语言指令的理解,使其能够更好地与人类进行交互。
  • methods: 本研究使用了一种新的传输学习方法,即Prototypical Contrastive Transfer Learning(PCTL),其中使用了一种新的对比损失函数名为双protoNCE。
  • results: 实验表明,PCTL比现有方法高效,其中PCTL的准确率为78.1%,而简单的精度调整只能达到73.4%。
    Abstract Although domestic service robots are expected to assist individuals who require support, they cannot currently interact smoothly with people through natural language. For example, given the instruction "Bring me a bottle from the kitchen," it is difficult for such robots to specify the bottle in an indoor environment. Most conventional models have been trained on real-world datasets that are labor-intensive to collect, and they have not fully leveraged simulation data through a transfer learning framework. In this study, we propose a novel transfer learning approach for multimodal language understanding called Prototypical Contrastive Transfer Learning (PCTL), which uses a new contrastive loss called Dual ProtoNCE. We introduce PCTL to the task of identifying target objects in domestic environments according to free-form natural language instructions. To validate PCTL, we built new real-world and simulation datasets. Our experiment demonstrated that PCTL outperformed existing methods. Specifically, PCTL achieved an accuracy of 78.1%, whereas simple fine-tuning achieved an accuracy of 73.4%.
    摘要 尽管家用服务机器人预期能够为需要支持的个人提供帮助,但目前它们无法通过自然语言与人们互动流畅。例如,接受“帮我取 kitchen 里的瓶子”的指令时,大多数传统模型很难准确指定室内环境中的瓶子。大多数传统模型需要大量劳动集成的实际数据来训练,而没有充分利用通过传输学习框架的 simulated 数据。在这种研究中,我们提出了一种新的转移学习方法,称为 Prototypical Contrastive Transfer Learning(PCTL),它使用了一种新的对比损失函数,称为 Dual ProtoNCE。我们将 PCTL 应用于根据自由形式的自然语言指令在家庭环境中标识目标物品。为验证 PCTL,我们创建了新的实际世界和模拟数据集。我们的实验表明,PCTL 的精度为 78.1%,而简单的练习只达到了 73.4%。

Predictive Pipelined Decoding: A Compute-Latency Trade-off for Exact LLM Decoding

  • paper_url: http://arxiv.org/abs/2307.05908
  • repo_url: None
  • paper_authors: Seongjun Yang, Gibbeum Lee, Jaewoong Cho, Dimitris Papailiopoulos, Kangwook Lee
  • for: 这 paper 是为了提高 Large Language Models (LLMs) 的决策速度,而不会改变输出结果。
  • methods: 这 paper 使用了额外的计算资源,以并行起始后续的 token 解码过程,从而减少解码延迟。
  • results: 结果表明,通过使用额外的计算资源,可以加速 LLM 的决策速度,并且可以通过评估匹配率(p_correct)来估算减少的延迟量。
    Abstract This paper presents "Predictive Pipelined Decoding (PPD)," an approach that speeds up greedy decoding in Large Language Models (LLMs) while maintaining the exact same output as the original decoding. Unlike conventional strategies, PPD employs additional compute resources to parallelize the initiation of subsequent token decoding during the current token decoding. This innovative method reduces decoding latency and reshapes the understanding of trade-offs in LLM decoding strategies. We have developed a theoretical framework that allows us to analyze the trade-off between computation and latency. Using this framework, we can analytically estimate the potential reduction in latency associated with our proposed method, achieved through the assessment of the match rate, represented as p_correct. The results demonstrate that the use of extra computational resources has the potential to accelerate LLM greedy decoding.
    摘要

Exploring the Emotional and Mental Well-Being of Individuals with Long COVID Through Twitter Analysis

  • paper_url: http://arxiv.org/abs/2307.07558
  • repo_url: None
  • paper_authors: Guocheng Feng, Huaiyu Cai, Wei Quan
  • for: 了解长期 covid-19 患者的情绪和心理健康状况,以及他们关注的话题。
  • methods: 分析了 tweets 的内容,检测了六种基本情感的存在,并提取了主导话题。
  • results: 发现 throughout 研究时间段,负面情感占据了主导地位,并在一些关键时间点出现两次峰值,如新 covid 变种爆发时。
    Abstract The COVID-19 pandemic has led to the emergence of Long COVID, a cluster of symptoms that persist after infection. Long COVID patients may also experience mental health challenges, making it essential to understand individuals' emotional and mental well-being. This study aims to gain a deeper understanding of Long COVID individuals' emotional and mental well-being, identify the topics that most concern them, and explore potential correlations between their emotions and social media activity. Specifically, we classify tweets into four categories based on the content, detect the presence of six basic emotions, and extract prevalent topics. Our analyses reveal that negative emotions dominated throughout the study period, with two peaks during critical periods, such as the outbreak of new COVID variants. The findings of this study have implications for policy and measures for addressing the mental health challenges of individuals with Long COVID and provide a foundation for future work.
    摘要 COVID-19 大流行导致长期 COVID 出现,一群表现出持续性症状的患者。长期 COVID 患者可能也会经历心理健康挑战,因此了解个人情感和心理健康状况非常重要。本研究的目的是深入了解长期 COVID 个人情感和心理健康状况,确定他们最关心的话题,并探索他们情绪与社交媒体活动之间的可能相关性。我们将微博分为四类基于内容,检测表示六种基本情感的存在,并提取最常见的话题。我们的分析发现,研究期间全程具有负情感占据优势,有两个关键时期的高峰,如新冠变种爆发。本研究的发现对政策和addressing长期 COVID 患者的心理健康挑战提供了依据,并为未来工作提供了基础。

Improved POS tagging for spontaneous, clinical speech using data augmentation

  • paper_url: http://arxiv.org/abs/2307.05796
  • repo_url: None
  • paper_authors: Seth Kulick, Neville Ryant, David J. Irwin, Naomi Nevler, Sunghye Cho
  • for: 本研究旨在提高临床人群口语讲解词法标注的精度。
  • methods: 我们不使用域内treebank进行训练,而是使用新闻报道的out of domain treebank,并使用数据增强技术来使这些结构更像自然的口语。
  • results: 我们通过使用手动验证的POS标签测试并证实了使用数据增强技术训练的parser的性能提高。
    Abstract This paper addresses the problem of improving POS tagging of transcripts of speech from clinical populations. In contrast to prior work on parsing and POS tagging of transcribed speech, we do not make use of an in domain treebank for training. Instead, we train on an out of domain treebank of newswire using data augmentation techniques to make these structures resemble natural, spontaneous speech. We trained a parser with and without the augmented data and tested its performance using manually validated POS tags in clinical speech produced by patients with various types of neurodegenerative conditions.
    摘要 Translation notes:* "POS tagging" 被翻译为 "Part-of-speech 标注" ( particle 标注 )* "transcripts of speech" 被翻译为 " spoken language 笔记" ( 口语 笔记 )* "clinical populations" 被翻译为 "医学人群" ( 医学 人群 )* "prior work" 被翻译为 "先前的研究" ( 先前的 研究 )* "an in-domain treebank" 被翻译为 "一个领域 treebank" ( 一个 领域 treebank )* "out-of-domain treebank" 被翻译为 "外领域 treebank" ( 外领域 treebank )* "data augmentation techniques" 被翻译为 "数据扩充技术" ( 数据 扩充 技术 )* " manually validated POS tags" 被翻译为 "手动验证的 Part-of-speech 标注" ( 手动 验证 的 particle 标注 )

Large Language Models

  • paper_url: http://arxiv.org/abs/2307.05782
  • repo_url: https://github.com/lm-sys/FastChat
  • paper_authors: Michael R. Douglas
  • for: 这篇论文是为了介绍大语言模型(LLM)的发展和现状,以及这些模型在完成其他任务时的工作原理。
  • methods: 这篇论文使用了transformer架构,并详细介绍了这种架构的实现。
  • results: 论文介绍了LLM的发展历史和当前状况,以及模型如何在预测下一个单词的基础上完成其他任务。
    Abstract Artificial intelligence is making spectacular progress, and one of the best examples is the development of large language models (LLMs) such as OpenAI's GPT series. In these lectures, written for readers with a background in mathematics or physics, we give a brief history and survey of the state of the art, and describe the underlying transformer architecture in detail. We then explore some current ideas on how LLMs work and how models trained to predict the next word in a text are able to perform other tasks displaying intelligence.
    摘要 人工智能正在做出各种各样的进步,其中一个最出色的例子就是大型语言模型(LLM),如OpenAI的GPT系列。在这些讲座中,我们为有数学或物理背景的读者提供了简短的历史和现状概述,并对transformer架构进行详细介绍。然后,我们会详细介绍一些当前LLM工作原理的想法,以及如何通过预测下一个文本字符来实现其他智能任务的能力。

Neural Machine Translation Data Generation and Augmentation using ChatGPT

  • paper_url: http://arxiv.org/abs/2307.05779
  • repo_url: None
  • paper_authors: Wayne Yang, Garrett Nicolai
  • for: 用于替代手动创建的平行 corpora,以便更快速地和更cost-effectively进行机器翻译模型的训练。
  • methods: 利用生成语言模型创建的幻想平行 corpora,这些模型本身是在平行数据上训练的。
  • results: 实验发现,幻想的数据可以提高翻译信号,即使Domain clashes with the original dataset。
    Abstract Neural models have revolutionized the field of machine translation, but creating parallel corpora is expensive and time-consuming. We investigate an alternative to manual parallel corpora - hallucinated parallel corpora created by generative language models. Although these models are themselves trained on parallel data, they can leverage a multilingual vector space to create data, and may be able to supplement small manually-procured corpora. Our experiments highlight two key findings - despite a lack of diversity in their output, the hallucinated data improves the translation signal, even when the domain clashes with the original dataset.
    摘要 neural models 已经革命化了机器翻译领域,但创建平行资料是昂贵的和时间consuming。我们研究一种 altenative 到手动平行资料 - 由生成语言模型生成的幻想平行资料。这些模型本身已经在平行数据上训练,但它们可以利用多语言向量空间创建数据,并可能能够补充小型手动获取的资料。我们的实验发现了两个关键发现:尽管生成的输出没有多样性,但这些幻想数据可以提高翻译信号,即使领域与原始数据不符。

Towards Robust and Efficient Continual Language Learning

  • paper_url: http://arxiv.org/abs/2307.05741
  • repo_url: None
  • paper_authors: Adam Fisch, Amal Rannen-Triki, Razvan Pascanu, Jörg Bornschein, Angeliki Lazaridou, Elena Gribovskaya, Marc’Aurelio Ranzato
  • for: 本文旨在研究如何快速适应新任务,通过 continual learning 的视角,即继续使模型在过去任务的基础上进行微调,以 Transfer 有用的知识。
  • methods: 作者提出了一个新的任务序列审核标准,该标准包括了不同的转移enario,如高可能性转移、高可能性逆转移、无预期效果和混合等。理想的学习者应该能够充分利用所有可能带来积极转移的任务中的信息,同时避免任务的干扰。
  • results: 作者提出了一种简单 yet effective 的学习者,通过选择性地使用过去任务的检查点初始化新模型来实现。然而,这些学习者仍然存在限制,希望这个审核标准可以帮助社区建立和分析更好的学习者。
    Abstract As the application space of language models continues to evolve, a natural question to ask is how we can quickly adapt models to new tasks. We approach this classic question from a continual learning perspective, in which we aim to continue fine-tuning models trained on past tasks on new tasks, with the goal of "transferring" relevant knowledge. However, this strategy also runs the risk of doing more harm than good, i.e., negative transfer. In this paper, we construct a new benchmark of task sequences that target different possible transfer scenarios one might face, such as a sequence of tasks with high potential of positive transfer, high potential for negative transfer, no expected effect, or a mixture of each. An ideal learner should be able to maximally exploit information from all tasks that have any potential for positive transfer, while also avoiding the negative effects of any distracting tasks that may confuse it. We then propose a simple, yet effective, learner that satisfies many of our desiderata simply by leveraging a selective strategy for initializing new models from past task checkpoints. Still, limitations remain, and we hope this benchmark can help the community to further build and analyze such learners.
    摘要 (翻译中)随着语言模型的应用空间不断演化,人们 naturall 会问到如何快速适应新任务。我们从 kontinual learning 的视角出发,即继续在过去任务的基础上练习新任务,以实现知识传递。然而,这种策略也存在危险,即负面传递。在这篇论文中,我们构建了一个新的任务序列底层,targeting 不同的可能的传递enario,如高可能性正向传递、高可能性负面传递、无预期效果、或一种混合。理想的学习者应该能够充分利用所有可能实现正向传递的任务中的信息,而无需被任务所折衣。我们提议一种简单 yet effective 的学习者,通过选择性地从过去任务的检查点初始化新模型来满足许多我们的需求。然而,限制仍然存在,我们希望这个底层可以帮助社区进一步建立和分析这类学习者。

Stack More Layers Differently: High-Rank Training Through Low-Rank Updates

  • paper_url: http://arxiv.org/abs/2307.05695
  • repo_url: https://github.com/guitaricet/peft_pretraining
  • paper_authors: Vladislav Lialin, Namrata Shivagunde, Sherin Muckatira, Anna Rumshisky
  • for: 本文旨在探讨底层训练技术的可行性,以减少训练大型神经网络所需的计算资源。
  • methods: 本文提出了一种新的low-rank训练方法,称为ReLoRA,可以高效地训练大型神经网络。ReLoRA使用低级别更新来训练高级别网络。
  • results: 作者通过应用ReLoRA方法训练 pré-training transformer语言模型,并证明了与常规神经网络训练相比,ReLoRA可以具有相同的性能。此外,作者发现,随着模型的大小增加,ReLoRA的效率会随着增加。这些发现有助于理解低级别训练技术的潜在优势和扩展法律。
    Abstract Despite the dominance and effectiveness of scaling, resulting in large networks with hundreds of billions of parameters, the necessity to train overparametrized models remains poorly understood, and alternative approaches do not necessarily make it cheaper to train high-performance models. In this paper, we explore low-rank training techniques as an alternative approach to training large neural networks. We introduce a novel method called ReLoRA, which utilizes low-rank updates to train high-rank networks. We apply ReLoRA to pre-training transformer language models with up to 350M parameters and demonstrate comparable performance to regular neural network training. Furthermore, we observe that the efficiency of ReLoRA increases with model size, making it a promising approach for training multi-billion-parameter networks efficiently. Our findings shed light on the potential of low-rank training techniques and their implications for scaling laws.
    摘要 尽管拓扑和效果性的拓扑带来了大型神经网络中有百亿个参数的情况,但是训练过参数化模型的必要性并不很理解,而且寻求更便宜的训练方法并不一定能够提高高性能模型的训练成本。在这篇论文中,我们探索了低级别训练技术作为大神经网络训练的替代方法。我们提出了一种名为ReLoRA的新方法,它利用低级别更新来训练高级别网络。我们在预训练转换器语言模型中应用ReLoRA,并达到了与正常神经网络训练相同的性能。此外,我们发现ReLoRA的效率随着模型的大小而增长,这表明它是训练多亿参数网络的有效方法。我们的发现反映了低级别训练技术的潜在优势和拓扑法则的影响。

Empowering Cross-lingual Behavioral Testing of NLP Models with Typological Features

  • paper_url: http://arxiv.org/abs/2307.05454
  • repo_url: https://github.com/google-research/multi-morph-checklist
  • paper_authors: Ester Hlavnova, Sebastian Ruder
  • for: 本研究旨在探讨如何使NLG系统在不同语言类型的语言中进行普适化。
  • methods: 本研究提出了一种基于形态意识的测试框架M2C,用于测试NLG模型在12种语言中的行为。
  • results: 研究发现,现有语言模型在英语等语言中表现良好,但在一些语言特点上存在一些缺陷,如斯瓦希利语的时间表达和芬兰语的复合所有格。这些发现鼓励了开发更加具有抗逆势能力的模型。
    Abstract A challenge towards developing NLP systems for the world's languages is understanding how they generalize to typological differences relevant for real-world applications. To this end, we propose M2C, a morphologically-aware framework for behavioral testing of NLP models. We use M2C to generate tests that probe models' behavior in light of specific linguistic features in 12 typologically diverse languages. We evaluate state-of-the-art language models on the generated tests. While models excel at most tests in English, we highlight generalization failures to specific typological characteristics such as temporal expressions in Swahili and compounding possessives in Finish. Our findings motivate the development of models that address these blind spots.
    摘要 “世界各语言的自然语言处理(NLP)系统的开发受到了语言特征之间的通用性问题的挑战。为解决这个问题,我们提出了M2C框架,这是一个具有 morphological awareness的行为测试框架。我们使用M2C框架生成了12种语言中的特殊语言特征测试,并评估了现有的语言模型。结果发现,这些模型在英语上表现出色,但在其他语言中存在一些缺乏通用性的特征,例如在斯瓦希利语中的时间表达和在芬兰语中的复合所有格。我们的发现将推动开发更加通用的模型。”

Duncode Characters Shorter

  • paper_url: http://arxiv.org/abs/2307.05414
  • repo_url: https://github.com/laohur/duncode
  • paper_authors: Changshang Xue
  • for: 本研究探讨了文本转换中不同编码器的使用,将字符转换为字节。
  • methods: 本文讨论了本地编码器如ASCII和GB-2312,可以将特定字符转换为更短的字节;以及通用编码器如UTF-8和UTF-16,可以对 Unicode 集合进行更好的编码,但需要更多的空间。此外,文中还介绍了 SCSU、BOCU-1 和 binary 编码器,但它们缺乏自适应同步功能。
  • results: 本文引入了一种新的编码方法 called Duncode,可以高效地编码 Unicode 字符集,与本地编码器相似。它可以将多个字符串中的多个字符编码为 Duncode 单元,使用更少的字节。虽然 Duncode 缺乏自适应同步功能,但它在空间效率方面超过了 UTF8。应用程序可以在 \url{https://github.com/laohur/duncode} 上下载。此外,文中还开发了一个评估不同语言下编码器性能的 benchmark,可以在 \url{https://github.com/laohur/wiki2txt} 上下载。
    Abstract This paper investigates the employment of various encoders in text transformation, converting characters into bytes. It discusses local encoders such as ASCII and GB-2312, which encode specific characters into shorter bytes, and universal encoders like UTF-8 and UTF-16, which can encode the complete Unicode set with greater space requirements and are gaining widespread acceptance. Other encoders, including SCSU, BOCU-1, and binary encoders, however, lack self-synchronizing capabilities. Duncode is introduced as an innovative encoding method that aims to encode the entire Unicode character set with high space efficiency, akin to local encoders. It has the potential to compress multiple characters of a string into a Duncode unit using fewer bytes. Despite offering less self-synchronizing identification information, Duncode surpasses UTF8 in terms of space efficiency. The application is available at \url{https://github.com/laohur/duncode}. Additionally, we have developed a benchmark for evaluating character encoders across different languages. It encompasses 179 languages and can be accessed at \url{https://github.com/laohur/wiki2txt}.
    摘要

BLUEX: A benchmark based on Brazilian Leading Universities Entrance eXams

  • paper_url: http://arxiv.org/abs/2307.05410
  • repo_url: https://github.com/portuguese-benchmark-datasets/bluex
  • paper_authors: Thales Sales Almeida, Thiago Laitz, Giovana K. Bonás, Rodrigo Nogueira
  • for: The paper is written to address the lack of high-quality datasets for evaluating natural language processing (NLP) models in Portuguese, and to provide a new dataset called BLUEX for advancing the state-of-the-art in NLP in Portuguese.
  • methods: The paper introduces the BLUEX dataset, which consists of entrance exams from two leading universities in Brazil, and includes annotated metadata for evaluating NLP models on a variety of subjects. The dataset also includes recently administered exams that are unlikely to be included in the training data of many popular LMs as of 2023.
  • results: The paper establishes a benchmark for NLP models using BLUEX, and demonstrates the potential of the dataset for advancing the state-of-the-art in natural language understanding and reasoning in Portuguese. The results show that state-of-the-art LMs can be improved by fine-tuning them on BLUEX.Here is the simplified Chinese text for the three key information points:
  • for: 本研究目的是为了解决葡萄牙语自然语言处理(NLP)模型的评估数据缺乏问题,并提供了一个新的数据集 called BLUEX,以提高葡萄牙语 NLP 的状态。
  • methods: 本研究引入了 BLUEX 数据集,该数据集包括两所巴西领先大学的入学考试,并包含了多个主题的注解metadata,用于评估 NLP 模型的性能。此外,BLUEX 还包括了2023年春季administered的考试,这些考试 unlikely to be included in the training data of many popular LMs as of 2023。
  • results: 本研究 estabilishes a benchmark for NLP models using BLUEX,并示出了该数据集可以提高葡萄牙语 NLP 的状态。结果显示,可以通过 fine-tuning state-of-the-art LMs on BLUEX 来提高其性能。
    Abstract One common trend in recent studies of language models (LMs) is the use of standardized tests for evaluation. However, despite being the fifth most spoken language worldwide, few such evaluations have been conducted in Portuguese. This is mainly due to the lack of high-quality datasets available to the community for carrying out evaluations in Portuguese. To address this gap, we introduce the Brazilian Leading Universities Entrance eXams (BLUEX), a dataset of entrance exams from the two leading universities in Brazil: UNICAMP and USP. The dataset includes annotated metadata for evaluating the performance of NLP models on a variety of subjects. Furthermore, BLUEX includes a collection of recently administered exams that are unlikely to be included in the training data of many popular LMs as of 2023. The dataset is also annotated to indicate the position of images in each question, providing a valuable resource for advancing the state-of-the-art in multimodal language understanding and reasoning. We describe the creation and characteristics of BLUEX and establish a benchmark through experiments with state-of-the-art LMs, demonstrating its potential for advancing the state-of-the-art in natural language understanding and reasoning in Portuguese. The data and relevant code can be found at https://github.com/Portuguese-Benchmark-Datasets/BLUEX
    摘要 一些最新的语言模型(LM)研究中的趋势是使用标准化测试来评估。然而,虽然葡萄牙语是全球第五大语言,但是对于葡萄牙语的评估测试却相对罕见。这主要是因为葡萄牙语的高质量数据集没有被社区广泛使用。为了解决这个问题,我们介绍了巴西领先大学入学考试(BLUEX),这是来自巴西两所领先大学(UNICAMP和USP)的入学考试数据集。该数据集包括评注的metadata,用于评估语言模型在多种主题上的表现。此外,BLUEX还包括最近进行的考试,这些考试可能不会包含在许多流行的LM的训练数据中,特别是在2023年。数据集还被评注,以便确定每个问题中图像的位置,提供了进一步推进多模态语言理解和逻辑的 valuable 资源。我们描述了BLUEX的创建和特点,并通过对当前状态的LM进行实验,确立了它的潜在价值,以提高葡萄牙语自然语言理解和逻辑的状态。数据和相关代码可以在https://github.com/Portuguese-Benchmark-Datasets/BLUEX 找到。

cs.LG - 2023-07-12

Machine learning and Topological data analysis identify unique features of human papillae in 3D scans

  • paper_url: http://arxiv.org/abs/2307.06255
  • repo_url: None
  • paper_authors: Rayna Andreeva, Anwesha Sarkar, Rik Sarkar
  • for: 这个论文旨在探讨舌尖上的苔毛是否具有个性特征,以及这些特征是如何影响食物的嗅觉和口感感受。
  • methods: 该论文使用了计算机视觉和数据科学技术,对3D微显微镜像中的人类舌尖进行了计算和分析,揭示了舌尖的几何和拓扑特征的独特性。
  • results: 研究发现,舌尖的几何和拓扑特征是具有个性特征的,可以用来识别个体。模型使用了这些特征,可以准确地分类舌尖为不同类型,并且可以映射舌尖的空间布局。这些结果表明,舌尖可以作为一个唯一标识符,并且可以推动新的食品偏好和口腔诊断研究。
    Abstract The tongue surface houses a range of papillae that are integral to the mechanics and chemistry of taste and textural sensation. Although gustatory function of papillae is well investigated, the uniqueness of papillae within and across individuals remains elusive. Here, we present the first machine learning framework on 3D microscopic scans of human papillae (n = 2092), uncovering the uniqueness of geometric and topological features of papillae. The finer differences in shapes of papillae are investigated computationally based on a number of features derived from discrete differential geometry and computational topology. Interpretable machine learning techniques show that persistent homology features of the papillae shape are the most effective in predicting the biological variables. Models trained on these features with small volumes of data samples predict the type of papillae with an accuracy of 85%. The papillae type classification models can map the spatial arrangement of filiform and fungiform papillae on a surface. Remarkably, the papillae are found to be distinctive across individuals and an individual can be identified with an accuracy of 48% among the 15 participants from a single papillae. Collectively, this is the first unprecedented evidence demonstrating that tongue papillae can serve as a unique identifier inspiring new research direction for food preferences and oral diagnostics.
    摘要 tongues 表面上有一些特殊的皮质结构,它们是味蕾和口感感觉的机械和化学方面的重要组成部分。although the gustatory function of these papillae has been well studied, the uniqueness of papillae within and across individuals remains poorly understood. here, we present the first machine learning framework on 3D microscopic scans of human papillae (n = 2092), revealing the uniqueness of the geometric and topological features of papillae. we use computational methods to investigate the finer differences in papillae shapes based on a number of features derived from discrete differential geometry and computational topology. our interpretable machine learning models show that persistent homology features of the papillae shape are the most effective in predicting the biological variables. models trained on these features with small volumes of data samples predict the type of papillae with an accuracy of 85%. remarkably, the papillae are found to be distinctive across individuals, and an individual can be identified with an accuracy of 48% among the 15 participants from a single papillae. this is the first unprecedented evidence demonstrating that tongue papillae can serve as a unique identifier, inspiring new research directions for food preferences and oral diagnostics.

Identifiability Guarantees for Causal Disentanglement from Soft Interventions

  • paper_url: http://arxiv.org/abs/2307.06250
  • repo_url: https://github.com/uhlerlab/discrepancy_vae
  • paper_authors: Jiaqi Zhang, Chandler Squires, Kristjan Greenewald, Akash Srivastava, Karthikeyan Shanmugam, Caroline Uhler
  • for: 本研究旨在探讨如何通过 latent variable 来抽象数据,以实现 causal disentanglement。
  • methods: 本研究使用了 unpaired observational 和 intervenational 数据,并采用了一种基于 causal model 的方法来 indentify latent variable。
  • results: 研究结果表明,可以通过一种 generalized notion of faithfulness 来确保 causal model 的可 identificability,并且可以预测未经见过的混合干扰效应。
    Abstract Causal disentanglement aims to uncover a representation of data using latent variables that are interrelated through a causal model. Such a representation is identifiable if the latent model that explains the data is unique. In this paper, we focus on the scenario where unpaired observational and interventional data are available, with each intervention changing the mechanism of a latent variable. When the causal variables are fully observed, statistically consistent algorithms have been developed to identify the causal model under faithfulness assumptions. We here show that identifiability can still be achieved with unobserved causal variables, given a generalized notion of faithfulness. Our results guarantee that we can recover the latent causal model up to an equivalence class and predict the effect of unseen combinations of interventions, in the limit of infinite data. We implement our causal disentanglement framework by developing an autoencoding variational Bayes algorithm and apply it to the problem of predicting combinatorial perturbation effects in genomics.
    摘要 causal disentanglement aims to uncover a representation of data using latent variables that are interrelated through a causal model. such a representation is identifiable if the latent model that explains the data is unique. in this paper, we focus on the scenario where unpaired observational and interventional data are available, with each intervention changing the mechanism of a latent variable. when the causal variables are fully observed, statistically consistent algorithms have been developed to identify the causal model under faithfulness assumptions. we here show that identifiability can still be achieved with unobserved causal variables, given a generalized notion of faithfulness. our results guarantee that we can recover the latent causal model up to an equivalence class and predict the effect of unseen combinations of interventions, in the limit of infinite data. we implement our causal disentanglement framework by developing an autoencoding variational bayes algorithm and apply it to the problem of predicting combinatorial perturbation effects in genomics.Here's the text with some additional information about the translation:The text is translated into Simplified Chinese, which is the standard form of Chinese used in mainland China and widely used in other countries as well. The translation is written in a formal and technical style, using precise language and terminology to convey the meaning of the original text. The translation includes all the key concepts and ideas of the original text, including "causal disentanglement," "latent variables," "causal model," "identifiability," "faithfulness assumptions," "equivalence class," and "combinatorial perturbation effects." The translation also includes some additional information and context to help readers understand the text better.

Diffusion Based Multi-Agent Adversarial Tracking

  • paper_url: http://arxiv.org/abs/2307.06244
  • repo_url: None
  • paper_authors: Sean Ye, Manisha Natarajan, Zixuan Wu, Matthew Gombolay
  • for: 这篇论文的目的是提高自动追踪系统,以更好地帮助无人飞行、水面和水下车辆对抗走私者使用推测游戏和追踪技术。
  • methods: 这篇论文提出了一种称为Constrained Agent-based Diffusion for Enhanced Multi-Agent Tracking(CADENCE)的方法,它利用过去的稀疏状态信息来生成敌人位置的全面预测。
  • results: 这篇论文的实验结果显示,CADENCE方法的单目标和多目标追踪环境下的预测性能都高于所有基eline方法,尤其是在所有时间检查点上。
    Abstract Target tracking plays a crucial role in real-world scenarios, particularly in drug-trafficking interdiction, where the knowledge of an adversarial target's location is often limited. Improving autonomous tracking systems will enable unmanned aerial, surface, and underwater vehicles to better assist in interdicting smugglers that use manned surface, semi-submersible, and aerial vessels. As unmanned drones proliferate, accurate autonomous target estimation is even more crucial for security and safety. This paper presents Constrained Agent-based Diffusion for Enhanced Multi-Agent Tracking (CADENCE), an approach aimed at generating comprehensive predictions of adversary locations by leveraging past sparse state information. To assess the effectiveness of this approach, we evaluate predictions on single-target and multi-target pursuit environments, employing Monte-Carlo sampling of the diffusion model to estimate the probability associated with each generated trajectory. We propose a novel cross-attention based diffusion model that utilizes constraint-based sampling to generate multimodal track hypotheses. Our single-target model surpasses the performance of all baseline methods on Average Displacement Error (ADE) for predictions across all time horizons.
    摘要 Target tracking plays a crucial role in real-world scenarios, particularly in drug-trafficking interdiction, where the knowledge of an adversarial target's location is often limited. Improving autonomous tracking systems will enable unmanned aerial, surface, and underwater vehicles to better assist in interdicting smugglers that use manned surface, semi-submersible, and aerial vessels. As unmanned drones proliferate, accurate autonomous target estimation is even more crucial for security and safety. This paper presents Constrained Agent-based Diffusion for Enhanced Multi-Agent Tracking (CADENCE), an approach aimed at generating comprehensive predictions of adversary locations by leveraging past sparse state information. To assess the effectiveness of this approach, we evaluate predictions on single-target and multi-target pursuit environments, employing Monte-Carlo sampling of the diffusion model to estimate the probability associated with each generated trajectory. We propose a novel cross-attention based diffusion model that utilizes constraint-based sampling to generate multimodal track hypotheses. Our single-target model surpasses the performance of all baseline methods on Average Displacement Error (ADE) for predictions across all time horizons.Translation notes:* "Target tracking" is translated as "目标跟踪" (mùzhì yùcháng)* "Adversarial target" is translated as "反对目标" (fǎndǎo mùzhì)* "Autonomous tracking systems" is translated as "自主跟踪系统" (zìzhòu yùcháng jìtè)* "Unmanned aerial, surface, and underwater vehicles" is translated as "无人航空、表面、水下车辆" (wúrén hángkōng, biǎo miàn, shuǐ xià kā)* "Single-target" is translated as "单目标" (dān mùzhì)* "Multi-target" is translated as "多目标" (duō mùzhì)* "Pursuit environments" is translated as "追踪环境" (zhuīcháng yuánjì)* "Monte-Carlo sampling" is translated as "蒙特卡洛采样" (mēng tè kā luō cǎi yàng)* "Diffusion model" is translated as "扩散模型" (kuòchǎn móde)* "Constraint-based sampling" is translated as "约束基于采样" (guīshì jíyù yùcháng)* "Multimodal track hypotheses" is translated as "多模态跟踪假设" (duō módai yùcháng pinyì)* "Average Displacement Error" is translated as "平均偏移错误" (píngjì diānchēng xiǎngwù)

Reconstructing Spatiotemporal Data with C-VAEs

  • paper_url: http://arxiv.org/abs/2307.06243
  • repo_url: https://github.com/ciic-c-t-polytechnic-of-leiria/reconstr_cvae_paper
  • paper_authors: Tiago F. R. Ribeiro, Fernando Silva, Rogério Luís de C. Costa
  • for: 这篇论文的目的是研究如何使用Conditional Variational Autoencoder(C-VAE)模型来生成2D移动区域的平滑和实际的演化表示。
  • methods: 这篇论文使用了C-VAE模型和其他常用的插值算法来生成间隔数据表示。
  • results: 研究结果表明,C-VAE模型可以与其他方法相比,在几何相似度指标上达到竞争水平,并且在时间一致指标上表现出色, suggesting that C-VAE models may be a viable alternative for modelling the spatiotemporal evolution of 2D moving regions.
    Abstract The continuous representation of spatiotemporal data commonly relies on using abstract data types, such as \textit{moving regions}, to represent entities whose shape and position continuously change over time. Creating this representation from discrete snapshots of real-world entities requires using interpolation methods to compute in-between data representations and estimate the position and shape of the object of interest at arbitrary temporal points. Existing region interpolation methods often fail to generate smooth and realistic representations of a region's evolution. However, recent advancements in deep learning techniques have revealed the potential of deep models trained on discrete observations to capture spatiotemporal dependencies through implicit feature learning. In this work, we explore the capabilities of Conditional Variational Autoencoder (C-VAE) models to generate smooth and realistic representations of the spatiotemporal evolution of moving regions. We evaluate our proposed approach on a sparsely annotated dataset on the burnt area of a forest fire. We apply compression operations to sample from the dataset and use the C-VAE model and other commonly used interpolation algorithms to generate in-between region representations. To evaluate the performance of the methods, we compare their interpolation results with manually annotated data and regions generated by a U-Net model. We also assess the quality of generated data considering temporal consistency metrics. The proposed C-VAE-based approach demonstrates competitive results in geometric similarity metrics. It also exhibits superior temporal consistency, suggesting that C-VAE models may be a viable alternative to modelling the spatiotemporal evolution of 2D moving regions.
    摘要 continuous representation of spatiotemporal data 通常使用抽象数据类型,如移动区域,来表示时间和空间上的变化。从真实世界中的精碎快照中创建这种表示需要使用 interpolate 方法来计算中间数据表示和估算对象关注点的位置和形状。现有的区域 interpolate 方法经常无法生成平滑和真实的区域演化表示。然而,最近的深度学习技术的发展已经探明了深度学习模型在离散观察数据上学习隐式特征的潜力。在这项工作中,我们探索使用 Conditional Variational Autoencoder(C-VAE)模型来生成平滑和真实的区域演化表示。我们使用缺乏注释的 dataset 对受灾区域进行评估,并应用压缩操作来采样 dataset。我们使用 C-VAE 模型和其他常用的 interpolate 算法来生成中间区域表示。为了评估方法的性能,我们比较 interpolate 结果与手动注释数据和 U-Net 模型生成的区域。我们还评估生成数据的 temporal 一致性指标。根据 geometric similarity 指标,我们的提议的 C-VAE 基于方法达到了竞争力的结果。此外,它还表现出了superior temporal consistency, suggesting that C-VAE models may be a viable alternative to modelling the spatiotemporal evolution of 2D moving regions.

DSSE: a drone swarm search environment

  • paper_url: http://arxiv.org/abs/2307.06240
  • repo_url: https://github.com/pfe-embraer/drone-swarm-search
  • paper_authors: Manuel Castanares, Luis F. S. Carrete, Enrico F. Damiani, Leonardo D. M. de Abreu, José Fernando B. Brancalion, Fabrício J. Barth
  • for: 这个论文是为了研究基于可变概率输入的多智能体(或单智能体)强化学习算法。
  • methods: 这个项目使用了基于PettingZoo的环境,其中多个智能体(或单个智能体)需要在无知目标位置的情况下找到失事人员。这些智能体不会根据自己与目标的距离获得奖励,但会收到地图中每个单元的目标概率。
  • results: 这个项目的目的是用于研究基于动态概率输入的强化学习算法。I hope that helps! Let me know if you have any other questions.
    Abstract The Drone Swarm Search project is an environment, based on PettingZoo, that is to be used in conjunction with multi-agent (or single-agent) reinforcement learning algorithms. It is an environment in which the agents (drones), have to find the targets (shipwrecked people). The agents do not know the position of the target and do not receive rewards related to their own distance to the target(s). However, the agents receive the probabilities of the target(s) being in a certain cell of the map. The aim of this project is to aid in the study of reinforcement learning algorithms that require dynamic probabilities as inputs.
    摘要 “这个Drone Swarm Search项目是一个基于PettingZoo的环境,用于与多项(或单项)循环学习算法相结合。在这个环境中,代理人(无人机)需要找到目标(船难生还者)。代理人不知道目标的位置,也不会因自己与目标之间的距离而获得奖励。然而,代理人会获得目标可能存在某个地图范围中的概率。这个项目的目的是帮助研究需要动态概率作为输入的循环学习算法。”Note that the word "项目" (project) is in Simplified Chinese, while the word "PettingZoo" is in Traditional Chinese.

Unified Molecular Modeling via Modality Blending

  • paper_url: http://arxiv.org/abs/2307.06235
  • repo_url: None
  • paper_authors: Qiying Yu, Yudi Zhang, Yuyan Ni, Shikun Feng, Yanyan Lan, Hao Zhou, Jingjing Liu
  • for: 本研究旨在提高基于分子的任务,如人工智能药物发现,的自动学习表示。
  • methods: 我们提出了一种新的“混合然后预测”自然学习方法(分子融合),将不同模态的原子关系融合为一个统一的关系矩阵,以便编码。然后,通过回归模态特有的信息,对2D和3D结构进行细致的关系级别的协调。
  • results: 我们的实验表明,MoleBLEND在主要的2D/3D标准测试集上达到了状态 искусственный智能表示的最佳性能。此外,我们还提供了基于协同信息最大化的理论启示,显示了我们的方法可以将对比学习、生成学习(跨模态预测)和封预测(内模态预测)的目标集成为一个完整的融合混合然后预测框架。
    Abstract Self-supervised molecular representation learning is critical for molecule-based tasks such as AI-assisted drug discovery. Recent studies consider leveraging both 2D and 3D information for representation learning, with straightforward alignment strategies that treat each modality separately. In this work, we introduce a novel "blend-then-predict" self-supervised learning method (MoleBLEND), which blends atom relations from different modalities into one unified relation matrix for encoding, then recovers modality-specific information for both 2D and 3D structures. By treating atom relationships as anchors, seemingly dissimilar 2D and 3D manifolds are aligned and integrated at fine-grained relation-level organically. Extensive experiments show that MoleBLEND achieves state-of-the-art performance across major 2D/3D benchmarks. We further provide theoretical insights from the perspective of mutual-information maximization, demonstrating that our method unifies contrastive, generative (inter-modal prediction) and mask-then-predict (intra-modal prediction) objectives into a single cohesive blend-then-predict framework.
    摘要 自适应分子表示学习是药物发现方面的关键技术。近年研究人员倾向于同时利用2D和3D信息进行表示学习,通常采用简单的对齐策略,将每种模式处理为独立的数据。在这项工作中,我们提出了一种新的“混合然后预测”自适应学习方法(分子融合),将不同模式的原子关系混合到一个统一的关系矩阵中进行编码,然后在2D和3D结构中恢复模式特有的信息。通过对原子关系作为锚点, apparently 不同的2D和3D manifold 被平滑地对应和融合,从 fine-grained 关系水平出发。实验表明, MoleBLEND 在主要的2D/3D标准 benchmar 上实现了状态机器的表现。我们还提供了从多信息最大化角度的理论启示,表明我们的方法将对异常对的、生成(交互模式预测)和Mask-Then-Predict(内模式预测)目标集成到一个完整的混合然后预测框架中。

Local Conditional Neural Fields for Versatile and Generalizable Large-Scale Reconstructions in Computational Imaging

  • paper_url: http://arxiv.org/abs/2307.06207
  • repo_url: https://github.com/bu-cisl/LCNF
  • paper_authors: Hao Wang, Jiabei Zhu, Yunzhe Li, QianWan Yang, Lei Tian
  • for: 解决计算成像领域中的大规模逆问题,使用深度学习技术。
  • methods: 使用Local Conditional Neural Fields(LCNF)框架,利用连续卷积神经表示,解决传统像素基的限制,捕捉 объек 的连续、多尺度特征。
  • results: 在快速扫描微型镜中实现了高分辨率相位恢复,使用只有一些多重混合测量数据,并且能够捕捉宽视场、高分辨率的相位图像。LCNF可以学习自然图像数据集上的物理 simulate 中的对象约束,并在实验中成功应用于生物样本测量。
    Abstract Deep learning has transformed computational imaging, but traditional pixel-based representations limit their ability to capture continuous, multiscale details of objects. Here we introduce a novel Local Conditional Neural Fields (LCNF) framework, leveraging a continuous implicit neural representation to address this limitation. LCNF enables flexible object representation and facilitates the reconstruction of multiscale information. We demonstrate the capabilities of LCNF in solving the highly ill-posed inverse problem in Fourier ptychographic microscopy (FPM) with multiplexed measurements, achieving robust, scalable, and generalizable large-scale phase retrieval. Unlike traditional neural fields frameworks, LCNF incorporates a local conditional representation that promotes model generalization, learning multiscale information, and efficient processing of large-scale imaging data. By combining an encoder and a decoder conditioned on a learned latent vector, LCNF achieves versatile continuous-domain super-resolution image reconstruction. We demonstrate accurate reconstruction of wide field-of-view, high-resolution phase images using only a few multiplexed measurements. LCNF robustly captures the continuous object priors and eliminates various phase artifacts, even when it is trained on imperfect datasets. The framework exhibits strong generalization, reconstructing diverse objects even with limited training data. Furthermore, LCNF can be trained on a physics simulator using natural images and successfully applied to experimental measurements on biological samples. Our results highlight the potential of LCNF for solving large-scale inverse problems in computational imaging, with broad applicability in various deep-learning-based techniques.
    摘要 深度学习已经改变计算影像的方式,但传统的像素基于表示限制了它们捕捉对象的连续、多尺度细节的能力。在这里,我们介绍了一种新的本地conditional神经场(LCNF)框架,利用连续假设神经表示来解决这一限制。LCNF允许 flexible对象表示,并且使得多尺度信息的重建。我们在快 Fourierptychographic microscopy(FPM)中解决了高度不稳定的逆问题, achieved robust、可扩展、通用的大规模阶段逆解决方案。与传统神经场框架不同,LCNF包含本地conditional表示,从而促进模型通用、学习多尺度信息和高效处理大规模影像数据。通过将编码器和解码器conditioned on learned latent vector,LCNF实现了 versatile continuous-domain超Resolution image reconstruction。我们示出了使用只有几个多重化测量的宽视场高分辨率相位图像的准确重建。LCNF坚定地捕捉连续对象假设,并消除了各种相位杂质,即使在训练数据不完整的情况下。框架具有强大的通用性,可以在不同的对象和测量数据上重建多样化的图像。此外,LCNF可以在物理模拟器上使用自然图像进行训练,并成功应用于实验室测量。我们的结果表明LCNF可以解决计算影像中的大规模逆问题,并具有广泛的应用前景。

Learning Decentralized Partially Observable Mean Field Control for Artificial Collective Behavior

  • paper_url: http://arxiv.org/abs/2307.06175
  • repo_url: None
  • paper_authors: Kai Cui, Sascha Hauck, Christian Fabian, Heinz Koeppl
    for:This paper focuses on multi-agent reinforcement learning (MARL) with decentralized partially observable markov decision processes (Dec-POMFC) to address scalability and partial observability challenges in collective behavior tasks.methods:The proposed method uses mean field control (MFC) with novel models for decentralized partially observable MFC, which enables decentralized behavior of agents under partial information. The method also includes policy gradient methods for MARL via centralized training and decentralized execution, with policy gradient approximation guarantees.results:The proposed method is evaluated numerically on representative collective behavior tasks such as adapted Kuramoto and Vicsek swarming models, and is on par with state-of-the-art MARL. The method improves upon state-of-the-art histogram-based MFC by kernel methods, which is of separate interest also for fully observable MFC.
    Abstract Recent reinforcement learning (RL) methods have achieved success in various domains. However, multi-agent RL (MARL) remains a challenge in terms of decentralization, partial observability and scalability to many agents. Meanwhile, collective behavior requires resolution of the aforementioned challenges, and remains of importance to many state-of-the-art applications such as active matter physics, self-organizing systems, opinion dynamics, and biological or robotic swarms. Here, MARL via mean field control (MFC) offers a potential solution to scalability, but fails to consider decentralized and partially observable systems. In this paper, we enable decentralized behavior of agents under partial information by proposing novel models for decentralized partially observable MFC (Dec-POMFC), a broad class of problems with permutation-invariant agents allowing for reduction to tractable single-agent Markov decision processes (MDP) with single-agent RL solution. We provide rigorous theoretical results, including a dynamic programming principle, together with optimality guarantees for Dec-POMFC solutions applied to finite swarms of interest. Algorithmically, we propose Dec-POMFC-based policy gradient methods for MARL via centralized training and decentralized execution, together with policy gradient approximation guarantees. In addition, we improve upon state-of-the-art histogram-based MFC by kernel methods, which is of separate interest also for fully observable MFC. We evaluate numerically on representative collective behavior tasks such as adapted Kuramoto and Vicsek swarming models, being on par with state-of-the-art MARL. Overall, our framework takes a step towards RL-based engineering of artificial collective behavior via MFC.
    摘要 近期的强化学习(RL)方法在不同领域中已经取得了成功。然而,多智能RL(MARL)仍然面临了分布式、偏见性和可扩展性等挑战。同时,集体行为需要解决这些挑战,并对许多现代应用程序如活跃物理、自组织系统、意见动力学和生物或机器群体等产生了重要性。在这篇论文中,我们通过提出新的均场控制(MFC)模型来解决分布式和偏见性的问题,并实现了可扩展的集体行为。我们提供了准确的理论结果,包括动态程序理论,以及对均场控制解决方案的可行性保证。从算法角度来看,我们提出了基于均场控制的策略梯度法,并提供了策略梯度预测 guarantees。此外,我们提高了现有的频谱矩阵控制方法,这也是一个独立的研究兴趣。我们在代表性的收集行为任务上进行了数值计算,与当前的MARL技术一样。总的来说,我们的框架向RL基于MFC的人工集体行为工程做出了一步进展。

Auxiliary-Tasks Learning for Physics-Informed Neural Network-Based Partial Differential Equations Solving

  • paper_url: http://arxiv.org/abs/2307.06167
  • repo_url: https://github.com/junjun-yan/atl-pinn
  • paper_authors: Junjun Yan, Xinhai Chen, Zhichao Wang, Enqiang Zhou, Jie Liu
  • for: 解决部分整数方程(PDEs)的数学预测问题
  • methods: 利用物理学信息学习(PINNs)和辅助任务学习(ATL)两种方法
  • results: 在不同领域和场景中对三个PDE问题进行了实验,发现辅助任务学习模式可以显著提高解决精度,最高提升率为96.62%(平均提升率为28.23%)比单任务PINN更高。Here’s the breakdown of each point:
  • for: 解释了这篇论文的目的是解决部分整数方程(PDEs)的数学预测问题。
  • methods: 讲述了这篇论文使用的两种方法:原始的物理学信息学习(PINNs)和辅助任务学习(ATL)。
  • results: 描述了这篇论文的实验结果,包括在不同领域和场景中对三个PDE问题进行了实验,并发现辅助任务学习模式可以显著提高解决精度。
    Abstract Physics-informed neural networks (PINNs) have emerged as promising surrogate modes for solving partial differential equations (PDEs). Their effectiveness lies in the ability to capture solution-related features through neural networks. However, original PINNs often suffer from bottlenecks, such as low accuracy and non-convergence, limiting their applicability in complex physical contexts. To alleviate these issues, we proposed auxiliary-task learning-based physics-informed neural networks (ATL-PINNs), which provide four different auxiliary-task learning modes and investigate their performance compared with original PINNs. We also employ the gradient cosine similarity algorithm to integrate auxiliary problem loss with the primary problem loss in ATL-PINNs, which aims to enhance the effectiveness of the auxiliary-task learning modes. To the best of our knowledge, this is the first study to introduce auxiliary-task learning modes in the context of physics-informed learning. We conduct experiments on three PDE problems across different fields and scenarios. Our findings demonstrate that the proposed auxiliary-task learning modes can significantly improve solution accuracy, achieving a maximum performance boost of 96.62% (averaging 28.23%) compared to the original single-task PINNs. The code and dataset are open source at https://github.com/junjun-yan/ATL-PINN.
    摘要 physics-informed neural networks (PINNs) 已经出现为解决部分微分方程(PDEs)的可靠供应方法。它们的有效性来自于通过神经网络捕捉解决方案相关的特征。然而,原始的PINNs经常受到瓶颈,如精度低下和不收敛,限制它们在复杂的物理上下文中的应用。为了解决这些问题,我们提出了auxiliary-task learning-based physics-informed neural networks(ATL-PINNs),它们提供了四种不同的auxiliary-task学习模式,并对其性能与原始PINNs进行了比较。此外,我们采用了梯度cosine相似性算法将auxiliary问题损失与主问题损失集成在ATL-PINNs中,以提高auxiliary-task学习模式的效果。在我们所知道的范围内,这是首次在物理学习中引入auxiliary-task学习模式的研究。我们在不同的领域和场景中进行了三个PDE问题的实验。我们的发现表明,我们提出的auxiliary-task学习模式可以显著提高解决精度,最高提高96.62%(平均提高28.23%)相比原始单任务PINNs。代码和数据集可以在https://github.com/junjun-yan/ATL-PINN上获取。

Deep Generative Models for Physiological Signals: A Systematic Literature Review

  • paper_url: http://arxiv.org/abs/2307.06162
  • repo_url: None
  • paper_authors: Nour Neifar, Afef Mdhaffar, Achraf Ben-Hamadou, Mohamed Jmaiel
  • for: 本文是一篇系统性的文献综述,探讨深度生成模型在生理信号方面的应用,具体是电心电征、电 Encyclopaedia、光谱 Plethysmogram 和电动肌征。
  • methods: 本文分析了深度生成模型的现状,包括其主要应用和挑战,同时也详细介绍了 employed evaluation protocol 和主要使用的生理数据库。
  • results: 本文对深度生成模型的应用进行了系统性的梳理和分析,并对这些模型的评价和比较提供了参考。
    Abstract In this paper, we present a systematic literature review on deep generative models for physiological signals, particularly electrocardiogram, electroencephalogram, photoplethysmogram and electromyogram. Compared to the existing review papers, we present the first review that summarizes the recent state-of-the-art deep generative models. By analysing the state-of-the-art research related to deep generative models along with their main applications and challenges, this review contributes to the overall understanding of these models applied to physiological signals. Additionally, by highlighting the employed evaluation protocol and the most used physiological databases, this review facilitates the assessment and benchmarking of deep generative models.
    摘要 在这篇论文中,我们提出了一项系统性的文献评视对深度生成模型的应用于生物信号,特别是电卡ardiogram、电脑电幕ogram、光谱 plethysmogram 和 electromyogram。与现有的评视纸相比,我们的评视是第一个总结最新的深度生成模型的。通过分析深度生成模型的主要应用和挑战,以及employped评价协议和最常用的生物信号数据库,这篇评视对深度生成模型的应用于生物信号进行了贡献。此外,这篇评视还促进了深度生成模型的评价和比较。

Detecting the Presence of COVID-19 Vaccination Hesitancy from South African Twitter Data Using Machine Learning

  • paper_url: http://arxiv.org/abs/2307.15072
  • repo_url: None
  • paper_authors: Nicholas Perikli, Srimoy Bhattacharya, Blessing Ogbuokiri, Zahra Movahedi Nia, Benjamin Lieberman, Nidhi Tripathi, Salah-Eddine Dahbi, Finn Stevenson, Nicola Bragazzi, Jude Kong, Bruce Mellado
    for: 这个研究的目的是使用 sentiment analysis 分析南非用户生成内容中的疫苗拒绝 sentiment,并训练 AI 模型来 categorize UGC。methods: 该研究使用了 LSTM、bi-LSTM、SVM、BERT-base-cased 和 RoBERTa-base 机器学习模型,并且hyperparameters 通过 WandB 平台进行了精心调整。两种不同的数据预处理方法(semantics-based和corpus-based)也被比较使用。results: 所有模型都有在 45$%$-55$%$ 的范围内的低 F1-scores,只有 BERT 和 RoBERTa 两者达到了显著更好的度量,其中 BERT 的总 F1-scores 为 60$%$, RoBERTa 的总 F1-scores 为 61$%$. 使用 LDA 进行主题分析的miss-classified tweets 可以提供模型准确性的提高之路。
    Abstract Very few social media studies have been done on South African user-generated content during the COVID-19 pandemic and even fewer using hand-labelling over automated methods. Vaccination is a major tool in the fight against the pandemic, but vaccine hesitancy jeopardizes any public health effort. In this study, sentiment analysis on South African tweets related to vaccine hesitancy was performed, with the aim of training AI-mediated classification models and assessing their reliability in categorizing UGC. A dataset of 30000 tweets from South Africa were extracted and hand-labelled into one of three sentiment classes: positive, negative, neutral. The machine learning models used were LSTM, bi-LSTM, SVM, BERT-base-cased and the RoBERTa-base models, whereby their hyperparameters were carefully chosen and tuned using the WandB platform. We used two different approaches when we pre-processed our data for comparison: one was semantics-based, while the other was corpus-based. The pre-processing of the tweets in our dataset was performed using both methods, respectively. All models were found to have low F1-scores within a range of 45$\%$-55$\%$, except for BERT and RoBERTa which both achieved significantly better measures with overall F1-scores of 60$\%$ and 61$\%$, respectively. Topic modelling using an LDA was performed on the miss-classified tweets of the RoBERTa model to gain insight on how to further improve model accuracy.
    摘要 很少有关于南非用户生成内容 durign COVID-19 大流行的社交媒体研究,而且使用手动标注而不是自动方法更加罕见。疫苗是战胜 COVID-19 的重要工具,但是疫苗不信任会对公共卫生产生威胁。本研究通过对南非推特上关于疫苗不信任的 sentiment 分析,以训练 AI 媒介分类模型并评估其可靠性。我们收集了30000个推特消息,并 manually 标注为一个sentiment类型:正面、负面或中性。我们使用的机器学习模型包括 LSTM、bi-LSTM、SVM、BERT-base-cased 和 RoBERTa-base 模型,其中每个模型的 гиперparameters 都经过精心选择和调整使用 WandB 平台。我们使用了两种不同的方法来处理我们的数据,以便进行比较:一种是 semantics-based,另一种是 corpus-based。对于我们的数据集,我们使用了这两种方法进行预处理。所有模型的 F1 分数都在45%-55%之间,只有 BERT 和 RoBERTa 两个模型显示出了明显更好的表现,它们的总 F1 分数分别为 60% 和 61%。为了提高模型准确性,我们使用 LDA 进行主题分析对 RoBERTa 模型中的误分类消息。

Sequential Experimental Design for X-Ray CT Using Deep Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2307.06343
  • repo_url: https://github.com/tianyuan1wang/seqanglerl
  • paper_authors: Tianyuan Wang, Felix Lucka, Tristan van Leeuwen
  • For: 这个研究旨在使X射 Computed Tomography (CT) 技术适合进行生产线上的质量控制,减少投射角度的数量,同时维持三维重建的品质。* Methods: 这个研究使用了简减角度 Tomatoesography (OED) 问题,并使用了深度学习来解决这个问题。* Results: 研究发现,这种方法可以成功地在线上获取最有用的投射角度,并且可以提高 CT 技术的质量控制能力。
    Abstract In X-ray Computed Tomography (CT), projections from many angles are acquired and used for 3D reconstruction. To make CT suitable for in-line quality control, reducing the number of angles while maintaining reconstruction quality is necessary. Sparse-angle tomography is a popular approach for obtaining 3D reconstructions from limited data. To optimize its performance, one can adapt scan angles sequentially to select the most informative angles for each scanned object. Mathematically, this corresponds to solving and optimal experimental design (OED) problem. OED problems are high-dimensional, non-convex, bi-level optimization problems that cannot be solved online, i.e., during the scan. To address these challenges, we pose the OED problem as a partially observable Markov decision process in a Bayesian framework, and solve it through deep reinforcement learning. The approach learns efficient non-greedy policies to solve a given class of OED problems through extensive offline training rather than solving a given OED problem directly via numerical optimization. As such, the trained policy can successfully find the most informative scan angles online. We use a policy training method based on the Actor-Critic approach and evaluate its performance on 2D tomography with synthetic data.
    摘要 在X射 Computed Tomography(CT)中,从多个角度获取投影,并用于3D重建。以使CT适用于直接质控,减少投影角度而保持重建质量是必要的。稀疏角度 computed tomography 是一种常用的方法,以获取3D重建从有限数据。为了优化其性能,可以逐渐更新扫描角度,以选择每个扫描对象中最有用的角度。这种问题可以表示为一个优化实验设计(OED)问题。OED问题是高维度、非凸、双级优化问题,无法在扫描过程中解决。为了解决这些挑战,我们将OED问题 posed 为一个部分可见 Markov 决策过程在 bayesian 框架中,并通过深度强化学习解决。这种方法可以学习高效的非准确策略,并在大量的 offline 训练中学习,而不是直接解决一个给定的 OED 问题。因此,训练好的策略可以成功地在线找到最有用的扫描角度。我们使用一种基于actor-critic方法的策略训练方法,并对2D tomography 的 sintetic 数据进行评估。

Maneuver Decision-Making Through Automatic Curriculum Reinforcement Learning Without Handcrafted Reward functions

  • paper_url: http://arxiv.org/abs/2307.06152
  • repo_url: None
  • paper_authors: Zhang Hong-Peng
  • for: 本文旨在解决无人战斗飞机 autonomous 空中作战中的决策问题,提出一种自动课程强化学习方法,使代理人可以从零开始学习有效的决策。
  • methods: 本文使用自动课程强化学习方法,将决策分解为一系列不同Difficulty Level的子任务,并通过测试结果来调整子任务。代理人逐渐学习完成不同Difficulty Level的子任务,从而学习有效地做出决策。
  • results: 实验表明,无人战斗飞机使用自动课程强化学习方法可以在不同状态下做出有效的决策,包括跟踪、攻击和逃脱等,这些决策都是合理且可解释的。
    Abstract Maneuver decision-making is the core of unmanned combat aerial vehicle for autonomous air combat. To solve this problem, we propose an automatic curriculum reinforcement learning method, which enables agents to learn effective decisions in air combat from scratch. The range of initial states are used for distinguishing curricula of different difficulty levels, thereby maneuver decision is divided into a series of sub-tasks from easy to difficult, and test results are used to change sub-tasks. As sub-tasks change, agents gradually learn to complete a series of sub-tasks from easy to difficult, enabling them to make effective maneuvering decisions to cope with various states without the need to spend effort designing reward functions. The ablation studied show that the automatic curriculum learning proposed in this article is an essential component for training through reinforcement learning, namely, agents cannot complete effective decisions without curriculum learning. Simulation experiments show that, after training, agents are able to make effective decisions given different states, including tracking, attacking and escaping, which are both rational and interpretable.
    摘要 <>掌握决策是无人战斗飞行器的核心,以实现自主空中作战。为解决这个问题,我们提出了一种自动课程强化学习方法,允许代理人从零开始学习有效的决策。Initial states的范围用于 distinguish curricula of different difficulty levels, thereby dividing maneuver decision into a series of sub-tasks from easy to difficult, and test results are used to change sub-tasks. As sub-tasks change, agents gradually learn to complete a series of sub-tasks from easy to difficult, enabling them to make effective maneuvering decisions to cope with various states without the need to spend effort designing reward functions. The ablation studied show that the automatic curriculum learning proposed in this article is an essential component for training through reinforcement learning, namely, agents cannot complete effective decisions without curriculum learning. Simulation experiments show that, after training, agents are able to make effective decisions given different states, including tracking, attacking and escaping, which are both rational and interpretable.Translated by Google Translate.

NetGPT: A Native-AI Network Architecture Beyond Provisioning Personalized Generative Services

  • paper_url: http://arxiv.org/abs/2307.06148
  • repo_url: None
  • paper_authors: Yuxuan Chen, Rongpeng Li, Zhifeng Zhao, Chenghui Peng, Jianjun Wu, Ekram Hossain, Honggang Zhang
  • for: 提供个性化生成服务,使大语言模型(LLM)更加适应人类意图。
  • methods: collaborative cloud-edge方法,可以有效地协调多种不同的分布式通信和计算资源。
  • results: 提出了NetGPT,可以在云和边缘部署适当的LLM,并使用地址基本信息进行个性化提示完成。
    Abstract Large language models (LLMs) have triggered tremendous success to empower daily life by generative information, and the personalization of LLMs could further contribute to their applications due to better alignment with human intents. Towards personalized generative services, a collaborative cloud-edge methodology sounds promising, as it facilitates the effective orchestration of heterogeneous distributed communication and computing resources. In this article, after discussing the pros and cons of several candidate cloud-edge collaboration techniques, we put forward NetGPT to capably deploy appropriate LLMs at the edge and the cloud in accordance with their computing capacity. In addition, edge LLMs could efficiently leverage location-based information for personalized prompt completion, thus benefiting the interaction with cloud LLMs. After deploying representative open-source LLMs (e.g., GPT-2-base and LLaMA model) at the edge and the cloud, we present the feasibility of NetGPT on the basis of low-rank adaptation-based light-weight fine-tuning. Subsequently, we highlight substantial essential changes required for a native artificial intelligence (AI) network architecture towards NetGPT, with special emphasis on deeper integration of communications and computing resources and careful calibration of logical AI workflow. Furthermore, we demonstrate several by-product benefits of NetGPT, given edge LLM's astonishing capability to predict trends and infer intents, which possibly leads to a unified solution for intelligent network management \& orchestration. In a nutshell, we argue that NetGPT is a promising native-AI network architecture beyond provisioning personalized generative services.
    摘要 大型语言模型(LLM)已经带来了巨大的成功,并且可以强化日常生活中的信息生成,而个人化的LLM可能会进一步应用 Due to better alignment with human intents。面对个人生成服务的个人化,一种 cloud-edge 方法论是可行的,这种方法论可以协调跨多种不同的分布式通信和计算资源。在这篇文章中,我们首先讨论了几种候选的 cloud-edge 合作技术,然后我们提出了 NetGPT,可以将适当的 LLM 部署到云和edge 之间,并且考虑到它们的计算能力。此外,edge LLM 可以充分利用位置基本信息来进行个人化的提示完成,以便与云上的 LLM 进行更好的互动。我们在部署了一些代表性的开源 LLM (例如 GPT-2-base 和 LLaMA 模型)在云和edge 之间时,显示了 NetGPT 的可行性,基于低维度适应的轻量化微调。接下来,我们强调了 NetGPT 的Native AI 网络架构中的重要更改,包括对应用程序的更深入的融合,以及当地的适应和精确的逻辑 AI 工作流程。此外,我们还详细介绍了 NetGPT 的一些副产品优点,例如edge LLM 的惊人的趋势预测和推论能力,这可能导致一个统一的智能网络管理和协调解决方案。简而言之,我们认为 NetGPT 是一个可行的 Native AI 网络架构,不仅提供个人化的生成服务,更重要的是它的副产品优点。

Enhancing ECG Analysis of Implantable Cardiac Monitor Data: An Efficient Pipeline for Multi-Label Classification

  • paper_url: http://arxiv.org/abs/2307.07423
  • repo_url: None
  • paper_authors: Amnon Bleich, Antje Linnemann, Benjamin Jaidi, Björn H Diem, Tim OF Conrad
  • for: 这个研究是为了解决嵌入式心脏监测器(ICM)数据自动分析中的挑战。
  • methods: 这个研究使用了一种新的分类方法,该方法可以在ICM数据上自动分类并提高分类精度。
  • results: 研究发现,新的分类方法可以在ICM数据上提高分类精度,并且比现有的方法更好地处理ICM数据的特殊特征。
    Abstract Implantable Cardiac Monitor (ICM) devices are demonstrating as of today, the fastest-growing market for implantable cardiac devices. As such, they are becoming increasingly common in patients for measuring heart electrical activity. ICMs constantly monitor and record a patient's heart rhythm and when triggered - send it to a secure server where health care professionals (denote HCPs from here on) can review it. These devices employ a relatively simplistic rule-based algorithm (due to energy consumption constraints) to alert for abnormal heart rhythms. This algorithm is usually parameterized to an over-sensitive mode in order to not miss a case (resulting in relatively high false-positive rate) and this, combined with the device's nature of constantly monitoring the heart rhythm and its growing popularity, results in HCPs having to analyze and diagnose an increasingly growing amount of data. In order to reduce the load on the latter, automated methods for ECG analysis are nowadays becoming a great tool to assist HCPs in their analysis. While state-of-the-art algorithms are data-driven rather than rule-based, training data for ICMs often consist of specific characteristics which make its analysis unique and particularly challenging. This study presents the challenges and solutions in automatically analyzing ICM data and introduces a method for its classification that outperforms existing methods on such data. As such, it could be used in numerous ways such as aiding HCPs in the analysis of ECGs originating from ICMs by e.g. suggesting a rhythm type.
    摘要 内置式心脏监测器(ICM)设备已经今天为内置式心脏设备市场带来了最快增长。因此,它们在患者中变得越来越普遍,用于测量心电活动。ICM设备不断监测和记录患者的心跳rhythm,并在触发时将其传输到一个安全的服务器上, где医疗专业人员(以下简称为HCP)可以查看和诊断。这些设备使用一种相对简单的规则基于算法(由于能量消耗限制)来警示异常的心跳rhythm。这个算法通常会被参数化为过敏模式,以确保不会错过任何 случа子(导致相对较高的假阳性率)。由于ICM设备的特殊性和不断增长的普遍性,HCPs需要分析和诊断越来越多的数据。为了减轻后者的负担,自动化ECG分析方法在当今变得越来越重要。尽管现有的状态 arts algorithms是数据驱动的,而不是规则基于的,但ICM设备的训练数据常常具有特定的特征,使其分析变得特殊和困难。本研究描述了ICM数据自动分析的挑战和解决方案,并介绍了一种可以超越现有方法的分类方法。因此,它可以在许多方面 aid HCPs在ICM设备上的ECG分析,例如,建议心跳类型。

Learning Hierarchical Interactive Multi-Object Search for Mobile Manipulation

  • paper_url: http://arxiv.org/abs/2307.06125
  • repo_url: https://github.com/robot-learning-freiburg/HIMOS
  • paper_authors: Fabian Schmalstieg, Daniel Honerkamp, Tim Welschehold, Abhinav Valada
  • for: 这篇论文的目的是解决机器人在无结构化人类中心环境中实现多对象搜索任务。
  • methods: 这篇论文使用了归纳学习方法,把探索、导航和操作技能组合在一起,以解决在未经探索的环境中实现多对象搜索任务。
  • results: 实验和实际应用中的结果表明,HIMOS可以在零个shot情况下在新环境中转移,并能够承受未看过的子策略、执行失败和不同的机器人姿态。这些能力开启了许多下渠任务和实际应用场景。
    Abstract Existing object-search approaches enable robots to search through free pathways, however, robots operating in unstructured human-centered environments frequently also have to manipulate the environment to their needs. In this work, we introduce a novel interactive multi-object search task in which a robot has to open doors to navigate rooms and search inside cabinets and drawers to find target objects. These new challenges require combining manipulation and navigation skills in unexplored environments. We present HIMOS, a hierarchical reinforcement learning approach that learns to compose exploration, navigation, and manipulation skills. To achieve this, we design an abstract high-level action space around a semantic map memory and leverage the explored environment as instance navigation points. We perform extensive experiments in simulation and the real-world that demonstrate that HIMOS effectively transfers to new environments in a zero-shot manner. It shows robustness to unseen subpolicies, failures in their execution, and different robot kinematics. These capabilities open the door to a wide range of downstream tasks across embodied AI and real-world use cases.
    摘要 现有的对象搜索方法可以让机器人在开放的路径上搜索,但是在人类中心环境中,机器人经常需要 manipulate 环境以满足自己的需求。在这种工作中,我们引入了一种新的多对象搜索任务,其中机器人需要打开门来探索房间,并在柜子和抽屉中搜索目标对象。这些新挑战需要机器人结合探索、导航和操作技能。我们提出了一种层次学习策略,称为HIMOS,它可以学习搜索、导航和操作技能的组合。为了实现这一点,我们设计了一个抽象的高级动作空间,基于 semantic map 的快照和已经探索的环境,并利用这些环境作为实例导航点。我们在实验中进行了广泛的 simulate 和实际应用,并证明了 HIMOS 可以在零shot 情况下转移到新环境,并且具有对不见的互聪策略、操作失败和不同机器人骨干的Robustness。这些能力开启了许多下游任务和实际应用场景。

SoK: Comparing Different Membership Inference Attacks with a Comprehensive Benchmark

  • paper_url: http://arxiv.org/abs/2307.06123
  • repo_url: https://github.com/mibench/mibench.github.io
  • paper_authors: Jun Niu, Xiaoyan Zhu, Moxuan Zeng, Ge Zhang, Qingyang Zhao, Chunhui Huang, Yangming Zhang, Suyu An, Yangzhong Wang, Xinghui Yue, Zhipeng He, Weihao Guo, Kuo Shen, Peng Liu, Yulong Shen, Xiaohong Jiang, Jianfeng Ma, Yuqing Zhang
  • for: 本研究旨在提供一个完整的比较不同密盟攻击方法的 benchmark,以帮助研究人员更好地了解不同密盟攻击方法的表现。
  • methods: 本研究使用了15种现状最佳的密盟攻击算法,并在7种广泛使用的数据集和7种常见的模型上进行了784个评估场景的比较。
  • results: 根据我们的评估结果,存在一些已有的比较结果在 литераature中报道的是有误的,我们提出了三个比较方法的原则,并在84个评估场景中测试了这些原则。
    Abstract Membership inference (MI) attacks threaten user privacy through determining if a given data example has been used to train a target model. However, it has been increasingly recognized that the "comparing different MI attacks" methodology used in the existing works has serious limitations. Due to these limitations, we found (through the experiments in this work) that some comparison results reported in the literature are quite misleading. In this paper, we seek to develop a comprehensive benchmark for comparing different MI attacks, called MIBench, which consists not only the evaluation metrics, but also the evaluation scenarios. And we design the evaluation scenarios from four perspectives: the distance distribution of data samples in the target dataset, the distance between data samples of the target dataset, the differential distance between two datasets (i.e., the target dataset and a generated dataset with only nonmembers), and the ratio of the samples that are made no inferences by an MI attack. The evaluation metrics consist of ten typical evaluation metrics. We have identified three principles for the proposed "comparing different MI attacks" methodology, and we have designed and implemented the MIBench benchmark with 84 evaluation scenarios for each dataset. In total, we have used our benchmark to fairly and systematically compare 15 state-of-the-art MI attack algorithms across 588 evaluation scenarios, and these evaluation scenarios cover 7 widely used datasets and 7 representative types of models. All codes and evaluations of MIBench are publicly available at https://github.com/MIBench/MIBench.github.io/blob/main/README.md.
    摘要 Member inference (MI) attacks threaten user privacy by determining whether a given data example has been used to train a target model. However, existing works have serious limitations in their "comparing different MI attacks" methodology. Through our experiments, we found that some comparison results in the literature are misleading. In this paper, we aim to develop a comprehensive benchmark for comparing different MI attacks, called MIBench, which includes both evaluation metrics and evaluation scenarios. We design the evaluation scenarios from four perspectives: the distribution of data samples in the target dataset, the distance between data samples, the difference in distance between two datasets, and the ratio of samples that are not inferred by an MI attack. The evaluation metrics include ten typical metrics. We have established three principles for the proposed "comparing different MI attacks" methodology and have designed and implemented the MIBench benchmark with 84 evaluation scenarios for each dataset. In total, we have used our benchmark to compare 15 state-of-the-art MI attack algorithms across 588 evaluation scenarios, covering 7 widely used datasets and 7 representative types of models. All codes and evaluations of MIBench are publicly available at .

Deep learning for dynamic graphs: models and benchmarks

  • paper_url: http://arxiv.org/abs/2307.06104
  • repo_url: https://github.com/gravins/dynamic_graph_benchmark
  • paper_authors: Alessio Gravina, Davide Bacciu
  • for: 这种研究旨在为实际世界中的变化连接体系进行预测任务准备深度图网络(DGNs)。
  • methods: 这种研究使用了最新的优势来学习时间和空间信息,提供了现有的领域状态对话的全面回顾。
  • results: 该研究对最受欢迎的提议方法进行了公平的性能比较,通过严格的模型选择和评估方法,建立了可靠的基准点 для评估新的架构和方法。
    Abstract Recent progress in research on Deep Graph Networks (DGNs) has led to a maturation of the domain of learning on graphs. Despite the growth of this research field, there are still important challenges that are yet unsolved. Specifically, there is an urge of making DGNs suitable for predictive tasks on realworld systems of interconnected entities, which evolve over time. With the aim of fostering research in the domain of dynamic graphs, at first, we survey recent advantages in learning both temporal and spatial information, providing a comprehensive overview of the current state-of-the-art in the domain of representation learning for dynamic graphs. Secondly, we conduct a fair performance comparison among the most popular proposed approaches, leveraging rigorous model selection and assessment for all the methods, thus establishing a sound baseline for evaluating new architectures and approaches
    摘要 近期研究深度图网络(DGN)的进展,使得图学学习领域得到了成熔。尽管这一研究领域的发展,但还有许多重要的挑战未解决。具体来说,是让DGN适用于真实世界中的连接实体系统,这些系统随着时间的演变而变化。为促进动态图学学习的研究,我们首先审查了最近各种利用时间和空间信息学习的优势,提供了动态图学学习领域的全面概述。其次,我们对最受欢迎的提议方法进行了公正的性能比较,通过严格的模型选择和评估方法,以建立一个坚实的基准 для评估新的建筑和方法。

CLAIMED – the open source framework for building coarse-grained operators for accelerated discovery in science

  • paper_url: http://arxiv.org/abs/2307.06824
  • repo_url: https://github.com/claimed-framework/component-library
  • paper_authors: Romeo Kienzler, Rafflesia Khan, Jerome Nilmeier, Ivan Nesic, Ibrahim Haddad
  • for: Addressing the repeatability and reusability issues in modern data-driven science.
  • methods: Introducing CLAIMED, a framework for building reusable operators and scalable scientific workflows by supporting the scientist to draw from previous work by re-composing workflows from existing libraries of coarse-grained scientific operators.
  • results: CLAIMED is programming language, scientific library, and execution environment agnostic, and has a proven track record in scientific research.
    Abstract In modern data-driven science, reproducibility and reusability are key challenges. Scientists are well skilled in the process from data to publication. Although some publication channels require source code and data to be made accessible, rerunning and verifying experiments is usually hard due to a lack of standards. Therefore, reusing existing scientific data processing code from state-of-the-art research is hard as well. This is why we introduce CLAIMED, which has a proven track record in scientific research for addressing the repeatability and reusability issues in modern data-driven science. CLAIMED is a framework to build reusable operators and scalable scientific workflows by supporting the scientist to draw from previous work by re-composing workflows from existing libraries of coarse-grained scientific operators. Although various implementations exist, CLAIMED is programming language, scientific library, and execution environment agnostic.
    摘要 现代数据驱动科学中,重复性和可重用性是关键挑战。科学家们具备了从数据到发表的过程技能,但有些发表渠道需要提供源代码和数据,但重新运行和验证实验很难,因为缺乏标准。因此, reuse 已经成功地在科学研究中解决了现代数据驱动科学中的重复性和可重用性问题。 reuse 是一个框架,用于构建可重用的操作和可扩展的科学工作流程,帮助科学家从已有的库中粗粒度的科学操作中复用工作流程。尽管有各种实现, reuse 是编程语言、科学库和执行环境不受限制的。

Learning Stochastic Dynamical Systems as an Implicit Regularization with Graph Neural Networks

  • paper_url: http://arxiv.org/abs/2307.06097
  • repo_url: None
  • paper_authors: Jin Guo, Ting Gao, Yufu Lan, Peng Zhang, Sikun Yang, Jinqiao Duan
  • for: 学习高维时序数据,即观察维度具有空间相关性的数据。
  • methods: 使用泊松戈姆比例矩阵嵌入,学习扩散和涨落过程中的噪声效应。
  • results: 提供了一种理论保证,并通过库拉摩托模型生成数据,实验结果表明S-GGNs在比较难以学习的高维时序数据上具有更好的收敛、稳定性和泛化能力。
    Abstract Stochastic Gumbel graph networks are proposed to learn high-dimensional time series, where the observed dimensions are often spatially correlated. To that end, the observed randomness and spatial-correlations are captured by learning the drift and diffusion terms of the stochastic differential equation with a Gumble matrix embedding, respectively. In particular, this novel framework enables us to investigate the implicit regularization effect of the noise terms in S-GGNs. We provide a theoretical guarantee for the proposed S-GGNs by deriving the difference between the two corresponding loss functions in a small neighborhood of weight. Then, we employ Kuramoto's model to generate data for comparing the spectral density from the Hessian Matrix of the two loss functions. Experimental results on real-world data, demonstrate that S-GGNs exhibit superior convergence, robustness, and generalization, compared with state-of-the-arts.
    摘要

Online Laplace Model Selection Revisited

  • paper_url: http://arxiv.org/abs/2307.06093
  • repo_url: None
  • paper_authors: Jihao Andreas Lin, Javier Antorán, José Miguel Hernández-Lobato
  • for: 这个论文是为了提出一种关于神经网络(NN)的闭合形模型选择目标函数的方法,以及一种在线变体,即同时调整 NN 参数和 гипер参数。
  • methods: 这个论文使用了 Laplace 方法,并将其修改为一种可以在线进行的方法,以及一种基于模型修正的变体。
  • results: 该论文显示了在实际应用中,使用 full-batch gradient descent 算法和 online Laplace 方法可以减少过拟合和提高模型的性能。
    Abstract The Laplace approximation provides a closed-form model selection objective for neural networks (NN). Online variants, which optimise NN parameters jointly with hyperparameters, like weight decay strength, have seen renewed interest in the Bayesian deep learning community. However, these methods violate Laplace's method's critical assumption that the approximation is performed around a mode of the loss, calling into question their soundness. This work re-derives online Laplace methods, showing them to target a variational bound on a mode-corrected variant of the Laplace evidence which does not make stationarity assumptions. Online Laplace and its mode-corrected counterpart share stationary points where 1. the NN parameters are a maximum a posteriori, satisfying the Laplace method's assumption, and 2. the hyperparameters maximise the Laplace evidence, motivating online methods. We demonstrate that these optima are roughly attained in practise by online algorithms using full-batch gradient descent on UCI regression datasets. The optimised hyperparameters prevent overfitting and outperform validation-based early stopping.
    摘要 laplace approximation提供了一个关闭式的神经网络(NN)选择目标函数,而在线变体,即同时调整NN参数和 гипер参数,在 bayesian deep learning 社区中又受到了 renovated 的关注。然而,这些方法违背了laplace 方法的核心假设,即在损失函数的拟合中进行假设,这就引入了其准确性的问题。这项工作重新 derivation 了在线 laplace 方法,显示它们是targeting 一种修正后的 laplace 证据 bound,不假设站点性,并且与模式 corrected 的 laplace 证据相关。在线 laplace 和其修正版之间共享站点点,其中1. NN 参数是最大 posteriori,满足 laplace 方法的假设,2. гипер参数最大化 laplace 证据,这种 motivation 是在线方法的。我们示出了在实践中,使用 full-batch 梯度下降,在 UCI 回归数据集上,online 算法可以很好地实现这些最优点。修正的 гипер参数避免过拟合,并且超越验证基于 early stopping 的性能。

Quantitative CLTs in Deep Neural Networks

  • paper_url: http://arxiv.org/abs/2307.06092
  • repo_url: None
  • paper_authors: Stefano Favaro, Boris Hanin, Domenico Marinucci, Ivan Nourdin, Giovanni Peccati
  • for: 这个论文是为了研究一个全连接神经网络的分布而写的。
  • methods: 这个论文使用了随机 Gaussian 权重和偏置来研究神经网络的分布。
  • results: 论文提出了一些关于神经网络的正负样本分布的量化 bounds,这些 bounds 表明在大于零的 $n$ 下,神经网络的分布与无限宽 Gaussian 过程的分布之间的距离 scales like $n^{- \gamma}$,其中 $\gamma > 0$ 是一个小于一个常数的数。这些 bounds 比前一个文献中的 bounds 更加严格,在一维情况下,我们还证明了它们是优秀的,即我们提出了匹配的下界。
    Abstract We study the distribution of a fully connected neural network with random Gaussian weights and biases in which the hidden layer widths are proportional to a large constant $n$. Under mild assumptions on the non-linearity, we obtain quantitative bounds on normal approximations valid at large but finite $n$ and any fixed network depth. Our theorems show both for the finite-dimensional distributions and the entire process, that the distance between a random fully connected network (and its derivatives) to the corresponding infinite width Gaussian process scales like $n^{-\gamma}$ for $\gamma>0$, with the exponent depending on the metric used to measure discrepancy. Our bounds are strictly stronger in terms of their dependence on network width than any previously available in the literature; in the one-dimensional case, we also prove that they are optimal, i.e., we establish matching lower bounds.
    摘要 我们研究了一个完全连接的神经网络,其Random Gaussian weights和biases的分布。我们的假设是宽度是一个大常数n的隐藏层。我们获得了在大于0的γ的强制下的量化上限,这些上限适用于任何固定的网络深度。我们的定理表明,在Random fully connected network和其导数的距离与相应的无限宽 Gaussian process之间的距离成正比于n^-γ,其中γ>0,这取决于用来衡量偏差的度量。我们的上限是文献中已知的宽度依赖性更加严格,在一维情况下,我们还证明了它们是优化的,即我们设置了匹配的下界。

Assessment of the suitability of degradation models for the planning of CCTV inspections of sewer pipes

  • paper_url: http://arxiv.org/abs/2307.06341
  • repo_url: https://github.com/fidaeic/sewer-pred
  • paper_authors: Fidae El Morer, Stefan Wittek, Andreas Rausch
  • for: 这种研究的目的是为了制定更有效的管道维护计划,以避免管道腐蚀的经济、环境和健康问题。
  • methods: 这种方法使用了统计学和机器学习方法来评估管道腐蚀的模型,包括精度指标、长期腐蚀曲线的生成能力和可解释性。
  • results: 结果表明, ensemble 模型具有最高精度,但无法推断管道的长期腐蚀趋势,而логистиック回归模型具有轻度下降的精度,但能够生成高可解释性的长期腐蚀曲线。
    Abstract The degradation of sewer pipes poses significant economical, environmental and health concerns. The maintenance of such assets requires structured plans to perform inspections, which are more efficient when structural and environmental features are considered along with the results of previous inspection reports. The development of such plans requires degradation models that can be based on statistical and machine learning methods. This work proposes a methodology to assess their suitability to plan inspections considering three dimensions: accuracy metrics, ability to produce long-term degradation curves and explainability. Results suggest that although ensemble models yield the highest accuracy, they are unable to infer the long-term degradation of the pipes, whereas the Logistic Regression offers a slightly less accurate model that is able to produce consistent degradation curves with a high explainability. A use case is presented to demonstrate this methodology and the efficiency of model-based planning compared to the current inspection plan.
    摘要 排水管道的衰化带来经济、环境和健康问题。维护这些资产需要结构化的计划,包括考虑结构和环境特征以及上一次检测报告的结果。开发这些计划需要衰化模型,这些模型可以基于统计学和机器学习方法。这项工作提出一种方法来评估这些模型的适用性,包括三个维度:准确度指标、能够生成长期衰化曲线和可解释性。结果表明, ensemble 模型具有最高准确度,但是它们无法描述管道的长期衰化趋势,而逻辑回归方法则提供了一个微妙的不准确的模型,但它能够生成高度可解释的衰化曲线。一个使用 случа例介绍了这种方法和模型基于的规划的效率,与现有的检测计划相比。

Efficient and Joint Hyperparameter and Architecture Search for Collaborative Filtering

  • paper_url: http://arxiv.org/abs/2307.11004
  • repo_url: https://github.com/overwenyan/joint-search
  • paper_authors: Yan Wen, Chen Gao, Lingling Yi, Liwei Qiu, Yaqing Wang, Yong Li
  • for: 本研究旨在透过自动机器学习(AutoML)技术设计协同推荐(CF)模型,并将架构和参数搜索视为一体化进行。
  • methods: 本研究提出了一个两阶段搜索算法,首先利用子集数据来范畴搜索空间,然后使用整体理解个别参数的知识来范畴搜索。
  • results: 实验结果显示, compared with手动设计和先前搜索的模型,本研究的搜索架构可以更好地运行,并且可以更好地适应实际应用中的挑战。
    Abstract Automated Machine Learning (AutoML) techniques have recently been introduced to design Collaborative Filtering (CF) models in a data-specific manner. However, existing works either search architectures or hyperparameters while ignoring the fact they are intrinsically related and should be considered together. This motivates us to consider a joint hyperparameter and architecture search method to design CF models. However, this is not easy because of the large search space and high evaluation cost. To solve these challenges, we reduce the space by screening out usefulness yperparameter choices through a comprehensive understanding of individual hyperparameters. Next, we propose a two-stage search algorithm to find proper configurations from the reduced space. In the first stage, we leverage knowledge from subsampled datasets to reduce evaluation costs; in the second stage, we efficiently fine-tune top candidate models on the whole dataset. Extensive experiments on real-world datasets show better performance can be achieved compared with both hand-designed and previous searched models. Besides, ablation and case studies demonstrate the effectiveness of our search framework.
    摘要 自动机器学习(AutoML)技术最近被引入设计共同推荐(CF)模型的设计中。然而,现有的工作都是搜索 архитектуры或超参数而忽略了它们是内在相关的,应该一起考虑。这种情况引发我们考虑一种共同搜索超参数和architecture的方法来设计CF模型。然而,这并不容易,因为搜索空间很大,评估成本高。为解决这些挑战,我们将搜索空间减少,通过对各个超参数的含义进行全面理解,排除无用的超参数选择。然后,我们提议一种两阶段搜索算法,在第一阶段,利用子样本数据来减少评估成本,在第二阶段,高效地精化top候选模型。广泛的实验表明,我们的搜索框架可以比手动设计和先前搜索的模型表现更好。此外,剖除和案例研究表明我们的搜索框架的有效性。

Interpreting deep embeddings for disease progression clustering

  • paper_url: http://arxiv.org/abs/2307.06060
  • repo_url: None
  • paper_authors: Anna Munoz-Farre, Antonios Poulakakis-Daktylidis, Dilini Mahesha Kothalawala, Andrea Rodriguez-Martinez
  • for: 这个论文是为了解释深度嵌入的应用在患者划分中。
  • methods: 这个论文使用了一种新的方法来解释深度嵌入,并在UK Biobank数据集上进行了评估。
  • results: 研究发现,使用这种方法可以提供有价值的医学意义的疾病进程 patrern。
    Abstract We propose a novel approach for interpreting deep embeddings in the context of patient clustering. We evaluate our approach on a dataset of participants with type 2 diabetes from the UK Biobank, and demonstrate clinically meaningful insights into disease progression patterns.
    摘要 我们提出了一种新的方法来解释深度嵌入在患者划分中的应用。我们在UK Biobank中的 participant数据集上进行了评估,并发现了临床意义的疾病进程模式。Here's the word-for-word translation:我们提出了一种新的方法来解释深度嵌入在患者划分中的应用。我们在UK Biobank中的 participant数据集上进行了评估,并发现了临床意义的疾病进程模式。Note that the word "UK Biobank" is not a commonly used term in Simplified Chinese, so I've translated it as " participated in the UK Biobank".

Machine Learning for Autonomous Vehicle’s Trajectory Prediction: A comprehensive survey, Challenges, and Future Research Directions

  • paper_url: http://arxiv.org/abs/2307.07527
  • repo_url: None
  • paper_authors: Vibha Bharilya, Neetesh Kumar
  • for: 本文旨在探讨自动驾驶车辆(AV)的轨迹预测方法,尤其是基于机器学习技术的深度学习和奖励学习方法。
  • methods: 本文综述了许多关于AV轨迹预测的研究,包括深度学习和奖励学习等机器学习技术。
  • results: 本文对许多研究进行了详细的分析和评价,并提出了未来研究方向。
    Abstract Autonomous Vehicles (AVs) have emerged as a promising solution by replacing human drivers with advanced computer-aided decision-making systems. However, for AVs to effectively navigate the road, they must possess the capability to predict the future behavior of nearby traffic participants, similar to the predictive driving abilities of human drivers. Building upon existing literature is crucial to advance the field and develop a comprehensive understanding of trajectory prediction methods in the context of automated driving. To address this need, we have undertaken a comprehensive review that focuses on trajectory prediction methods for AVs, with a particular emphasis on machine learning techniques including deep learning and reinforcement learning-based approaches. We have extensively examined over two hundred studies related to trajectory prediction in the context of AVs. The paper begins with an introduction to the general problem of predicting vehicle trajectories and provides an overview of the key concepts and terminology used throughout. After providing a brief overview of conventional methods, this review conducts a comprehensive evaluation of several deep learning-based techniques. Each method is summarized briefly, accompanied by a detailed analysis of its strengths and weaknesses. The discussion further extends to reinforcement learning-based methods. This article also examines the various datasets and evaluation metrics that are commonly used in trajectory prediction tasks. Encouraging an unbiased and objective discussion, we compare two major learning processes, considering specific functional features. By identifying challenges in the existing literature and outlining potential research directions, this review significantly contributes to the advancement of knowledge in the domain of AV trajectory prediction.
    摘要 自动驾驶车(AV)已经出现为可能的解决方案,替代人类驾驶员使用高级计算机辅助决策系统。然而,为AV navigation道路,它们必须具备预测周围交通参与者的未来行为能力,类似于人类驾驶员的预测驾驶能力。 builds upon existing literature is crucial to advance the field and develop a comprehensive understanding of trajectory prediction methods in the context of automated driving. To address this need, we have undertaken a comprehensive review that focuses on trajectory prediction methods for AVs, with a particular emphasis on machine learning techniques including deep learning and reinforcement learning-based approaches. We have extensively examined over two hundred studies related to trajectory prediction in the context of AVs. The paper begins with an introduction to the general problem of predicting vehicle trajectories and provides an overview of the key concepts and terminology used throughout. After providing a brief overview of conventional methods, this review conducts a comprehensive evaluation of several deep learning-based techniques. Each method is summarized briefly, accompanied by a detailed analysis of its strengths and weaknesses. The discussion further extends to reinforcement learning-based methods. This article also examines the various datasets and evaluation metrics that are commonly used in trajectory prediction tasks. Encouraging an unbiased and objective discussion, we compare two major learning processes, considering specific functional features. By identifying challenges in the existing literature and outlining potential research directions, this review significantly contributes to the advancement of knowledge in the domain of AV trajectory prediction.Here's the translation in Traditional Chinese:自动驾驶车(AV)已经出现为可能的解决方案,替代人类驾驶员使用高级计算机辅助决策系统。然而,为AV navigate道路,它们必须具备预测周围交通参加者的未来行为能力,类似于人类驾驶员的预测驾驶能力。 builds upon existing literature is crucial to advance the field and develop a comprehensive understanding of trajectory prediction methods in the context of automated driving. To address this need, we have undertaken a comprehensive review that focuses on trajectory prediction methods for AVs, with a particular emphasis on machine learning techniques including deep learning and reinforcement learning-based approaches. We have extensively examined over two hundred studies related to trajectory prediction in the context of AVs. The paper begins with an introduction to the general problem of predicting vehicle trajectories and provides an overview of the key concepts and terminology used throughout. After providing a brief overview of conventional methods, this review conducts a comprehensive evaluation of several deep learning-based techniques. Each method is summarized briefly, accompanied by a detailed analysis of its strengths and weaknesses. The discussion further extends to reinforcement learning-based methods. This article also examines the various datasets and evaluation metrics that are commonly used in trajectory prediction tasks. Encouraging an unbiased and objective discussion, we compare two major learning processes, considering specific functional features. By identifying challenges in the existing literature and outlining potential research directions, this review significantly contributes to the advancement of knowledge in the domain of AV trajectory prediction.

Function-Space Regularization for Deep Bayesian Classification

  • paper_url: http://arxiv.org/abs/2307.06055
  • repo_url: https://github.com/Aryia-Behroziuan/neurons
  • paper_authors: Jihao Andreas Lin, Joe Watson, Pascal Klink, Jan Peters
  • for: 这个研究旨在提高深度学习模型中的uncertainty量化和信任度,并避免过度自信和难以预测的行为。
  • methods: 这个研究使用Dirichlet几何统计学来做假设空间的扩散推断,并将其应用到不同的深度学习模型上。
  • results: 这个研究的结果显示了这种方法可以提高图像识别 tasks的uncertainty量化和防火墙性能,并且可以与不同的深度学习模型搭配使用。
    Abstract Bayesian deep learning approaches assume model parameters to be latent random variables and infer posterior distributions to quantify uncertainty, increase safety and trust, and prevent overconfident and unpredictable behavior. However, weight-space priors are model-specific, can be difficult to interpret and are hard to specify. Instead, we apply a Dirichlet prior in predictive space and perform approximate function-space variational inference. To this end, we interpret conventional categorical predictions from stochastic neural network classifiers as samples from an implicit Dirichlet distribution. By adapting the inference, the same function-space prior can be combined with different models without affecting model architecture or size. We illustrate the flexibility and efficacy of such a prior with toy experiments and demonstrate scalability, improved uncertainty quantification and adversarial robustness with large-scale image classification experiments.
    摘要

Online Inventory Problems: Beyond the i.i.d. Setting with Online Convex Optimization

  • paper_url: http://arxiv.org/abs/2307.06048
  • repo_url: None
  • paper_authors: Massil Hihat, Stéphane Gaïffas, Guillaume Garrigos, Simon Bussy
  • for: 管理者面临多产品存储控制问题,以尽量减少累累损失。
  • methods: 提出MaxCOSD算法,具有可证明的保证,能够应对非新闻式损失、状态变化和非同异常变量。
  • results: 提出非特征假设,以便学习。
    Abstract We study multi-product inventory control problems where a manager makes sequential replenishment decisions based on partial historical information in order to minimize its cumulative losses. Our motivation is to consider general demands, losses and dynamics to go beyond standard models which usually rely on newsvendor-type losses, fixed dynamics, and unrealistic i.i.d. demand assumptions. We propose MaxCOSD, an online algorithm that has provable guarantees even for problems with non-i.i.d. demands and stateful dynamics, including for instance perishability. We consider what we call non-degeneracy assumptions on the demand process, and argue that they are necessary to allow learning.
    摘要 我们研究多种产品存储控制问题,其中管理者根据历史信息进行顺序充备决策,以最小化总损失。我们的动机是考虑通用的需求、损失和动态,以超越标准模型,这些模型通常基于新闻 vendor 类型的损失、固定动态和不实际的 i.i.d. 需求假设。我们提出 MaxCOSD,一种在线算法,具有证明的保证,包括非 i.i.d. 需求和状态动态。我们认为非 degeneracy 假设对需求过程是必要的,以便学习。

  • paper_url: http://arxiv.org/abs/2307.06046
  • repo_url: None
  • paper_authors: Jincheng Zhou, Beatrice Bevilacqua, Bruno Ribeiro
  • for: 预测扩展到新的测试图表中缺失的链接(关系),尤其是在面对新的节点和关系类型的外部数据(OOD)时。
  • methods: 基于双交换性(节点与关系类型)的理论概念,与传统的关系学习方法不同,我们提出了一种OOD链接预测方法。
  • results: 我们的方法可以有效地泛化到完全新的关系类型,无需访问额外信息,在实际数据集上实现了显著的性能提升。
    Abstract The task of inductive link prediction in (discrete) attributed multigraphs infers missing attributed links (relations) between nodes in new test multigraphs. Traditional relational learning methods face the challenge of limited generalization to OOD test multigraphs containing both novel nodes and novel relation types not seen in training. Recently, under the only assumption that all relation types share the same structural predictive patterns (single task), Gao et al. (2023) proposed an OOD link prediction method using the theoretical concept of double exchangeability (for nodes & relation types), in contrast to the (single) exchangeability (only for nodes) used to design Graph Neural Networks (GNNs). In this work we further extend the double exchangeability concept to multi-task double exchangeability, where we define link prediction in attributed multigraphs that can have distinct and potentially conflicting predictive patterns for different sets of relation types (multiple tasks). Our empirical results on real-world datasets demonstrate that our approach can effectively generalize to entirely new relation types in test, without access to additional information, yielding significant performance improvements over existing methods.
    摘要 这个任务是在抽象的对称多边Graph中预测缺失的关联(关系)。传统的关系学习方法面临新的外部测试多边Graph中的限定应用。在2023年,高等等(Gao et al.)提出了一种外部链接预测方法,基于关联类型之间的同structural predictive pattern(单一任务)。在这个工作中,我们进一步扩展了双交换性概念(for nodes & relation types),并定义了在具有不同任务的对称多边Graph中进行预测。我们的实验结果显示,我们的方法可以对于整个新的关联类型进行有效的扩展,无需进一步的训练或资讯,实现了与现有方法的 significiant performance improvement。

Rhythm Modeling for Voice Conversion

  • paper_url: http://arxiv.org/abs/2307.06040
  • repo_url: https://github.com/bshall/urhythmic
  • paper_authors: Benjamin van Niekerk, Marc-André Carbonneau, Herman Kamper
  • for: 这篇论文的目的是提出一种不需要平行数据或文本译写的无监督语音变换方法,以改善语音识别的感知。
  • methods: 该方法首先将源语音分成不同类型的段落,包括声门声、塞音声和空格声。然后,它使用自我监督表示来模型语音的节奏,并将目标语音的节奏与源语音的节奏匹配。
  • results: 实验结果表明,Urhythmic方法在质量和语音落幕方面表现更好于现有的无监督方法。代码和检查点:https://github.com/bshall/urhythmic。音频 demo 页面:https://ubisoft-laforge.github.io/speech/urhythmic。
    Abstract Voice conversion aims to transform source speech into a different target voice. However, typical voice conversion systems do not account for rhythm, which is an important factor in the perception of speaker identity. To bridge this gap, we introduce Urhythmic-an unsupervised method for rhythm conversion that does not require parallel data or text transcriptions. Using self-supervised representations, we first divide source audio into segments approximating sonorants, obstruents, and silences. Then we model rhythm by estimating speaking rate or the duration distribution of each segment type. Finally, we match the target speaking rate or rhythm by time-stretching the speech segments. Experiments show that Urhythmic outperforms existing unsupervised methods in terms of quality and prosody. Code and checkpoints: https://github.com/bshall/urhythmic. Audio demo page: https://ubisoft-laforge.github.io/speech/urhythmic.
    摘要 声音转换目标是将源语音转换成不同的目标声音。然而,常见的声音转换系统不会考虑节奏,这是声音认知的重要因素。为了弥足这一差距,我们介绍了 Urhythmic,一种不需要平行数据或文本译写的不supervised方法 для节奏转换。我们首先使用自我supervised表示来将源音频分成sonorants、obstruents和沟通 silence 等多个段落。然后,我们使用计算speaking rate或每段类型的duration分布来模拟节奏。最后,我们使用时间压缩来匹配目标speaking rate或节奏。实验表明,Urhythmic在质量和语调方面与现有的不supervised方法相比,表现出色。代码和Checkpoint:https://github.com/bshall/urhythmic。音频 demo 页面:https://ubisoft-laforge.github.io/speech/urhythmic。

Learning from Exemplary Explanations

  • paper_url: http://arxiv.org/abs/2307.06026
  • repo_url: https://github.com/Sfedfcv/redesigned-pancake
  • paper_authors: Misgina Tsighe Hagos, Kathleen M. Curran, Brian Mac Namee
    for:这篇论文旨在提高Interactive Machine Learning(IML)中的模型解释approach的效果,使得模型更加透明和可解释。methods:该论文使用了两个输入实例和它们对应的Gradient Weighted Class Activation Mapping(GradCAM)模型解释作为示例来实现XBL。results:该论文使用了医学图像分类任务,并通过最小化人工输入,实现了改进的解释 (+0.02, +3%)和降低分类性能 (-0.04, -4%),与不使用交互的模型相比。
    Abstract eXplanation Based Learning (XBL) is a form of Interactive Machine Learning (IML) that provides a model refining approach via user feedback collected on model explanations. Although the interactivity of XBL promotes model transparency, XBL requires a huge amount of user interaction and can become expensive as feedback is in the form of detailed annotation rather than simple category labelling which is more common in IML. This expense is exacerbated in high stakes domains such as medical image classification. To reduce the effort and expense of XBL we introduce a new approach that uses two input instances and their corresponding Gradient Weighted Class Activation Mapping (GradCAM) model explanations as exemplary explanations to implement XBL. Using a medical image classification task, we demonstrate that, using minimal human input, our approach produces improved explanations (+0.02, +3%) and achieves reduced classification performance (-0.04, -4%) when compared against a model trained without interactions.
    摘要 <>TRANSLATE_TEXT explaination_based_learning 是一种 interactive_machine_learning 方法,它通过用户反馈来修改模型。 although explaination_based_learning 提高了模型透明度,它需要很大量的用户交互,这会变得昂贵,特别是在高度重要的领域,如医学图像分类。 为了减少 XBL 的努力和成本,我们介绍了一种新的方法,该方法使用两个输入实例和它们对应的 Gradient Weighted Class Activation Mapping 模型解释作为示例来实现 XBL。 使用医学图像分类任务,我们示示了,使用最小的人工输入,我们的方法可以生成改进的解释 (+0.02, +3%),并实现了降低分类性能 (-0.04, -4%),比一个没有交互的模型更好。Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.

An Effective and Efficient Time-aware Entity Alignment Framework via Two-aspect Three-view Label Propagation

  • paper_url: http://arxiv.org/abs/2307.06013
  • repo_url: None
  • paper_authors: Li Cai, Xin Mao, Youshao Xiao, Changxu Wu, Man Lan
  • for: 提高知识融合的实体对应关系检索
  • methods: 非神经网络方法,包括两个视角三元标签协力、稀疏相似度与时间约束、汇聚运算和时间迭代学习
  • results: 与状态艺术方法相比,提高实体对应关系检索的性能,并且时间占用只有毫秒级,比最高效TEA方法的10%左右
    Abstract Entity alignment (EA) aims to find the equivalent entity pairs between different knowledge graphs (KGs), which is crucial to promote knowledge fusion. With the wide use of temporal knowledge graphs (TKGs), time-aware EA (TEA) methods appear to enhance EA. Existing TEA models are based on Graph Neural Networks (GNN) and achieve state-of-the-art (SOTA) performance, but it is difficult to transfer them to large-scale TKGs due to the scalability issue of GNN. In this paper, we propose an effective and efficient non-neural EA framework between TKGs, namely LightTEA, which consists of four essential components: (1) Two-aspect Three-view Label Propagation, (2) Sparse Similarity with Temporal Constraints, (3) Sinkhorn Operator, and (4) Temporal Iterative Learning. All of these modules work together to improve the performance of EA while reducing the time consumption of the model. Extensive experiments on public datasets indicate that our proposed model significantly outperforms the SOTA methods for EA between TKGs, and the time consumed by LightTEA is only dozens of seconds at most, no more than 10% of the most efficient TEA method.
    摘要 Entity alignment (EA) 目标是找到不同知识 graphs (KGs) 中相对应的实体对,这对知识融合提供了重要的支持。随着时间知识 graphs (TKGs) 的广泛使用,时间意识 EA (TEA) 方法得到了提高 EA 的可能性。现有的 TEA 模型基于图神经网络 (GNN) ,实现了状态之巅 (SOTA) 性能,但是将其应用到大规模 TKGs 上却存在可插运行性问题。在这篇论文中,我们提出了一种高效和高效的非神经 EA 框架 между TKGs,即 LightTEA,它包括以下四个基本组件:1. 两个方面三个视图标签卷积2. 稀疏相似度 WITH 时间约束3. Sinkhorn 算子4. 时间迭代学习这些模块结合起来,可以提高 EA 的性能,同时降低模型的时间消耗。我们在公共数据集上进行了广泛的实验,发现我们提出的模型在 EA between TKGs 方面具有显著的优势,并且模型的时间消耗只有毫秒级,最多只有最高效 TEA 方法的 10%。

What Happens During Finetuning of Vision Transformers: An Invariance Based Investigation

  • paper_url: http://arxiv.org/abs/2307.06006
  • repo_url: None
  • paper_authors: Gabriele Merlin, Vedant Nanda, Ruchit Rawal, Mariya Toneva
  • for: 这篇论文探讨了预训练-精度调整模式下的模型性能提升问题,并提出了新的度量来评估预训练模型中吸收的特征是否被细化或忘记。
  • methods: 作者使用了多个 benchmark 数据集和任务来研究预训练视Transformers 和其精度调整版本之间的关系。他们还提出了一些新的度量来评估预训练模型中吸收的特征是否被细化或忘记。
  • results: 研究发现,预训练可以带来跨任务的特征转移,且这种特征转移主要发生在预训练模型的浅层。此外,预训练模型的深层特征会在精度调整过程中压缩到浅层。这些发现可以帮助我们更好地理解预训练模型的成功原因和精度调整过程中的变化。
    Abstract The pretrain-finetune paradigm usually improves downstream performance over training a model from scratch on the same task, becoming commonplace across many areas of machine learning. While pretraining is empirically observed to be beneficial for a range of tasks, there is not a clear understanding yet of the reasons for this effect. In this work, we examine the relationship between pretrained vision transformers and the corresponding finetuned versions on several benchmark datasets and tasks. We present new metrics that specifically investigate the degree to which invariances learned by a pretrained model are retained or forgotten during finetuning. Using these metrics, we present a suite of empirical findings, including that pretraining induces transferable invariances in shallow layers and that invariances from deeper pretrained layers are compressed towards shallower layers during finetuning. Together, these findings contribute to understanding some of the reasons for the successes of pretrained models and the changes that a pretrained model undergoes when finetuned on a downstream task.
    摘要 通常情况下,预训练-精度调整方法会提高下游任务的性能,这种方法在多个机器学习领域变得普遍。虽然预训练的效果在多种任务上被观察到,但还没有一个明确的理解,预训练对这些效果的原因。在这项工作中,我们研究预训练的视图转换器和相应的精度调整版本在几个 benchmark 数据集和任务上的关系。我们提出了新的指标,专门探讨预训练模型中学习的不变性是否在精度调整过程中被保留或忘记。使用这些指标,我们提出了一组实验结果,包括预训练层的权重会导致预训练层的不变性被传递到下游任务中,以及深层预训练层的不变性会在精度调整过程中压缩到浅层。总的来说,这些发现对预训练模型的成功和精度调整过程中的变化做出了贡献。

DDNAS: Discretized Differentiable Neural Architecture Search for Text Classification

  • paper_url: http://arxiv.org/abs/2307.06005
  • repo_url: https://github.com/ddnas/ddnas
  • paper_authors: Kuan-Chun Chen, Cheng-Te Li, Kuo-Jung Lee
  • for: 文本表示学习中的Neural Architecture Search(NAS)方法可以提供更好的表示能力。
  • methods: 本文提出了一种新的NAS方法,即Discretized Differentiable Neural Architecture Search(DDNAS),可以用于文本表示学习和分类。DDNAS使用了连续的权重下降来优化搜索,同时通过最大化相互信息来增加搜索节点的拓扑结构,以模型文本输入的层次分类。
  • results: 在八种真实数据集上进行了广泛的实验,DDNAS可以一致性地超越现有的NAS方法。尽管DDNAS只使用了三种基本操作(即卷积、聚合和none)作为NAS建构块的候选者,但其表现良好并可以进一步提高通过添加更多不同的操作。
    Abstract Neural Architecture Search (NAS) has shown promising capability in learning text representation. However, existing text-based NAS neither performs a learnable fusion of neural operations to optimize the architecture, nor encodes the latent hierarchical categorization behind text input. This paper presents a novel NAS method, Discretized Differentiable Neural Architecture Search (DDNAS), for text representation learning and classification. With the continuous relaxation of architecture representation, DDNAS can use gradient descent to optimize the search. We also propose a novel discretization layer via mutual information maximization, which is imposed on every search node to model the latent hierarchical categorization in text representation. Extensive experiments conducted on eight diverse real datasets exhibit that DDNAS can consistently outperform the state-of-the-art NAS methods. While DDNAS relies on only three basic operations, i.e., convolution, pooling, and none, to be the candidates of NAS building blocks, its promising performance is noticeable and extensible to obtain further improvement by adding more different operations.
    摘要 neural architecture search (NAS) 显示了可观的能力在文本表示学习中。然而,现有的文本基于的 NAS neither performs a learnable fusion of neural operations to optimize the architecture, nor encodes the latent hierarchical categorization behind text input. This paper presents a novel NAS method, Discretized Differentiable Neural Architecture Search (DDNAS), for text representation learning and classification. With the continuous relaxation of architecture representation, DDNAS can use gradient descent to optimize the search. We also propose a novel discretization layer via mutual information maximization, which is imposed on every search node to model the latent hierarchical categorization in text representation. Extensive experiments conducted on eight diverse real datasets exhibit that DDNAS can consistently outperform the state-of-the-art NAS methods. While DDNAS relies on only three basic operations, i.e., convolution, pooling, and none, to be the candidates of NAS building blocks, its promising performance is noticeable and extensible to obtain further improvement by adding more different operations.Here is the word-for-word translation of the text into Simplified Chinese: neural architecture search (NAS) 显示了可观的能力在文本表示学习中。然而,现有的文本基于的 NAS neither performs a learnable fusion of neural operations to optimize the architecture, nor encodes the latent hierarchical categorization behind text input. This paper presents a novel NAS method, Discretized Differentiable Neural Architecture Search (DDNAS), for text representation learning and classification. With the continuous relaxation of architecture representation, DDNAS can use gradient descent to optimize the search. We also propose a novel discretization layer via mutual information maximization, which is imposed on every search node to model the latent hierarchical categorization in text representation. Extensive experiments conducted on eight diverse real datasets exhibit that DDNAS can consistently outperform the state-of-the-art NAS methods. While DDNAS relies on only three basic operations, i.e., convolution, pooling, and none, to be the candidates of NAS building blocks, its promising performance is noticeable and extensible to obtain further improvement by adding more different operations.

Pathway: a fast and flexible unified stream data processing framework for analytical and Machine Learning applications

  • paper_url: http://arxiv.org/abs/2307.13116
  • repo_url: None
  • paper_authors: Michal Bartoszkiewicz, Jan Chorowski, Adrian Kosowski, Jakub Kowalski, Sergey Kulik, Mateusz Lewandowski, Krzysztof Nowicki, Kamil Piechowiak, Olivier Ruas, Zuzanna Stamirowska, Przemyslaw Uznanski
  • for: 这个论文是为了解决物理经济数据流处理中的挑战,包括互联网物联网和企业系统生成的数据流。
  • methods: 这个论文使用了一种新的统一数据处理框架,叫做Pathway,可以在 bounded和unbounded数据流中运行工作负荷。Pathway使用了Python和Python/SQL工作流程的表格API,并由分布式增量数据流程在Rust中实现。
  • results: 作者们present了Pathway的系统和benchmarking结果,表明它在批处理和流处理上能够超过现有的行业框架。此外,Pathway还可以处理一些现有框架无法解决的流处理用例,如流式迭代图算法(PageRank等)。
    Abstract We present Pathway, a new unified data processing framework that can run workloads on both bounded and unbounded data streams. The framework was created with the original motivation of resolving challenges faced when analyzing and processing data from the physical economy, including streams of data generated by IoT and enterprise systems. These required rapid reaction while calling for the application of advanced computation paradigms (machinelearning-powered analytics, contextual analysis, and other elements of complex event processing). Pathway is equipped with a Table API tailored for Python and Python/SQL workflows, and is powered by a distributed incremental dataflow in Rust. We describe the system and present benchmarking results which demonstrate its capabilities in both batch and streaming contexts, where it is able to surpass state-of-the-art industry frameworks in both scenarios. We also discuss streaming use cases handled by Pathway which cannot be easily resolved with state-of-the-art industry frameworks, such as streaming iterative graph algorithms (PageRank, etc.).
    摘要 我们介绍Pathway,一个新的统一数据处理框架,可以处理 bounded和unbounded数据流。这个框架由physical economy中数据分析和处理难题的原动机而生,包括来自互联网东西和企业系统的数据流。这些数据流需要快速应对,并且需要应用高级计算 paradigms(机器学习驱动分析、上下文分析和其他复杂事件处理)。Pathway具有适合Python和Python/SQL工作流的表格API,并由分布式增量数据流在Rust中实现。我们描述了这个系统,并提供了对其在批处理和流处理上下文中的性能测试结果,其中它在两个场景中都能够超越当前行业框架。我们还讨论了由Pathway处理的流处理用例,包括流行 iterate 图算法(PageRank等),这些用例无法轻松地通过当前行业框架解决。

A Comprehensive Review of Automated Data Annotation Techniques in Human Activity Recognition

  • paper_url: http://arxiv.org/abs/2307.05988
  • repo_url: None
  • paper_authors: Florenc Demrozi, Cristian Turetta, Fadi Al Machot, Graziano Pravadelli, Philipp H. Kindt
    for: 这篇论文的目的是为了提供关于人体活动识别(HAR)数据注释技术的系统性回顾。methods: 本论文使用了分类法将现有的方法分为不同的类别,并提供了一个分类法,以帮助选择适用于给定场景的技术。results: 本论文提供了关于HAR数据注释技术的系统性回顾,并将现有的方法分为不同的类别,以便在不同的场景中选择适用的技术。
    Abstract Human Activity Recognition (HAR) has become one of the leading research topics of the last decade. As sensing technologies have matured and their economic costs have declined, a host of novel applications, e.g., in healthcare, industry, sports, and daily life activities have become popular. The design of HAR systems requires different time-consuming processing steps, such as data collection, annotation, and model training and optimization. In particular, data annotation represents the most labor-intensive and cumbersome step in HAR, since it requires extensive and detailed manual work from human annotators. Therefore, different methodologies concerning the automation of the annotation procedure in HAR have been proposed. The annotation problem occurs in different notions and scenarios, which all require individual solutions. In this paper, we provide the first systematic review on data annotation techniques for HAR. By grouping existing approaches into classes and providing a taxonomy, our goal is to support the decision on which techniques can be beneficially used in a given scenario.
    摘要 人类活动识别(HAR)在过去一个 décennial 内成为了研究领域的主导话题之一。随着感知技术的成熔和经济成本的下降,一系列的新应用,如医疗、工业、运动和日常生活活动,在人类活动识别领域得到了广泛的应用。人类活动识别系统的设计需要不同的时间consuming 的处理步骤,如数据收集、注释、模型训练和优化。特别是数据注释是人类活动识别中最劳力占用和繁琐的步骤,因为它需要大量的人工注释员进行详细的手动工作。因此,不同的方法和技术在人类活动识别中自动注释的问题上提出了多种方法。这些问题在不同的概念和场景下都需要具体的解决方案。本文是人类活动识别领域的首次系统性的文献评论。我们将现有的方法分类并提供了一个分类法,以支持在给定的场景下选择合适的技术。

Transformers in Reinforcement Learning: A Survey

  • paper_url: http://arxiv.org/abs/2307.05979
  • repo_url: None
  • paper_authors: Pranav Agarwal, Aamer Abdul Rahman, Pierre-Luc St-Charles, Simon J. D. Prince, Samira Ebrahimi Kahou
  • for: 这篇论文探讨了如何使用 transformers 来解决 reinforcement learning 中的挑战,包括不稳定的训练、归因问题、不可解释性和部分可见性。
  • methods: 这篇论文详细介绍了 transformers 的性质和其变体,并讲解了它们在 reinforcement learning 中的应用,包括表示学习、过程和奖励函数模型化以及策略优化。
  • results: 这篇论文总结了在不同应用中使用 transformers 的研究,包括机器人、医学、自然语言处理和云计算等。它们还讨论了如何使用可视化技术和高效的训练策略来提高 transformers 的可解释性和效率。
    Abstract Transformers have significantly impacted domains like natural language processing, computer vision, and robotics, where they improve performance compared to other neural networks. This survey explores how transformers are used in reinforcement learning (RL), where they are seen as a promising solution for addressing challenges such as unstable training, credit assignment, lack of interpretability, and partial observability. We begin by providing a brief domain overview of RL, followed by a discussion on the challenges of classical RL algorithms. Next, we delve into the properties of the transformer and its variants and discuss the characteristics that make them well-suited to address the challenges inherent in RL. We examine the application of transformers to various aspects of RL, including representation learning, transition and reward function modeling, and policy optimization. We also discuss recent research that aims to enhance the interpretability and efficiency of transformers in RL, using visualization techniques and efficient training strategies. Often, the transformer architecture must be tailored to the specific needs of a given application. We present a broad overview of how transformers have been adapted for several applications, including robotics, medicine, language modeling, cloud computing, and combinatorial optimization. We conclude by discussing the limitations of using transformers in RL and assess their potential for catalyzing future breakthroughs in this field.
    摘要 启发器在自然语言处理、计算机视觉和机器人领域内的表现有所改善,而在奖励学习(RL)领域的应用也吸引了广泛的关注。这篇评论将探讨启发器如何在RL中应用,并评估它们在RL中的潜在优势和局限性。我们首先提供了RL领域的简要概述,然后讨论了 классиRL算法的挑战。接着,我们介绍了启发器的性质和其变种,并讨论了它们在RL中的特点,以及它们如何解决RL中的挑战。我们还检视了启发器在RL中的应用,包括表示学习、转移和奖励函数模型化、政策优化等方面。此外,我们还讨论了如何使用视觉化技术和高效训练策略来提高启发器在RL中的解释性和效率。 Finally, we discuss the limitations of using transformers in RL and assess their potential for catalyzing future breakthroughs in this field.Note: Simplified Chinese is used here, as it is the most widely used version of Chinese in mainland China and other countries. However, if you prefer Traditional Chinese, I can also provide the translation.

Towards Safe Self-Distillation of Internet-Scale Text-to-Image Diffusion Models

  • paper_url: http://arxiv.org/abs/2307.05977
  • repo_url: https://github.com/nannullna/safe-diffusion
  • paper_authors: Sanghyun Kim, Seohyeon Jung, Balhae Kim, Moonseok Choi, Jinwoo Shin, Juho Lee
  • for: 防止文本到图像扩散模型中的危险或版权内容生成
  • methods: 提出了一种名为SDD的方法,通过自我概念混合来引导噪声估计,使得噪声估计与目标 removals 的概念匹配无条件的噪声估计
  • results: 比前一些方法更高效地减少了危险内容生成的图像质量下降,同时允许同时除掉多个概念,而前一些工作只能一个概念一次除掉
    Abstract Large-scale image generation models, with impressive quality made possible by the vast amount of data available on the Internet, raise social concerns that these models may generate harmful or copyrighted content. The biases and harmfulness arise throughout the entire training process and are hard to completely remove, which have become significant hurdles to the safe deployment of these models. In this paper, we propose a method called SDD to prevent problematic content generation in text-to-image diffusion models. We self-distill the diffusion model to guide the noise estimate conditioned on the target removal concept to match the unconditional one. Compared to the previous methods, our method eliminates a much greater proportion of harmful content from the generated images without degrading the overall image quality. Furthermore, our method allows the removal of multiple concepts at once, whereas previous works are limited to removing a single concept at a time.
    摘要 大规模图像生成模型,具有吸引人艺术品质,受互联网庞大数据的支持,但也引发社会问题,这些模型可能生成害性或版权内容。这些偏见和害性在训练过程中产生,难以完全除去,成为安全部署这些模型的主要障碍。在这篇论文中,我们提出了一种方法called SDD,用于防止文本到图像扩散模型中的害性内容生成。我们通过自我馈散扩散模型,使噪声估计 conditional on 目标 removalfocus 与无条件噪声估计匹配。相比之前的方法,我们的方法可以更好地除去害性内容,而不会影响整体图像质量。此外,我们的方法允许同时除掉多个概念,而前一代方法只能一个概念一次。

Outlier detection in regression: conic quadratic formulations

  • paper_url: http://arxiv.org/abs/2307.05975
  • repo_url: None
  • paper_authors: Andrés Gómez, José Neto
  • for: Linear regression model building with outlier detection
  • methods: Second-order conic relaxations without big-M constraints
  • results: Faster computational performance compared to existing big-M formulations
    Abstract In many applications, when building linear regression models, it is important to account for the presence of outliers, i.e., corrupted input data points. Such problems can be formulated as mixed-integer optimization problems involving cubic terms, each given by the product of a binary variable and a quadratic term of the continuous variables. Existing approaches in the literature, typically relying on the linearization of the cubic terms using big-M constraints, suffer from weak relaxation and poor performance in practice. In this work we derive stronger second-order conic relaxations that do not involve big-M constraints. Our computational experiments indicate that the proposed formulations are several orders-of-magnitude faster than existing big-M formulations in the literature for this problem.
    摘要 在许多应用中,当建立线性回归模型时,需要考虑异常值(即损坏输入数据点)的存在。这些问题可以表示为杂合整数优化问题,每个问题由一个二进制变量和一个连续变量的二次项组成。现有的文献中的方法通常采用线性化立方项使用大M约束,但这些方法在实践中具有弱约束和低效性。在这个工作中,我们 derivates stronger的第二阶几何relaxation,不需要大M约束。我们的计算实验表明,提议的形式化比现有的大M形式化快数个次几何。

Contrastive Learning for Conversion Rate Prediction

  • paper_url: http://arxiv.org/abs/2307.05974
  • repo_url: https://github.com/dongruihust/cl4cvr
  • paper_authors: Wentao Ouyang, Rui Dong, Xiuwu Zhang, Chaofeng Guo, Jinmei Luo, Xiangzheng Liu, Yanlong Du
  • for: 预测广告点击率 (CVR) 在广告系统中扮演着重要的角色,现今的深度神经网络模型在 CVR 预测方面已经显示出了可观的表现。但是,这些深度模型需要巨量数据进行训练,在在线广告系统中,尽管有数以百万到数以亿的广告,但用户往往只会点击一小部分的广告,并且转化的部分更加罕见。这种数据稀缺问题限制了深度模型的应用。
  • methods: 本文提出了一种名为 Contrastive Learning for CVR prediction (CL4CVR) 的框架,它将 CVR 预测任务与对比学习任务相联系起来,可以通过利用丰富的无标注数据来提取更好的数据表示,提高 CVR 预测性能。为了适应 CVR 预测问题,我们提出了嵌入屏蔽 (EM),而不是特征屏蔽,来创建两个视图的扩展样本。我们还提出了一个假值排除 (FNE) 组件,用于消除具有同样特征的样本,以考虑用户行为数据中的自然特性。此外,我们还提出了一个监督正常包含 (SPI) 组件,用于包含每个拥有样本的额外正确样本,以便充分利用稀缺 yet 珍贵的用户转化事件。
  • results: 实验结果表明,CL4CVR 在两个真实的转化数据集上显示出了更高的性能。源代码可以在 https://github.com/DongRuiHust/CL4CVR 上获取。
    Abstract Conversion rate (CVR) prediction plays an important role in advertising systems. Recently, supervised deep neural network-based models have shown promising performance in CVR prediction. However, they are data hungry and require an enormous amount of training data. In online advertising systems, although there are millions to billions of ads, users tend to click only a small set of them and to convert on an even smaller set. This data sparsity issue restricts the power of these deep models. In this paper, we propose the Contrastive Learning for CVR prediction (CL4CVR) framework. It associates the supervised CVR prediction task with a contrastive learning task, which can learn better data representations exploiting abundant unlabeled data and improve the CVR prediction performance. To tailor the contrastive learning task to the CVR prediction problem, we propose embedding masking (EM), rather than feature masking, to create two views of augmented samples. We also propose a false negative elimination (FNE) component to eliminate samples with the same feature as the anchor sample, to account for the natural property in user behavior data. We further propose a supervised positive inclusion (SPI) component to include additional positive samples for each anchor sample, in order to make full use of sparse but precious user conversion events. Experimental results on two real-world conversion datasets demonstrate the superior performance of CL4CVR. The source code is available at https://github.com/DongRuiHust/CL4CVR.
    摘要 “ conversión rate(CVR)预测在广告系统中扮演着重要的角色。最近,我们使用了超级vised深度神经网络模型,并表现出了让人惊叹的效果。但是,这些深度模型需要巨量数据进行训练,而在在线广告系统中,用户只有Click的一小部分,并且转化的部分更加小。这个数据稀缺问题限制了这些深度模型的力量。在这篇论文中,我们提出了对CVR预测的Contrastive Learning框架(CL4CVR)。它将supervised CVR预测任务与contrastive learning任务联系起来,可以利用庞大的无标签数据来学习更好的数据表示,提高CVR预测性能。为了适应CVR预测问题,我们提出了 embedding masking(EM),而不是特征masking,来创建两个视图的增强样本。我们还提出了false negative elimination(FNE)组件,以消除样本中的同样特征的锚样本,以满足用户行为数据的自然性质。我们还提出了supervised positive inclusion(SPI)组件,以包括每个锚样本的额外正例样本,以便充分利用稀缺而价值很高的用户转化事件。实验结果表明,CL4CVR在两个真实的转化数据集上表现出了superior的效果。源代码可以在https://github.com/DongRuiHust/CL4CVR中下载。”

VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models

  • paper_url: http://arxiv.org/abs/2307.05973
  • repo_url: None
  • paper_authors: Wenlong Huang, Chen Wang, Ruohan Zhang, Yunzhu Li, Jiajun Wu, Li Fei-Fei
  • for: This paper aims to synthesize robot trajectories for a large variety of manipulation tasks given an open-set of instructions and an open-set of objects.
  • methods: The proposed method leverages large language models (LLMs) to infer affordances and constraints from free-form language instructions, and then composes 3D value maps with a visual-language model (VLM) to ground the knowledge into the observation space of the agent. The composed value maps are then used in a model-based planning framework to zero-shot synthesize closed-loop robot trajectories with robustness to dynamic perturbations.
  • results: The proposed method is demonstrated to be effective in both simulated and real-robot environments, showcasing the ability to perform a large variety of everyday manipulation tasks specified in free-form natural language. The method also benefits from online experiences by efficiently learning a dynamics model for scenes that involve contact-rich interactions.
    Abstract Large language models (LLMs) are shown to possess a wealth of actionable knowledge that can be extracted for robot manipulation in the form of reasoning and planning. Despite the progress, most still rely on pre-defined motion primitives to carry out the physical interactions with the environment, which remains a major bottleneck. In this work, we aim to synthesize robot trajectories, i.e., a dense sequence of 6-DoF end-effector waypoints, for a large variety of manipulation tasks given an open-set of instructions and an open-set of objects. We achieve this by first observing that LLMs excel at inferring affordances and constraints given a free-form language instruction. More importantly, by leveraging their code-writing capabilities, they can interact with a visual-language model (VLM) to compose 3D value maps to ground the knowledge into the observation space of the agent. The composed value maps are then used in a model-based planning framework to zero-shot synthesize closed-loop robot trajectories with robustness to dynamic perturbations. We further demonstrate how the proposed framework can benefit from online experiences by efficiently learning a dynamics model for scenes that involve contact-rich interactions. We present a large-scale study of the proposed method in both simulated and real-robot environments, showcasing the ability to perform a large variety of everyday manipulation tasks specified in free-form natural language. Project website: https://voxposer.github.io
    摘要 大型语言模型(LLM)已经展示了丰富的行为知识,可以用于机器人操作中的理解和观念。 despite the progress, most still rely on pre-defined motion primitives to carry out physical interactions with the environment, which remains a major bottleneck. In this work, we aim to synthesize robot trajectories, i.e., a dense sequence of 6-DoF end-effector waypoints, for a large variety of manipulation tasks given an open-set of instructions and an open-set of objects. We achieve this by first observing that LLMs excel at inferring affordances and constraints given a free-form language instruction. More importantly, by leveraging their code-writing capabilities, they can interact with a visual-language model (VLM) to compose 3D value maps to ground the knowledge into the observation space of the agent. The composed value maps are then used in a model-based planning framework to zero-shot synthesize closed-loop robot trajectories with robustness to dynamic perturbations. We further demonstrate how the proposed framework can benefit from online experiences by efficiently learning a dynamics model for scenes that involve contact-rich interactions. We present a large-scale study of the proposed method in both simulated and real-robot environments, showcasing the ability to perform a large variety of everyday manipulation tasks specified in free-form natural language. Project website:

Self-Distilled Quantization: Achieving High Compression Rates in Transformer-Based Language Models

  • paper_url: http://arxiv.org/abs/2307.05972
  • repo_url: None
  • paper_authors: James O’ Neill, Sourav Dutta
  • for: investigate the effects of post-training quantization and quantization-aware training on the generalization of Transformer language models, and propose a new method called self-distilled quantization (SDQ) to minimize accumulative quantization errors.
  • methods: post-training quantization, quantization-aware training, self-distilled quantization (SDQ)
  • results: both multilingual models XLM-R-Base and InfoXLM-Base can be reduced from 32-bit floating point weights to 8-bit integer weights while maintaining a high level of performance on the XGLUE benchmark, but multilingual models have challenges in generalizing to languages they were not fine-tuned on.
    Abstract We investigate the effects of post-training quantization and quantization-aware training on the generalization of Transformer language models. We present a new method called self-distilled quantization (SDQ) that minimizes accumulative quantization errors and outperforms baselines. We apply SDQ to multilingual models XLM-R-Base and InfoXLM-Base and demonstrate that both models can be reduced from 32-bit floating point weights to 8-bit integer weights while maintaining a high level of performance on the XGLUE benchmark. Our results also highlight the challenges of quantizing multilingual models, which must generalize to languages they were not fine-tuned on.
    摘要 我团队 investigate transformer语言模型的普适性,包括post-training quantization和quantization-aware training的影响。我们提出了一种新的自适应减量法(SDQ),可以减少累加减量错误,并超过基线。我们在多语言模型XLM-R-Base和InfoXLM-Base上应用SDQ,并证明这两个模型可以由32位浮点数变量降低到8位整数变量,同时保持高水平的性能在XGLUE测试准则上。我们的结果也透视了多语言模型的减量挑战,它们需要总结到它们没有精心调整过的语言。

Giving Robots a Hand: Learning Generalizable Manipulation with Eye-in-Hand Human Video Demonstrations

  • paper_url: http://arxiv.org/abs/2307.05959
  • repo_url: None
  • paper_authors: Moo Jin Kim, Jiajun Wu, Chelsea Finn
  • for: 本研究旨在增强视觉控制策略的通用性,使用人类视频示例来增强眼手控制策略的泛化能力。
  • methods: 我们使用人类视频示例和眼手摄像头来增强眼手控制策略的泛化能力。我们不需要使用显式领域适应方法,而是利用眼手摄像头的部分可见性和简单的固定图像屏蔽 schemes。
  • results: 我们在八个真实世界任务中,包括3DoF和6DoF机器人控制任务,实现了通过眼手控制策略的成功率提高58%(绝对)的平均提升。这些结果表明我们的方法可以帮助机器人在新的环境配置和新任务中泛化。请参考视频结果:https://giving-robots-a-hand.github.io/。
    Abstract Eye-in-hand cameras have shown promise in enabling greater sample efficiency and generalization in vision-based robotic manipulation. However, for robotic imitation, it is still expensive to have a human teleoperator collect large amounts of expert demonstrations with a real robot. Videos of humans performing tasks, on the other hand, are much cheaper to collect since they eliminate the need for expertise in robotic teleoperation and can be quickly captured in a wide range of scenarios. Therefore, human video demonstrations are a promising data source for learning generalizable robotic manipulation policies at scale. In this work, we augment narrow robotic imitation datasets with broad unlabeled human video demonstrations to greatly enhance the generalization of eye-in-hand visuomotor policies. Although a clear visual domain gap exists between human and robot data, our framework does not need to employ any explicit domain adaptation method, as we leverage the partial observability of eye-in-hand cameras as well as a simple fixed image masking scheme. On a suite of eight real-world tasks involving both 3-DoF and 6-DoF robot arm control, our method improves the success rates of eye-in-hand manipulation policies by 58% (absolute) on average, enabling robots to generalize to both new environment configurations and new tasks that are unseen in the robot demonstration data. See video results at https://giving-robots-a-hand.github.io/ .
    摘要 眼手相机已经在视觉基于机器人操作中展现了扩大样本效率和总体化的承袭性。然而,为了机器人模仿,仍然是非常昂贵的收集大量专业人员操作实验数据。相比之下,人类完成任务的视频记录则非常便宜,因为它们消除了机器人操作专业人员的需求,并可以快速在多种情况下采集。因此,人类视频示例是学习普适机器人操作策略的有力的数据源。在这项工作中,我们将宽频精确的机器人模仿数据集与广泛的无标签人类视频示例相结合,以大大提高眼手视觉动作策略的普适性。虽然人机视觉域之间存在明显的视觉领域差异,但我们的框架并不需要直接使用适应领域方法,而是利用眼手相机的部分可见性以及简单的固定图像遮盾方案。在八个真实世界任务中,我们的方法提高了眼手操作策略的成功率by 58%(绝对)的平均值,使机器人能够通过新环境配置和新任务来普适化。请参考视频结果在

Newell’s theory based feature transformations for spatio-temporal traffic prediction

  • paper_url: http://arxiv.org/abs/2307.05949
  • repo_url: None
  • paper_authors: Agnimitra Sengupta, S. Ilgin Guler
  • for: 这种研究是为了提高深度学习模型在空间和时间流行预测中的表现,以及使这些模型更容易转移到新的位置。
  • methods: 这种方法使用了卷积或图像卷积filter,并结合回归神经网络来捕捉空间和时间相关性。
  • results: 研究表明,通过physics-based feature transformation,可以提高深度学习模型在不同预测距离和不同位置上的表现,并且这些模型可以更好地适应新的位置。
    Abstract Deep learning (DL) models for spatio-temporal traffic flow forecasting employ convolutional or graph-convolutional filters along with recurrent neural networks to capture spatial and temporal dependencies in traffic data. These models, such as CNN-LSTM, utilize traffic flows from neighboring detector stations to predict flows at a specific location of interest. However, these models are limited in their ability to capture the broader dynamics of the traffic system, as they primarily learn features specific to the detector configuration and traffic characteristics at the target location. Hence, the transferability of these models to different locations becomes challenging, particularly when data is unavailable at the new location for model training. To address this limitation, we propose a traffic flow physics-based feature transformation for spatio-temporal DL models. This transformation incorporates Newell's uncongested and congested-state estimators of traffic flows at the target locations, enabling the models to learn broader dynamics of the system. Our methodology is empirically validated using traffic data from two different locations. The results demonstrate that the proposed feature transformation improves the models' performance in predicting traffic flows over different prediction horizons, as indicated by better goodness-of-fit statistics. An important advantage of our framework is its ability to be transferred to new locations where data is unavailable. This is achieved by appropriately accounting for spatial dependencies based on station distances and various traffic parameters. In contrast, regular DL models are not easily transferable as their inputs remain fixed. It should be noted that due to data limitations, we were unable to perform spatial sensitivity analysis, which calls for further research using simulated data.
    摘要 深度学习(DL)模型用于空间时间流量预测利用 convolutional 或图像卷积filter 以及循环神经网络,以捕捉流量数据中的空间和时间相互关系。这些模型,如 CNN-LSTM,利用周围探测站的流量数据预测目标位置的流量。然而,这些模型受到特定探测站和流量特征的限制,难以捕捉整个交通系统的广泛动态,因此在不同地点传输性不佳。为解决这种限制,我们提出了基于流量物理特征的特性变换方法,该方法包括Newell的拥塞和塞缩状态估计器,使模型学习更广泛的系统动态。我们的方法ологи是基于实验验证,使用了两个不同的位置的交通数据。结果表明,我们的特性变换方法可以提高模型在不同预测时间 horizon 上的流量预测性能,如果精度指标。与常见DL模型不同,我们的框架可以在新的位置传输,而不需要训练数据。这是因为我们采用了基于站点距离和各种交通参数的空间依赖关系。相比之下,常见DL模型的输入固定,不易在新位置传输。尽管由于数据限制,我们无法进行空间敏感分析,这是需要进一步研究使用模拟数据。

Diversity-enhancing Generative Network for Few-shot Hypothesis Adaptation

  • paper_url: http://arxiv.org/abs/2307.05948
  • repo_url: None
  • paper_authors: Ruijiang Dong, Feng Liu, Haoang Chi, Tongliang Liu, Mingming Gong, Gang Niu, Masashi Sugiyama, Bo Han
  • for: addressing the few-shot hypothesis adaptation (FHA) problem
  • methods: 使用 diversity-enhancing generative network (DEG-Net),通过最小化 Hilbert-Schmidt independence criterion (HSIC) 值来生成多元的无标示数据
  • results: 比对 existed FHA baselines 表现更好,并证明生成多元数据对解决 FHA 问题具有重要作用
    Abstract Generating unlabeled data has been recently shown to help address the few-shot hypothesis adaptation (FHA) problem, where we aim to train a classifier for the target domain with a few labeled target-domain data and a well-trained source-domain classifier (i.e., a source hypothesis), for the additional information of the highly-compatible unlabeled data. However, the generated data of the existing methods are extremely similar or even the same. The strong dependency among the generated data will lead the learning to fail. In this paper, we propose a diversity-enhancing generative network (DEG-Net) for the FHA problem, which can generate diverse unlabeled data with the help of a kernel independence measure: the Hilbert-Schmidt independence criterion (HSIC). Specifically, DEG-Net will generate data via minimizing the HSIC value (i.e., maximizing the independence) among the semantic features of the generated data. By DEG-Net, the generated unlabeled data are more diverse and more effective for addressing the FHA problem. Experimental results show that the DEG-Net outperforms existing FHA baselines and further verifies that generating diverse data plays a vital role in addressing the FHA problem
    摘要 很近期,生成无标示数据已经被证明可以帮助解决几个难点假设适应(FHA)问题,我们希望通过几个标注目标领域数据和一个已经训练好的源领域分类器(即源假设)来训练目标领域分类器。然而,现有方法生成的数据很相似或甚至是完全相同的。这强大的数据生成相依关系会导致学习失败。在这篇论文中,我们提出了一种多样化提升生成网络(DEG-Net),用于解决FHA问题。DEG-Net使用希尔伯特- Schmidt独立度量(HSIC)来生成多样化的无标示数据。具体来说,DEG-Net通过最小化HSIC值(即最大化独立度)来生成数据。由于DEG-Net可以生成更多样化的无标示数据,因此它可以更好地解决FHA问题。实验结果表明,DEG-Net在FHA基线上表现出色,并证明了生成多样化数据在解决FHA问题中的重要性。

A Bayesian approach to quantifying uncertainties and improving generalizability in traffic prediction models

  • paper_url: http://arxiv.org/abs/2307.05946
  • repo_url: None
  • paper_authors: Agnimitra Sengupta, Sudeepta Mondal, Adway Das, S. Ilgin Guler
  • for: 预测交通数据的深度学习模型可以提供更高的性能,但是它们通常不提供不确定性估计,这是交通运营和控制中不可或缺的。
  • methods: 我们提出了一种 bayesian 反复神经网络框架,用于交通预测中的不确定性量化。我们引入了spectral normalization来控制神经网络的复杂性,从而改善模型的泛化性能。
  • results: 我们的结果表明,spectral normalization可以更好地地方化特征空间,并且在单步预测历史中显著超过了layer normalization和没有normalization的模型。这表明,spectral normalization可以更好地捕捉交通数据的下变换特征。
    Abstract Deep-learning models for traffic data prediction can have superior performance in modeling complex functions using a multi-layer architecture. However, a major drawback of these approaches is that most of these approaches do not offer forecasts with uncertainty estimates, which are essential for traffic operations and control. Without uncertainty estimates, it is difficult to place any level of trust to the model predictions, and operational strategies relying on overconfident predictions can lead to worsening traffic conditions. In this study, we propose a Bayesian recurrent neural network framework for uncertainty quantification in traffic prediction with higher generalizability by introducing spectral normalization to its hidden layers. In our paper, we have shown that normalization alters the training process of deep neural networks by controlling the model's complexity and reducing the risk of overfitting to the training data. This, in turn, helps improve the generalization performance of the model on out-of-distribution datasets. Results demonstrate that spectral normalization improves uncertainty estimates and significantly outperforms both the layer normalization and model without normalization in single-step prediction horizons. This improved performance can be attributed to the ability of spectral normalization to better localize the feature space of the data under perturbations. Our findings are especially relevant to traffic management applications, where predicting traffic conditions across multiple locations is the goal, but the availability of training data from multiple locations is limited. Spectral normalization, therefore, provides a more generalizable approach that can effectively capture the underlying patterns in traffic data without requiring location-specific models.
    摘要 深度学习模型可以在交通数据预测中表现出优秀的性能,因为它们可以模型复杂的函数使用多层架构。然而,这些方法的主要缺点是不提供预测结果的不确定性估计,这是交通运营和控制中非常重要的。 Without uncertainty estimates, it is difficult to trust the model predictions, and operational strategies relying on overconfident predictions can lead to worsening traffic conditions.在这种情况下,我们提出了一种 bayesian 循环神经网络框架,用于交通预测中的不确定性量化。我们在论文中示出,normalization 控制了深度神经网络的复杂性,从而降低了模型在训练数据上的风险欠拟合。这种方法可以提高模型在不同数据集上的总体性能。我们的结果表明,spectral normalization 可以提高不确定性估计,并在单步预测征 horizon 中显著超过层 normalization 和没有normalization的情况。这种改进的性能可以归因于spectral normalization 更好地localize 数据的特征空间下的干扰。我们的发现对交通管理应用非常重要,因为需要预测多个位置的交通条件,但是具体的训练数据受限。spectral normalization 因此提供了一种更通用的方法,可以更好地捕捉交通数据的下面特征,而无需建立具体的位置特定的模型。

YOGA: Deep Object Detection in the Wild with Lightweight Feature Learning and Multiscale Attention

  • paper_url: http://arxiv.org/abs/2307.05945
  • repo_url: https://github.com/LabSAINT/YOGA
  • paper_authors: Raja Sunkara, Tie Luo
  • for: 这个论文是为了开发一种基于深度学习的轻量级物体检测模型,可以在低端边缘设备上运行,并且可以达到竞争性的准确率。
  • methods: 该模型采用了一个两阶段特征学习管道,包括一个便宜的线性变换,可以使用只有半数的卷积核来学习特征图。此外,它使用了一种注意机制来实现多 scales特征融合,而不是 conventinal检测器中的笼性 concatenation。
  • results: 我们评估了YOGA模型在COCO-val和COCO-testdev数据集上,与其他10个状态对照检测器进行比较。结果表明,YOGA能够占据最佳的平衡点,即同时具有高准确率和轻量级模型(相比 conventinal检测器,YOGA可以提高AP值22%,参数和FLOPs减少23-34%),因此适合在低端边缘设备上部署。此外,我们还对YOGA模型进行了硬件实现和NVIDIA Jetson Nano上的评估,结果表明YOGA在硬件上也表现出了优秀的性能。
    Abstract We introduce YOGA, a deep learning based yet lightweight object detection model that can operate on low-end edge devices while still achieving competitive accuracy. The YOGA architecture consists of a two-phase feature learning pipeline with a cheap linear transformation, which learns feature maps using only half of the convolution filters required by conventional convolutional neural networks. In addition, it performs multi-scale feature fusion in its neck using an attention mechanism instead of the naive concatenation used by conventional detectors. YOGA is a flexible model that can be easily scaled up or down by several orders of magnitude to fit a broad range of hardware constraints. We evaluate YOGA on COCO-val and COCO-testdev datasets with other over 10 state-of-the-art object detectors. The results show that YOGA strikes the best trade-off between model size and accuracy (up to 22% increase of AP and 23-34% reduction of parameters and FLOPs), making it an ideal choice for deployment in the wild on low-end edge devices. This is further affirmed by our hardware implementation and evaluation on NVIDIA Jetson Nano.
    摘要 我们介绍YOGA,一种基于深度学习的轻量级对象检测模型,可以在低端边缘设备上运行而仍然达到竞争性的准确率。 YOGA架构包括两个阶段特征学习管道,使用便宜的线性变换学习特征地图,只需半数的卷积核数量相对于常见卷积神经网络来学习特征地图。此外,它使用注意机制进行多scale特征融合,而不是常见检测器中的简单 concatenation。YOGA是一种灵活的模型,可以轻松地缩放到适应各种硬件限制。我们对COCO-val和COCO-testdev数据集进行评估,与其他10个状态对照检测器进行比较。结果表明,YOGA在准确率和模型大小之间达到了最佳平衡(增加AP22%,减少参数和FLOPs23-34%),使其成为在野外部署的理想选择。此外,我们对NVIDIA Jetson Nano硬件实现和评估也得到了证明。

Towards the Better Ranking Consistency: A Multi-task Learning Framework for Early Stage Ads Ranking

  • paper_url: http://arxiv.org/abs/2307.11096
  • repo_url: None
  • paper_authors: Xuewei Wang, Qiang Jin, Shengyu Huang, Min Zhang, Xi Liu, Zhengli Zhao, Yukun Chen, Zhengyu Zhang, Jiyan Yang, Ellie Wen, Sagar Chordia, Wenlin Chen, Qin Huang
    for: 大规模广告推荐系统中分为三个阶段的广告推荐是一种常见的做法,以平衡效率和准确性。methods: 我们提出了一种多任务学习框架,用于在早期阶段推荐广告,以捕捉多个最终阶段推荐组件(例如广告点击和广告质量事件)的关系。results: 在大规模实际应用中,我们的框架在线A/B测试中获得了显著高的点击率(CTR)、转化率(CVR)、总值和更好的广告质量(例如减少广告横幅率)。
    Abstract Dividing ads ranking system into retrieval, early, and final stages is a common practice in large scale ads recommendation to balance the efficiency and accuracy. The early stage ranking often uses efficient models to generate candidates out of a set of retrieved ads. The candidates are then fed into a more computationally intensive but accurate final stage ranking system to produce the final ads recommendation. As the early and final stage ranking use different features and model architectures because of system constraints, a serious ranking consistency issue arises where the early stage has a low ads recall, i.e., top ads in the final stage are ranked low in the early stage. In order to pass better ads from the early to the final stage ranking, we propose a multi-task learning framework for early stage ranking to capture multiple final stage ranking components (i.e. ads clicks and ads quality events) and their task relations. With our multi-task learning framework, we can not only achieve serving cost saving from the model consolidation, but also improve the ads recall and ranking consistency. In the online A/B testing, our framework achieves significantly higher click-through rate (CTR), conversion rate (CVR), total value and better ads-quality (e.g. reduced ads cross-out rate) in a large scale industrial ads ranking system.
    摘要 To address this issue, we propose a multi-task learning framework for early stage ranking that captures multiple final stage ranking components (ads clicks and ads quality events) and their task relations. By using our multi-task learning framework, we can not only achieve serving cost savings from model consolidation but also improve ads recall and ranking consistency.In online A/B testing, our framework achieved significantly higher click-through rate (CTR), conversion rate (CVR), total value, and better ads quality (e.g., reduced ads cross-out rate) in a large-scale industrial ads ranking system.

Filling time-series gaps using image techniques: Multidimensional context autoencoder approach for building energy data imputation

  • paper_url: http://arxiv.org/abs/2307.05926
  • repo_url: None
  • paper_authors: Chun Fu, Matias Quintana, Zoltan Nagy, Clayton Miller
  • for: 本研究旨在提高建筑物能源预测和管理的精度,通过利用互联网对话设备(IoT)和更多的能源数据。然而,能源数据 часто来自多个源头,可能存在不完整或不一致的数据,从而阻碍精确的预测和管理。为了解决这个问题,过去的研究主要集中在填充缺失的能源数据中,包括随机和连续的缺失。
  • methods: 本研究使用了现代深度学习方法,包括Partial Convolution(PConv),以填充缺失的能源数据。PConv是在计算机视觉领域广泛应用的图像填充方法,可以处理复杂的缺失模式。
  • results: 研究结果表明,相比raw时间序列(1D-CNN)和每周平均方法,使用两维填充的神经网络模型可以降低 Mean Squared Error(MSE)的值,下降10%到30%。而Partial convolution(PConv)方法更进一步降低MSE值,比2D-CNN和其他模型更出色。
    Abstract Building energy prediction and management has become increasingly important in recent decades, driven by the growth of Internet of Things (IoT) devices and the availability of more energy data. However, energy data is often collected from multiple sources and can be incomplete or inconsistent, which can hinder accurate predictions and management of energy systems and limit the usefulness of the data for decision-making and research. To address this issue, past studies have focused on imputing missing gaps in energy data, including random and continuous gaps. One of the main challenges in this area is the lack of validation on a benchmark dataset with various building and meter types, making it difficult to accurately evaluate the performance of different imputation methods. Another challenge is the lack of application of state-of-the-art imputation methods for missing gaps in energy data. Contemporary image-inpainting methods, such as Partial Convolution (PConv), have been widely used in the computer vision domain and have demonstrated their effectiveness in dealing with complex missing patterns. To study whether energy data imputation can benefit from the image-based deep learning method, this study compared PConv, Convolutional neural networks (CNNs), and weekly persistence method using one of the biggest publicly available whole building energy datasets, consisting of 1479 power meters worldwide, as the benchmark. The results show that, compared to the CNN with the raw time series (1D-CNN) and the weekly persistence method, neural network models with reshaped energy data with two dimensions reduced the Mean Squared Error (MSE) by 10% to 30%. The advanced deep learning method, Partial convolution (PConv), has further reduced the MSE by 20-30% than 2D-CNN and stands out among all models.
    摘要 “建筑能源预测和管理在最近几十年中日益重要,受互联网物联网(IoT)设备的快速发展和更多能源数据的可用性的推动。然而,能源数据 часто来自多个源头,可能存在异常或缺失数据,这会阻碍精准预测和管理能源系统,限制数据的使用价值,降低决策和研究的价值。为解决这一问题,过去的研究主要集中在填充缺失的能源数据中,包括随机和连续缺失。一个主要挑战在这一领域是缺乏一个标准的测试集,包含不同的建筑和计量类型,这使得评估不同的填充方法的性能具有困难。另一个挑战是缺乏使用现代填充方法来填充能源数据中的缺失。现代图像填充方法,如Partial Convolution(PConv),在计算机视觉领域中广泛应用,并在处理复杂缺失模式方面表现出色。为了研究能源数据填充是否可以从图像深度学习方法中受益,本研究对1479个全球的电力计量数据进行了比较,该数据集是公共可用的最大数据集之一。结果表明,相比1D-CNN( raw 时间序列 CNN)和周期性方法,二维神经网络模型(2D-CNN)减少了平均方差Error(MSE)的10%至30%。进一步地,Partial convolution(PConv)在2D-CNN和2D-CNN之间减少了MSE的20%至30%,并在所有模型中脱颖而出。”

Unified Medical Image-Text-Label Contrastive Learning With Continuous Prompt

  • paper_url: http://arxiv.org/abs/2307.05920
  • repo_url: None
  • paper_authors: Yuhao Wang
    for:The paper is written for the task of medical image-text pre-training, specifically addressing the challenges of using large-scale medical image and radiology report datasets.methods:The proposed method uses a unified Image-Text-Label contrastive learning framework based on continuous prompts, which includes three main contributions: unifying image, text, and label data, introducing continuous implicit prompts, and proposing an ImageText-Label contrastive training method to mitigate the problem of too many false-negative samples.results:The proposed UMCL framework exhibits excellent performance on several downstream tasks, demonstrating the effectiveness of the unified Image-Text-Label contrastive learning framework and the benefits of using continuous prompts.
    Abstract Contrastive language-image Pre-training (CLIP) [13] can leverage large datasets of unlabeled Image-Text pairs, which have demonstrated impressive performance in various downstream tasks. Given that annotating medical data is time-consuming and laborious, Image-Text Pre-training has promising applications in exploiting large-scale medical image and radiology report datasets. However, medical Image-Text Pre-training faces several challenges, as follows: (1) Due to privacy concerns, the amount of available medical data is relatively small compared to natural data, leading to weaker generalization ability of the model. (2) Medical images are highly similar with only fine-grained differences in subtleties, resulting in a large number of false-negative sample pairs in comparison learning. (3) The hand-crafted Prompt usually differs from the natural medical image report, Subtle changes in wording can lead to significant differences in performance. In this paper, we propose a unified Image-Text-Label contrastive learning framework based on continuous prompts, with three main contributions. First, We unified the data of images, text, and labels, which greatly expanded the training data that the model could utilize. Second, we address the issue of data diversity and the impact of hand-crafted prompts on model performance by introducing continuous implicit prompts. Lastly, we propose a ImageText-Label contrastive Training to mitigate the problem of too many false-negative samples. We demonstrate through sufficient experiments that the Unified Medical Contrastive Learning (UMCL) framework exhibits excellent performance on several downstream tasks.
    摘要 对比语言图像预训练(CLIP)[13] 可以利用大量无标注图像文本对 pairs,已经在多种下游任务中表现出色。由于标注医疗数据占用时间和劳动力,图像文本预训练在医疗领域有承袭的应用。然而,医疗图像文本预训练面临多个挑战,包括:1. 因为隐私问题,可用的医疗数据相对较少,导致模型的泛化能力弱化。2. 医疗图像具有高度相似的特征,只有细腻的差别,导致False Negative样本对比学习中的很多错误样本。3. 手工设计的提示通常与自然医疗图像报告不同,wording的微小变化可能导致性能的显著下降。在这篇论文中,我们提出一种统一图像文本标签对比学习框架,基于连续提示,有三个主要贡献:1. 我们统一图像、文本和标签数据,大大扩展了模型可以使用的训练数据。2. 我们解决数据多样性和手工设计提示对模型性能的影响,通过引入连续隐藏提示。3. 我们提出图像文本标签对比训练, Mitigate the problem of too many false-negative samples.我们通过充分的实验表明,Unified Medical Contrastive Learning(UMCL)框架在多种下游任务中表现出色。

Prompt Generate Train (PGT): Few-shot Domain Adaption of Retrieval Augmented Generation Models for Open Book Question-Answering

  • paper_url: http://arxiv.org/abs/2307.05915
  • repo_url: None
  • paper_authors: C. S. Krishna
  • for: 这个论文是为了提出一种框架(PGT),用于快速开发一个生成型问答模型,以便在一个专有文档集上进行开 кни问答。
  • methods: 这个框架使用了一种搜索加生成(RAG)模型,通过监督微调和强化学习,在几枚批处理下实现对目标领域的适应。
  • results: 这个框架可以生成高度相关、uncertainty calibrated的答案,并且可以在服务成本下降的情况下与GPT-4基于Context Retrieval Augmented Generation(CR-AG)模型竞争。
    Abstract We propose a framework - Prompt, Generate, Train (PGT) - to efficiently develop a generative question-answering model for open-book question-answering over a proprietary collection of text documents. The framework adapts a retriever augmented generation (RAG) model to the target domain using supervised fine-tuning and reinforcement learning with synthetic feedback in a few-shot setting. This, we hypothesize, will yield an aligned, uncertainty calibrated model that is competitive with GPT-4 based in-context retrieval augmented generation in generating relevant answers at lower serving costs. The framework's synthetic generation pipeline will generate synthetic training data comprising tuples using an open-source LLM and a novel consistency filtering scheme. The pipeline will be designed to generate both abstractive and extractive questions that span the entire corpus. The framework proposes to fine-tune a smaller RAG model comprising a dense retriever (ColBERTv2) and a smaller sized LLM on the synthetic dataset. In parallel, the framework will train a Reward model to score domain grounded answers higher than hallucinated answers using an a priori relevance ordering of synthetically assembled samples. In the next phase, the framework will align the RAG model with the target domain using reinforcement learning (Proximal Policy Optimization). This step may improve the RAG model's ability to generate grounded answers and ignore out of domain questions. In the final phase, the framework will calibrate the model's uncertainty for extractive question-answers.
    摘要 我们提出了一个框架 - 提示、生成、训练(PGT) - 以高效地开发一个生成问答模型,用于 открытых书籍问答。这个框架采用了一种搜索机器人加 augmented generation(RAG)模型,通过监督微调和强化学习来适应目标领域。我们认为,这将生成一个与GPT-4基于 контекст内 Retrieval Augmented Generation(RAG)模型相似的,uncertainty calibrated的模型,能够在低服务成本下生成相关的答案。PGT框架的 sintetic生成管道将使用一个开源的大语言模型和一种新的一致性筛选算法来生成 <文章、问题、答案> triplets。这个管道将生成抽象和EXTRACTIVE的问题,覆盖整个文库。PGT框架将微调一个较小的 RAG 模型,包括 dense retriever(ColBERTv2)和一个更小的语言模型,在 sintetic 数据上进行微调。同时,PGT 框架将训练一个奖励模型,以尝试高优化域内答案。在下一个阶段,PGT 框架将将 RAG 模型与目标领域进行对接,使用强化学习(Proximal Policy Optimization)进行调整。这步可能会提高 RAG 模型的能力生成固定答案和忽略非目标域问题。在最后一个阶段,PGT 框架将对抽取式问答模型进行uncertainty calibration。

FIS-ONE: Floor Identification System with One Label for Crowdsourced RF Signals

  • paper_url: http://arxiv.org/abs/2307.05914
  • repo_url: https://github.com/stevezhuo/fis-one
  • paper_authors: Weipeng Zhuo, Ka Ho Chiu, Jierun Chen, Ziqi Zhao, S. -H. Gary Chan, Sangtae Ha, Chul-Ho Lee
  • for: 这个论文主要是为了提出一种基于单个标注样本的层数标识方法,以便在智能城市应用中实现多层indoor定位、地OFencing和机器人监测等功能。
  • methods: 该论文提出了一种基于注意力图 neural network 模型的信号分 clustering 和集群索引方法,其中首先建立了一个 two-mode 图模型,以模型 RF 信号样本,然后使用注意力图 neural network 模型来建立每个信号样本的潜在表示,并使用这些表示来更准确地分 clustering 信号样本。
  • results: 该论文的实验结果表明,基于单个标注样本的层数标识方法可以达到高效率和高准确率,并且与其他基线算法相比,具有最高的改进度(最大化 adjusted rand index 和 normalized mutual information)。
    Abstract Floor labels of crowdsourced RF signals are crucial for many smart-city applications, such as multi-floor indoor localization, geofencing, and robot surveillance. To build a prediction model to identify the floor number of a new RF signal upon its measurement, conventional approaches using the crowdsourced RF signals assume that at least few labeled signal samples are available on each floor. In this work, we push the envelope further and demonstrate that it is technically feasible to enable such floor identification with only one floor-labeled signal sample on the bottom floor while having the rest of signal samples unlabeled. We propose FIS-ONE, a novel floor identification system with only one labeled sample. FIS-ONE consists of two steps, namely signal clustering and cluster indexing. We first build a bipartite graph to model the RF signal samples and obtain a latent representation of each node (each signal sample) using our attention-based graph neural network model so that the RF signal samples can be clustered more accurately. Then, we tackle the problem of indexing the clusters with proper floor labels, by leveraging the observation that signals from an access point can be detected on different floors, i.e., signal spillover. Specifically, we formulate a cluster indexing problem as a combinatorial optimization problem and show that it is equivalent to solving a traveling salesman problem, whose (near-)optimal solution can be found efficiently. We have implemented FIS-ONE and validated its effectiveness on the Microsoft dataset and in three large shopping malls. Our results show that FIS-ONE outperforms other baseline algorithms significantly, with up to 23% improvement in adjusted rand index and 25% improvement in normalized mutual information using only one floor-labeled signal sample.
    摘要 floor labels of crowdsourced RF signals are crucial for many smart-city applications, such as multi-floor indoor localization, geofencing, and robot surveillance. To build a prediction model to identify the floor number of a new RF signal upon its measurement, conventional approaches using the crowdsourced RF signals assume that at least few labeled signal samples are available on each floor. In this work, we push the envelope further and demonstrate that it is technically feasible to enable such floor identification with only one floor-labeled signal sample on the bottom floor while having the rest of signal samples unlabeled. We propose FIS-ONE, a novel floor identification system with only one labeled sample. FIS-ONE consists of two steps, namely signal clustering and cluster indexing. We first build a bipartite graph to model the RF signal samples and obtain a latent representation of each node (each signal sample) using our attention-based graph neural network model so that the RF signal samples can be clustered more accurately. Then, we tackle the problem of indexing the clusters with proper floor labels, by leveraging the observation that signals from an access point can be detected on different floors, i.e., signal spillover. Specifically, we formulate a cluster indexing problem as a combinatorial optimization problem and show that it is equivalent to solving a traveling salesman problem, whose (near-)optimal solution can be found efficiently. We have implemented FIS-ONE and validated its effectiveness on the Microsoft dataset and in three large shopping malls. Our results show that FIS-ONE outperforms other baseline algorithms significantly, with up to 23% improvement in adjusted rand index and 25% improvement in normalized mutual information using only one floor-labeled signal sample.

Grain and Grain Boundary Segmentation using Machine Learning with Real and Generated Datasets

  • paper_url: http://arxiv.org/abs/2307.05911
  • repo_url: None
  • paper_authors: Peter Warren, Nandhini Raju, Abhilash Prasad, Shajahan Hossain, Ramesh Subramanian, Jayanta Kapat, Navin Manjooran, Ranajay Ghosh
  • for: This paper aims to improve the accuracy of grain boundary segmentation in stainless steel microstructure images using Convolutional Neural Networks (CNN) trained on a combination of real and generated data.
  • methods: The paper uses a combination of real and generated data to train a CNN model for grain boundary segmentation, and employs a novel artificial grain image fabrication method based on Voronoi tessellation patterns and random synthetic noise.
  • results: The paper reports significantly improved accuracy of grain boundary segmentation using the proposed method, with the CNN model achieving an accuracy of 95.6% on a test set of images. The results also show that the proposed method outperforms existing computational methods and manual segmentation in terms of accuracy and efficiency.
    Abstract We report significantly improved accuracy of grain boundary segmentation using Convolutional Neural Networks (CNN) trained on a combination of real and generated data. Manual segmentation is accurate but time-consuming, and existing computational methods are faster but often inaccurate. To combat this dilemma, machine learning models can be used to achieve the accuracy of manual segmentation and have the efficiency of a computational method. An extensive dataset of from 316L stainless steel samples is additively manufactured, prepared, polished, etched, and then microstructure grain images were systematically collected. Grain segmentation via existing computational methods and manual (by-hand) were conducted, to create "real" training data. A Voronoi tessellation pattern combined with random synthetic noise and simulated defects, is developed to create a novel artificial grain image fabrication method. This provided training data supplementation for data-intensive machine learning methods. The accuracy of the grain measurements from microstructure images segmented via computational methods and machine learning methods proposed in this work are calculated and compared to provide much benchmarks in grain segmentation. Over 400 images of the microstructure of stainless steel samples were manually segmented for machine learning training applications. This data and the artificial data is available on Kaggle.
    摘要 我们发现使用卷积神经网络(CNN)在组合实际和生成数据上训练后,grain boundary segmentation的精度得到了显著改善。人工分割是准确的,但是时间消耗过多,而现有的计算方法快速,但是准确性往往不高。为了解决这个矛盾,机器学习模型可以用来实现人工分割的准确性,同时具有计算方法的效率。我们收集了316L不锈钢样品的数据集,并对其进行了加工、磨砺、镀金和镀膜等处理。然后,我们通过人工分割和计算方法来获得实际和模拟的grain图像,以便用于机器学习训练。我们还开发了一种基于Voronoi嵌入和随机生成的人工grain图像生成方法,以提供更多的训练数据。通过对计算方法和机器学习方法 segmented的grain measurement的精度进行比较,我们提供了许多benchmarks在grain segmentation方面。我们手动为机器学习训练应用分割了超过400张stainless steel样品的微结构图像,这些数据和人工生成的数据都可以在Kaggle上下载。

Predictive Pipelined Decoding: A Compute-Latency Trade-off for Exact LLM Decoding

  • paper_url: http://arxiv.org/abs/2307.05908
  • repo_url: None
  • paper_authors: Seongjun Yang, Gibbeum Lee, Jaewoong Cho, Dimitris Papailiopoulos, Kangwook Lee
  • for: 本文提出了一种名为“预测管道化解码(PPD)”的方法,用于加速大语言模型(LLM)的排序解码,而不会影响输出的精度。
  • methods: PPD使用了额外的计算资源,以并行Initialize后续的字符解码 durante 当前字符解码。
  • results: 结果表明,通过使用更多的计算资源,可以减少解码延迟,并改变 LLM 解码策略的理解。我们还提出了一个理论框架,用于分析计算和延迟之间的贸易OFF。这个框架可以 Analytical estimate 使用更多计算资源可以减少解码延迟的潜在降低。
    Abstract This paper presents "Predictive Pipelined Decoding (PPD)," an approach that speeds up greedy decoding in Large Language Models (LLMs) while maintaining the exact same output as the original decoding. Unlike conventional strategies, PPD employs additional compute resources to parallelize the initiation of subsequent token decoding during the current token decoding. This innovative method reduces decoding latency and reshapes the understanding of trade-offs in LLM decoding strategies. We have developed a theoretical framework that allows us to analyze the trade-off between computation and latency. Using this framework, we can analytically estimate the potential reduction in latency associated with our proposed method, achieved through the assessment of the match rate, represented as p_correct. The results demonstrate that the use of extra computational resources has the potential to accelerate LLM greedy decoding.
    摘要 Translation notes:* "Predictive Pipelined Decoding" (PPD) is translated as "预测式管道解码" (PPD)* "Large Language Models" (LLMs) is translated as "大型语言模型" (LLMs)* "greedy decoding" is translated as "贪吃解码" (greedy decoding)* "trade-offs" is translated as "交互" (trade-offs)* "theoretical framework" is translated as "理论框架" (theoretical framework)* "match rate" is translated as "匹配率" (match rate)* "potential reduction in latency" is translated as "可能的延迟减少" (potential reduction in latency)

Mini-Batch Optimization of Contrastive Loss

  • paper_url: http://arxiv.org/abs/2307.05906
  • repo_url: https://github.com/krafton-ai/mini-batch-cl
  • paper_authors: Jaewoong Cho, Kartik Sreenivasan, Keon Lee, Kyunghoo Mun, Soheun Yi, Jeong-Gwan Lee, Anna Lee, Jy-yong Sohn, Dimitris Papailiopoulos, Kangwook Lee
  • for: 本文研究了对比学习中的小批量优化问题,尤其是在实际应用中的内存限制下。
  • methods: 本文使用了几何学上的比较方法来分析小批量优化的理论基础,并提出了一种基于特征值分布的方法来快速速度下降。
  • results: 实验结果表明,提出的方法可以在实际应用中提高对比学习的效率,并且在不同的数据集上都能够达到更好的性能。
    Abstract Contrastive learning has gained significant attention as a method for self-supervised learning. The contrastive loss function ensures that embeddings of positive sample pairs (e.g., different samples from the same class or different views of the same object) are similar, while embeddings of negative pairs are dissimilar. Practical constraints such as large memory requirements make it challenging to consider all possible positive and negative pairs, leading to the use of mini-batch optimization. In this paper, we investigate the theoretical aspects of mini-batch optimization in contrastive learning. We show that mini-batch optimization is equivalent to full-batch optimization if and only if all $\binom{N}{B}$ mini-batches are selected, while sub-optimality may arise when examining only a subset. We then demonstrate that utilizing high-loss mini-batches can speed up SGD convergence and propose a spectral clustering-based approach for identifying these high-loss mini-batches. Our experimental results validate our theoretical findings and demonstrate that our proposed algorithm outperforms vanilla SGD in practically relevant settings, providing a better understanding of mini-batch optimization in contrastive learning.
    摘要 对比学习已经受到了大量注意力,这是一种无监督学习的方法。对比损失函数使得对象的嵌入相似,而不同类别或不同观察角度的对象的嵌入则不相似。实际上,考虑所有可能的正例和负例对可能会带来巨大的内存需求,因此使用了小批量优化。在这篇论文中,我们 investigate了对比学习中的小批量优化的理论方面。我们证明了,只要所有 $\binom{N}{B}$ 小批量都被选择,则小批量优化和全批量优化是等价的。但是,只选择一部分小批量可能会导致优化落后。我们随后提出了基于 спектраль clustering 的一种方法来识别高损失小批量,并证明这种方法可以加速 SGD 的演进。我们的实验结果证实了我们的理论结果,并证明了我们的提案方法在实际上有更好的性能,提供了更好的理解小批量优化在对比学习中的性能。

Stability Guarantees for Feature Attributions with Multiplicative Smoothing

  • paper_url: http://arxiv.org/abs/2307.05902
  • repo_url: None
  • paper_authors: Anton Xue, Rajeev Alur, Eric Wong
  • for: 这篇论文旨在提供可靠的特征归因方法,以确保模型的决策过程是可靠的。
  • methods: 该论文使用了多项式简化技术(MuS)来实现模型的稳定性,并且可以与任何分类器和特征归因方法结合使用。
  • results: 研究人员通过对视觉和语言模型进行测试,证明了 MuS 可以为特征归因方法提供非rivial的稳定性保证。
    Abstract Explanation methods for machine learning models tend to not provide any formal guarantees and may not reflect the underlying decision-making process. In this work, we analyze stability as a property for reliable feature attribution methods. We prove that relaxed variants of stability are guaranteed if the model is sufficiently Lipschitz with respect to the masking of features. To achieve such a model, we develop a smoothing method called Multiplicative Smoothing (MuS). We show that MuS overcomes theoretical limitations of standard smoothing techniques and can be integrated with any classifier and feature attribution method. We evaluate MuS on vision and language models with a variety of feature attribution methods, such as LIME and SHAP, and demonstrate that MuS endows feature attributions with non-trivial stability guarantees.
    摘要 machine learning 模型的解释方法通常没有正式的保证,并且可能不准确反映决策过程。在这项工作中,我们研究稳定性作为可靠特征归属方法的性质。我们证明如果模型具有特定的 Lipschitz 性,那么可以 garantuee 稳定性。为了实现这种模型,我们开发了一种简单的平滑方法called Multiplicative Smoothing(MuS)。我们表明了 MuS 可以超越标准平滑技术的理论限制,并且可以与任何分类器和特征归属方法结合使用。我们对视觉和语言模型进行了各种特征归属方法的评估,包括 LIME 和 SHAP,并证明了 MuS 可以为特征归属提供非正式的稳定保证。

Deep Unrolling for Nonconvex Robust Principal Component Analysis

  • paper_url: http://arxiv.org/abs/2307.05893
  • repo_url: None
  • paper_authors: Elizabeth Z. C. Tan, Caroline Chaux, Emmanuel Soubies, Vincent Y. F. Tan
  • for: 该研究是为了提出一种基于深度学习的Robust Principal Component Analysis(RPCA)算法,用于分解矩阵为低维矩阵和稀疏矩阵的含义。
  • methods: 该算法基于加速交互预测算法,使得RPCA可以在非对称形式下解决。该方法结合了深度神经网络的优点和原始算法的可读性,自动学习超参数。
  • results: 在synthetic数据集和一个面部模型问题中,该算法实现了更好的数值和视觉性能。
    Abstract We design algorithms for Robust Principal Component Analysis (RPCA) which consists in decomposing a matrix into the sum of a low rank matrix and a sparse matrix. We propose a deep unrolled algorithm based on an accelerated alternating projection algorithm which aims to solve RPCA in its nonconvex form. The proposed procedure combines benefits of deep neural networks and the interpretability of the original algorithm and it automatically learns hyperparameters. We demonstrate the unrolled algorithm's effectiveness on synthetic datasets and also on a face modeling problem, where it leads to both better numerical and visual performances.
    摘要 我们设计了一种Robust Principal Component Analysis(RPCA)算法,该算法可以将矩阵分解成低级矩阵和稀疏矩阵的和。我们提出了一种深度卷积算法,该算法基于加速的交互式投影算法,用于解决RPCA的非核心形式。我们的方法结合了深度神经网络的优点和原始算法的可读性,自动学习超参数。我们在 synthetic 数据集和一个面部建模问题上证明了这种算法的有效性,其 numerically 和 visually 都表现出色。Note: "Simplified Chinese" is a romanization of Chinese characters, which is used to represent the pronunciation of Chinese characters in a Latin-based alphabet. The translation is written in Simplified Chinese, which is the most widely used form of Chinese writing system.

PID-Inspired Inductive Biases for Deep Reinforcement Learning in Partially Observable Control Tasks

  • paper_url: http://arxiv.org/abs/2307.05891
  • repo_url: https://github.com/ianchar/gpide
  • paper_authors: Ian Char, Jeff Schneider
  • for: 这篇论文旨在探讨深度强化学习(RL)如何在数据 alone 上学习控制系统。
  • methods: 论文使用 PID 控制器的成功原理,提出了一种基于 summing 和 differencing 的历史编码方法,以及一种可以应用于任何控制任务的扩展方法。
  • results: 与先前的方法相比,论文的编码器可以生成更加Robust 和更高性能的政策,并在一系列高维控制任务上 achieve 1.7 倍的性能提升。
    Abstract Deep reinforcement learning (RL) has shown immense potential for learning to control systems through data alone. However, one challenge deep RL faces is that the full state of the system is often not observable. When this is the case, the policy needs to leverage the history of observations to infer the current state. At the same time, differences between the training and testing environments makes it critical for the policy not to overfit to the sequence of observations it sees at training time. As such, there is an important balancing act between having the history encoder be flexible enough to extract relevant information, yet be robust to changes in the environment. To strike this balance, we look to the PID controller for inspiration. We assert the PID controller's success shows that only summing and differencing are needed to accumulate information over time for many control tasks. Following this principle, we propose two architectures for encoding history: one that directly uses PID features and another that extends these core ideas and can be used in arbitrary control tasks. When compared with prior approaches, our encoders produce policies that are often more robust and achieve better performance on a variety of tracking tasks. Going beyond tracking tasks, our policies achieve 1.7x better performance on average over previous state-of-the-art methods on a suite of high dimensional control tasks.
    摘要

Efficient Task Offloading Algorithm for Digital Twin in Edge/Cloud Computing Environment

  • paper_url: http://arxiv.org/abs/2307.05888
  • repo_url: None
  • paper_authors: Ziru Zhang, Xuling Zhang, Guangzhi Zhu, Yuyang Wang, Pan Hui
  • for: 本研究旨在提出一种基于多种数据资源的数字双方(DT)系统模型,以及一种基于分布式深度学习(DDL)的卸载决策算法,以提高DT系统的响应速度和能效性。
  • methods: 本研究使用虚拟化和模拟技术,并采用移动云计算(MCC)和边缘计算(MEC)等技术来实现DT系统中的多功能化。而且,本研究还提出了一种基于DDL的卸载决策算法,以解决DT系统中数据卸载的问题。
  • results: 根据实验结果,本研究的提出的算法可以有效地降低DT系统的平均延迟和能 consumption。与基eline相比,本研究的方法在动态环境下得到了显著的提高。
    Abstract In the era of Internet of Things (IoT), Digital Twin (DT) is envisioned to empower various areas as a bridge between physical objects and the digital world. Through virtualization and simulation techniques, multiple functions can be achieved by leveraging computing resources. In this process, Mobile Cloud Computing (MCC) and Mobile Edge Computing (MEC) have become two of the key factors to achieve real-time feedback. However, current works only considered edge servers or cloud servers in the DT system models. Besides, The models ignore the DT with not only one data resource. In this paper, we propose a new DT system model considering a heterogeneous MEC/MCC environment. Each DT in the model is maintained in one of the servers via multiple data collection devices. The offloading decision-making problem is also considered and a new offloading scheme is proposed based on Distributed Deep Learning (DDL). Simulation results demonstrate that our proposed algorithm can effectively and efficiently decrease the system's average latency and energy consumption. Significant improvement is achieved compared with the baselines under the dynamic environment of DTs.
    摘要 在互联网OF Things(IoT)时代,数字双(DT)被描述为在物理对象和数字世界之间的桥梁,通过虚拟化和模拟技术,DT可以实现多种功能,并利用计算资源。在这个过程中,移动云计算(MCC)和边缘计算(MEC)已成为DT系统模型中的两个关键因素,以实现实时反馈。然而,当前的工作仅考虑了边缘服务器或云服务器在DT系统模型中。此外,现有的模型忽略了DT具有多个数据资源的情况。在本文中,我们提出了一种新的DT系统模型,该模型考虑了多种服务器环境中的DT,每个DT在服务器中被维护,并通过多个数据收集设备进行维护。此外,我们还考虑了卸载决策问题,并提出了基于分布式深度学习(DDL)的卸载方案。实验结果表明,我们的提议算法可以有效地和高效地降低系统的平均延迟和能耗。相比基eline,我们的算法在DT动态环境下表现出了显著的改善。

Dynamic Prediction using Time-Dependent Cox Survival Neural Network

  • paper_url: http://arxiv.org/abs/2307.05881
  • repo_url: None
  • paper_authors: Lang Zeng, Jipeng Zhang, Wei Chen, Ying Ding
  • for: 预测年龄相关 macular degeneration(AMD)的进行时间的个性化风险预测,可以随着新数据的可用性而更新。
  • methods: 基于时间依赖的科克斯模型(tdCox model)和神经网络(CNN),提出一种时间依赖神经网络存储模型(tdCoxSNN),用于预测AMD的进行时间的连续时间观察图像。tdCoxSNN可以模型时间依赖的非线性影响因素的效应。
  • results: 通过对两个实际数据集进行分析,包括一个大的AMD研究(Age-Related Eye Disease Study,AREDS)和一个公共数据集的主发炎病(PBC)疾病,我们的方法实现了满意的预测性能。
    Abstract The target of dynamic prediction is to provide individualized risk predictions over time which can be updated as new data become available. Motivated by establishing a dynamic prediction model for the progressive eye disease, age-related macular degeneration (AMD), we proposed a time-dependent Cox model-based survival neural network (tdCoxSNN) to predict its progression on a continuous time scale using longitudinal fundus images. tdCoxSNN extends the time-dependent Cox model by utilizing a neural network to model the non-linear effect of the time-dependent covariates on the survival outcome. Additionally, by incorporating the convolutional neural network (CNN), tdCoxSNN can take the longitudinal raw images as input. We evaluate and compare our proposed method with joint modeling and landmarking approaches through comprehensive simulations using two time-dependent accuracy metrics, the Brier Score and dynamic AUC. We applied the proposed approach to two real datasets. One is a large AMD study, the Age-Related Eye Disease Study (AREDS), in which more than 50,000 fundus images were captured over a period of 12 years for more than 4,000 participants. Another is a public dataset of the primary biliary cirrhosis (PBC) disease, in which multiple lab tests were longitudinally collected to predict the time-to-liver transplant. Our approach achieves satisfactory prediction performance in both simulation studies and the two real data analyses. tdCoxSNN was implemented in PyTorch, Tensorflow, and R-Tensorflow.
    摘要 目标是提供个性化风险预测,随着时间的推移进行更新。驱动于年龄相关的抑阻性眼病(AMD)的进程预测,我们提出了基于时间依赖的戴克模型的时间依赖神经网络(tdCoxSNN),以预测在连续时间尺度上的进程。tdCoxSNN通过使用神经网络来模型时间依赖的非线性效应,从而扩展了时间依赖戴克模型。此外,tdCoxSNN可以通过抽象神经网络来处理长itudinal的原始图像。我们通过了全面的 simulations 来评估和比较我们的提议方法和联合模型和标记方法,使用了两种时间依赖准确度指标:布里度分数和动态AUC。我们将该方法应用于两个实际数据集。一个是大量AMD研究,Age-Related Eye Disease Study (AREDS),其中超过50,000个眼科图像在12年时间内被捕捉,并且超过4,000名参与者。另一个是公共数据集的主发炎病(PBC)疾病,其中多个实验室测试被长期采集,以预测时间到肝脏移植。我们的方法在两个实际数据分析中获得了满意的预测性能。tdCoxSNN在 PyTorch、TensorFlow 和 R-Tensorflow 中实现。

Ecosystem-level Analysis of Deployed Machine Learning Reveals Homogeneous Outcomes

  • paper_url: http://arxiv.org/abs/2307.05862
  • repo_url: None
  • paper_authors: Connor Toups, Rishi Bommasani, Kathleen A. Creel, Sarah H. Bana, Dan Jurafsky, Percy Liang
  • for: 本研究旨在探讨机器学习技术在社会中的影响,以及它们在不同上下文中的应用。
  • methods: 本研究采用了生态系统水平的分析方法,而不是单独分析具体的模型。研究者们对11个数据集进行了分析,并发现了一个普遍存在的趋势:已部署的机器学习系统具有系统性的失败现象,即某些用户被所有模型都错误地分类。
  • results: 研究发现,尽管具体的模型在人口级别上的表现得到改进,但这些改进很少降低了系统性失败的频率。此外,研究者们发现了新的种族差距现象,即模型的预测与人类预测之间存在差异。这些例子表明,生态系统水平的分析具有描述机器学习技术在社会中的社会影响的独特优势。
    Abstract Machine learning is traditionally studied at the model level: researchers measure and improve the accuracy, robustness, bias, efficiency, and other dimensions of specific models. In practice, the societal impact of machine learning is determined by the surrounding context of machine learning deployments. To capture this, we introduce ecosystem-level analysis: rather than analyzing a single model, we consider the collection of models that are deployed in a given context. For example, ecosystem-level analysis in hiring recognizes that a job candidate's outcomes are not only determined by a single hiring algorithm or firm but instead by the collective decisions of all the firms they applied to. Across three modalities (text, images, speech) and 11 datasets, we establish a clear trend: deployed machine learning is prone to systemic failure, meaning some users are exclusively misclassified by all models available. Even when individual models improve at the population level over time, we find these improvements rarely reduce the prevalence of systemic failure. Instead, the benefits of these improvements predominantly accrue to individuals who are already correctly classified by other models. In light of these trends, we consider medical imaging for dermatology where the costs of systemic failure are especially high. While traditional analyses reveal racial performance disparities for both models and humans, ecosystem-level analysis reveals new forms of racial disparity in model predictions that do not present in human predictions. These examples demonstrate ecosystem-level analysis has unique strengths for characterizing the societal impact of machine learning.
    摘要 机器学习传统上被研究在模型层次上:研究人员测量和改进精度、可靠性、偏见、效率等模型维度。在实践中,机器学习的社会影响受到机器学习部署的外部环境影响。为了捕捉这一点,我们引入生态系统级分析:而不是分析单一模型,我们考虑在给定上下文中部署的模型集。例如,在招聘中,候选人的结果不仅受到单个招聘算法或公司的决策影响,而是由所有申请公司的集遇决策决定。在文本、图像和语音三种Modalities以及11个数据集中,我们发现一个明确的趋势:部署机器学习倾向于系统性失败, meaning some users are exclusively misclassified by all models available。即使个体模型在人口水平上逐渐改进,我们发现这些改进很少降低系统性失败的频率。相反,改进的 beneficial 效果主要为已经正确地分类的用户带来。为了解决这一问题,我们考虑医学影像领域,特别是皮肤病的情况。传统分析显示人类和模型之间存在种族性能差,而生态系统级分析则揭示了模型预测中新的种族性差。这些示例表明生态系统级分析具有特殊的优势,可以帮助我们更好地了解机器学习的社会影响。

FAIRO: Fairness-aware Adaptation in Sequential-Decision Making for Human-in-the-Loop Systems

  • paper_url: http://arxiv.org/abs/2307.05857
  • repo_url: None
  • paper_authors: Tianyu Zhao, Mojtaba Taherisadr, Salma Elmalaki
  • for: 本文旨在提出一种基于人类行为变化的循环决策系统中的公平性问题的解决方案,尤其是在多个人类具有不同行为和期望的情况下。
  • methods: 本文提出了一种名为FAIRO的新算法,用于在人类在Loop(HITL)环境中实现公平性。FAIRO将这个复杂的公平性问题分解成个人人类偏好的适应任务,通过利用Options reinforcement learning框架。
  • results: 评估表明,FAIRO可以在三种不同的HITL应用场景中实现公平性,同时考虑人类行为变化。FAIRO比其他方法在所有三个应用场景中平均提高公平性水平35.36%。
    Abstract Achieving fairness in sequential-decision making systems within Human-in-the-Loop (HITL) environments is a critical concern, especially when multiple humans with different behavior and expectations are affected by the same adaptation decisions in the system. This human variability factor adds more complexity since policies deemed fair at one point in time may become discriminatory over time due to variations in human preferences resulting from inter- and intra-human variability. This paper addresses the fairness problem from an equity lens, considering human behavior variability, and the changes in human preferences over time. We propose FAIRO, a novel algorithm for fairness-aware sequential-decision making in HITL adaptation, which incorporates these notions into the decision-making process. In particular, FAIRO decomposes this complex fairness task into adaptive sub-tasks based on individual human preferences through leveraging the Options reinforcement learning framework. We design FAIRO to generalize to three types of HITL application setups that have the shared adaptation decision problem. Furthermore, we recognize that fairness-aware policies can sometimes conflict with the application's utility. To address this challenge, we provide a fairness-utility tradeoff in FAIRO, allowing system designers to balance the objectives of fairness and utility based on specific application requirements. Extensive evaluations of FAIRO on the three HITL applications demonstrate its generalizability and effectiveness in promoting fairness while accounting for human variability. On average, FAIRO can improve fairness compared with other methods across all three applications by 35.36%.
    摘要

PIGEON: Predicting Image Geolocations

  • paper_url: http://arxiv.org/abs/2307.05845
  • repo_url: None
  • paper_authors: Lukas Haas, Michal Skreta, Silas Alberti
  • for: 这个论文是为了提出一种多任务端到端系统,以实现地球规模的图像地理位置确定。
  • methods: 该论文使用了semantic geocell创建和分割算法、图像地理信息预训练和ProtoNets进行位置预测精度提高。
  • results: 该论文在外部测试数据和人工评估中均达到了状态机的表现,并且提供了一个可用于邻域领域的预训练CLIP变换器模型。
    Abstract We introduce PIGEON, a multi-task end-to-end system for planet-scale image geolocalization that achieves state-of-the-art performance on both external benchmarks and in human evaluation. Our work incorporates semantic geocell creation with label smoothing, conducts pretraining of a vision transformer on images with geographic information, and refines location predictions with ProtoNets across a candidate set of geocells. The contributions of PIGEON are three-fold: first, we design a semantic geocells creation and splitting algorithm based on open-source data which can be adapted to any geospatial dataset. Second, we show the effectiveness of intra-geocell refinement and the applicability of unsupervised clustering and ProtNets to the task. Finally, we make our pre-trained CLIP transformer model, StreetCLIP, publicly available for use in adjacent domains with applications to fighting climate change and urban and rural scene understanding.
    摘要 我们介绍PIGEON,一个多任务端到端系统,用于大规模图像地理位置localization,实现了最新的性能标准。我们的工作包括semantic geocell创建与标签平滑、图像地理信息预训练vision transformer,以及在候选集geocells上进行位置预测refinement。PIGEON的贡献有三个方面:1. 我们设计了基于开源数据的semantic geocells创建和分割算法,可以适应任何地ospatial dataset。2. 我们证明了内部geocell划分的有效性和无监督归类和ProtoNets在这个任务中的可行性。3. 我们公开发布了我们预训练的CLIP transformer模型,StreetCLIP,用于附近领域的应用,如战击气候变化和城市和农村场景理解。

Scaling Distributed Multi-task Reinforcement Learning with Experience Sharing

  • paper_url: http://arxiv.org/abs/2307.05834
  • repo_url: None
  • paper_authors: Sanae Amani, Khushbu Pahwa, Vladimir Braverman, Lin F. Yang
    for:Distributed multi-task reinforcement learning (RL) is explored to benefit distributed lifelong learning agents in adapting to new challenges, specifically in the context of the ShELL program launched by DARPA.methods:The paper uses both theoretical and empirical research to address the problem of distributed multi-task RL, where a group of $N$ agents collaboratively solve $M$ tasks without prior knowledge of their identities. The problem is formulated as linearly parameterized contextual Markov decision processes (MDPs), and the proposed algorithm is called DistMT-LSVI.results:The paper shows that a single agent using DistMT-LSVI needs to run a total of at most $\tilde{\mathcal{O}({d^3H^6(\epsilon^{-2}+c_{\rm sep}^{-2})}\cdot M/N)$ episodes to achieve $\epsilon$-optimal policies for all $M$ tasks, improving the sample complexity of non-distributed settings by a factor of $1/N$. Numerical experiments conducted on OpenAI Gym Atari environments validate the theoretical findings.
    Abstract Recently, DARPA launched the ShELL program, which aims to explore how experience sharing can benefit distributed lifelong learning agents in adapting to new challenges. In this paper, we address this issue by conducting both theoretical and empirical research on distributed multi-task reinforcement learning (RL), where a group of $N$ agents collaboratively solves $M$ tasks without prior knowledge of their identities. We approach the problem by formulating it as linearly parameterized contextual Markov decision processes (MDPs), where each task is represented by a context that specifies the transition dynamics and rewards. To tackle this problem, we propose an algorithm called DistMT-LSVI. First, the agents identify the tasks, and then they exchange information through a central server to derive $\epsilon$-optimal policies for the tasks. Our research demonstrates that to achieve $\epsilon$-optimal policies for all $M$ tasks, a single agent using DistMT-LSVI needs to run a total number of episodes that is at most $\tilde{\mathcal{O}({d^3H^6(\epsilon^{-2}+c_{\rm sep}^{-2})}\cdot M/N)$, where $c_{\rm sep}>0$ is a constant representing task separability, $H$ is the horizon of each episode, and $d$ is the feature dimension of the dynamics and rewards. Notably, DistMT-LSVI improves the sample complexity of non-distributed settings by a factor of $1/N$, as each agent independently learns $\epsilon$-optimal policies for all $M$ tasks using $\tilde{\mathcal{O}(d^3H^6M\epsilon^{-2})$ episodes. Additionally, we provide numerical experiments conducted on OpenAI Gym Atari environments that validate our theoretical findings.
    摘要 最近,DARPA发起了Shell计划,旨在探索经验分享如何为分布式长期学习代理人在面临新挑战时适应。在这篇论文中,我们对这个问题进行了both theoretically和实验研究,我们使用分布式多任务强化学习(RL)来解决这个问题。我们将问题表述为线性参数化上下文 Markov决策过程(MDP),每个任务都被一个上下文特定的过程和奖励规则表示。为解决这个问题,我们提议了一个算法 called DistMT-LSVI。首先,代理人们识别任务,然后通过中央服务器交换信息以 derivate ε-优质策略。我们的研究表明,使用 DistMT-LSVI,每个代理人只需要运行 $\tilde{\mathcal{O}({d^3H^6(\epsilon^{-2}+c_{\rm sep}^{-2})}\cdot M/N)$ 集数,其中 $c_{\rm sep}>0$ 是任务分离度,$H$ 是每集的时间范围,$d$ 是动力和奖励的特征维度。各代理人独立学习 $\epsilon$-优质策略,每个任务使用 $\tilde{\mathcal{O}(d^3H^6M\epsilon^{-2})$ 集数。此外,我们对 OpenAI Gym Atari 环境进行了实验,并证明了我们的理论发现。

Memorization Through the Lens of Curvature of Loss Function Around Samples

  • paper_url: http://arxiv.org/abs/2307.05831
  • repo_url: None
  • paper_authors: Isha Garg, Kaushik Roy
  • for: 这篇论文旨在探讨神经网络在训练集上的溯源和泛化能力问题。
  • methods: 论文使用损失函数的曲线特性来衡量神经网络的溯源和泛化能力,并在各个训练轮数据上 calculate 平均损失曲线的弯曲度。
  • results: 研究发现,在各个训练集上,神经网络可以具有高度的溯源和泛化能力,但也可能存在强度的溯源和泛化能力。此外,研究还发现了一种新的失效模型,即 duplicated images with different labels。此外,通过随机损害一些样本的标签,发现 curvature 排序可以具有高 AUROC 值来识别损害的样本。
    Abstract Neural networks are overparametrized and easily overfit the datasets they train on. In the extreme case, it is shown that they can memorize a training set with fully randomized labels. We propose using the curvature of loss function around the training sample as a measure of its memorization, averaged over all training epochs. We use this to study the generalization versus memorization properties of different samples in popular image datasets. We visualize samples with the highest curvature of loss around them, and show that these visually correspond to long-tailed, mislabeled or conflicting samples. This analysis helps us find a, to the best of our knowledge, novel failure model on the CIFAR100 dataset, that of duplicated images with different labels. We also synthetically mislabel a proportion of the dataset by randomly corrupting the labels of a few samples, and show that sorting by curvature yields high AUROC values for identifying the mislabeled samples.
    摘要 Here's the Simplified Chinese translation:神经网络经常过参数化,容易过拟合训练数据。在极端情况下,它们可以完全记忆训练集的 labels。我们提议使用损失函数的 curvature around the training sample作为其记忆度的度量,并对所有训练轮进行平均计算。我们使用这种方法来研究不同样本的泛化性与记忆性的关系,并在受欢迎的图像集中进行可视化分析。我们发现了一种新的失败模式在 CIFAR100 数据集上,即重复的图像与不同标签存在。我们还使用随机损害标签的方式 Synthetically mislabel a portion of the dataset,并显示了以 curvature 为排序的 AUROC 值高。

Relational Extraction on Wikipedia Tables using Convolutional and Memory Networks

  • paper_url: http://arxiv.org/abs/2307.05827
  • repo_url: https://github.com/simpleparadox/re_656
  • paper_authors: Arif Shahriar, Rohan Saha, Denilson Barbosa
  • for: 这篇论文主要是为了提出一种基于表格数据的关系提取方法。
  • methods: 该方法使用了卷积神经网络和双向长短Term Memory网络来编码实体和学习实体之间的依赖关系。
  • results: 实验结果显示,该模型在大规模最新的数据集上 consistently 超过了之前的神经方法 для关系提取任务。
    Abstract Relation extraction (RE) is the task of extracting relations between entities in text. Most RE methods extract relations from free-form running text and leave out other rich data sources, such as tables. We explore RE from the perspective of applying neural methods on tabularly organized data. We introduce a new model consisting of Convolutional Neural Network (CNN) and Bidirectional-Long Short Term Memory (BiLSTM) network to encode entities and learn dependencies among them, respectively. We evaluate our model on a large and recent dataset and compare results with previous neural methods. Experimental results show that our model consistently outperforms the previous model for the task of relation extraction on tabular data. We perform comprehensive error analyses and ablation study to show the contribution of various components of our model. Finally, we discuss the usefulness and trade-offs of our approach, and provide suggestions for fostering further research.
    摘要 <>将文本中的关系抽取出来(Relation Extraction,RE)是一项任务,它的目标是从自由文本中提取实体之间的关系。大多数RE方法都是从自由文本中提取关系,而忽略其他丰富数据源,如表格。我们从表格化数据的角度来探讨RE。我们介绍了一种新的模型,该模型包括卷积神经网络(CNN)和双向长短期记忆网络(BiLSTM),用于编码实体和学习实体之间的依赖关系。我们对一个大型和最新的数据集进行了评估,并与前一代神经网络方法进行比较。实验结果表明,我们的模型在关系抽取任务中一直表现出色,超越了前一代神经网络方法。我们进行了完整的错误分析和减少研究,以显示我们模型的各个组件的贡献。最后,我们讨论了我们的方法的有用性和缺点,并提供了进一步研究的建议。

AnuraSet: A dataset for benchmarking Neotropical anuran calls identification in passive acoustic monitoring

  • paper_url: http://arxiv.org/abs/2307.06860
  • repo_url: https://github.com/soundclim/anuraset
  • paper_authors: Juan Sebastián Cañas, Maria Paula Toro-Gómez, Larissa Sayuri Moreira Sugai, Hernán Darío Benítez Restrepo, Jorge Rudas, Breyner Posso Bautista, Luís Felipe Toledo, Simone Dena, Adão Henrique Rosa Domingos, Franco Leandro de Souza, Selvino Neckel-Oliveira, Anderson da Rosa, Vítor Carvalho-Rocha, José Vinícius Bernardy, José Luiz Massao Moreira Sugai, Carolina Emília dos Santos, Rogério Pereira Bastos, Diego Llusia, Juan Sebastián Ulloa
  • for: 这个论文的目的是为了研究鳄鱼的叫声行为,以便通过pasive acoustic monitoring(PAM)来了解全球变化对鳄鱼的影响。
  • methods: 这篇论文使用了大规模多种鳄鱼种类的叫声数据集,包括42种不同的鳄鱼种类从两个南美生态系统中记录的27小时专家标注。
  • results: 论文提供了一个开放的数据集,包括原始录音、实验设置代码和一个基线模型的评估。同时,论文还挑战了机器学习研究人员解决鳄鱼叫声识别问题,以便为保护政策提供技术支持。
    Abstract Global change is predicted to induce shifts in anuran acoustic behavior, which can be studied through passive acoustic monitoring (PAM). Understanding changes in calling behavior requires the identification of anuran species, which is challenging due to the particular characteristics of neotropical soundscapes. In this paper, we introduce a large-scale multi-species dataset of anuran amphibians calls recorded by PAM, that comprises 27 hours of expert annotations for 42 different species from two Brazilian biomes. We provide open access to the dataset, including the raw recordings, experimental setup code, and a benchmark with a baseline model of the fine-grained categorization problem. Additionally, we highlight the challenges of the dataset to encourage machine learning researchers to solve the problem of anuran call identification towards conservation policy. All our experiments and resources can be found on our GitHub repository https://github.com/soundclim/anuraset.
    摘要

Bayesian taut splines for estimating the number of modes

  • paper_url: http://arxiv.org/abs/2307.05825
  • repo_url: None
  • paper_authors: José E. Chacón, Javier Fernández Serrano
  • for: 本研究targets the estimation of the number of modes in a probability density function, which is representative of the model’s complexity and the number of existing subpopulations.
  • methods: 我们提出了一种新的方法,启发自一些受欢迎的假设和潜在的解决方案。 Our method combines flexible kernel estimators and parsimonious compositional splines, and incorporates feature exploration, model selection, and mode testing in the Bayesian inference paradigm.
  • results: 我们的方法在一个实际应用中(体育分析)中展示了多种伴生视觉工具,并通过了严格的模拟研究。 Traditional modality-driven approaches paradoxically struggle to provide accurate results, but our method emerges as a top-tier alternative offering innovative solutions for analysts.
    Abstract The number of modes in a probability density function is representative of the model's complexity and can also be viewed as the number of existing subpopulations. Despite its relevance, little research has been devoted to its estimation. Focusing on the univariate setting, we propose a novel approach targeting prediction accuracy inspired by some overlooked aspects of the problem. We argue for the need for structure in the solutions, the subjective and uncertain nature of modes, and the convenience of a holistic view blending global and local density properties. Our method builds upon a combination of flexible kernel estimators and parsimonious compositional splines. Feature exploration, model selection and mode testing are implemented in the Bayesian inference paradigm, providing soft solutions and allowing to incorporate expert judgement in the process. The usefulness of our proposal is illustrated through a case study in sports analytics, showcasing multiple companion visualisation tools. A thorough simulation study demonstrates that traditional modality-driven approaches paradoxically struggle to provide accurate results. In this context, our method emerges as a top-tier alternative offering innovative solutions for analysts.
    摘要 “一个概率密度函数中的模式数量对于模型的复杂性是代表性的,同时也可以视为存在多个子人口的数量。尽管其重要性,仍然有少量的研究对其进行了探讨。在单变量设置下,我们提出了一种新的方法,旨在提高预测精度,基于一些受过lookover的方面。我们认为结构在解决方案中是重要的,模式的主观和不确定性,以及全局和局部概率性质的整体视图的便利性。我们的方法基于灵活的kernel估计器和简洁的 compositional splines。在 bayesian推理框架下,我们实现了特征探索、模型选择和模式测试,提供软解决方案,并允许把专家判断纳入过程中。我们的提议在运动统计领域中进行了一个案例研究,并展示了多种伴生视觉工具。一系列的 simulations 研究表明,传统的模态驱动方法却难以提供准确的结果,在这种情况下,我们的方法出现为一个优质的代替方案,为分析师提供创新的解决方案。”Note: Please note that the translation is in Simplified Chinese, and some words or phrases may have different translations in Traditional Chinese.

Safe Reinforcement Learning for Strategic Bidding of Virtual Power Plants in Day-Ahead Markets

  • paper_url: http://arxiv.org/abs/2307.05812
  • repo_url: None
  • paper_authors: Ognjen Stanojev, Lesia Mitridati, Riccardo de Nardis di Prata, Gabriela Hug
  • for: 这篇论文旨在提出一种安全的优化学习算法,用于在日后电力市场中进行虚拟发电厂(VPP)的竞投策略选择。
  • methods: 该算法使用深度决定策函数方法(DDPG)学习竞投策略,不需要精准的市场模型。此外,为了考虑分布式能源资源的复杂内部物理约束,我们提出了两个改进。首先,基于投影的安全屏障,限制代理人的行为在非线性电力流方程和运行约束下的可行空间内。其次,在奖励函数中增加了一个障碍屏障的惩罚项,以鼓励代理人学习更安全的策略。
  • results: 一个基于IEEE 13-bus网络的案例研究表明,提出的方法可以帮助代理人学习一种非常竞争力强、安全的策略。
    Abstract This paper presents a novel safe reinforcement learning algorithm for strategic bidding of Virtual Power Plants (VPPs) in day-ahead electricity markets. The proposed algorithm utilizes the Deep Deterministic Policy Gradient (DDPG) method to learn competitive bidding policies without requiring an accurate market model. Furthermore, to account for the complex internal physical constraints of VPPs we introduce two enhancements to the DDPG method. Firstly, a projection-based safety shield that restricts the agent's actions to the feasible space defined by the non-linear power flow equations and operating constraints of distributed energy resources is derived. Secondly, a penalty for the shield activation in the reward function that incentivizes the agent to learn a safer policy is introduced. A case study based on the IEEE 13-bus network demonstrates the effectiveness of the proposed approach in enabling the agent to learn a highly competitive, safe strategic policy.
    摘要
  1. A projection-based safety shield that restricts the agent’s actions to the feasible space defined by the non-linear power flow equations and operating constraints of distributed energy resources.2. A penalty for the shield activation in the reward function that incentivizes the agent to learn a safer policy.A case study based on the IEEE 13-bus network demonstrates the effectiveness of the proposed approach in enabling the agent to learn a highly competitive, safe strategic policy.Simplified Chinese translation:这篇论文提出了一种新的安全强化学习算法,用于虚拟能源厂(VPP)在日前电力市场中的投标策略。该算法使用深度决定策优化(DDPG)方法学习竞争性的投标策略,无需准确的市场模型。此外,为了考虑VPP内部的复杂物理约束,我们引入了两个优化:1. 一种基于投影的安全盾,限制智能机器人的动作在分布式能源资源的非线性电流方程和操作约束下的可行空间中。2. 在奖励函数中增加一个奖励用于盾牌活动的罚款,以鼓励智能机器人学习一个更安全的策略。基于IEEE 13 bus网络的案例研究表明,提议的方法可以帮助智能机器人学习一个非常竞争力高、安全的策略。

Differentiable Forward Projector for X-ray Computed Tomography

  • paper_url: http://arxiv.org/abs/2307.05801
  • repo_url: https://github.com/llnl/leap
  • paper_authors: Hyojin Kim, Kyle Champley
  • for: 这篇论文是为了解决计算机 Tomography 重建问题而写的。
  • methods: 这篇论文使用的方法是数据驱动深度学习,可以超越现有的分析和迭代算法,尤其是在不良定的 CT 重建问题中。
  • results: 这篇论文提出了一个准确的导计前向和反向投影软件库,以确保预测图像与原始测量数据之间的一致性。这个软件库支持多种投影几何类型,同时尽量减少 GPU 内存占用量,以便与现有的深度学习训练和推理管道集成无缝。
    Abstract Data-driven deep learning has been successfully applied to various computed tomographic reconstruction problems. The deep inference models may outperform existing analytical and iterative algorithms, especially in ill-posed CT reconstruction. However, those methods often predict images that do not agree with the measured projection data. This paper presents an accurate differentiable forward and back projection software library to ensure the consistency between the predicted images and the original measurements. The software library efficiently supports various projection geometry types while minimizing the GPU memory footprint requirement, which facilitates seamless integration with existing deep learning training and inference pipelines. The proposed software is available as open source: https://github.com/LLNL/LEAP.
    摘要 <>将数据驱动的深度学习应用到了多个计算Tomography重建问题中,深度推理模型可能超越现有的分析和迭代算法,尤其是在糜爷CT重建中。然而,这些方法经常预测不符合测量角度数据的图像。这篇文章介绍了一个准确的可导进程和反进程软件库,以确保预测的图像与原始测量数据保持一致。该软件库支持多种投影几何类型,同时减少GPU内存占用量,以便与现有的深度学习训练和推理管道集成。该软件开源可用:https://github.com/LLNL/LEAP。Note: "LEAP" stands for "Livermore Eigen Solver" in the text, but it is not translated in the Simplified Chinese version.

  • paper_url: http://arxiv.org/abs/2307.05794
  • repo_url: https://github.com/weilabmsu/oud-ppi
  • paper_authors: Long Chen, Jian Jiang, Bozheng Dou, Hongsong Feng, Jie Liu, Yueying Zhu, Bengong Zhang, Tianshou Zhou, Guo-Wei Wei
  • for: 这个研究旨在发展新的痛症处理方法,以提高现有的痛症治疗选择,并实现更好的效果和更少的副作用。
  • methods: 这个研究使用蛋白质-蛋白质互作网络(PPI)和药物-标靶互作网络(DTI)来探索痛症相关的NaV1.3、NaV1.7、NaV1.8和NaV1.9感触通道,以找到可能的领域药物。
  • results: 这个研究通过系统性的测试过程,评估了150,000多个药物潜在目标的副作用和重新利用潜力,并评估了这些目标的ADMET特性,以找到最佳的领域药物。
    Abstract Pain is a significant global health issue, and the current treatment options for pain management have limitations in terms of effectiveness, side effects, and potential for addiction. There is a pressing need for improved pain treatments and the development of new drugs. Voltage-gated sodium channels, particularly Nav1.3, Nav1.7, Nav1.8, and Nav1.9, play a crucial role in neuronal excitability and are predominantly expressed in the peripheral nervous system. Targeting these channels may provide a means to treat pain while minimizing central and cardiac adverse effects. In this study, we construct protein-protein interaction (PPI) networks based on pain-related sodium channels and develop a corresponding drug-target interaction (DTI) network to identify potential lead compounds for pain management. To ensure reliable machine learning predictions, we carefully select 111 inhibitor datasets from a pool of over 1,000 targets in the PPI network. We employ three distinct machine learning algorithms combined with advanced natural language processing (NLP)-based embeddings, specifically pre-trained transformer and autoencoder representations. Through a systematic screening process, we evaluate the side effects and repurposing potential of over 150,000 drug candidates targeting Nav1.7 and Nav1.8 sodium channels. Additionally, we assess the ADMET (absorption, distribution, metabolism, excretion, and toxicity) properties of these candidates to identify leads with near-optimal characteristics. Our strategy provides an innovative platform for the pharmacological development of pain treatments, offering the potential for improved efficacy and reduced side effects.
    摘要 疼痛是全球健康 Issue 的一个重要问题,现有的疼痛管理选项有限,效果有限,并且可能会导致成瘾。有一个急需新的疼痛治疗和新药的开发。电位调节的钾离子通道,特别是Nav1.3、Nav1.7、Nav1.8和Nav1.9,在神经元的兴奋性中扮演着关键角色,对于疼痛的治疗可能提供一个新的途径。在这个研究中,我们建立了疾病相互作用(PPI)网络,并将其与疼痛相关的钾离子通道之间的互动组合成一个对疼痛管理的药物-标的互动网络(DTI),以获得可能的领域药物。为确保机器学学习预测的可靠性,我们从抽象标的网络中选择111个抑制标的数据集,并使用三种不同的机器学学习算法,其中包括具有进步的自然语言处理(NLP)基于嵌入的预训练 transformer 和自动编码器表示。通过一系列的排序过程,我们评估了 Nav1.7 和 Nav1.8 钾离子通道的抑制者具有哪些副作用和可重用性。此外,我们评估了这些候选药物的ADMET特性,以选择具有最佳特性的领域药物。我们的策略提供了一个创新的疼痛治疗开发平台,具有改善效果和副作用的减少。

Implicit regularisation in stochastic gradient descent: from single-objective to two-player games

  • paper_url: http://arxiv.org/abs/2307.05789
  • repo_url: None
  • paper_authors: Mihaela Rosca, Marc Peter Deisenroth
  • for: 这个论文的目的是研究深度学习优化中的隐式正则化效果,以及如何使用这些效果来改进性能和稳定性。
  • methods: 这个论文使用了回归错误分析(BEA)来量化步长误差,并使用continuous-time flows来找到隐式正则化效果。
  • results: 这个论文发现了多个隐式正则化效果,包括在多个渐近梯度 descent步骤中产生的正则化效果,以及在总体 differentiable two-player games 中产生的正则化效果。
    Abstract Recent years have seen many insights on deep learning optimisation being brought forward by finding implicit regularisation effects of commonly used gradient-based optimisers. Understanding implicit regularisation can not only shed light on optimisation dynamics, but it can also be used to improve performance and stability across problem domains, from supervised learning to two-player games such as Generative Adversarial Networks. An avenue for finding such implicit regularisation effects has been quantifying the discretisation errors of discrete optimisers via continuous-time flows constructed by backward error analysis (BEA). The current usage of BEA is not without limitations, since not all the vector fields of continuous-time flows obtained using BEA can be written as a gradient, hindering the construction of modified losses revealing implicit regularisers. In this work, we provide a novel approach to use BEA, and show how our approach can be used to construct continuous-time flows with vector fields that can be written as gradients. We then use this to find previously unknown implicit regularisation effects, such as those induced by multiple stochastic gradient descent steps while accounting for the exact data batches used in the updates, and in generally differentiable two-player games.
    摘要 One avenue for finding implicit regularization effects has been to quantify the discretization errors of discrete optimizers using continuous-time flows constructed by backward error analysis (BEA). However, the current usage of BEA is not without limitations, as not all the vector fields of continuous-time flows obtained using BEA can be written as gradients, hindering the construction of modified losses revealing implicit regularizers.In this work, we propose a novel approach to using BEA, which allows us to construct continuous-time flows with vector fields that can be written as gradients. We then use this approach to find previously unknown implicit regularization effects, such as those induced by multiple stochastic gradient descent steps while accounting for the exact data batches used in the updates, and in generally differentiable two-player games.

Making the Nyström method highly accurate for low-rank approximations

  • paper_url: http://arxiv.org/abs/2307.05785
  • repo_url: None
  • paper_authors: Jianlin Xia
  • for: 本研究 propose a series of heuristic strategies to improve the accuracy of the Nystr"om method for nonsymmetric and/or rectangular matrices.
  • methods: 提议使用一种快速的准备策略,使用Nystr"om方法和纤维辐射因子作为快速的轮循策略,并使用两种反复更新策略: alternate 行和列准备,以及逐渐增加样本数ntil reached a desired rank or accuracy.
  • results: 实验表明,高精度Nystr"om方法可以快速达到预先设置的高精度,并且在一些情况下,与SVD的质量几乎相同,仅使用少量的进程式扫描步骤。
    Abstract The Nystr\"om method is a convenient heuristic method to obtain low-rank approximations to kernel matrices in nearly linear complexity. Existing studies typically use the method to approximate positive semidefinite matrices with low or modest accuracies. In this work, we propose a series of heuristic strategies to make the Nystr\"om method reach high accuracies for nonsymmetric and/or rectangular matrices. The resulting methods (called high-accuracy Nystr\"om methods) treat the Nystr\"om method and a skinny rank-revealing factorization as a fast pivoting strategy in a progressive alternating direction refinement process. Two refinement mechanisms are used: alternating the row and column pivoting starting from a small set of randomly chosen columns, and adaptively increasing the number of samples until a desired rank or accuracy is reached. A fast subset update strategy based on the progressive sampling of Schur complements is further proposed to accelerate the refinement process. Efficient randomized accuracy control is also provided. Relevant accuracy and singular value analysis is given to support some of the heuristics. Extensive tests with various kernel functions and data sets show how the methods can quickly reach prespecified high accuracies in practice, sometimes with quality close to SVDs, using only small numbers of progressive sampling steps.
    摘要 “尼斯特罗姆方法”是一种便利的估算方法,用于获取低级别approximation matrices的kernel matrices,在近似线性复杂度下。现有研究通常使用这种方法来 aproximatepositive semi-definite matrices with low or modest accuracies。在这个工作中,我们提出了一系列的优化策略,使得尼斯特罗姆方法可以在非对称和/或方正矩阵上达到高精度。这些方法(称为高精度尼斯特罗姆方法)将尼斯特罗姆方法和瘦rank-revealing factorization视为一种快速转移策略,并在进行 alternating direction refinement 过程中使用两种缓存机制: alternating the row and column pivoting,从一个小的Randomly chosen columns开始,并逐渐增加样本数 Until a desired rank or accuracy is reached。此外,我们还提出了一种快速subset update策略,基于进行逐渐 sampling of Schur complements。此外,我们还提供了高效的随机化准确性控制。 relevante accuracy和singular value analysis 支持一些of the heuristics。具体测试结果表明,这些方法可以在各种kernel functions和数据集上快速到达预先定义的高精度,有时与SVDs的质量几乎相同,只使用了小量的进行逐渐 sampling steps。

Weisfeiler and Lehman Go Measurement Modeling: Probing the Validity of the WL Test

  • paper_url: http://arxiv.org/abs/2307.05775
  • repo_url: https://github.com/arjunsubramonian/wl-test-exploration
  • paper_authors: Arjun Subramonian, Adina Williams, Maximilian Nickel, Yizhou Sun, Levent Sagun
  • for: 本研究旨在探讨图 neural network的表达能力是如何量化的,以及$k$-dimensional Weisfeiler-Lehman ($k$-WL) 测试是否能够准确地评估图 neural network的表达能力。
  • methods: 本研究采用系统性分析和评估$k$-WL 测试的可靠性和有效性,以及一份问卷调查(n = 18)来探讨实践者对表达能力的概念和$k$-WL 测试的假设。
  • results: 分析发现$k$-WL 测试并不能保证同构,可能与实际的图任务无关,并且可能不会提高通用性或可靠性。作者提议使用外部定义和测试表达能力基于标准套件,并提供了指导问题来构建这些套件,以促进图机器学习的进步。
    Abstract The expressive power of graph neural networks is usually measured by comparing how many pairs of graphs or nodes an architecture can possibly distinguish as non-isomorphic to those distinguishable by the $k$-dimensional Weisfeiler-Lehman ($k$-WL) test. In this paper, we uncover misalignments between practitioners' conceptualizations of expressive power and $k$-WL through a systematic analysis of the reliability and validity of $k$-WL. We further conduct a survey ($n = 18$) of practitioners to surface their conceptualizations of expressive power and their assumptions about $k$-WL. In contrast to practitioners' opinions, our analysis (which draws from graph theory and benchmark auditing) reveals that $k$-WL does not guarantee isometry, can be irrelevant to real-world graph tasks, and may not promote generalization or trustworthiness. We argue for extensional definitions and measurement of expressive power based on benchmarks; we further contribute guiding questions for constructing such benchmarks, which is critical for progress in graph machine learning.
    摘要 通常来说,图 neural network 的表达力是通过比较它们可以分辨的图或节点数量与 $k $-dimensional Weisfeiler-Lehman ($k $-WL) 测试的结果进行比较来度量的。在这篇论文中,我们发现了实践者们对表达力的概念和 $k $-WL 测试之间的不一致,并通过系统性的分析和survey(n = 18)来揭示实践者们对表达力的概念和 $k $-WL 测试的假设。与实践者们的意见相比,我们的分析发现了 $k $-WL 测试不能保证同构,可能与实际图任务无关,并且可能不会提高泛化性或可靠性。我们建议使用外在定义和测试表达力基于标准 benchmark,并提供了指导问题来构建这些标准 benchmark,这对图机器学习进程的进步是非常重要。

Random-Set Convolutional Neural Network (RS-CNN) for Epistemic Deep Learning

  • paper_url: http://arxiv.org/abs/2307.05772
  • repo_url: None
  • paper_authors: Shireen Kudukkil Manchingal, Muhammad Mubashar, Kaizheng Wang, Keivan Shariatmadar, Fabio Cuzzolin
  • for: 本研究旨在提供一种基于随机集的卷积神经网络(RS-CNN),用于分类 зада务中具有信任度和不确定性的评估。
  • methods: 本研究使用了随机集模型,通过表示样本空间上的分布来预测信任函数,并使用 credal sets 来估计 epistemic uncertainty。
  • results: 对比于其他不确定性意识方法,RS-CNN 在Out-of-distribution 样本上表现出色,能够正确地预测真实的结果。
    Abstract Machine learning is increasingly deployed in safety-critical domains where robustness against adversarial attacks is crucial and erroneous predictions could lead to potentially catastrophic consequences. This highlights the need for learning systems to be equipped with the means to determine a model's confidence in its prediction and the epistemic uncertainty associated with it, 'to know when a model does not know'. In this paper, we propose a novel Random-Set Convolutional Neural Network (RS-CNN) for classification which predicts belief functions rather than probability vectors over the set of classes, using the mathematics of random sets, i.e., distributions over the power set of the sample space. Based on the epistemic deep learning approach, random-set models are capable of representing the 'epistemic' uncertainty induced in machine learning by limited training sets. We estimate epistemic uncertainty by approximating the size of credal sets associated with the predicted belief functions, and experimentally demonstrate how our approach outperforms competing uncertainty-aware approaches in a classical evaluation setting. The performance of RS-CNN is best demonstrated on OOD samples where it manages to capture the true prediction while standard CNNs fail.
    摘要 Translated into Simplified Chinese:机器学习在安全关键领域得到广泛应用,其中机器学习模型需要具备对预测结果的信任度和 epistemic 不确定性的识别能力,以便知道机器学习模型在预测时不熟悉的情况。在这篇论文中,我们提出了一种基于 random set 的卷积神经网络(RS-CNN)模型,该模型预测了信函数而不是类型的概率 вектор,使用机器学习中的 epistemic 深度学习方法。通过 approximating credal sets 的大小,我们可以估算模型中 epistemic 不确定性的大小。我们在 классической评估环境中对我们的方法进行了实验比较,并证明了我们的方法在 OOD 样本上表现更好。

Unsupervised Learning in Complex Systems

  • paper_url: http://arxiv.org/abs/2307.10993
  • repo_url: https://github.com/hugcis/evolving-structures-in-complex-systems
  • paper_authors: Hugo Cisneros
  • For: 研究自适应学习和复杂系统的应用,以开发无监督学习算法,提高自适应应用的灵活性和适应性。* Methods: 使用复杂系统来研究学习和适应的自然和人工系统,开发一个通用复杂度指标,使用哥特规则归纳方法研究大规模复杂系统的计算,开发学习效率指标和学习算法评价数据集。* Results: 通过研究复杂系统的学习和适应机制,提出了一种新的自适应学习方法,并实现了无监督学习的目标,对自适应应用的灵活性和适应性做出了贡献。
    Abstract In this thesis, we explore the use of complex systems to study learning and adaptation in natural and artificial systems. The goal is to develop autonomous systems that can learn without supervision, develop on their own, and become increasingly complex over time. Complex systems are identified as a suitable framework for understanding these phenomena due to their ability to exhibit growth of complexity. Being able to build learning algorithms that require limited to no supervision would enable greater flexibility and adaptability in various applications. By understanding the fundamental principles of learning in complex systems, we hope to advance our ability to design and implement practical learning algorithms in the future. This thesis makes the following key contributions: the development of a general complexity metric that we apply to search for complex systems that exhibit growth of complexity, the introduction of a coarse-graining method to study computations in large-scale complex systems, and the development of a metric for learning efficiency as well as a benchmark dataset for evaluating the speed of learning algorithms. Our findings add substantially to our understanding of learning and adaptation in natural and artificial systems. Moreover, our approach contributes to a promising new direction for research in this area. We hope these findings will inspire the development of more effective and efficient learning algorithms in the future.
    摘要 在这个论文中,我们探讨使用复杂系统来研究学习和适应自然和人工系统的问题。我们的目标是开发无监督学习的自动化系统,能够自主发展、不断增加复杂性,并在不同应用中实现更大的灵活性和适应能力。由于复杂系统的能力表现增长复杂性,因此我们认为这种框架是研究这些现象的适当选择。通过理解复杂系统学习的基本原理,我们希望能够在未来设计和实现更加实用的学习算法。这个论文的主要贡献包括:开发一个通用复杂度度量,用于搜索展示增长复杂性的复杂系统,介绍一种大规模复杂系统的粗化方法,以及开发一个学习效率度量和一个评估学习算法速度的 benchmark 数据集。我们的发现对自然和人工系统的学习和适应有很大的贡献,同时,我们的方法也对研究这个领域的未来发展做出了重要贡献。我们希望这些发现能够激励未来的研究人员开发更有效率的学习算法。

Realtime Spectrum Monitoring via Reinforcement Learning – A Comparison Between Q-Learning and Heuristic Methods

  • paper_url: http://arxiv.org/abs/2307.05763
  • repo_url: None
  • paper_authors: Tobias Braun, Tobias Korzyzkowske, Larissa Putzar, Jan Mietzner, Peter A. Hoeher
  • for: 本研究旨在比较两种不同的接收器资源管理方法在频谱监测中的性能。
  • methods: 研究使用了一种基于循环搜索的Q学习算法和一种启发式方法来控制可用接收器资源。
  • results: 研究发现,使用Q学习算法可以在检测率和探索率之间取得一个适当的平衡,而启发式方法的检测率较低。
    Abstract Due to technological advances in the field of radio technology and its availability, the number of interference signals in the radio spectrum is continuously increasing. Interference signals must be detected in a timely fashion, in order to maintain standards and keep emergency frequencies open. To this end, specialized (multi-channel) receivers are used for spectrum monitoring. In this paper, the performances of two different approaches for controlling the available receiver resources are compared. The methods used for resource management (ReMa) are linear frequency tuning as a heuristic approach and a Q-learning algorithm from the field of reinforcement learning. To test the methods to be investigated, a simplified scenario was designed with two receiver channels monitoring ten non-overlapping frequency bands with non-uniform signal activity. For this setting, it is shown that the Q-learning algorithm used has a significantly higher detection rate than the heuristic approach at the expense of a smaller exploration rate. In particular, the Q-learning approach can be parameterized to allow for a suitable trade-off between detection and exploration rate.
    摘要 To test the methods being investigated, a simplified scenario was designed with two receiver channels monitoring ten non-overlapping frequency bands with non-uniform signal activity. The results show that the Q-learning algorithm used has a significantly higher detection rate than the heuristic approach, but at the expense of a smaller exploration rate. Specifically, the Q-learning approach can be parameterized to allow for a suitable trade-off between detection and exploration rate.

GOKU-UI: Ubiquitous Inference through Attention and Multiple Shooting for Continuous-time Generative Models

  • paper_url: http://arxiv.org/abs/2307.05735
  • repo_url: None
  • paper_authors: Germán Abrevaya, Mahta Ramezanian-Panahi, Jean-Christophe Gagnon-Audet, Irina Rish, Pablo Polosecki, Silvina Ponce Dawson, Guillermo Cecchi, Guillaume Dumas
  • for: 该研究旨在推动科学机器学习领域的发展,通过结合域知和可解释模型和agnostik机器学习技术来提高模型的表现。
  • methods: 该研究提出了一种基于GOKU-nets的生成模型GOKU-UI,通过吸引机制和多极训练策略在隐藏空间进行分布式推理,并将Stochastic Differential Equations(SDEs)纳入模型范畴。
  • results: 对于 sintetic数据和实验室数据的测试,GOKU-UI模型表现出色,比基eline模型更高的表现,特别是在数据训练量少的情况下。此外,当应用于实验室人脑数据时,GOKU-UI模型可以更好地预测人脑活动的未来变化,并且可以在12秒前预测。
    Abstract Scientific Machine Learning (SciML) is a burgeoning field that synergistically combines domain-aware and interpretable models with agnostic machine learning techniques. In this work, we introduce GOKU-UI, an evolution of the SciML generative model GOKU-nets. The GOKU-UI broadens the original model's spectrum to incorporate other classes of differential equations, such as Stochastic Differential Equations (SDEs), and integrates a distributed, i.e. ubiquitous, inference through attention mechanisms and a novel multiple shooting training strategy in the latent space. These enhancements have led to a significant increase in its performance in both reconstruction and forecast tasks, as demonstrated by our evaluation of simulated and empirical data. Specifically, GOKU-UI outperformed all baseline models on synthetic datasets even with a training set 32-fold smaller, underscoring its remarkable data efficiency. Furthermore, when applied to empirical human brain data, while incorporating stochastic Stuart-Landau oscillators into its dynamical core, it not only surpassed state-of-the-art baseline methods in the reconstruction task, but also demonstrated better prediction of future brain activity up to 12 seconds ahead. By training GOKU-UI on resting-state fMRI data, we encoded whole-brain dynamics into a latent representation, learning an effective low-dimensional dynamical system model that could offer insights into brain functionality and open avenues for practical applications such as mental state or psychiatric condition classification. Ultimately, our research provides further impetus for the field of Scientific Machine Learning, showcasing the potential for advancements when established scientific insights are interwoven with modern machine learning.
    摘要 科学机器学习(SciML)是一个 быстро发展的领域,它将域属和可解释的模型与agnostic机器学习技术相结合。在这项工作中,我们介绍了GOKU-UI,它是SciML生成模型GOKU-nets的演化。GOKU-UI扩展了原始模型的谱,包括杂eventually differential equations(SDEs),并通过注意力机制和novel multiple shooting training strategy在latent space进行分布式、ubiquitous的推理。这些改进导致GOKU-UI在重建和预测任务中表现出了显著的提升,如我们对synthetic和实验数据进行评估所示。具体来说,GOKU-UI在synthetic数据集上even with a training set 32-fold smallerthan all baseline models,manifesting its remarkable data efficiency。此外,当应用到empirical human brain data时,通过包含Stochastic Stuart-Landau oscillators在其动力核心中,不仅超越了state-of-the-art baseline方法在重建任务中,还能够预测未来脑动活的趋势达12秒。通过在resting-state fMRI数据上训练GOKU-UI,我们将整个大脑动力系统编码到了一个低维度的动力系统模型中,从而学习了一个有效的低维度动力系统模型,可以为脑功能的研究提供新的视角和实用应用 such as mental state或心理疾病诊断。最后,我们的研究为科学机器学习领域增添了新的动力,证明了当科学知识和现代机器学习技术相结合时,可以取得更大的进步。

Towards A Scalable Solution for Improving Multi-Group Fairness in Compositional Classification

  • paper_url: http://arxiv.org/abs/2307.05728
  • repo_url: None
  • paper_authors: James Atwood, Tina Tian, Ben Packer, Meghana Deodhar, Jilin Chen, Alex Beutel, Flavien Prost, Ahmad Beirami
  • for: 提高复杂系统中的机器学习公平性
  • methods: 提出两种简单的技术:任务上下文和组层排序
  • results: 实验结果在学术和实际环境中证明提案的有效性
    Abstract Despite the rich literature on machine learning fairness, relatively little attention has been paid to remediating complex systems, where the final prediction is the combination of multiple classifiers and where multiple groups are present. In this paper, we first show that natural baseline approaches for improving equal opportunity fairness scale linearly with the product of the number of remediated groups and the number of remediated prediction labels, rendering them impractical. We then introduce two simple techniques, called {\em task-overconditioning} and {\em group-interleaving}, to achieve a constant scaling in this multi-group multi-label setup. Our experimental results in academic and real-world environments demonstrate the effectiveness of our proposal at mitigation within this environment.
    摘要 尽管机器学习公平性的文献丰富,但对复杂系统进行改进却得到了 relativamente poco 的关注。在这篇论文中,我们首先表明了自然基线方法用于提高平等机会公平性的扩展级数和标签数的乘积,导致它们成为实用不可能。然后,我们介绍了两种简单的技术,即任务过程和组分排序,以实现在多组多标签设置中的常数扩展。我们在学术和实际环境中进行了实验,并证明了我们的提议的效果。

MoP-CLIP: A Mixture of Prompt-Tuned CLIP Models for Domain Incremental Learning

  • paper_url: http://arxiv.org/abs/2307.05707
  • repo_url: None
  • paper_authors: Julien Nicolas, Florent Chiaroni, Imtiaz Ziko, Ola Ahmad, Christian Desrosiers, Jose Dolz
  • for: 提高逐步学习中的分布逐步增长问题的解决方案。
  • methods: 使用权重调整CLIP模型的权重,通过在训练阶段学习每个类别在每个频谱中的特征分布,并在推理阶段使用这些学习的分布来选择正确的描述符进行分类任务。
  • results: 在标准DIL设置下与现状方法竞争,而在OOD场景下超过现状方法表现。这些结果表明MoP-CLIP的优越性,提供一种可靠和普适的分布逐步增长问题的解决方案。
    Abstract Despite the recent progress in incremental learning, addressing catastrophic forgetting under distributional drift is still an open and important problem. Indeed, while state-of-the-art domain incremental learning (DIL) methods perform satisfactorily within known domains, their performance largely degrades in the presence of novel domains. This limitation hampers their generalizability, and restricts their scalability to more realistic settings where train and test data are drawn from different distributions. To address these limitations, we present a novel DIL approach based on a mixture of prompt-tuned CLIP models (MoP-CLIP), which generalizes the paradigm of S-Prompting to handle both in-distribution and out-of-distribution data at inference. In particular, at the training stage we model the features distribution of every class in each domain, learning individual text and visual prompts to adapt to a given domain. At inference, the learned distributions allow us to identify whether a given test sample belongs to a known domain, selecting the correct prompt for the classification task, or from an unseen domain, leveraging a mixture of the prompt-tuned CLIP models. Our empirical evaluation reveals the poor performance of existing DIL methods under domain shift, and suggests that the proposed MoP-CLIP performs competitively in the standard DIL settings while outperforming state-of-the-art methods in OOD scenarios. These results demonstrate the superiority of MoP-CLIP, offering a robust and general solution to the problem of domain incremental learning.
    摘要 尽管最近的增量学习进步有所,但 catastrophic forgetting 下 distributional drift 问题仍然是一个打开的和重要的问题。实际上,当前的领域增量学习(DIL)方法在已知的领域中表现良好,但在新的领域出现时,其性能很差。这限制了它们的普遍性和可扩展性,使得它们在更真实的设置中无法扩展。为解决这些限制,我们提出了一种基于混合提示 CLIP 模型(MoP-CLIP)的新的 DIL 方法。在训练阶段,我们模型每个领域中的类别特征分布,学习具体的文本和视觉提示来适应给定的领域。在推理阶段,学习的分布使我们能够判断一个测试样本是否属于已知的领域,选择相应的提示进行分类任务,或者是从未看过的领域,利用混合的提示 tuned CLIP 模型。我们的实验表明,现有的 DIL 方法在领域变化下表现很差,而我们的 MoP-CLIP 方法在标准 DIL 设置中表现竞争力强,而且在 OOD 情况下表现出色。这些结果表明 MoP-CLIP 的优越性,提供一种可靠和通用的领域增量学习解决方案。

A Causal Ordering Prior for Unsupervised Representation Learning

  • paper_url: http://arxiv.org/abs/2307.05704
  • repo_url: None
  • paper_authors: Avinash Kori, Pedro Sanchez, Konstantinos Vilouras, Ben Glocker, Sotirios A. Tsaftaris
  • for: 这篇论文主要是为了解决无监督表示学习中的独立假设问题。
  • methods: 该论文提出了一种基于 функциональ causal模型的完全无监督表示学习方法,通过激励干扰空间遵循 causal 顺序来强制实现。
  • results: 该方法可以在无监督情况下,通过考虑数据生成过程中的干扰噪声模型,学习出高效的表示。
    Abstract Unsupervised representation learning with variational inference relies heavily on independence assumptions over latent variables. Causal representation learning (CRL), however, argues that factors of variation in a dataset are, in fact, causally related. Allowing latent variables to be correlated, as a consequence of causal relationships, is more realistic and generalisable. So far, provably identifiable methods rely on: auxiliary information, weak labels, and interventional or even counterfactual data. Inspired by causal discovery with functional causal models, we propose a fully unsupervised representation learning method that considers a data generation process with a latent additive noise model (ANM). We encourage the latent space to follow a causal ordering via loss function based on the Hessian of the latent distribution.
    摘要 文本翻译成简化中文:无监督表示学习中假设离散变量独立,但是 causal representation learning(CRL)则认为数据中变量之间存在 causal 关系。允许假设变量之间存在相关性,是更加现实和普遍的假设。目前已知可证明方法包括: auxillary 信息、弱标签和 intervenitional 或者 counterfactual 数据。 draw inspiration from causal discovery with functional causal models, we propose a fully unsupervised representation learning method that considers a data generation process with a latent additive noise model (ANM)。 we encourage the latent space to follow a causal ordering via loss function based on the Hessian of the latent distribution.

Stack More Layers Differently: High-Rank Training Through Low-Rank Updates

  • paper_url: http://arxiv.org/abs/2307.05695
  • repo_url: https://github.com/guitaricet/peft_pretraining
  • paper_authors: Vladislav Lialin, Namrata Shivagunde, Sherin Muckatira, Anna Rumshisky
  • for: 本文旨在探讨训练大型神经网络时,低级别训练技术的可行性和效果。
  • methods: 本文提出了一种名为ReLoRA的新方法,它利用低级别更新来训练高级别网络。
  • results: 作者通过应用ReLoRA方法对 pré-训练转换器语言模型进行训练,并观察到与常规神经网络训练相同的性能,同时发现ReLoRA方法的效率随模型大小增长。
    Abstract Despite the dominance and effectiveness of scaling, resulting in large networks with hundreds of billions of parameters, the necessity to train overparametrized models remains poorly understood, and alternative approaches do not necessarily make it cheaper to train high-performance models. In this paper, we explore low-rank training techniques as an alternative approach to training large neural networks. We introduce a novel method called ReLoRA, which utilizes low-rank updates to train high-rank networks. We apply ReLoRA to pre-training transformer language models with up to 350M parameters and demonstrate comparable performance to regular neural network training. Furthermore, we observe that the efficiency of ReLoRA increases with model size, making it a promising approach for training multi-billion-parameter networks efficiently. Our findings shed light on the potential of low-rank training techniques and their implications for scaling laws.
    摘要 尽管拓扑和效果性的拓扑在大型神经网络中具有很高的表现,然而训练过参数化模型的必要性还未得到充分理解,而代理方法不一定可以降低训练高性能模型的成本。在这篇论文中,我们探索使用低级别训练技术来训练大型神经网络。我们介绍了一种新的方法called ReLoRA,它利用低级别更新来训练高级别网络。我们应用ReLoRA来预训练变换器语言模型,最多350万参数,并观察到它们的性能与正常神经网络训练相当。此外,我们发现ReLoRA的效率随 modelo 的大小增长,这显示了它在训练多亿参数网络时的潜在优势。我们的发现 shed light onto the potential of low-rank training techniques and their implications for scaling laws.

Self-consistency for open-ended generations

  • paper_url: http://arxiv.org/abs/2307.06857
  • repo_url: None
  • paper_authors: Siddhartha Jain, Xiaofei Ma, Anoop Deoras, Bing Xiang
  • for: 提高语言模型生成质量
  • methods: 使用易于计算的对生成序列进行对比,选择最佳生成
  • results: 在代码生成、自动ormalization和概要生成等任务中,可以实现强大的生成质量提高
    Abstract Large Language Models (LLMs) can exhibit considerable variation in the quality of their sampled outputs. Reranking and selecting the best generation from the sampled set is a popular way of obtaining strong gains in generation quality. In this paper, we present a novel approach for reranking LLM generations. Unlike other techniques that might involve additional inferences or training a specialized reranker, our approach relies on easy to compute pairwise statistics between the generations that have minimal compute overhead. We show that our approach can be formalized as an extension of self-consistency and analyze its performance in that framework, theoretically as well as via simulations. We show strong improvements for selecting the best $k$ generations for code generation tasks as well as robust improvements for best generation for the tasks of autoformalization, and summarization. While our approach only assumes black-box access to LLMs, we show that additional access to token probabilities can improve performance even further.
    摘要 大型语言模型(LLM)的样本输出质量可能会异常差异。重新排名和选择样本中的最佳一代是一种常见的方法来提高生成质量。在这篇论文中,我们提出了一种新的重新排名LLM生成的方法。与其他技术不同,我们的方法不需要额外的推理或训练特殊的重新排名器,而是基于易于计算的对生成之间的对比统计。我们表明了我们的方法可以视为自适应性的扩展,并通过理论和仿真来分析其性能。我们在代码生成任务和自动ormalization、概要任务中都显示了强大的改进,而且对于最佳一代的选择还能够具有Robust性。我们的方法只需要黑盒访问LLM,但我们还证明了在Token概率信息可以提高性能的情况下,性能可以更加出色。

Empowering Cross-lingual Behavioral Testing of NLP Models with Typological Features

  • paper_url: http://arxiv.org/abs/2307.05454
  • repo_url: https://github.com/google-research/multi-morph-checklist
  • paper_authors: Ester Hlavnova, Sebastian Ruder
  • for: 这个论文的目的是探讨如何为世界各语言的自然语言处理(NLP)系统进行普适性测试。
  • methods: 这篇论文提出了一种基于形态意识的测试框架,可以评测NLP模型在不同语言特征下的行为。
  • results: 通过使用这种测试框架, authors发现了一些现代语言模型在某些语言特征下的普适性问题,如在斯瓦希利语中的时间表达和芬兰语中的复合possessive表达。这些发现鼓励了开发更加擅长这些特征的NLP模型。
    Abstract A challenge towards developing NLP systems for the world's languages is understanding how they generalize to typological differences relevant for real-world applications. To this end, we propose M2C, a morphologically-aware framework for behavioral testing of NLP models. We use M2C to generate tests that probe models' behavior in light of specific linguistic features in 12 typologically diverse languages. We evaluate state-of-the-art language models on the generated tests. While models excel at most tests in English, we highlight generalization failures to specific typological characteristics such as temporal expressions in Swahili and compounding possessives in Finish. Our findings motivate the development of models that address these blind spots.
    摘要 世界各语言的自然语言处理(NLP)系统的发展受到了语言类型学上的挑战。我们提出了M2C框架,它能够考虑语言的Typological differences,用于测试NLP模型的行为。我们使用M2C生成了12种语言中的特点特征的测试集,并评估了当前的语言模型。虽然模型在英语上表现出色,但我们发现了特定的语言特征,如斯瓦希利语的时间表达和芬兰语的复合所有格,导致模型的扩展缺陷。我们的发现激励了开发更加全面的模型。

ISLTranslate: Dataset for Translating Indian Sign Language

  • paper_url: http://arxiv.org/abs/2307.05440
  • repo_url: https://github.com/exploration-lab/isltranslate
  • paper_authors: Abhinav Joshi, Susmit Agrawal, Ashutosh Modi
  • for: bridge the communication gap between the hard-of-hearing community and the rest of the population
  • methods: using a translation dataset for continuous Indian Sign Language (ISL) consisting of 31k ISL-English sentence/phrase pairs
  • results: the largest translation dataset for continuous Indian Sign Language, and a detailed analysis of the dataset
    Abstract Sign languages are the primary means of communication for many hard-of-hearing people worldwide. Recently, to bridge the communication gap between the hard-of-hearing community and the rest of the population, several sign language translation datasets have been proposed to enable the development of statistical sign language translation systems. However, there is a dearth of sign language resources for the Indian sign language. This resource paper introduces ISLTranslate, a translation dataset for continuous Indian Sign Language (ISL) consisting of 31k ISL-English sentence/phrase pairs. To the best of our knowledge, it is the largest translation dataset for continuous Indian Sign Language. We provide a detailed analysis of the dataset. To validate the performance of existing end-to-end Sign language to spoken language translation systems, we benchmark the created dataset with a transformer-based model for ISL translation.
    摘要 现在,手语翻译数据集已经在全球各地为听力异常的人群提供了一种桥梁,以便开发统计手语翻译系统。然而,印度手语资源匮乏,这篇资源文章介绍了一个名为ISLTranslate的翻译集,该集包含31,000个连续印度手语(ISL)-英语句子/短语对。据我们所知,这是最大的连续印度手语翻译集。我们对该集进行了详细分析。为验证现有的端到端手语到语言翻译系统的性能,我们对创建的dataset进行了基于转换器的ISL翻译模型的测试。

Metropolis Sampling for Constrained Diffusion Models

  • paper_url: http://arxiv.org/abs/2307.05439
  • repo_url: None
  • paper_authors: Nic Fishman, Leo Klarner, Emile Mathieu, Michael Hutchinson, Valentin de Bortoli
  • for: 这篇论文的主要目标是提出一种新的抑杂方法,以提高在拥有约束的拟合问题上的生成模型的计算效率和实验性能。
  • methods: 该论文使用了 Metropolis 抽样法,并证明了该新的抑杂过程对于反射抽样动力的有效性。
  • results: 该论文通过应用在多种具有凸和非凸约束的问题上,包括地理模型、机器人和蛋白质设计等领域,并取得了较高的计算效率和实验性能。
    Abstract Denoising diffusion models have recently emerged as the predominant paradigm for generative modelling. Their extension to Riemannian manifolds has facilitated their application to an array of problems in the natural sciences. Yet, in many practical settings, such manifolds are defined by a set of constraints and are not covered by the existing (Riemannian) diffusion model methodology. Recent work has attempted to address this issue by employing novel noising processes based on logarithmic barrier methods or reflected Brownian motions. However, the associated samplers are computationally burdensome as the complexity of the constraints increases. In this paper, we introduce an alternative simple noising scheme based on Metropolis sampling that affords substantial gains in computational efficiency and empirical performance compared to the earlier samplers. Of independent interest, we prove that this new process corresponds to a valid discretisation of the reflected Brownian motion. We demonstrate the scalability and flexibility of our approach on a range of problem settings with convex and non-convex constraints, including applications from geospatial modelling, robotics and protein design.
    摘要 近些年来,杜雷尼泊 diffusion 模型在生成模型方面占据了主导地位。它们的扩展到里曼尼泊 manifold 使得它们在自然科学中的应用受到了推广。然而,在许多实际应用中,这些泊 manifold 通常是通过一些约束定义的,而现有的(里曼尼泊)扩散模型方法不适用。 latest work 尝试使用新的杂散过程,基于对数梯度方法或反射布朗尼动的新的杂散过程。然而,相关的抽样器 computationally burdensome ,随着约束的复杂度增加。在本文中,我们介绍了一种新的简单杂散方案,基于 Metropolis 抽样,它提供了substantial 的计算效率和实际性提升,相比于之前的抽样器。此外,我们证明了这新的过程对于反射布朗尼动的有效积分。我们在具有凸和非凸约束的问题设定中展示了我们的方法的可扩展性和灵活性,包括地理模型、机器人和蛋白质设计等应用。

Improving the Security of Smartwatch Payment with Deep Learning

  • paper_url: http://arxiv.org/abs/2307.05437
  • repo_url: None
  • paper_authors: George Webber
  • for: 这篇论文旨在提高智能手表支付系统的安全性,使用深度学习技术来实现较少的手势数量来进行授权。
  • methods: 本论文使用了深度学习技术建立了一个高效的授权系统,并且使用了增强器模型生成了SyntheticUser-specific手势。
  • results: 本论文的结果显示,使用这些增强器模型生成的手势可以帮助授权系统增强其分类能力,并且不需要用户提供太多的手势来进行授权。
    Abstract Making contactless payments using a smartwatch is increasingly popular, but this payment medium lacks traditional biometric security measures such as facial or fingerprint recognition. In 2022, Sturgess et al. proposed WatchAuth, a system for authenticating smartwatch payments using the physical gesture of reaching towards a payment terminal. While effective, the system requires the user to undergo a burdensome enrolment period to achieve acceptable error levels. In this dissertation, we explore whether applications of deep learning can reduce the number of gestures a user must provide to enrol into an authentication system for smartwatch payment. We firstly construct a deep-learned authentication system that outperforms the current state-of-the-art, including in a scenario where the target user has provided a limited number of gestures. We then develop a regularised autoencoder model for generating synthetic user-specific gestures. We show that using these gestures in training improves classification ability for an authentication system. Through this technique we can reduce the number of gestures required to enrol a user into a WatchAuth-like system without negatively impacting its error rates.
    摘要 使用智能手表进行无接触支付已成为流行的趋势,但这种支付方式缺乏传统的生物认证安全措施,如面部或指纹识别。在2022年,Sturgess等人提出了WatchAuth系统,用于通过智能手表支付的物理姿势认证。虽然有效,但系统需要用户进行负担重的批处程序来实现接受的错误率。在这个论文中,我们 explore了 whether deep learning可以减少用户为WatchAuth系统进行身份验证所需的姿势数量。我们首先构建了深度学习Auth系统,超越当前状态的艺术。然后,我们开发了一个彩色autoencoder模型,用于生成个性化用户特定的姿势。我们发现,使用这些姿势在训练中可以提高身份验证系统的分类能力。通过这种技术,我们可以降低用户进行WatchAuth系统的批处程序,无需增加错误率。

One-Versus-Others Attention: Scalable Multimodal Integration

  • paper_url: http://arxiv.org/abs/2307.05435
  • repo_url: https://github.com/rsinghlab/ovo
  • paper_authors: Michal Golovanevsky, Eva Schiller, Akira Nair, Ritambhara Singh, Carsten Eickhoff
  • for: 本研究旨在提出一种适用于多Modal learning模型的域外注意机制,以解决现有模型在多个模式之间进行注意的复杂性问题。
  • methods: 我们提出了一种名为“一对一”(OvO)注意机制,该机制可以在多个模式之间进行注意,并且与传统的对比注意方法相比,它具有线性的复杂度增长。
  • results: 我们通过三个真实世界数据集和一个 simulations экспериментирова,证明了我们的方法可以提高性能,同时降低计算成本。
    Abstract Multimodal learning models have become increasingly important as they surpass single-modality approaches on diverse tasks ranging from question-answering to autonomous driving. Despite the importance of multimodal learning, existing efforts focus on NLP applications, where the number of modalities is typically less than four (audio, video, text, images). However, data inputs in other domains, such as the medical field, may include X-rays, PET scans, MRIs, genetic screening, clinical notes, and more, creating a need for both efficient and accurate information fusion. Many state-of-the-art models rely on pairwise cross-modal attention, which does not scale well for applications with more than three modalities. For $n$ modalities, computing attention will result in $n \choose 2$ operations, potentially requiring considerable amounts of computational resources. To address this, we propose a new domain-neutral attention mechanism, One-Versus-Others (OvO) attention, that scales linearly with the number of modalities and requires only $n$ attention operations, thus offering a significant reduction in computational complexity compared to existing cross-modal attention algorithms. Using three diverse real-world datasets as well as an additional simulation experiment, we show that our method improves performance compared to popular fusion techniques while decreasing computation costs.
    摘要 多模态学习模型在不同任务上表现越来越重要,如问答系统和自动驾驶等。虽然多模态学习在自然语言处理领域得到了广泛应用,但是现有的努力主要集中在单模态方面,而另外的领域,如医疗领域,数据输入可能包括X射线、PET扫描、MRI、基因检测、临床笔记等,需要有效地并准确地融合信息。许多当今顶尖模型依靠对称交叉模态注意力,但是这种方法不能扩展到更多的模态。为了解决这个问题,我们提出了一种新的领域中立注意力机制,即一对一注意力(OvO),该机制与模态数量直线相关,只需要进行n个注意力操作,与现有的交叉模态注意力算法相比,可以获得显著的计算复杂性减少。使用三个多样化的实际数据集以及一个额外的模拟实验,我们显示了我们的方法在与流行的融合技术进行比较时,提高性能而减少计算成本。

Self-Supervised Learning with Lie Symmetries for Partial Differential Equations

  • paper_url: http://arxiv.org/abs/2307.05432
  • repo_url: None
  • paper_authors: Grégoire Mialon, Quentin Garrido, Hannah Lawrence, Danyal Rehman, Yann LeCun, Bobak T. Kiani
  • for: 这篇论文是为了学习泛化的方程式 differential equations 而写的,以获得更加 computationally efficient 的代替方法,并且可能会广泛地影响科学和工程领域。
  • methods: 这篇论文使用了 joint embedding methods for self-supervised learning (SSL) 来学习不同数据源的泛化表示,并且实现了一种基于自动适应的方法来学习 PDEs 的泛化表示。
  • results: 该论文的表示方法在对不同数据源的泛化任务上表现出色,比如可以正确预测 PDE 的系数,同时也提高了基于神经网络的时间步长性能。
    Abstract Machine learning for differential equations paves the way for computationally efficient alternatives to numerical solvers, with potentially broad impacts in science and engineering. Though current algorithms typically require simulated training data tailored to a given setting, one may instead wish to learn useful information from heterogeneous sources, or from real dynamical systems observations that are messy or incomplete. In this work, we learn general-purpose representations of PDEs from heterogeneous data by implementing joint embedding methods for self-supervised learning (SSL), a framework for unsupervised representation learning that has had notable success in computer vision. Our representation outperforms baseline approaches to invariant tasks, such as regressing the coefficients of a PDE, while also improving the time-stepping performance of neural solvers. We hope that our proposed methodology will prove useful in the eventual development of general-purpose foundation models for PDEs.
    摘要 Note:* "differential equations" is translated as " diferencial equations" in Simplified Chinese, with the word "diff" being translated as " diferen".* "numerical solvers" is translated as "numerical solvers" in Simplified Chinese.* "self-supervised learning" is translated as "自我超vised learning" in Simplified Chinese.* "PDEs" is translated as "PDEs" in Simplified Chinese.* "baseline approaches" is translated as "基线方法" in Simplified Chinese.* "invariant tasks" is translated as " invariable tasks" in Simplified Chinese.* "time-stepping performance" is translated as "时间步长性能" in Simplified Chinese.* "foundation models" is translated as "基础模型" in Simplified Chinese.

Geometric Neural Diffusion Processes

  • paper_url: http://arxiv.org/abs/2307.05431
  • repo_url: https://github.com/cambridge-mlg/neural_diffusion_processes
  • paper_authors: Emile Mathieu, Vincent Dutordoir, Michael J. Hutchinson, Valentin De Bortoli, Yee Whye Teh, Richard E. Turner
  • for: 用于模elling非euclidian空间中的自然科学问题,包括symmetries和非euclidian空间中的数据。
  • methods: 使用噪声扩散模型,并在这些模型中添加几何先验来保持几何协议。使用神经网络来近似分子,使其对几何协议变换。
  • results: 通过这些条件,生成函数模型可以具有相同的几何结构。通过一种新的朗格朗-基尔 conditional sampler,能够适应复杂的托管场景,并在真实世界天气数据中进行验证。
    Abstract Denoising diffusion models have proven to be a flexible and effective paradigm for generative modelling. Their recent extension to infinite dimensional Euclidean spaces has allowed for the modelling of stochastic processes. However, many problems in the natural sciences incorporate symmetries and involve data living in non-Euclidean spaces. In this work, we extend the framework of diffusion models to incorporate a series of geometric priors in infinite-dimension modelling. We do so by a) constructing a noising process which admits, as limiting distribution, a geometric Gaussian process that transforms under the symmetry group of interest, and b) approximating the score with a neural network that is equivariant w.r.t. this group. We show that with these conditions, the generative functional model admits the same symmetry. We demonstrate scalability and capacity of the model, using a novel Langevin-based conditional sampler, to fit complex scalar and vector fields, with Euclidean and spherical codomain, on synthetic and real-world weather data.
    摘要 德尔冲散度模型已经证明是一种灵活且有效的生成模型。其最近的扩展到无穷维euclidian空间已经允许模型sto噪声过程。然而,自然科学中的许多问题含有对称和非euclidian空间的数据。在这种情况下,我们将 diffusion模型扩展到包括一系列几何约束。我们通过以下两种方法来实现这一点:a) 构建一个具有限定分布为几何加载过程的噪声过程,该过程在对称群中变换,并且b) 使用对称于这个群的神经网络来近似分数。我们证明,在这些条件下,生成函数模型具有同样的对称性。我们还证明了这种模型的扩展性和容量,通过一种基于Langevin方程的条件采样器,可以适应复杂的托管场景和实际天气数据。

Differential Analysis of Triggers and Benign Features for Black-Box DNN Backdoor Detection

  • paper_url: http://arxiv.org/abs/2307.05422
  • repo_url: https://github.com/fu1001hao/five-metrics-detector
  • paper_authors: Hao Fu, Prashanth Krishnamurthy, Siddharth Garg, Farshad Khorrami
  • for: 该论文提出了一种数据效率的侦测方法,用于检测深度神经网络中的后门攻击,在黑盒场景下。
  • methods: 该方法基于触发器特征的 intuition,即触发器特征对于决定后门网络输出的影响比任何其他正常特征更高。为量化触发器和正常特征对后门网络输出的影响,我们引入了五个指标。
  • results: 我们的方法可以在广泛的后门攻击下表现出色,包括ablation study和与现有方法进行比较。我们还展示了我们的方法可以在线测试中识别恶意攻击的样本。
    Abstract This paper proposes a data-efficient detection method for deep neural networks against backdoor attacks under a black-box scenario. The proposed approach is motivated by the intuition that features corresponding to triggers have a higher influence in determining the backdoored network output than any other benign features. To quantitatively measure the effects of triggers and benign features on determining the backdoored network output, we introduce five metrics. To calculate the five-metric values for a given input, we first generate several synthetic samples by injecting the input's partial contents into clean validation samples. Then, the five metrics are computed by using the output labels of the corresponding synthetic samples. One contribution of this work is the use of a tiny clean validation dataset. Having the computed five metrics, five novelty detectors are trained from the validation dataset. A meta novelty detector fuses the output of the five trained novelty detectors to generate a meta confidence score. During online testing, our method determines if online samples are poisoned or not via assessing their meta confidence scores output by the meta novelty detector. We show the efficacy of our methodology through a broad range of backdoor attacks, including ablation studies and comparison to existing approaches. Our methodology is promising since the proposed five metrics quantify the inherent differences between clean and poisoned samples. Additionally, our detection method can be incrementally improved by appending more metrics that may be proposed to address future advanced attacks.
    摘要 Here is the text in Simplified Chinese:这篇论文提出了一种数据效率的深度神经网络响应攻击检测方法,在黑盒enario下进行检测。该方法基于 triggers 相关的特征具有更大的影响深度神经网络输出的INTUITION。为了量化 triggers 和其他无害特征对深度神经网络输出的影响,作者们引入了五个指标。为计算这些指标的值,作者们首先生成了一些synthetic samples,通过在净验证数据中插入输入的部分内容。然后,作者们计算了五个指标的值,使用相应的synthetic samples的输出标签。本文使用了一个tiny的净验证数据集,并从该数据集中训练了五个新颖检测器。一个meta新颖检测器将五个训练好的新颖检测器的输出进行拟合,生成一个meta confidence score。在在线测试中,方法通过评估meta confidence score来判断在线样本是否被毒化。作者们通过许多backdoor攻击的ablation study和相对比较来证明方法的效果。该方法可能性很大,因为提出的五个指标可以量化clean和毒化样本之间的本质差异。此外,检测方法可以通过逐渐添加更多的指标来进一步提高检测效果,以应对未来的更高级攻击。

Boosting Feedback Efficiency of Interactive Reinforcement Learning by Adaptive Learning from Scores

  • paper_url: http://arxiv.org/abs/2307.05405
  • repo_url: https://github.com/sskkai/interactive-scoring-irl
  • paper_authors: Shukai Liu, Chenming Wu, Ying Li, Liangjun Zhang
  • for: The paper is written to improve the feedback efficiency of interactive reinforcement learning by using scores provided by humans instead of pairwise preferences.
  • methods: The paper proposes an adaptive learning scheme that uses scores to train a behavioral policy in a sparse reward environment, and it is insensitive to imperfect or unreliable scores.
  • results: The proposed method can efficiently learn near-optimal policies by adaptive learning from scores while requiring less feedback compared to pairwise preference learning methods.Here’s the Chinese translation of the three key points:
  • for: 本 paper 的目的是提高互动强化学习的反馈效率,使用人类提供的分数而不是对比喜好来进行学习。
  • methods: 本 paper 提出了一种适应学习方案,使用分数来训练一个行为政策,并且对于不准确或不可靠的分数,提出了一种适应学习方法,以避免对学习过程的不良影响。
  • results: 提议的方法可以高效地从分数中学习优化的策略,而且需要更少的反馈,相比于对比喜好学习方法。
    Abstract Interactive reinforcement learning has shown promise in learning complex robotic tasks. However, the process can be human-intensive due to the requirement of a large amount of interactive feedback. This paper presents a new method that uses scores provided by humans instead of pairwise preferences to improve the feedback efficiency of interactive reinforcement learning. Our key insight is that scores can yield significantly more data than pairwise preferences. Specifically, we require a teacher to interactively score the full trajectories of an agent to train a behavioral policy in a sparse reward environment. To avoid unstable scores given by humans negatively impacting the training process, we propose an adaptive learning scheme. This enables the learning paradigm to be insensitive to imperfect or unreliable scores. We extensively evaluate our method for robotic locomotion and manipulation tasks. The results show that the proposed method can efficiently learn near-optimal policies by adaptive learning from scores while requiring less feedback compared to pairwise preference learning methods. The source codes are publicly available at https://github.com/SSKKai/Interactive-Scoring-IRL.
    摘要 互动强化学习已经显示出学习复杂机器人任务的损益。然而,过程可能需要大量互动反馈,这可能会增加人工干预。这篇论文提出了一新的方法,使用人类提供的分数来改善互动强化学习的反馈效率。我们的关键见解是,分数可以产生更多数据,而不需要每个情况都需要人类的反馈。具体来说,我们需要一位教师互动地给出机器人的全程轨迹来训练行为政策。为了避免人类提供的分数不稳定或不可靠影响训练过程,我们提出了一个适应学习方案。这个方法使得学习模式不受不确定或不可靠的分数影响。我们广泛评估了我们的方法,并发现它可以内部学习近乎最佳政策,并且需要较少的反馈。源代码可以在https://github.com/SSKKai/Interactive-Scoring-IRL上获取。

Domain-Agnostic Neural Architecture for Class Incremental Continual Learning in Document Processing Platform

  • paper_url: http://arxiv.org/abs/2307.05399
  • repo_url: https://github.com/mateusz-wojcik-97/domain-agnostic-architecture
  • paper_authors: Mateusz Wójcik, Witold Kościukiewicz, Mateusz Baran, Tomasz Kajdanowicz, Adam Gonczarek
  • for: 这篇论文是用于描述一种基于混合专家模型的完全可微分架构,用于在流动资料中进行分类问题。
  • methods: 本论文使用了可微分学习的混合专家模型,并且不需要内存缓冲。
  • results: 实验结果显示,该架构可以在多个领域中取得最佳性能,并且在生产环境中进行线上学习。该方法与参考方法相比,有着明显的性能优势。
    Abstract Production deployments in complex systems require ML architectures to be highly efficient and usable against multiple tasks. Particularly demanding are classification problems in which data arrives in a streaming fashion and each class is presented separately. Recent methods with stochastic gradient learning have been shown to struggle in such setups or have limitations like memory buffers, and being restricted to specific domains that disable its usage in real-world scenarios. For this reason, we present a fully differentiable architecture based on the Mixture of Experts model, that enables the training of high-performance classifiers when examples from each class are presented separately. We conducted exhaustive experiments that proved its applicability in various domains and ability to learn online in production environments. The proposed technique achieves SOTA results without a memory buffer and clearly outperforms the reference methods.
    摘要

eess.IV - 2023-07-12

On the Importance of Denoising when Learning to Compress Images

  • paper_url: http://arxiv.org/abs/2307.06233
  • repo_url: https://github.com/trougnouf/compression
  • paper_authors: Benoit Brummer, Christophe De Vleeschouwer
  • for: 这个研究旨在提高图像压缩和去噪的效果。
  • methods: 研究人员使用了一个混合不同噪音水平的图像训练集,并将图像压缩和去噪过程结合在一起进行训练。
  • results: 研究人员发现这种方法可以实现更好的率误比和更高的图像质量,比起单独使用图像压缩或去噪方法。
    Abstract Image noise is ubiquitous in photography. However, image noise is not compressible nor desirable, thus attempting to convey the noise in compressed image bitstreams yields sub-par results in both rate and distortion. We propose to explicitly learn the image denoising task when training a codec. Therefore, we leverage the Natural Image Noise Dataset, which offers a wide variety of scenes captured with various ISO numbers, leading to different noise levels, including insignificant ones. Given this training set, we supervise the codec with noisy-clean image pairs, and show that a single model trained based on a mixture of images with variable noise levels appears to yield best-in-class results with both noisy and clean images, achieving better rate-distortion than a compression-only model or even than a pair of denoising-then-compression models with almost one order of magnitude fewer GMac operations.
    摘要 图像噪声是摄影中的普遍存在问题。然而,图像噪声不能压缩 nor 愿意被压缩,因此在压缩图像比特流中传递噪声会导致质量下降。我们提议在编码器训练时显式学习图像杀噪任务。因此,我们利用自然图像噪声数据集,该数据集包括不同ISO数字化水平的场景,导致不同的噪声水平,包括无法快速识别的轻度噪声。我们将这些噪声污染图像与干净图像对组成,并证明了基于这些混合图像的变量噪声水平进行训练的单个模型可以在不同的图像中达到最佳级别的结果,超过了压缩Only模型或者是denoising-then-compression模型,并且具有约一个数量级的GMac操作数量的减少。

CellGAN: Conditional Cervical Cell Synthesis for Augmenting Cytopathological Image Classification

  • paper_url: http://arxiv.org/abs/2307.06182
  • repo_url: https://github.com/zhenrongshen/cellgan
  • paper_authors: Zhenrong Shen, Maosong Cao, Sheng Wang, Lichi Zhang, Qian Wang
  • for: 帮助病理学家更准确地检测预防性癌症。
  • methods: 使用CellGANSynthesize cytopathological images of various cervical cell types to augment patch-level cell classification.
  • results: 实验表明,CellGAN可以生成具有很高可信度的TCT cytopathological images,并且可以大幅提高patch-level细胞分类性能。
    Abstract Automatic examination of thin-prep cytologic test (TCT) slides can assist pathologists in finding cervical abnormality for accurate and efficient cancer screening. Current solutions mostly need to localize suspicious cells and classify abnormality based on local patches, concerning the fact that whole slide images of TCT are extremely large. It thus requires many annotations of normal and abnormal cervical cells, to supervise the training of the patch-level classifier for promising performance. In this paper, we propose CellGAN to synthesize cytopathological images of various cervical cell types for augmenting patch-level cell classification. Built upon a lightweight backbone, CellGAN is equipped with a non-linear class mapping network to effectively incorporate cell type information into image generation. We also propose the Skip-layer Global Context module to model the complex spatial relationship of the cells, and attain high fidelity of the synthesized images through adversarial learning. Our experiments demonstrate that CellGAN can produce visually plausible TCT cytopathological images for different cell types. We also validate the effectiveness of using CellGAN to greatly augment patch-level cell classification performance.
    摘要 自动检查薄准cytologic test(TCT)板块可以帮助病理学家更准确地检测cervical畸形,从而提高癌症检测的效率。现有的解决方案通常需要在local化异常cell和分类异常cell based on local patches,这是因为TCT板块的整个图像非常大。因此需要许多标注正常和异常cervical cell,以进行训练patch-level分类器。在这篇论文中,我们提出了CellGAN,用于生成cytopathological图像。CellGAN基于轻量级的背bone,并配备了非线性类别映射网络,以有效地将细胞类型信息 incorporated into图像生成。我们还提出了Skip-layer Global Context模块,用于模型细胞之间的复杂空间关系,并通过对抗学习来保证生成的图像的高准确性。我们的实验表明,CellGAN可以生成可信度高的TCT cytopathological图像。此外,我们还验证了使用CellGAN可以大幅提高patch-level细胞分类性能。

Learning Kernel-Modulated Neural Representation for Efficient Light Field Compression

  • paper_url: http://arxiv.org/abs/2307.06143
  • repo_url: None
  • paper_authors: Jinglei Shi, Yihong Xu, Christine Guillemot
  • for: This paper is written for the purpose of compressing light field data, which is a type of image data that captures 3D scene information.
  • methods: The paper proposes a compact neural network representation for light field compression, which is inspired by the visual characteristics of Sub-Aperture Images (SAIs) of light fields. The network is composed of two types of kernels: descriptive kernels that store scene description information, and modulatory kernels that control the rendering of different SAIs from the queried perspectives.
  • results: The paper demonstrates that the proposed method outperforms other state-of-the-art methods by a significant margin in the light field compression task. Additionally, the modulators learned from one light field can be transferred to new light fields for rendering dense views, indicating a potential solution for the view synthesis task.
    Abstract Light field is a type of image data that captures the 3D scene information by recording light rays emitted from a scene at various orientations. It offers a more immersive perception than classic 2D images but at the cost of huge data volume. In this paper, we draw inspiration from the visual characteristics of Sub-Aperture Images (SAIs) of light field and design a compact neural network representation for the light field compression task. The network backbone takes randomly initialized noise as input and is supervised on the SAIs of the target light field. It is composed of two types of complementary kernels: descriptive kernels (descriptors) that store scene description information learned during training, and modulatory kernels (modulators) that control the rendering of different SAIs from the queried perspectives. To further enhance compactness of the network meanwhile retain high quality of the decoded light field, we accordingly introduce modulator allocation and kernel tensor decomposition mechanisms, followed by non-uniform quantization and lossless entropy coding techniques, to finally form an efficient compression pipeline. Extensive experiments demonstrate that our method outperforms other state-of-the-art (SOTA) methods by a significant margin in the light field compression task. Moreover, after aligning descriptors, the modulators learned from one light field can be transferred to new light fields for rendering dense views, indicating a potential solution for view synthesis task.
    摘要 光场是一种图像数据类型,记录了场景中光束的多个方向信息。它提供了比 класси二dimensional图像更有彩虹的感受,但是需要巨大的数据量。在这篇论文中,我们从Sub-Aperture Image(SAI)的视觉特征中继承了灵感,并设计了一种可靠的神经网络表示方法 для光场压缩任务。网络背部使用随机初始化的噪声作为输入,并在SAI的目标光场上进行监督。它包括两种相辅相成的核心:描述核心(描述器),这些核心将在训练中学习的场景描述信息存储在内部,以及调制核心(调制器),这些核心控制来自不同视角的SAI的渲染。为了进一步提高网络的Compactness,同时保持高质量的解码光场,我们采用了调制器分配机制、核心矩阵分解机制、非均匀量化和无损编码技术。最终,我们形成了高效的压缩管道。广泛的实验证明了我们的方法在光场压缩任务中具有明显的优势,并且将描述器调整后,从一个光场中学习的调制器可以转移到新的光场中,表示一种可能的视 synthesis 任务的解决方案。

Spatially-Adaptive Learning-Based Image Compression with Hierarchical Multi-Scale Latent Spaces

  • paper_url: http://arxiv.org/abs/2307.06102
  • repo_url: None
  • paper_authors: Fabian Brand, Alexander Kopte, Kristian Fischer, André Kaup
  • for: 提高图像和视频压缩系统的效率
  • methods: 使用多尺度latent space和加速单元实现多尺度处理
  • results: 比传统自适应压缩网络提高7%的比特率,同时Complexity和解码时间均降低
    Abstract Adaptive block partitioning is responsible for large gains in current image and video compression systems. This method is able to compress large stationary image areas with only a few symbols, while maintaining a high level of quality in more detailed areas. Current state-of-the-art neural-network-based image compression systems however use only one scale to transmit the latent space. In previous publications, we proposed RDONet, a scheme to transmit the latent space in multiple spatial resolutions. Following this principle, we extend a state-of-the-art compression network by a second hierarchical latent-space level to enable multi-scale processing. We extend the existing rate variability capabilities of RDONet by a gain unit. With that we are able to outperform an equivalent traditional autoencoder by 7% rate savings. Furthermore, we show that even though we add an additional latent space, the complexity only increases marginally and the decoding time can potentially even be decreased.
    摘要 adaptive block partitioning 已经使现代图像和视频压缩系统中获得了大量的提升。这种方法可以压缩大的静止图像区域只需几个符号,同时保持更详细的区域的高质量。current state-of-the-art neural network-based image compression systems however 使用 solo scale 传输缓存空间。在先前的发表文章中,我们提议了 RDONet 方案,该方案在多个空间分辨率下传输缓存空间。基于这个原理,我们扩展了现有的 compression network ,添加了第二层嵌入空间级别,以实现多尺度处理。我们还扩展了 RDONet 的存在变化能力,添加了一个 gain unit。通过这些扩展,我们能够超越相同的传统 autoencoder ,减少了7%的比特量。此外,我们还表明,尽管我们添加了一个额外的嵌入空间,但复杂度只增加了微不足,并且解码时间可能实际下降。

ConvNeXt-ChARM: ConvNeXt-based Transform for Efficient Neural Image Compression

  • paper_url: http://arxiv.org/abs/2307.06342
  • repo_url: None
  • paper_authors: Ahmed Ghorbel, Wassim Hamidouche, Luce Morin
  • for: 这个论文的目的是提出一个高效的ConvNeXt-基本的对称代码架构,以提高对称代码的 compressive 性和重建精度。
  • methods: 本文使用了ConvNeXt-基本的对称代码架构,并与 compute-efficient 的通道自适应条件预测(Channel-wise auto-regressive prior,简称 CARP)结合,以捕捉对称代码中的全局和本地上下文。
  • results: 实验结果显示,ConvNeXt-ChARM 在四个常用的测试数据集上实现了平均5.24%和1.22%的BD-rate(PSNR)reduction,较VVC referencencoder(VTM-18.0)和SwinT-ChARM更高。此外,我们还进行了模型缩减研究和一些对象和主观分析,以显示ConvNeXt-ChARM 的 Computational efficiency和Performance gap。
    Abstract Over the last few years, neural image compression has gained wide attention from research and industry, yielding promising end-to-end deep neural codecs outperforming their conventional counterparts in rate-distortion performance. Despite significant advancement, current methods, including attention-based transform coding, still need to be improved in reducing the coding rate while preserving the reconstruction fidelity, especially in non-homogeneous textured image areas. Those models also require more parameters and a higher decoding time. To tackle the above challenges, we propose ConvNeXt-ChARM, an efficient ConvNeXt-based transform coding framework, paired with a compute-efficient channel-wise auto-regressive prior to capturing both global and local contexts from the hyper and quantized latent representations. The proposed architecture can be optimized end-to-end to fully exploit the context information and extract compact latent representation while reconstructing higher-quality images. Experimental results on four widely-used datasets showed that ConvNeXt-ChARM brings consistent and significant BD-rate (PSNR) reductions estimated on average to 5.24% and 1.22% over the versatile video coding (VVC) reference encoder (VTM-18.0) and the state-of-the-art learned image compression method SwinT-ChARM, respectively. Moreover, we provide model scaling studies to verify the computational efficiency of our approach and conduct several objective and subjective analyses to bring to the fore the performance gap between the next generation ConvNet, namely ConvNeXt, and Swin Transformer.
    摘要 过去几年,神经网络压缩得到了广泛的研究和业界的关注,并且实现了较好的端到端深度神经编码器,比传统编码器在比特率-质量性能方面表现更出色。然而,现有方法,包括注意力基于转换编码,仍然需要进一步改进,以降低编码率,保持重建准确度,特别是在非同质图像区域。这些模型还需要更多的参数和更高的解码时间。为解决以上挑战,我们提出了 ConvNeXt-ChARM,一个高效的 ConvNeXt 基于的转换编码框架,配备了 compute-efficient 通道 wise 自适应梯度估计。该框架可以通过练习策略来完全利用上下文信息,抽象出更紧凑的干扰表示,并重建更高质量的图像。我们对四个广泛使用的数据集进行实验,结果表明,ConvNeXt-ChARM 可以在 PSNR 和 BD-rate 上提供了平均下降约 5.24% 和 1.22%,相比 VVC 参考编码器 (VTM-18.0) 和 state-of-the-art 学习型图像压缩方法 SwinT-ChARM。此外,我们还提供了模型缩放研究,以证明我们的方法的计算效率,并进行了一些对象和主观分析,以强调 ConvNeXt 的性能差异与 Swin Transformer。

AICT: An Adaptive Image Compression Transformer

  • paper_url: http://arxiv.org/abs/2307.06091
  • repo_url: None
  • paper_authors: Ahmed Ghorbel, Wassim Hamidouche, Luce Morin
  • for: 提高SwinT-ChARM的效率调查
  • methods: 使用更直观却有效的通道wise自动回归先前模型,并利用学习缩放模块和ConvNeXt预/后处理器来提取更紧凑的归一化表示
  • results: 相比VVC参考解码器和SwinT-ChARM,提出的自适应图像压缩 трансформа器(AICT)框架在质量与编码效率之间做出了显著改进
    Abstract Motivated by the efficiency investigation of the Tranformer-based transform coding framework, namely SwinT-ChARM, we propose to enhance the latter, as first, with a more straightforward yet effective Tranformer-based channel-wise auto-regressive prior model, resulting in an absolute image compression transformer (ICT). Current methods that still rely on ConvNet-based entropy coding are limited in long-range modeling dependencies due to their local connectivity and an increasing number of architectural biases and priors. On the contrary, the proposed ICT can capture both global and local contexts from the latent representations and better parameterize the distribution of the quantized latents. Further, we leverage a learnable scaling module with a sandwich ConvNeXt-based pre/post-processor to accurately extract more compact latent representation while reconstructing higher-quality images. Extensive experimental results on benchmark datasets showed that the proposed adaptive image compression transformer (AICT) framework significantly improves the trade-off between coding efficiency and decoder complexity over the versatile video coding (VVC) reference encoder (VTM-18.0) and the neural codec SwinT-ChARM.
    摘要 Translated into Simplified Chinese:受到 transformer 基于 transform 编码框架的效率研究的启发,我们提议通过更直观又有效的 transformer 基于通道自动回归先前模型来增强其他,并得到一个绝对图像压缩变换器(ICT)。当前仍然基于 ConvNet 的 entropy 编码方法受到了其局部连接和逐渐增加的建筑约束和偏好的限制,无法正确模型长距离依赖关系。相比之下,我们提议的 ICT 可以从 latent 表示中捕捉到全局和局部上下文,更好地参数化压缩 latent 的分布。此外,我们还利用了一个可学习的缩放模块和一个 ConvNeXt 基于的前/后处理器来准确地提取更加紧凑的 latent 表示,并在重建更高质量的图像时进行更好的重建。我们在标准测试集上进行了广泛的实验研究,结果表明,我们提议的自适应图像压缩转换器(AICT)框架可以在 coding 效率和解码器复杂度之间进行更好的平衡,并超过了 VVC 参考编码器(VTM-18.0)和神经编码器 SwinT-ChARM。

Flexible and Fully Quantized Ultra-Lightweight TinyissimoYOLO for Ultra-Low-Power Edge Systems

  • paper_url: http://arxiv.org/abs/2307.05999
  • repo_url: None
  • paper_authors: Julian Moosmann, Hanna Mueller, Nicky Zimmerman, Georg Rutishauser, Luca Benini, Michele Magno
  • for: 这篇论文旨在探讨和探索基于边缘系统的ultra-lightweight对象检测网络TinyissimoYOLO的变体,以实现边缘系统中的几十毫瓦级电力范围内的对象检测。
  • methods: 本论文使用了TinyissimoYOLO的多种变体,通过实验测量,对网络的检测性能进行了全面的特征化,包括输入分辨率、对象类型数量和隐藏层调整的影响。
  • results: 本论文通过实验测量,对不同平台的响应时间和能效率进行了对比,并将TinyissimoYOLO部署到了现代最低功耗极端边缘平台上,包括GAP9 from Greenwaves、STM32H7 from ST Microelectronics、STM32L4 from STM和Apollo4b from Ambiq等。实验结果表明,GAP9的硬件加速器可以在2.12ms和150uJ的情况下实现最低的推理延迟和能效率,比最佳竞争平台MAX78000高2倍和20%。
    Abstract This paper deploys and explores variants of TinyissimoYOLO, a highly flexible and fully quantized ultra-lightweight object detection network designed for edge systems with a power envelope of a few milliwatts. With experimental measurements, we present a comprehensive characterization of the network's detection performance, exploring the impact of various parameters, including input resolution, number of object classes, and hidden layer adjustments. We deploy variants of TinyissimoYOLO on state-of-the-art ultra-low-power extreme edge platforms, presenting an in-depth a comparison on latency, energy efficiency, and their ability to efficiently parallelize the workload. In particular, the paper presents a comparison between a novel parallel RISC-V processor (GAP9 from Greenwaves) with and without use of its on-chip hardware accelerator, an ARM Cortex-M7 core (STM32H7 from ST Microelectronics), two ARM Cortex-M4 cores (STM32L4 from STM and Apollo4b from Ambiq), and a multi-core platform with a CNN hardware accelerator (Analog Devices MAX78000). Experimental results show that the GAP9's hardware accelerator achieves the lowest inference latency and energy at 2.12ms and 150uJ respectively, which is around 2x faster and 20% more efficient than the next best platform, the MAX78000. The hardware accelerator of GAP9 can even run an increased resolution version of TinyissimoYOLO with 112x112 pixels and 10 detection classes within 3.2ms, consuming 245uJ. To showcase the competitiveness of a versatile general-purpose system we also deployed and profiled a multi-core implementation on GAP9 at different operating points, achieving 11.3ms with the lowest-latency and 490uJ with the most energy-efficient configuration. With this paper, we demonstrate the suitability and flexibility of TinyissimoYOLO on state-of-the-art detection datasets for real-time ultra-low-power edge inference.
    摘要

FreeSeed: Frequency-band-aware and Self-guided Network for Sparse-view CT Reconstruction

  • paper_url: http://arxiv.org/abs/2307.05890
  • repo_url: https://github.com/masaaki-75/freeseed
  • paper_authors: Chenglong Ma, Zilong Li, Junping Zhang, Yi Zhang, Hongming Shan
  • for: 提高简单视图计算机 tomography(CT)图像的速度和辐射暴露减少,但重建图像仍然受到严重的扭曲痕迹的影响,这些痕迹会影响后续的检查和诊断。
  • methods: 我们提出了一种基于深度学习的图像后处理方法,以及其双域对应的方法,可以显著提高图像质量。
  • results: 我们的方法可以有效地除除扭曲痕迹和损失细节,并且在简单视图CT图像重建方法中表现出色,比前者更高效。
    Abstract Sparse-view computed tomography (CT) is a promising solution for expediting the scanning process and mitigating radiation exposure to patients, the reconstructed images, however, contain severe streak artifacts, compromising subsequent screening and diagnosis. Recently, deep learning-based image post-processing methods along with their dual-domain counterparts have shown promising results. However, existing methods usually produce over-smoothed images with loss of details due to (1) the difficulty in accurately modeling the artifact patterns in the image domain, and (2) the equal treatment of each pixel in the loss function. To address these issues, we concentrate on the image post-processing and propose a simple yet effective FREquency-band-awarE and SElf-guidED network, termed FreeSeed, which can effectively remove artifact and recover missing detail from the contaminated sparse-view CT images. Specifically, we first propose a frequency-band-aware artifact modeling network (FreeNet), which learns artifact-related frequency-band attention in Fourier domain for better modeling the globally distributed streak artifact on the sparse-view CT images. We then introduce a self-guided artifact refinement network (SeedNet), which leverages the predicted artifact to assist FreeNet in continuing to refine the severely corrupted details. Extensive experiments demonstrate the superior performance of FreeSeed and its dual-domain counterpart over the state-of-the-art sparse-view CT reconstruction methods. Source code is made available at https://github.com/Masaaki-75/freeseed.
    摘要 稀视 computed tomography (CT) 是一种有前途的解决方案,可以快速扫描和减少病人接受到辐射的方法,但是重构图像中仍然存在严重的条纹artefacts,这会对后续检查和诊断造成干扰。近年来,基于深度学习的图像后处理方法以及其双域对应方法已经展示了良好的结果。然而,现有方法通常会生成过滤平滑的图像,losing details due to (1) 缺乏 accurately modeling artifact patterns in the image domain, and (2) treating each pixel equally in the loss function.为了解决这些问题,我们将注意力集中在图像后处理上,并提出了一种简单 yet effective的FREquency-band-awarE and SElf-guidED网络(FreeSeed),可以有效地除掉 artifact和恢复损害的细节。 Specifically, we first propose a frequency-band-aware artifact modeling network(FreeNet),可以在Fourier domain中更好地模型稀视 CT 图像上的全球分布的条纹artefact。然后,我们引入了一种自领导的 artifact refinement network(SeedNet),可以通过预测的artefact来帮助 FreeNet 继续改进严重损害的细节。我们的实验证明了 FreeSeed 和其双域对应方法的超过状态艺术的稀视 CT 重构方法的优越性。源代码可以在 上获取。

Denoising Simulated Low-Field MRI (70mT) using Denoising Autoencoders (DAE) and Cycle-Consistent Generative Adversarial Networks (Cycle-GAN)

  • paper_url: http://arxiv.org/abs/2307.06338
  • repo_url: None
  • paper_authors: Fernando Vega, Abdoljalil Addeh, M. Ethan MacDonald
  • for: 提高低场磁共振成像(MRI)图像质量
  • methods: 使用潮汐律环GAN(Cycle-GAN)和推理 autoencoder(DAE)进行生成和恢复
  • results: 在 simulations 中,Cycle-GAN 能够提高低场 MRI 图像的高场、高分辨率和高信噪比(SNR),而且不需要图像对。
    Abstract In this work, a denoising Cycle-GAN (Cycle Consistent Generative Adversarial Network) is implemented to yield high-field, high resolution, high signal-to-noise ratio (SNR) Magnetic Resonance Imaging (MRI) images from simulated low-field, low resolution, low SNR MRI images. Resampling and additive Rician noise were used to simulate low-field MRI. Images were utilized to train a Denoising Autoencoder (DAE) and a Cycle-GAN, with paired and unpaired cases. Both networks were evaluated using SSIM and PSNR image quality metrics. This work demonstrates the use of a generative deep learning model that can outperform classical DAEs to improve low-field MRI images and does not require image pairs.
    摘要 在这项工作中,我们实现了一种杜因论文(Cycle Consistent Generative Adversarial Network)来生成高场、高分辨率、高信噪比(SNR)核磁共振成像(MRI)图像,从低场、低分辨率、低SNR MRI图像中生成。使用抽样和加法 ricain 噪声来模拟低场 MRI。图像用于训练一个 Denoising Autoencoder(DAE)和一个 Cycle-GAN,包括对应和不对应的情况。两个网络都被评估使用 SSsim 和 PSNR 图像质量指标。这项工作表明了使用生成深度学习模型可以超过传统 DAEs 来改善低场 MRI 图像,并不需要图像对。

Improving Segmentation and Detection of Lesions in CT Scans Using Intensity Distribution Supervision

  • paper_url: http://arxiv.org/abs/2307.05804
  • repo_url: https://github.com/rsummers11/CADLab
  • paper_authors: Seung Yeon Shin, Thomas C. Shen, Ronald M. Summers
  • for: 用于提高 segmentation 和检测网络的训练
  • methods: 使用intensity histogram建立 lesion probability 函数,并将其作为额外监督信息提供给网络训练
  • results: 对小肠癌瘤、肾肿瘤和肺核吸引蛋白的 segmentation 和检测效果进行改进,并且对肾肿瘤检测效果提高了64.6% -> 75.5%。
    Abstract We propose a method to incorporate the intensity information of a target lesion on CT scans in training segmentation and detection networks. We first build an intensity-based lesion probability (ILP) function from an intensity histogram of the target lesion. It is used to compute the probability of being the lesion for each voxel based on its intensity. Finally, the computed ILP map of each input CT scan is provided as additional supervision for network training, which aims to inform the network about possible lesion locations in terms of intensity values at no additional labeling cost. The method was applied to improve the segmentation of three different lesion types, namely, small bowel carcinoid tumor, kidney tumor, and lung nodule. The effectiveness of the proposed method on a detection task was also investigated. We observed improvements of 41.3% -> 47.8%, 74.2% -> 76.0%, and 26.4% -> 32.7% in segmenting small bowel carcinoid tumor, kidney tumor, and lung nodule, respectively, in terms of per case Dice scores. An improvement of 64.6% -> 75.5% was achieved in detecting kidney tumors in terms of average precision. The results of different usages of the ILP map and the effect of varied amount of training data are also presented.
    摘要 We applied this method to improve the segmentation of three lesion types: small bowel carcinoid tumor, kidney tumor, and lung nodule. Our results show improvements of 41.3% to 47.8%, 74.2% to 76.0%, and 26.4% to 32.7% in segmenting these lesions, respectively, in terms of per case Dice scores. We also achieved an improvement of 64.6% to 75.5% in detecting kidney tumors in terms of average precision.We also explored the effect of using the ILP map in different ways and the impact of varying amounts of training data. Our results show that the ILP map can be used effectively to improve the accuracy of lesion segmentation and detection, and that more training data can lead to better performance.

A Hierarchical Transformer Encoder to Improve Entire Neoplasm Segmentation on Whole Slide Image of Hepatocellular Carcinoma

  • paper_url: http://arxiv.org/abs/2307.05800
  • repo_url: None
  • paper_authors: Zhuxian Guo, Qitong Wang, Henning Müller, Themis Palpanas, Nicolas Loménie, Camille Kurtz
  • for: This paper is written for the purpose of proposing a novel deep learning architecture for entire neoplasm segmentation on Whole Slide Image (WSI) of Hepatocellular Carcinoma (HCC).
  • methods: The paper uses a hierarchical Transformer encoder, called HiTrans, to learn global dependencies within expanded 4096x4096 WSI patches.
  • results: The proposed method leads to better segmentation performance by taking into account regional and global dependency information.
    Abstract In digital histopathology, entire neoplasm segmentation on Whole Slide Image (WSI) of Hepatocellular Carcinoma (HCC) plays an important role, especially as a preprocessing filter to automatically exclude healthy tissue, in histological molecular correlations mining and other downstream histopathological tasks. The segmentation task remains challenging due to HCC's inherent high-heterogeneity and the lack of dependency learning in large field of view. In this article, we propose a novel deep learning architecture with a hierarchical Transformer encoder, HiTrans, to learn the global dependencies within expanded 4096$\times$4096 WSI patches. HiTrans is designed to encode and decode the patches with larger reception fields and the learned global dependencies, compared to the state-of-the-art Fully Convolutional Neural networks (FCNN). Empirical evaluations verified that HiTrans leads to better segmentation performance by taking into account regional and global dependency information.
    摘要 在数字 histopathology 中,整个肿瘤分 segmentation 在 Whole Slide Image (WSI) 的肝细胞癌 (HCC) 中扮演着重要的角色,特别是作为自动排除健康组织的预处理过滤器,在 histological molecular 相关性挖掘和其他下游 histopathological 任务中。该分 segmentation 任务仍然具有挑战性,因为 HCC 的内在高积分和大视场视野中的不具有依赖学习。在这篇文章中,我们提出了一种新的深度学习架构,即层次 Transformer 编码器 HiTrans,用于学习大视场范围内的全局依赖关系。HiTrans 设计用于编码和解码大视场范围内的补丁,并学习全局依赖关系,比之前的状态态准FCNN(完全 convolutional neural networks)更高效。实验证明,HiTrans 可以更好地进行分 segmentation,通过考虑区域和全局依赖信息。

3D Medical Image Segmentation based on multi-scale MPU-Net

  • paper_url: http://arxiv.org/abs/2307.05799
  • repo_url: https://github.com/Stefan-Yu404/MP-UNet
  • paper_authors: Zeqiu. Yu, Shuo. Han, Ziheng. Song
  • for: 这个论文是为了提出一种基于Transformer的快速准确肿瘤分割模型,以解决自动化肿瘤分割的问题。
  • methods: 这个模型使用了Transformer搭配全球注意机制,以便更好地捕捉肿瘤的深度相关性和多尺度信息。它还具有多尺度模块和交叉注意机制,以增强特征抽取和整合。
  • results: 根据LiTS 2017数据集测试,MPU-Net模型比标准的U-Net模型显著提高了肿瘤分割效果,其最佳分割结果中的 dice、准确率、特征率、准确率、IOU和MCC指标均达到了92.17%、99.08%、91.91%、99.52%、85.91%和91.74%。这些优秀的指标表现 illustrate this framework’s exceptional performance in automatic medical image segmentation.
    Abstract The high cure rate of cancer is inextricably linked to physicians' accuracy in diagnosis and treatment, therefore a model that can accomplish high-precision tumor segmentation has become a necessity in many applications of the medical industry. It can effectively lower the rate of misdiagnosis while considerably lessening the burden on clinicians. However, fully automated target organ segmentation is problematic due to the irregular stereo structure of 3D volume organs. As a basic model for this class of real applications, U-Net excels. It can learn certain global and local features, but still lacks the capacity to grasp spatial long-range relationships and contextual information at multiple scales. This paper proposes a tumor segmentation model MPU-Net for patient volume CT images, which is inspired by Transformer with a global attention mechanism. By combining image serialization with the Position Attention Module, the model attempts to comprehend deeper contextual dependencies and accomplish precise positioning. Each layer of the decoder is also equipped with a multi-scale module and a cross-attention mechanism. The capability of feature extraction and integration at different levels has been enhanced, and the hybrid loss function developed in this study can better exploit high-resolution characteristic information. Moreover, the suggested architecture is tested and evaluated on the Liver Tumor Segmentation Challenge 2017 (LiTS 2017) dataset. Compared with the benchmark model U-Net, MPU-Net shows excellent segmentation results. The dice, accuracy, precision, specificity, IOU, and MCC metrics for the best model segmentation results are 92.17%, 99.08%, 91.91%, 99.52%, 85.91%, and 91.74%, respectively. Outstanding indicators in various aspects illustrate the exceptional performance of this framework in automatic medical image segmentation.
    摘要 难以自动分割目标器官的主要问题在于三维组织结构的不规则性,导致自动分割医学影像中的准确率偏低。为了解决这问题,这篇论文提出了一种基于Transformer的吸引机制的肿瘤分割模型(MPU-Net),旨在提高分割精度。该模型通过图像序列化和位置吸引模块来理解更深层次的Contextual Dependencies,并通过多层次模块和交叉吸引机制来强化特征抽取和集成。在LiTS2017数据集上测试和评估,MPU-Net模型的分割结果较为出色,比对标本UNet模型更高。分割指标(dice、准确率、精度、特征率、IOU和MCC)的最佳值分别为92.17%、99.08%、91.91%、99.52%、85.91%和91.74%。这些精度指标在不同的方面都达到了 Exceptional Performance,illustrating the superior performance of this framework in automatic medical image segmentation.

SepHRNet: Generating High-Resolution Crop Maps from Remote Sensing imagery using HRNet with Separable Convolution

  • paper_url: http://arxiv.org/abs/2307.05700
  • repo_url: None
  • paper_authors: Priyanka Goyal, Sohan Patnaik, Adway Mitra, Manjira Sinha
  • for: 这个研究旨在提高Remote Sensing图像分析的精度,以提供更好的食物安全性、资源管理和可持续农业实践。
  • methods: 本研究使用Deep Learning技术分析高分辨率卫星图像,并将HRNet搭配分离型梯度层和自我注意力层,以捕捉空间和时间特征。HRNet模型 serves as a backbone,提取高分辨率特征,而分离型梯度层在浅层中更有效地捕捉细节状态。自我注意力层则捕捉长期时间的依赖关系。最后,一个CNN嵌入式产生了农作物地图。
  • results: 本研究在Zuericrop数据集上实现了97.5%的分类准确率和55.2%的IoU值,而与现有模型相比,其结果有所超越。
    Abstract The accurate mapping of crop production is crucial for ensuring food security, effective resource management, and sustainable agricultural practices. One way to achieve this is by analyzing high-resolution satellite imagery. Deep Learning has been successful in analyzing images, including remote sensing imagery. However, capturing intricate crop patterns is challenging due to their complexity and variability. In this paper, we propose a novel Deep learning approach that integrates HRNet with Separable Convolutional layers to capture spatial patterns and Self-attention to capture temporal patterns of the data. The HRNet model acts as a backbone and extracts high-resolution features from crop images. Spatially separable convolution in the shallow layers of the HRNet model captures intricate crop patterns more effectively while reducing the computational cost. The multi-head attention mechanism captures long-term temporal dependencies from the encoded vector representation of the images. Finally, a CNN decoder generates a crop map from the aggregated representation. Adaboost is used on top of this to further improve accuracy. The proposed algorithm achieves a high classification accuracy of 97.5\% and IoU of 55.2\% in generating crop maps. We evaluate the performance of our pipeline on the Zuericrop dataset and demonstrate that our results outperform state-of-the-art models such as U-Net++, ResNet50, VGG19, InceptionV3, DenseNet, and EfficientNet. This research showcases the potential of Deep Learning for Earth Observation Systems.
    摘要 准确地映射农作物生产是确保食品安全、有效地资源管理以及可持续的农业实践的关键。一种实现这一目标的方法是通过分析高分辨率卫星图像。深度学习在分析图像方面取得了成功,但是捕捉农作物细致图案是困难的,因为它们的复杂性和变化性。在这篇论文中,我们提出了一种新的深度学习方法,该方法将HRNet模型作为背bone,并将彩色卷积 layer和自注意力机制结合在一起,以捕捉空间和时间特征。HRNet模型在农作物图像中提取高分辨率特征。在浅层的彩色卷积 layer中,使用分解卷积来更有效地捕捉农作物图案。多头注意力机制在编码器中捕捉图像序列中的长期时间相关性。最后,一个CNN解码器将生成农作物地图。使用Adaboost进一步提高准确率。我们的算法在Zuericrop数据集上实现了97.5%的分类精度和55.2%的IOU,在生成农作物地图方面表现出色,并超过了state-of-the-art模型 such as U-Net++, ResNet50、VGG19、InceptionV3、DenseNet和EfficientNet。这项研究展示了深度学习在地球观测系统中的潜力。

cs.SD - 2023-07-11

ProgGP: From GuitarPro Tablature Neural Generation To Progressive Metal Production

  • paper_url: http://arxiv.org/abs/2307.05328
  • repo_url: None
  • paper_authors: Jackson Loth, Pedro Sarmento, CJ Carr, Zack Zukowski, Mathieu Barthet
  • for: 本研究旨在使用符号音乐生成技术,通过人类-AI合作创作Progressive Metal乐曲。
  • methods: 我们使用一个预训练的Transformer模型,在ProgGP数据集上进行微调,以生成多个吉他、贝司、鼓、钢琴和乐队部分。
  • results: 我们通过结合计算音乐学和实践研究两种方法进行验证,并证明模型能够生成有效的乐曲。最后,我们使用这个模型创作了一首完整的Progressive Metal乐曲,由人类金属制作人混音和混音。
    Abstract Recent work in the field of symbolic music generation has shown value in using a tokenization based on the GuitarPro format, a symbolic representation supporting guitar expressive attributes, as an input and output representation. We extend this work by fine-tuning a pre-trained Transformer model on ProgGP, a custom dataset of 173 progressive metal songs, for the purposes of creating compositions from that genre through a human-AI partnership. Our model is able to generate multiple guitar, bass guitar, drums, piano and orchestral parts. We examine the validity of the generated music using a mixed methods approach by combining quantitative analyses following a computational musicology paradigm and qualitative analyses following a practice-based research paradigm. Finally, we demonstrate the value of the model by using it as a tool to create a progressive metal song, fully produced and mixed by a human metal producer based on AI-generated music.
    摘要 近期在 симвоlic music generation 领域的研究表明使用 GuitarPro 格式的 токен化作为输入和输出表示有价值。我们在这个基础上进一步扩展,通过对 ProgGP 数据集(包含 173 首进步金属歌曲)的先验学习,以创造该类型的作品。我们的模型可以生成多个电 guitar、 Bass guitar、鼓、钢琴和管弦部分。我们通过混合计算音乐学和实践研究两种方法进行验证,并证明模型的有效性。最后,我们利用该模型创造一首完整的进步金属歌曲,由人工制作和混音。

ShredGP: Guitarist Style-Conditioned Tablature Generation

  • paper_url: http://arxiv.org/abs/2307.05324
  • repo_url: None
  • paper_authors: Pedro Sarmento, Adarsh Kumar, Dekun Xie, CJ Carr, Zack Zukowski, Mathieu Barthet
  • for: 本研究旨在生成符合不同电吉他演奏风格的Tablature notation,用于模拟四位知名电吉他手的演奏风格。
  • methods: 本研究使用Transformer预测器生成Tablature notation,并采用计算音乐学方法分析Token的特征,以评估每位吉他手的独特性。
  • results: 研究发现,使用多种乐器资源的ShredGP模型和使用 solo guitar数据的ShredGP模型均能生成符合目标吉他手风格的Tablature notation,并且使用BERT模型对生成的例子进行分类,结果表明ShredGP模型能够生成与目标吉他手风格相符的内容。
    Abstract GuitarPro format tablatures are a type of digital music notation that encapsulates information about guitar playing techniques and fingerings. We introduce ShredGP, a GuitarPro tablature generative Transformer-based model conditioned to imitate the style of four distinct iconic electric guitarists. In order to assess the idiosyncrasies of each guitar player, we adopt a computational musicology methodology by analysing features computed from the tokens yielded by the DadaGP encoding scheme. Statistical analyses of the features evidence significant differences between the four guitarists. We trained two variants of the ShredGP model, one using a multi-instrument corpus, the other using solo guitar data. We present a BERT-based model for guitar player classification and use it to evaluate the generated examples. Overall, results from the classifier show that ShredGP is able to generate content congruent with the style of the targeted guitar player. Finally, we reflect on prospective applications for ShredGP for human-AI music interaction.
    摘要 《GuitarPro格式 tablature 是一种数字音乐notation的形式,它包含 гитар演奏技巧和指法信息。我们介绍 ShredGP,一种基于 Transformer 模型的 GuitarPro tablature生成器,conditioned 以模仿四位知名电 гитара演奏者的风格。为了评估每位 гитар演奏者的特点,我们采用了计算音乐学方法,分析从 DadaGP 编码方案中生成的 token 的特征。统计分析显示,这些特征存在显著的差异。我们训练了两种不同的 ShredGP 模型,一种使用多种乐器资料,另一种使用 solo гитара数据。我们使用 BERT 模型进行 гитар演奏者分类,并用它来评估生成的示例。总的来说,结果表明 ShredGP 能够生成与目标 гитар演奏者风格相符的内容。最后,我们讨论了 ShredGP 在人工智能音乐互动方面的前景。》Note: The translation is in Simplified Chinese, which is the standardized form of Chinese used in mainland China. The Traditional Chinese form of the translation is also available upon request.

Point to the Hidden: Exposing Speech Audio Splicing via Signal Pointer Nets

  • paper_url: http://arxiv.org/abs/2307.05641
  • repo_url: None
  • paper_authors: Denise Moussa, Germans Hirsch, Sebastian Wankerl, Christian Riess
  • for: 协助刑事调查中证明语音记录的完整性
  • methods: 使用分析和深度学习方法检测语音掉包操作
  • results: 在具有压缩和噪声的语音数据上实现6-10%的性能提升
    Abstract Verifying the integrity of voice recording evidence for criminal investigations is an integral part of an audio forensic analyst's work. Here, one focus is on detecting deletion or insertion operations, so called audio splicing. While this is a rather easy approach to alter spoken statements, careful editing can yield quite convincing results. For difficult cases or big amounts of data, automated tools can support in detecting potential editing locations. To this end, several analytical and deep learning methods have been proposed by now. Still, few address unconstrained splicing scenarios as expected in practice. With SigPointer, we propose a pointer network framework for continuous input that uncovers splice locations naturally and more efficiently than existing works. Extensive experiments on forensically challenging data like strongly compressed and noisy signals quantify the benefit of the pointer mechanism with performance increases between about 6 to 10 percentage points.
    摘要 确认语音录音证据的完整性是专业Audio forensic analyst的重要任务之一。在这里,一个重点是检测删除或插入操作,也就是语音拼接。这是轻松地修改说话的方法,但是精心编辑可以获得非常有条理的结果。对于困难的案例或大量数据,自动工具可以帮助检测可能的编辑位置。为了解决这个问题,一些分析和深度学习方法已经被提出。然而,大多数方法仍然无法处理无条件拼接情况,这是实际应用中的一个挑战。我们透过SigPointer提出了一个指标网络框架,可以自然地检测拼接位置,并且较 existing works 效率高。实际实验表明,在专业挑战性的数据上,SigPointer 能够提高性能约6到10 percentage points。

Aeroacoustic testing on a full aircraft model at high Reynolds numbers in the European Transonic Windtunnel

  • paper_url: http://arxiv.org/abs/2307.05140
  • repo_url: None
  • paper_authors: Thomas Ahlefeldt, Daniel Ernst, Armin Goudarzi, Hans-Georg-Raumer, Carsten Spehr
  • for: 这个论文是为了评估液化和高压风洞测量 results close to real-world Reynolds numbers 而写的。
  • methods: 论文使用了一种端到端方法,包括选择麦克风器、测量参数、数组设计和流参数选择,以分离 Reynolds 数和 Mach 数的影响。
  • results: 论文提供了三维扩散结果,使用 CLEAN-SC 混合,选择了区域兴趣和相应的源谱。结果表明,封闭测试部件对扩散结果的影响较小,而 Reynolds 数对扩散结果有深入、非线性的影响,随着 Reynolds 数的增加,这种影响逐渐减弱。此外,源显示出 Mach 数对不同 Reynolds 数下的非线性关系,但是在相同的 Mach 数范围内是自similar的。这些结果表明,可以通过使用小规模全模型在实际 Reynolds 数下进行研究,以便在未来进行进一步的调查,如ource directivity的研究。
    Abstract This paper presents an end-to-end approach for the assessment of pressurized and cryogenic wind tunnel measurements of an EMBRAER scaled full model close to real-world Reynolds numbers. The choice of microphones, measurement parameters, the design of the array, and the selection of flow parameters are discussed. Different wind tunnel conditions are proposed which allow separating the influence of the Reynolds number from the Mach number, as well as the influence of slotted and closed test sections. The paper provides three-dimensional beamforming results with CLEAN-SC deconvolution, the selection of regions of interest, and the corresponding source spectra. The results suggest that slotted test sections have little influence on the beamforming results compared to closed test sections and that the Reynolds number has a profound, non-linear impact on the aeroacoustic emission that lessens with increasing Reynolds number. Further, sources show a non-linear Mach number dependency at constant Reynolds number but are self-similar in the observed Mach number range. The findings suggest that it is possible to study real-world phenomena on small-scale full models at real-world Reynolds numbers, which enable further investigations in the future such as the directivity of sources.
    摘要

Optimizing Feature Extraction for Symbolic Music

  • paper_url: http://arxiv.org/abs/2307.05107
  • repo_url: https://github.com/didoneproject/music_symbolic_features
  • paper_authors: Federico Simonetta, Ana Llorens, Martín Serrano, Eduardo García-Portugués, Álvaro Torrente
  • for: 本研究探讨了现有的符号音乐特征提取工具,并对其性能进行了比较,以确定一个给定乐谱的音乐风格的最佳特征集。
  • methods: 我们提出了一种新的特征提取工具,名为musif,并评估了其在不同的乐谱和文件格式(如MIDI、MusicXML和kern)中的性能。musif在计算效率方面与现有的jsymbolic和music21 Tool相似,而且提供了更好的可用性 для自定义特征开发。
  • results: 我们发现,使用不同的特征集可以提高分类精度,而且每个特征集需要不同的计算资源。我们发现,将各个工具中的最佳特征集结合使用,而不是单一工具的特征集,能够获得最佳的结果。为便于未来的音乐信息检索研究,我们发布了工具的源代码和benchmark。
    Abstract This paper presents a comprehensive investigation of existing feature extraction tools for symbolic music and contrasts their performance to determine the set of features that best characterizes the musical style of a given music score. In this regard, we propose a novel feature extraction tool, named musif, and evaluate its efficacy on various repertoires and file formats, including MIDI, MusicXML, and **kern. Musif approximates existing tools such as jSymbolic and music21 in terms of computational efficiency while attempting to enhance the usability for custom feature development. The proposed tool also enhances classification accuracy when combined with other sets of features. We demonstrate the contribution of each set of features and the computational resources they require. Our findings indicate that the optimal tool for feature extraction is a combination of the best features from each tool rather than those of a single one. To facilitate future research in music information retrieval, we release the source code of the tool and benchmarks.
    摘要 Here's the text in Simplified Chinese:这篇论文对现有的符号音乐特征提取工具进行了全面的比较,以确定符号音乐谱的音乐风格的最佳特征集。在这个意义上,我们提出了一种新的特征提取工具,名为musif,并评估了它在不同的乐谱和文件格式,如MIDI、MusicXML和**kern中的表现。musif与jsymbolic和music21等工具相比,在计算效率方面具有相似的效果,同时尝试提高自定义特征开发的可用性。我们的发现表明,使用多个工具的最佳特征集是更好的,而不是单一的工具。为将来的音乐信息检索研究提供便利,我们发布了工具的源代码和标准测试数据。

The smarty4covid dataset and knowledge base: a framework enabling interpretable analysis of audio signals

  • paper_url: http://arxiv.org/abs/2307.05096
  • repo_url: None
  • paper_authors: Konstantia Zarkogianni, Edmund Dervakos, George Filandrianos, Theofanis Ganitidis, Vasiliki Gkatzou, Aikaterini Sakagianni, Raghu Raghavendra, C. L. Max Nikias, Giorgos Stamou, Konstantina S. Nikita
  • for: This paper is written for developing and validating a framework for generating counterfactual explanations in opaque AI-based COVID-19 risk detection models using the smarty4covid dataset.
  • methods: The paper uses the smarty4covid dataset, which contains audio signals of cough, regular breathing, deep breathing, and voice, as well as other self-reported information, to develop and validate a framework for generating counterfactual explanations in opaque AI-based COVID-19 risk detection models.
  • results: The paper proposes a new framework for generating counterfactual explanations in opaque AI-based COVID-19 risk detection models using the smarty4covid dataset, and validates the effectiveness of the framework through experiments.
    Abstract Harnessing the power of Artificial Intelligence (AI) and m-health towards detecting new bio-markers indicative of the onset and progress of respiratory abnormalities/conditions has greatly attracted the scientific and research interest especially during COVID-19 pandemic. The smarty4covid dataset contains audio signals of cough (4,676), regular breathing (4,665), deep breathing (4,695) and voice (4,291) as recorded by means of mobile devices following a crowd-sourcing approach. Other self reported information is also included (e.g. COVID-19 virus tests), thus providing a comprehensive dataset for the development of COVID-19 risk detection models. The smarty4covid dataset is released in the form of a web-ontology language (OWL) knowledge base enabling data consolidation from other relevant datasets, complex queries and reasoning. It has been utilized towards the development of models able to: (i) extract clinically informative respiratory indicators from regular breathing records, and (ii) identify cough, breath and voice segments in crowd-sourced audio recordings. A new framework utilizing the smarty4covid OWL knowledge base towards generating counterfactual explanations in opaque AI-based COVID-19 risk detection models is proposed and validated.
    摘要 利用人工智能(AI)和移动医疗(m-health)探索新生物标志物(biomarker),检测呼吸畸形/疾病的开始和进程,在COVID-19大流行期间吸引了科学界和研究人员的广泛关注。smarty4covid数据集包含了4,676个呼吸音频记录(呼吸正常)、4,665个常规呼吸记录(常规呼吸)、4,695个深呼吸记录(深呼吸)和4,291个语音记录(语音),通过移动设备进行了人群投票方法收集。此外还包括其他自我报告信息(例如COVID-19病毒检测结果),因此提供了开发COVID-19风险检测模型的全面数据集。smarty4covid数据集以web-ontology语言(OWL)知识库的形式发布,允许数据集的整合、复杂查询和推理。它已经被用于开发能够:(i)从常规呼吸记录中提取临床有用的呼吸指标,和(ii)在人群投票 Audio记录中识别喊吸、呼吸和语音段落。一种基于smarty4covid OWL知识库的新框架,用于生成COVID-19风险检测模型的对比性解释,已经被提出并验证。

Musical Excellence of Mridangam: an introductory review

  • paper_url: http://arxiv.org/abs/2307.09425
  • repo_url: None
  • paper_authors: Arvind Shankar Kumar
  • for: 本论文主要针对listeners, artistes和制造者,旨在通过科学方法探讨印度古典鼓rument——Mridangam的独特音色性。
  • methods: 本论文使用了音乐分析的科学方法,从Dr. CV Raman的开创性研究开始,介绍了Musical Excellence of Mridangam中的基本科学概念,并对之前的科学研究进行了简要的讨论。
  • results: 本论文通过分析Musical Excellence of Mridangam中的各章节,揭示了Mridangam的独特音色性,包括其各种音高、音域、音色和演奏技巧等方面的特点。最后,本论文结合了这些科学研究结果,总结了Mridangam的音乐价值和科学意义。
    Abstract This is an introductory review of Musical Excellence of Mridangam by Dr. Umayalpuram K Sivaraman, Dr. T Ramasami and Dr. Naresh, which is a scientific treatise exploring the unique tonal properties of the ancient Indian classical percussive instrument -- the Mridangam. This review aims to bridge the gap between the primary intended audience of Musical Excellence of Mridangam - listeners, artistes and makers -- and the scientific rigour with which the original treatise is written, by first introducing the concepts of musical analysis and then presenting and explaining the discoveries made within this context. The first three chapters of this review introduce the basic scientific concepts used in Musical Excellence of Mridangam and provides background to previous scientific research into this instrument, starting from the seminal work of Dr. CV Raman. This also includes brief discussions of the corresponding chapters in Musical Excellence of Mridangam. The next chapters all serve the purpose of explaining the main scientific results presented in Musical Excellence of Mridangam in each of the corresponding chapters in the treatise, and finally summarizing the relevance of the work.
    摘要 这是一篇介绍《MRIDANGAM的音乐优良》by Dr. Umayalpuram K Sivaraman、Dr. T Ramasami和Dr. Naresh的科学评论,这是一部探讨古代印度古典打击乐器——MRIDANGAM的独特音征的科学著作。本评论的目的是将《MRIDANGAM的音乐优良》中的科学严谨性 bridged 到listeners、artistes和制造者的主要target audience,首先介绍音乐分析的概念,然后介绍和解释《MRIDANGAM的音乐优良》中的发现。本评论的前三章介绍了《MRIDANGAM的音乐优良》中使用的基本科学概念,并提供了 précédente scientific research 的背景,从CV Raman的著作开始。这些章节还包括《MRIDANGAM的音乐优良》中对应的章节的简要讨论。接下来的章节都是解释《MRIDANGAM的音乐优良》中每一章的主要科学成果,最后总结这项工作的 relevance。

Learning Spatial Features from Audio-Visual Correspondence in Egocentric Videos

  • paper_url: http://arxiv.org/abs/2307.04760
  • repo_url: None
  • paper_authors: Sagnik Majumder, Ziad Al-Halah, Kristen Grauman
  • for: 本研究旨在学习基于 egocentric 视频中的空间音视相关性的表示,以提高视频中人脸检测和空间声音干扰等任务的性能。
  • methods: 本研究使用 masked auto-encoding 框架,通过 audio 和视频之间的相关性来生成做masked binaural audio,从而学习有用的空间关系。
  • results: 我们通过大量的实验表明,我们的特征可以超过多个 state-of-the-art 基elines在两个公共的 egocentric 视频数据集上,包括 EgoCom 和 EasyCom。
    Abstract We propose a self-supervised method for learning representations based on spatial audio-visual correspondences in egocentric videos. In particular, our method leverages a masked auto-encoding framework to synthesize masked binaural audio through the synergy of audio and vision, thereby learning useful spatial relationships between the two modalities. We use our pretrained features to tackle two downstream video tasks requiring spatial understanding in social scenarios: active speaker detection and spatial audio denoising. We show through extensive experiments that our features are generic enough to improve over multiple state-of-the-art baselines on two public challenging egocentric video datasets, EgoCom and EasyCom. Project: http://vision.cs.utexas.edu/projects/ego_av_corr.
    摘要 我们提出了一种自助学习的方法,通过 egocentric 视频中的空间声音视觉对应关系学习表示。特别是,我们利用了一个掩码自动编码框架,通过声音和视觉之间的同步,Synthesize 掩码声音,从而学习了声音和视觉之间有用的空间关系。我们使用我们预训练的特征来解决两个需要社交场景中的空间理解的视频任务:活跃人员检测和空间声音降噪。我们通过广泛的实验表明,我们的特征可以超越多个州OF-the-art 基线在两个公共的 egocentric 视频数据集上,EgoCom 和 EasyCom。项目:http://vision.cs.utexas.edu/projects/ego_av_corr。

Retrieval of phonemes and Kohonen algorithm

  • paper_url: http://arxiv.org/abs/2307.07407
  • repo_url: None
  • paper_authors: Brunello Tirozzi, Orchidea Maria Lecian
  • for: 这个研究是为了提出一种phoneme-retrieval技术,这种技术基于网络的特殊构建方式。
  • methods: 这个研究使用的方法是给出一个初始集合neurons,这些neurons的数量大约等于数据中典型结构的数量。例如,如果网络是用于语音 retrieve then neurons的数量必须等于语言中所使用的phonemes的数量。
  • results: 这个研究的结果表明,这种phoneme-retrieval技术可以很好地处理语音和图像等多种数据类型,但是它的性能受到学习样本的影响很大。例如,如果学习样本只包含一些特定的语音,那么网络就只能用于这些语音的recognition。
    Abstract A phoneme-retrieval technique is proposed, which is due to the particular way of the construction of the network. An initial set of neurons is given. The number of these neurons is approximately equal to the number of typical structures of the data. For example if the network is built for voice retrieval then the number of neurons must be equal to the number of characteristic phonemes of the alphabet of the language spoken by the social group to which the particular person belongs. Usually this task is very complicated and the network can depend critically on the samples used for the learning. If the network is built for image retrieval then it works only if the data to be retrieved belong to a particular set of images. If the network is built for voice recognition it works only for some particular set of words. A typical example is the words used for the flight of airplanes. For example a command like the "airplane should make a turn of 120 degrees towards the east" can be easily recognized by the network if a suitable learning procedure is used.
    摘要 提出了一种phoneme-retrieval技术,它归功于网络的特定构建方式。给定一个初始集合neurons。这些neurons的数量约等于数据的典型结构数量。例如,如果建立了语音 Retrieval 网络,那么neurons的数量必须等于语言中所用的特征 Phone 的数量。通常,这个任务非常复杂,网络的学习过程取决于采用的样本。如果建立了图像 Retrieval 网络,它只能Recognize特定集合的图像。如果建立了语音识别网络,它只能识别某些特定集合的单词。例如,飞机航行时使用的命令"飞机应该向东方偏转120度"可以轻松地被网络识别,如果采用合适的学习过程。

Vocal Tract Area Estimation by Gradient Descent

  • paper_url: http://arxiv.org/abs/2307.04702
  • repo_url: https://github.com/dsuedholt/vocal-tract-grad
  • paper_authors: David Südholt, Mateo Cámara, Zhiyuan Xu, Joshua D. Reiss
  • for: 本研究旨在提供一种可读取和灵活控制的方法,以便通过重synthesis来synthesize人声。
  • methods: 本研究使用白盒优化技术来估算glottal source参数和声道形状直接从音频录音中。
  • results: 本研究发现,使用白盒优化技术可以准确地重建控制函数,以便与给定的声音匹配。与遗传优化算法和基于音频预测的神经网络相比,本研究的方法显示出更高的Subjective评价。
    Abstract Articulatory features can provide interpretable and flexible controls for the synthesis of human vocalizations by allowing the user to directly modify parameters like vocal strain or lip position. To make this manipulation through resynthesis possible, we need to estimate the features that result in a desired vocalization directly from audio recordings. In this work, we propose a white-box optimization technique for estimating glottal source parameters and vocal tract shapes from audio recordings of human vowels. The approach is based on inverse filtering and optimizing the frequency response of a wave\-guide model of the vocal tract with gradient descent, propagating error gradients through the mapping of articulatory features to the vocal tract area function. We apply this method to the task of matching the sound of the Pink Trombone, an interactive articulatory synthesizer, to a given vocalization. We find that our method accurately recovers control functions for audio generated by the Pink Trombone itself. We then compare our technique against evolutionary optimization algorithms and a neural network trained to predict control parameters from audio. A subjective evaluation finds that our approach outperforms these black-box optimization baselines on the task of reproducing human vocalizations.
    摘要 <>使用语音记录来直接修改人类声音的特征,例如咽喉压力或舌头位置,可以提供可解释的和灵活的控制方法。为实现这种整合,我们需要从语音记录中直接估计人类声音的特征。在这种工作中,我们提出了一种白盒优化技术,用于估计咽喉源参数和声道形状从语音记录中。这种方法基于反推 filtering和使用波导模型的声道区域函数的梯度下降来估计人类声音的特征。我们将这种方法应用于匹配粉色号,一种交互式语音合成器,的声音。我们发现我们的方法可以准确地重建粉色号中生成的声音的控制函数。然后,我们将这种技术与进化优化算法和基于语音预测的神经网络进行比较。一个主观评估发现,我们的方法在重建人类声音任务中表现出了超过黑盒优化基准的性能。

VampNet: Music Generation via Masked Acoustic Token Modeling

  • paper_url: http://arxiv.org/abs/2307.04686
  • repo_url: https://github.com/hugofloresgarcia/vampnet
  • paper_authors: Hugo Flores Garcia, Prem Seetharaman, Rithesh Kumar, Bryan Pardo
  • for: 这篇论文是用于音乐合成、压缩、缺失和变化的masked acoustic token模型方法的介绍。
  • methods: 这篇论文使用了变化的masking schedule durante training,以便在推断中应用不同的推示方法(称为“提示”)。这个非autoregressive的模型使用了双向transformer架构,对所有的Token进行同时 attend。只需要36个抽样pass,这个模型就可以生成具有高干准的乐音波形。
  • results: 通过不同的提示方法,VampNet可以应用于音乐压缩、缺失、外推、续写和循环等任务,同时维持音乐的风格、种类、乐器等高级特征。这种灵活的提示能力使得VampNet成为一种功能强大的音乐合作工具。
    Abstract We introduce VampNet, a masked acoustic token modeling approach to music synthesis, compression, inpainting, and variation. We use a variable masking schedule during training which allows us to sample coherent music from the model by applying a variety of masking approaches (called prompts) during inference. VampNet is non-autoregressive, leveraging a bidirectional transformer architecture that attends to all tokens in a forward pass. With just 36 sampling passes, VampNet can generate coherent high-fidelity musical waveforms. We show that by prompting VampNet in various ways, we can apply it to tasks like music compression, inpainting, outpainting, continuation, and looping with variation (vamping). Appropriately prompted, VampNet is capable of maintaining style, genre, instrumentation, and other high-level aspects of the music. This flexible prompting capability makes VampNet a powerful music co-creation tool. Code and audio samples are available online.
    摘要 我们介绍VampNet,一种带有遮盾的语音代码模型,用于音乐生成、压缩、缺失填充、变化等任务。在训练中,我们使用可变的遮盾计划,以便在推理时通过不同的遮盾方法(称为“提示”)来采样具有 coherence 的音乐。VampNet 非自然语言生成,利用了双向转换器架构,在前进 passes 中对所有各个标签进行注意力。只需要 36 次采样 passes,VampNet 就可以生成具有高比特率的完整音乐波形。我们表明,通过不同的提示方法,我们可以使 VampNet 应用于音乐压缩、缺失填充、外填充、续写和循环等任务,同时保持音乐的风格、种类、乐器等高级特征。这种灵活的提示能力使得 VampNet 成为一种强大的音乐合作工具。代码和音频示例可以在线找到。

eess.AS - 2023-07-11

Predicting Tuberculosis from Real-World Cough Audio Recordings and Metadata

  • paper_url: http://arxiv.org/abs/2307.04842
  • repo_url: None
  • paper_authors: George P. Kafentzis, Stephane Tetsing, Joe Brew, Lola Jover, Mindaugas Galvosas, Carlos Chaccour, Peter M. Small
  • For: 这个研究旨在提高肺结核病毒检测和诊断的效率,通过使用手机应用程序记录喷气声音,并利用 spectral 和时间频谱特征进行分类。* Methods: 该研究使用了一个非常大的TB和非TB喷气声音数据集,来自南部非洲、印度和东南亚,使用自动化的手机应用程序(Hyfe)进行收集,无需人工标注。研究者采用了统计分类器,基于喷气声音的spectral和时间频谱特征,以及参与者的民生信息和临床特征。* Results: 研究发现,使用喷气声音alone可以达到平均的地区下折线(AUC)约为0.70$\pm$0.05,而加入民生信息和临床特征后,可以提高性能,达到平均的AUC约为0.81$\pm$0.05。这些结果表明,通过 integrating клиниче症状和喷气声音分析,可以通过手机应用程序帮助社区卫生工作者和医疗机构提高肺结核病毒检测和诊断的效率,从而改善公共卫生。
    Abstract Tuberculosis (TB) is an infectious disease caused by the bacterium Mycobacterium tuberculosis and primarily affects the lungs, as well as other body parts. TB is spread through the air when an infected person coughs, sneezes, or talks. Medical doctors diagnose TB in patients via clinical examinations and specialized tests. However, coughing is a common symptom of respiratory diseases such as TB. Literature suggests that cough sounds coming from different respiratory diseases can be distinguished by both medical doctors and computer algorithms. Therefore, cough recordings associated with patients with and without TB seems to be a reasonable avenue of investigation. In this work, we utilize a very large dataset of TB and non-TB cough audio recordings obtained from the south-east of Africa, India, and the south-east of Asia using a fully automated phone-based application (Hyfe), without manual annotation. We fit statistical classifiers based on spectral and time domain features with and without clinical metadata. A stratified grouped cross-validation approach shows that an average Area Under Curve (AUC) of approximately 0.70 $\pm$ 0.05 both for a cough-level and a participant-level classification can be achieved using cough sounds alone. The addition of demographic and clinical factors increases performance, resulting in an average AUC of approximately 0.81 $\pm$ 0.05. Our results suggest mobile phone-based applications that integrate clinical symptoms and cough sound analysis could help community health workers and, most importantly, health service programs to improve TB case-finding efforts while reducing costs, which could substantially improve public health.
    摘要 tuberkulosis (TB) 是一种感染性疾病,由 Mycobacterium tuberculosis 菌种引起,主要影响肺部以及其他身体部位。TB 通过空气传播,当感染者喊喊、喘息或说话时,会散发出TB。医生通过临床检查和专业测试来诊断TB。然而,喊喊是肺部疾病的常见symptom,文献表明,不同的喊喊 зву频可以由医生和计算机算法分辨。因此,通过喊喊录音来诊断TB 是一个可能的方向。在这种工作中,我们使用了非常大的TB和非TB喊喊音频记录,来自南非、印度和东南亚,使用了自动化的手机应用程序(Hyfe),无需手动标注。我们使用了spectral和时域特征,并与临床Metadata进行合并。我们使用过分组验证方法,得到了 aproximadamente 0.70 ± 0.05的平均报告值,使用喊喊音频alone。通过添加临床和Metadata,可以提高性能,得到了 aproximadamente 0.81 ± 0.05的平均报告值。我们的结果表明,通过 integrate 临床症状和喊喊音频分析,可以帮助社区卫生工作者和健康服务计划,提高TB患者探测的努力,降低成本,可以对公共卫生产生很大的改善。

Behavioral Analysis of Pathological Speaker Embeddings of Patients During Oncological Treatment of Oral Cancer

  • paper_url: http://arxiv.org/abs/2307.04744
  • repo_url: None
  • paper_authors: Jenthe Thienpondt, Caroline M. Speksnijder, Kris Demuynck
  • for: 本研究探讨了口腔癌治疗期间 speaker embedding 的行为。
  • methods: 研究者使用 speaker embedding 分析了口腔癌患者在不同治疗阶段的发音特征。
  • results: 研究发现, pré-和 postsurgery speaker embedding 有显著差异,表示治疗后发音特征有所改变。然而,12个月后,部分发音特征回归到了前操作前的水平。此外,研究还发现,不同治疗阶段的同一个 speaker 之间的相似性与健康人的相似性相同,这表明 speaker embedding 可以捕捉到even severely impaired speech 的特征。最后,一个 speaker verification 分析表明,将不同治疗阶段的speech samples combine 后,false positive rate 相对稳定,false negative rate 变化。这表明 speaker embedding 具有对其他speaker的Robustness,同时仍能捕捉到治疗过程中发音特征的变化。
    Abstract In this paper, we analyze the behavior of speaker embeddings of patients during oral cancer treatment. First, we found that pre- and post-treatment speaker embeddings differ significantly, notifying a substantial change in voice characteristics. However, a partial recovery to pre-operative voice traits is observed after 12 months post-operation. Secondly, the same-speaker similarity at distinct treatment stages is similar to healthy speakers, indicating that the embeddings can capture characterizing features of even severely impaired speech. Finally, a speaker verification analysis signifies a stable false positive rate and variable false negative rate when combining speech samples of different treatment stages. This indicates robustness of the embeddings towards other speakers, while still capturing the changing voice characteristics during treatment. To the best of our knowledge, this is the first analysis of speaker embeddings during oral cancer treatment of patients.
    摘要 在这篇论文中,我们分析了口腔癌治疗期间说话人声迹的行为。我们发现,前期和后期治疗say的声迹存在显著差异,表明了声音特征的重要变化。然而,12个月后手术后,部分声迹还会恢复到初期的声音特征。此外,不同治疗阶段的同一个说话人声迹之间的相似性与正常说话人相似,这表明声迹可以捕捉到even severely impaired speech的特征。最后,将不同治疗阶段的说话样本组合起来进行说话人验证分析显示,声迹嵌入在其他说话人身上具有稳定的假阳性率和变化的假负性率,这表明声迹嵌入在其他说话人身上具有对说话人的稳定性,同时仍能捕捉到变化的声音特征。根据我们所知,这是首次对口腔癌治疗期间说话人声迹的分析。

cs.CV - 2023-07-11

On the Vulnerability of DeepFake Detectors to Attacks Generated by Denoising Diffusion Models

  • paper_url: http://arxiv.org/abs/2307.05397
  • repo_url: None
  • paper_authors: Marija Ivanovska, Vitomir Štruc
  • for: 本研究检测恶意深伪图像的漏洞,以确保检测器能够检测新型生成模型生成的图像修改。
  • methods: 本研究使用单个图像深伪检测器,并对 FaceForensics++ 数据集进行实验,包括不同的面换和面reenactment 技术生成的 Deepfakes。
  • results: 结果表明,只需要使用一步恒温扩散过程,可以显著降低所有测试检测器的准确率,而无需 introduce 可见的图像变化。
    Abstract The detection of malicious Deepfakes is a constantly evolving problem, that requires continuous monitoring of detectors, to ensure they are able to detect image manipulations generated by the latest emerging models. In this paper, we present a preliminary study that investigates the vulnerability of single-image Deepfake detectors to attacks created by a representative of the newest generation of generative methods, i.e. Denoising Diffusion Models (DDMs). Our experiments are run on FaceForensics++, a commonly used benchmark dataset, consisting of Deepfakes generated with various techniques for face swapping and face reenactment. The analysis shows, that reconstructing existing Deepfakes with only one denoising diffusion step significantly decreases the accuracy of all tested detectors, without introducing visually perceptible image changes.
    摘要 <> translate "The detection of malicious Deepfakes is a constantly evolving problem, that requires continuous monitoring of detectors, to ensure they are able to detect image manipulations generated by the latest emerging models. In this paper, we present a preliminary study that investigates the vulnerability of single-image Deepfake detectors to attacks created by a representative of the newest generation of generative methods, i.e. Denoising Diffusion Models (DDMs). Our experiments are run on FaceForensics++, a commonly used benchmark dataset, consisting of Deepfakes generated with various techniques for face swapping and face reenactment. The analysis shows, that reconstructing existing Deepfakes with only one denoising diffusion step significantly decreases the accuracy of all tested detectors, without introducing visually perceptible image changes." into 简化字Here's the translation:<>恶意深刻检测是一个不断发展的问题,需要持续监测检测器,以确保它们可以检测新兴模型生成的图像 manipulate。在这篇论文中,我们提出了一项初步研究,检查单个图像深刻检测器对使用 Denoising Diffusion Models (DDMs) 生成的攻击是否有攻击。我们在 FaceForensics++ 常用的测试集上进行了实验,该测试集包含了不同的面部交换和面部重现技术生成的 Deepfakes。分析结果表明,只有一步 Denoising Diffusion Models 重建可以在所有测试检测器中降低准确性,而无需添加可见的图像变化。Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.

Self-supervised adversarial masking for 3D point cloud representation learning

  • paper_url: http://arxiv.org/abs/2307.05325
  • repo_url: https://github.com/szacho/pointcam
  • paper_authors: Michał Szachniewicz, Wojciech Kozłowski, Michał Stypułkowski, Maciej Zięba
  • for: 学习深度表示的3D点云数据自助方法
  • methods: 提出了一种新的对抗方法,通过学习掩码函数来提高自助方法的性能
  • results: 对多个下游任务进行评估,实现了状态略于竞争或者优于其他方法的表现
    Abstract Self-supervised methods have been proven effective for learning deep representations of 3D point cloud data. Although recent methods in this domain often rely on random masking of inputs, the results of this approach can be improved. We introduce PointCAM, a novel adversarial method for learning a masking function for point clouds. Our model utilizes a self-distillation framework with an online tokenizer for 3D point clouds. Compared to previous techniques that optimize patch-level and object-level objectives, we postulate applying an auxiliary network that learns how to select masks instead of choosing them randomly. Our results show that the learned masking function achieves state-of-the-art or competitive performance on various downstream tasks. The source code is available at https://github.com/szacho/pointcam.
    摘要 自我监督方法在三维点云数据中学习深度表示方面已经得到证明。虽然最近的方法中常采用随机掩码输入,但我们认为这些结果可以进一步改进。我们引入PointCAM,一种新的对抗方法,用于学习掩码函数。我们的模型采用了自我蒸馈框架,并使用在线分词器来处理三维点云数据。与之前的技术相比,我们不再选择Random掩码,而是通过一个助动器网络来学习选择掩码。我们的实验结果表明,学习的掩码函数可以达到当前顶峰性或竞争性的性能在多种下游任务中。源代码可以在https://github.com/szacho/pointcam中下载。

Class Instance Balanced Learning for Long-Tailed Classification

  • paper_url: http://arxiv.org/abs/2307.05322
  • repo_url: None
  • paper_authors: Marc-Antoine Lavoie, Steven Waslander
  • for: 本研究旨在提高深度神经网络在长尾图像分类任务中的性能,即在训练数据中类别频率差异较大的情况下。
  • methods: 本研究提出了一种新的类实例平衡损失函数(CIBL),该函数根据训练批处的类实例频率进行权重调整,以优化长尾图像分类任务中的表现。
  • results: 研究结果表明,采用CIBL损失函数可以提高长尾图像分类任务中的表现,并且可以根据需要调整表现的类别分布。此外,将线性分类器头换为高斯分类器可以在更少的训练轮数下达到类似性能。
    Abstract The long-tailed image classification task remains important in the development of deep neural networks as it explicitly deals with large imbalances in the class frequencies of the training data. While uncommon in engineered datasets, this imbalance is almost always present in real-world data. Previous approaches have shown that combining cross-entropy and contrastive learning can improve performance on the long-tailed task, but they do not explore the tradeoff between head and tail classes. We propose a novel class instance balanced loss (CIBL), which reweights the relative contributions of a cross-entropy and a contrastive loss as a function of the frequency of class instances in the training batch. This balancing favours the contrastive loss for more common classes, leading to a learned classifier with a more balanced performance across all class frequencies. Furthermore, increasing the relative weight on the contrastive head shifts performance from common (head) to rare (tail) classes, allowing the user to skew the performance towards these classes if desired. We also show that changing the linear classifier head with a cosine classifier yields a network that can be trained to similar performance in substantially fewer epochs. We obtain competitive results on both CIFAR-100-LT and ImageNet-LT.
    摘要 长尾图像分类任务仍然是深度神经网络的发展中非常重要的问题,因为它直接面临巨大的类频率偏置在训练数据中。虽然在工程化数据集中不常见,但在实际世界数据中它总是存在。先前的方法已经显示过将cross-entropy和对比学习结合可以提高长尾任务的性能,但它们不探讨类别头和尾类之间的负担平衡。我们提出了一种新的类实例平衡损失函数(CIBL),它在训练批处理中类别实例的频率上重新调整cross-entropy和对比损失的相对贡献。这种平衡使得对于更常见的类别,增加对于对比损失的权重,从而使得学习出来的分类器在所有类别频率上具有更好的平衡性。此外,通过增加对于对比头的权重,可以使得性能倾斜向常见类别(头)和罕见类别(尾)之间,这样用户可以根据需要调整性能的方向。我们还证明了在cosine类ifier头上改变线性类ifier可以在相当多的epoch内达到相同的性能。我们在CIFAR-100-LT和ImageNet-LT上获得了竞争性的结果。

  • paper_url: http://arxiv.org/abs/2307.05288
  • repo_url: https://github.com/sharmasushil/navigating-uncertainty-trajectory-prediction
  • paper_authors: Sushil Sharma, Ganesh Sistu, Lucie Yahiaoui, Arindam Das, Mark Halton, Ciarán Eising
  • for: The paper is written for the task of short-term trajectory prediction for autonomous vehicles, with a focus on safe and efficient driving.
  • methods: The paper uses a synthetic dataset created using the CARLA simulator, which includes a variety of complex scenarios such as pedestrians crossing the road and vehicles overtaking. The authors also develop an end-to-end model using convolutional neural networks (CNN) and long short-term memory (LSTM) networks to predict short-term trajectories.
  • results: The paper reports that the proposed model can handle corner cases such as slowing down near zebra crossings and stopping when pedestrians cross the road without the need for explicit encoding of the surrounding environment. The authors also release their dataset and model to the research community for further research and development.
    Abstract Autonomous vehicles require accurate and reliable short-term trajectory predictions for safe and efficient driving. While most commercial automated vehicles currently use state machine-based algorithms for trajectory forecasting, recent efforts have focused on end-to-end data-driven systems. Often, the design of these models is limited by the availability of datasets, which are typically restricted to generic scenarios. To address this limitation, we have developed a synthetic dataset for short-term trajectory prediction tasks using the CARLA simulator. This dataset is extensive and incorporates what is considered complex scenarios - pedestrians crossing the road, vehicles overtaking - and comprises 6000 perspective view images with corresponding IMU and odometry information for each frame. Furthermore, an end-to-end short-term trajectory prediction model using convolutional neural networks (CNN) and long short-term memory (LSTM) networks has also been developed. This model can handle corner cases, such as slowing down near zebra crossings and stopping when pedestrians cross the road, without the need for explicit encoding of the surrounding environment. In an effort to accelerate this research and assist others, we are releasing our dataset and model to the research community. Our datasets are publicly available on https://github.com/sharmasushil/Navigating-Uncertainty-Trajectory-Prediction .
    摘要 自动驾驶车辆需要准确和可靠的短期轨迹预测,以便安全有效地驾驶。现在大多数商业自动驾驶车辆使用状态机器based算法进行轨迹预测,然而,最近努力主要集中在数据驱动系统上。常见的设计方法受到数据集的限制,这些数据集通常只包括一般情况。为解决这一问题,我们已经开发了使用 CARLA simulate器生成的 sintetic 轨迹预测数据集。这个数据集非常广泛,包括了考虑到复杂情况 - 人行道上的人跨道、车辆超越 -,并包括 6000 个视角图像和对应的 IMU 和 odometry 信息。此外,我们还开发了一个使用 convolutional neural networks (CNN) 和 long short-term memory (LSTM) 网络的终端到终端短期轨迹预测模型。这个模型可以处理弯曲情况,如在人行道上减速和当人跨道时停车,无需显式编码周围环境。为了加速这些研究并帮助其他人,我们将数据集和模型发布到研究社区。我们的数据集公开在 GitHub 上,详情请参考

Unbiased Scene Graph Generation via Two-stage Causal Modeling

  • paper_url: http://arxiv.org/abs/2307.05276
  • repo_url: None
  • paper_authors: Shuzhou Sun, Shuaifeng Zhi, Qing Liao, Janne Heikkilä, Li Liu
  • for: 本研究旨在提出一种基于 causal inference 的 scene graph生成任务减偏方法,以提高 scene graph 生成模型的不偏性。
  • methods: 本研究使用了 causal modeling 技术,包括 structural causal model (SCM) 和 population loss (P-Loss),以及 adaptive logit adjustment (AL-Adjustment) 等方法来解决 scene graph 生成任务中的减偏问题。
  • results: 实验结果表明,使用本研究提出的 two-stage causal modeling (TsCM) 方法可以在 popular scene graph backbones 和 benchmarks 上达到 state-of-the-art 的 mean recall rate,并且 TsCM 可以更好地平衡 head 和 tail 关系的准确率。
    Abstract Despite the impressive performance of recent unbiased Scene Graph Generation (SGG) methods, the current debiasing literature mainly focuses on the long-tailed distribution problem, whereas it overlooks another source of bias, i.e., semantic confusion, which makes the SGG model prone to yield false predictions for similar relationships. In this paper, we explore a debiasing procedure for the SGG task leveraging causal inference. Our central insight is that the Sparse Mechanism Shift (SMS) in causality allows independent intervention on multiple biases, thereby potentially preserving head category performance while pursuing the prediction of high-informative tail relationships. However, the noisy datasets lead to unobserved confounders for the SGG task, and thus the constructed causal models are always causal-insufficient to benefit from SMS. To remedy this, we propose Two-stage Causal Modeling (TsCM) for the SGG task, which takes the long-tailed distribution and semantic confusion as confounders to the Structural Causal Model (SCM) and then decouples the causal intervention into two stages. The first stage is causal representation learning, where we use a novel Population Loss (P-Loss) to intervene in the semantic confusion confounder. The second stage introduces the Adaptive Logit Adjustment (AL-Adjustment) to eliminate the long-tailed distribution confounder to complete causal calibration learning. These two stages are model agnostic and thus can be used in any SGG model that seeks unbiased predictions. Comprehensive experiments conducted on the popular SGG backbones and benchmarks show that our TsCM can achieve state-of-the-art performance in terms of mean recall rate. Furthermore, TsCM can maintain a higher recall rate than other debiasing methods, which indicates that our method can achieve a better tradeoff between head and tail relationships.
    摘要 尽管现代无偏Scene Graph Generation(SGG)方法已经表现出色,当前的偏见文献主要关注长条分布问题,而忽略了另一种偏见源,即semantic confusion,这使SGG模型容易生成错误的关系预测。在这篇论文中,我们探索了基于 causal inference 的 SGG 任务中的偏见处理方法。我们的中心思想是,在 causality 中的罕见机制shift (SMS) 可以独立地 intervene 多种偏见,从而可能保持 head category 性能 while pursuing the prediction of high-informative tail relationships。然而,噪声数据导致 SGG 任务中的隐藏偏见,因此构建的 causal 模型总是 causal-insufficient,不能得到 SMS 的好处。为此,我们提议 Two-stage Causal Modeling (TsCM) 方法,该方法将 long-tailed distribution 和 semantic confusion 作为 SGG 任务中的隐藏偏见,并将 causal intervention 分成两个阶段。第一阶段是 causal representation learning,我们使用一种新的 Population Loss (P-Loss) 来 intervene in semantic confusion confounder。第二阶段引入 Adaptive Logit Adjustment (AL-Adjustment),以消除 long-tailed distribution confounder,以完成 causal calibration learning。这两个阶段是模型无关的,可以在任何 seeking unbiased predictions 的 SGG 模型中使用。我们在流行的 SGG 背景和标准 benchmark 上进行了广泛的实验,结果表明,我们的 TsCM 可以在 terms of mean recall rate achieve state-of-the-art performance。此外,TsCM 可以保持 higher recall rate than other debiasing methods,这表明我们的方法可以更好地平衡 head 和 tail 关系。

APRF: Anti-Aliasing Projection Representation Field for Inverse Problem in Imaging

  • paper_url: http://arxiv.org/abs/2307.05270
  • repo_url: None
  • paper_authors: Zixuan Chen, Lingxiao Yang, Jianhuang Lai, Xiaohua Xie
  • for: 提高SVCT重建的精度和准确性,降低噪声和扭曲 artifacts。
  • methods: 使用自我监督学习的Anti-Aliasing Projection Representation Field(APRF)方法,通过空间约束来建立连续的投影视图对应关系,从而提高重建的精度和准确性。
  • results: 对CT图像进行SVCT重建,比州当前方法更加准确和精细,减少了噪声和扭曲 artifacts。
    Abstract Sparse-view Computed Tomography (SVCT) reconstruction is an ill-posed inverse problem in imaging that aims to acquire high-quality CT images based on sparsely-sampled measurements. Recent works use Implicit Neural Representations (INRs) to build the coordinate-based mapping between sinograms and CT images. However, these methods have not considered the correlation between adjacent projection views, resulting in aliasing artifacts on SV sinograms. To address this issue, we propose a self-supervised SVCT reconstruction method -- Anti-Aliasing Projection Representation Field (APRF), which can build the continuous representation between adjacent projection views via the spatial constraints. Specifically, APRF only needs SV sinograms for training, which first employs a line-segment sampling module to estimate the distribution of projection views in a local region, and then synthesizes the corresponding sinogram values using center-based line integral module. After training APRF on a single SV sinogram itself, it can synthesize the corresponding dense-view (DV) sinogram with consistent continuity. High-quality CT images can be obtained by applying re-projection techniques on the predicted DV sinograms. Extensive experiments on CT images demonstrate that APRF outperforms state-of-the-art methods, yielding more accurate details and fewer artifacts. Our code will be publicly available soon.
    摘要 《简洁视图计算tomography(SVCT)重建是一个不定性 inverse problem 在成像中,旨在基于稀疏样本来获得高质量 CT 图像。现有研究使用 Implicit Neural Representations(INRs)构建坐标基于的映射 между sinogram 和 CT 图像。然而,这些方法没有考虑邻近投影视图之间的相关性,导致 SV sinogram 中的抽象artefacts。为解决这问题,我们提出了一种自我超级vised SVCT重建方法——Anti-Aliasing Projection Representation Field(APRF)。APRF 可以通过空间约束建立连续的 Representation между邻近投影视图。具体来说,APRF 需要 SV sinogram 进行训练,首先使用 line-segment 抽象模块来估计当地区域的投影视图分布,然后使用 center-based line integral module Synthesize 相应的 sinogram 值。经过训练 APRF 可以在单个 SV sinogram 上预测 dense-view(DV) sinogram,并保持一致性。通过应用重oprojection 技术,可以从预测的 DV sinogram 中获得高质量 CT 图像。实验表明,APRF 在 CT 图像中比 state-of-the-art 方法更高精度、 fewer artifacts。我们很快将代码公开。

OpenAL: An Efficient Deep Active Learning Framework for Open-Set Pathology Image Classification

  • paper_url: http://arxiv.org/abs/2307.05254
  • repo_url: None
  • paper_authors: Linhao Qu, Yingfan Ma, Zhiwei Yang, Manning Wang, Zhijian Song
  • for: 本研究旨在解决实际临床任务中批处理数据集中存在非目标类样本的情况下,现有的活动学习方法不能有效工作。
  • methods: 本文提出了一种开集活动学习(OpenAL)框架,可以有效地从未标记样本池中选择目标类和非目标类样本进行问题。
  • results: 对于细致的生物学像分类任务,OpenAL可以显著提高目标类样本的问题质量,并与当前状态艺技活动学习方法相比,达到更高的性能。
    Abstract Active learning (AL) is an effective approach to select the most informative samples to label so as to reduce the annotation cost. Existing AL methods typically work under the closed-set assumption, i.e., all classes existing in the unlabeled sample pool need to be classified by the target model. However, in some practical clinical tasks, the unlabeled pool may contain not only the target classes that need to be fine-grainedly classified, but also non-target classes that are irrelevant to the clinical tasks. Existing AL methods cannot work well in this scenario because they tend to select a large number of non-target samples. In this paper, we formulate this scenario as an open-set AL problem and propose an efficient framework, OpenAL, to address the challenge of querying samples from an unlabeled pool with both target class and non-target class samples. Experiments on fine-grained classification of pathology images show that OpenAL can significantly improve the query quality of target class samples and achieve higher performance than current state-of-the-art AL methods. Code is available at https://github.com/miccaiif/OpenAL.
    摘要 活动学习(AL)是一种有效的方法,选择最有用的样本用于标注,以降低标注成本。现有的AL方法通常在关闭集合假设下工作,即所有在未标注样本池中存在的类都需要被目标模型分类。然而,在一些实际的医疗任务中,未标注池可能包含不仅目标类,还有无关的医疗任务类。现有的AL方法无法在这种场景下工作,因为它们往往选择大量的非目标样本。在这篇论文中,我们将这种场景描述为开集AL问题,并提出一种高效的框架,OpenAL,用于从未标注池中查询目标类和非目标类样本。实验表明,OpenAL可以明显提高目标类样本的查询质量,并在当前状态艺术AL方法的基础上 достичь更高的性能。代码可以在https://github.com/miccaiif/OpenAL上获取。

Evidence-based Hand Hygiene. Can You Trust the Fluorescent-based Assessment Methods?

  • paper_url: http://arxiv.org/abs/2307.05650
  • repo_url: None
  • paper_authors: Száva Bánsághi, Viola Sári, Péter Szerémy, Ákos Lehotsky, Bence Takács, Brigitta K. Tóth, Tamás Haidegger
    for:这个研究的目的是调查不同专家是否对同一个UV图像表示不同的评估结果,并与微生物鉴定结果进行对比。methods:这个研究使用了4种不同的UV盒子设备,通过CCD摄像头拍摄受试者手部在UV光下的图像,并由4名独立的感染控制专家 manually标注图像中未经杀灭的区域。results:研究发现,专家之间对同一个UV图像的评估结果存在高度不一致性(即interrater reliability),并且与微生物鉴定结果有较弱的相关性。在8个受试者手部中,有50%的情况下,人工评估和微生物鉴定结果之间存在了10%以上的差异。这表明,使用 fluorescent 方法评估手洁度的数据质量不具备建立 patient safety 质量监控系统的基础。
    Abstract Healthcare-Associated Infections present a major threat to patient safety globally. According to studies, more than 50% of HAI could be prevented by proper hand hygiene. Effectiveness of hand hygiene is regularly evaluated with the fluorescent method: performing hand hygiene with a handrub containing an ultra violet (UV) fluorescent marker. Typically, human experts evaluate the hands under UV-A light, and decide whether the applied handrub covered the whole hand surface. The aim of this study was to investigate how different experts judge the same UV-pattern, and compare that to microbiology for objective validation. Hands of volunteer participants were contaminated with high concentration of a Staphylococcus epidermidis suspension. Hands were incompletely disinfected with UV-labeled handrub. Four different UV-box type devices were used to take CCD pictures of the hands under UV light. Size of inadequately disinfected areas on the hands were determined in two different ways. First, based on microbiology; the areas where colonies were grown were measured. Second, four independent senior infection control specialists were asked to mark the missed areas on printed image, captured under UV light. 8 hands of healthy volunteers were examined. Expert evaluations were highly uncorrelated (regarding interrater reliability) and inconsistent. Microbiology results weakly correlated with the expert evaluations. In half of the cases, there were more than 10% difference in the size of properly disinfected area, as measured by microbiology versus human experts. Considering the result of the expert evaluations, variability was disconcertingly high. Evaluating the fluorescent method is challenging, even for highly experienced professionals. A patient safety quality assurance system cannot be built on these data quality.
    摘要 全球医疗相关感染(Healthcare-Associated Infections,HAI)对患者安全构成了重大威胁。studies表明,more than 50% of HAI可以通过正确的手洁涤透杂方法预防。手洁涤透效果 Regularly evaluated with fluorescent method: perform hand hygiene with a handrub containing an ultraviolet (UV) fluorescent marker。通常,人类专家通过UV-A光评估手部,并决定是否涂抹全面覆盖手部。本研究的目的是investigate how different experts judge the same UV pattern,and compare that to microbiology for objective validation。参与者的手部被高浓度的Staphylococcus epidermidis悬浮涂抹后,手部不完全消毒。使用UV标记的手洁涤透器拍摄手部Under UV光的CCD图像。手部不完全消毒的区域的大小Determined in two different ways。First, based on microbiology; the areas where colonies were grown were measured。Second, four independent senior infection control specialists were asked to mark the missed areas on printed image, captured under UV light。8名健康志愿者的手部被 исследова。专家评估结果高度不相关(interrater reliability)和不一致。微生物结果与专家评估结果只有微弱相关性。在80%的情况下,通过微生物测量的完全消毒区域与专家评估结果存在10%以上的差异。这种Result of expert evaluations is disconcertingly high。评估fluorescent方法是复杂的,甚至 для高级专业人员。一个patient safety quality assurance system cannot be built on these data quality。

DRMC: A Generalist Model with Dynamic Routing for Multi-Center PET Image Synthesis

  • paper_url: http://arxiv.org/abs/2307.05249
  • repo_url: None
  • paper_authors: Zhiwen Yang, Yang Zhou, Hui Zhang, Bingzheng Wei, Yubo Fan, Yan Xu
  • for: 多中心 positron发射Tomography(PET)图像合成,恢复低剂量PET图像。
  • methods: 我们开发了一种通用模型,该模型在多中心 studiossa共享架构和参数,以利用多中心之间的共同知识。但是,这种通用模型可能会受到中心干扰问题的影响,即不同中心的数据分布不同,导致不同中心的梯度方向不一致或甚至相反。为解决这问题,我们引入了一种新的动态路由策略,该策略在不同中心的数据之间建立了跨层连接,将数据分配给不同的专家。
  • results: 我们的通用模型并用动态路由策略(DRMC)在多中心 studiossa表现出色,可以快速恢复低剂量PET图像。
    Abstract Multi-center positron emission tomography (PET) image synthesis aims at recovering low-dose PET images from multiple different centers. The generalizability of existing methods can still be suboptimal for a multi-center study due to domain shifts, which result from non-identical data distribution among centers with different imaging systems/protocols. While some approaches address domain shifts by training specialized models for each center, they are parameter inefficient and do not well exploit the shared knowledge across centers. To address this, we develop a generalist model that shares architecture and parameters across centers to utilize the shared knowledge. However, the generalist model can suffer from the center interference issue, \textit{i.e.} the gradient directions of different centers can be inconsistent or even opposite owing to the non-identical data distribution. To mitigate such interference, we introduce a novel dynamic routing strategy with cross-layer connections that routes data from different centers to different experts. Experiments show that our generalist model with dynamic routing (DRMC) exhibits excellent generalizability across centers. Code and data are available at: https://github.com/Yaziwel/Multi-Center-PET-Image-Synthesis.
    摘要

  • paper_url: http://arxiv.org/abs/2307.05241
  • repo_url: https://github.com/gama-ufsc/brain-age
  • paper_authors: Bruno Machado Pacheco, Victor Hugo Rocha de Oliveira, Augusto Braga Fernandes Antunes, Saulo Domingos de Souza Pedro, Danilo Silva
  • for: 这研究旨在探讨深度学习模型在脑年龄预测方面的可靠性和效率,以及这些模型在脑健康和衰老过程中的应用前景。
  • methods: 我们在这研究中使用了深度学习模型,并在这些模型的预训练阶段使用了脑相关任务来提高模型的性能。
  • results: 我们在ADNI数据集上进行了实验,并发现了脑年龄预测模型的性能有所提高,但是更好的模型性能不一定意味着更可靠的脑年龄生物标记。
    Abstract Brain age prediction using neuroimaging data has shown great potential as an indicator of overall brain health and successful aging, as well as a disease biomarker. Deep learning models have been established as reliable and efficient brain age estimators, being trained to predict the chronological age of healthy subjects. In this paper, we investigate the impact of a pre-training step on deep learning models for brain age prediction. More precisely, instead of the common approach of pre-training on natural imaging classification, we propose pre-training the models on brain-related tasks, which led to state-of-the-art results in our experiments on ADNI data. Furthermore, we validate the resulting brain age biomarker on images of patients with mild cognitive impairment and Alzheimer's disease. Interestingly, our results indicate that better-performing deep learning models in terms of brain age prediction on healthy patients do not result in more reliable biomarkers.
    摘要 �������ж�edge age prediction using neuroimaging data has shown great potential as an indicator of overall brain health and successful aging, as well as a disease biomarker. Deep learning models have been established as reliable and efficient brain age estimators, being trained to predict the chronological age of healthy subjects. In this paper, we investigate the impact of a pre-training step on deep learning models for brain age prediction. More precisely, instead of the common approach of pre-training on natural imaging classification, we propose pre-training the models on brain-related tasks, which led to state-of-the-art results in our experiments on ADNI data. Furthermore, we validate the resulting brain age biomarker on images of patients with mild cognitive impairment and Alzheimer's disease. Interestingly, our results indicate that better-performing deep learning models in terms of brain age prediction on healthy patients do not result in more reliable biomarkers.Here's the word-for-word translation:�������ж�edge age prediction using neuroimaging data has shown great potential as an indicator of overall brain health and successful aging, as well as a disease biomarker. Deep learning models have been established as reliable and efficient brain age estimators, being trained to predict the chronological age of healthy subjects. In this paper, we investigate the impact of a pre-training step on deep learning models for brain age prediction. More precisely, instead of the common approach of pre-training on natural imaging classification, we propose pre-training the models on brain-related tasks, which led to state-of-the-art results in our experiments on ADNI data. Furthermore, we validate the resulting brain age biomarker on images of patients with mild cognitive impairment and Alzheimer's disease. Interestingly, our results indicate that better-performing deep learning models in terms of brain age prediction on healthy patients do not result in more reliable biomarkers.

Generative Pretraining in Multimodality

  • paper_url: http://arxiv.org/abs/2307.05222
  • repo_url: https://github.com/baaivision/emu
  • paper_authors: Quan Sun, Qiying Yu, Yufeng Cui, Fan Zhang, Xiaosong Zhang, Yueze Wang, Hongcheng Gao, Jingjing Liu, Tiejun Huang, Xinlong Wang
  • for: 本研究旨在开发一种基于Transformer的多Modal基础模型,可以无需特殊设定处理不同的多Modal数据,并在一个模型上进行权重学习。
  • methods: 该模型使用一种混合的输入序列,将视觉信号编码成嵌入,并与文本token组成一个混合输入序列。然后通过一个简单的损失函数进行一对多的权重学习,以实现多Modal任务的同时进行。
  • results: 对多种零shot/几shot任务进行评估,包括图像描述、视觉问答、视频问答和文本到图像生成等,模型表现出色,与当前最大多Modal模型相比,具有更高的性能。此外,通过调整指令,模型还可以实现多Modal助手的功能,如语音助手和图像生成助手等。
    Abstract We present Emu, a Transformer-based multimodal foundation model, which can seamlessly generate images and texts in multimodal context. This omnivore model can take in any single-modality or multimodal data input indiscriminately (e.g., interleaved image, text and video) through a one-model-for-all autoregressive training process. First, visual signals are encoded into embeddings, and together with text tokens form an interleaved input sequence. Emu is then end-to-end trained with a unified objective of classifying the next text token or regressing the next visual embedding in the multimodal sequence. This versatile multimodality empowers the exploration of diverse pretraining data sources at scale, such as videos with interleaved frames and text, webpages with interleaved images and text, as well as web-scale image-text pairs and video-text pairs. Emu can serve as a generalist multimodal interface for both image-to-text and text-to-image tasks, and supports in-context image and text generation. Across a broad range of zero-shot/few-shot tasks including image captioning, visual question answering, video question answering and text-to-image generation, Emu demonstrates superb performance compared to state-of-the-art large multimodal models. Extended capabilities such as multimodal assistants via instruction tuning are also demonstrated with impressive performance.
    摘要 我们介绍Emu,一个基于Transformer的多媒体基础模型,可以无障碍地生成图像和文本在多媒体上下文中。这个“食物”模型可以将任何单一模式或多媒体资料输入(例如:混合图像、文本和视频)通过一个“一模型 для所有”的采取过程进行采样训练。首先,视觉信号被编码成嵌入,与文本token共同形成混合输入序列。Emu然后以终端训练的方式,实现统一的目标,即预测下一个文本token或调整下一个视觉嵌入在多媒体序列中。这种多元的多媒体特性使得可以大规模地探索不同的预训练数据源,例如:录像带中的混合帧和文本、网页中的混合图像和文本,以及网页级的图像-文本对和视频-文本对。Emu可以作为一个通用的多媒体界面,用于图像-文本和文本-图像任务,并且支持在Context中生成图像和文本。在零shot/几shot任务中,包括图像描述、视觉问题回答、视频问题回答和文本-图像生成等,Emu展示了较好的性能,与现有的大型多媒体模型相比。此外,我们还展示了增强多媒体助手的能力,例如:透过执行调整来实现多媒体助手。

The Staged Knowledge Distillation in Video Classification: Harmonizing Student Progress by a Complementary Weakly Supervised Framework

  • paper_url: http://arxiv.org/abs/2307.05201
  • repo_url: None
  • paper_authors: Chao Wang, Zheng Tang
  • for: 提高视频分类任务中的知识填充效率和准确率
  • methods: 使用学生 substage 和对应 substage 的相关性来实现知识采样,并使用进程式扩散训练方法来解决教师和学生差距过大导致的准确性损失
  • results: 在真实数据集和 simulate 数据集上进行了广泛的实验,并证明了我们提出的方法可以比既有的知识填充方法在视频分类任务中提高知识填充效率和准确率
    Abstract In the context of label-efficient learning on video data, the distillation method and the structural design of the teacher-student architecture have a significant impact on knowledge distillation. However, the relationship between these factors has been overlooked in previous research. To address this gap, we propose a new weakly supervised learning framework for knowledge distillation in video classification that is designed to improve the efficiency and accuracy of the student model. Our approach leverages the concept of substage-based learning to distill knowledge based on the combination of student substages and the correlation of corresponding substages. We also employ the progressive cascade training method to address the accuracy loss caused by the large capacity gap between the teacher and the student. Additionally, we propose a pseudo-label optimization strategy to improve the initial data label. To optimize the loss functions of different distillation substages during the training process, we introduce a new loss method based on feature distribution. We conduct extensive experiments on both real and simulated data sets, demonstrating that our proposed approach outperforms existing distillation methods in terms of knowledge distillation for video classification tasks. Our proposed substage-based distillation approach has the potential to inform future research on label-efficient learning for video data.
    摘要 在视频数据上进行标签效率学习中,distillation方法和教师学生架构的结构设计具有重要的影响。然而,这些因素之间的关系在前期研究中受到了忽略。为了bridging这个差距,我们提出了一种新的弱监督学习框架 для知识储存在视频分类中,旨在提高学生模型的效率和准确性。我们的方法利用学生子阶段的概念,基于学生子阶段和相应的子阶段之间的相关性来进行知识储存。此外,我们采用了进度式遮盖训练方法,以Address the accuracy loss caused by the large capacity gap between the teacher and the student。此外,我们还提出了一种 Pseudo-label优化策略,以提高初始数据标签。在训练过程中,我们引入了一种基于特征分布的新损失方法,以便在不同的储存子阶段中优化损失函数。我们在实际和模拟数据集上进行了广泛的实验,并证明了我们的提出的方法在视频分类任务中的知识储存性能比既有的储存方法高。我们的提出的子阶段基于储存方法具有潜在的推动未来标签效率学习的前景。

ResMatch: Residual Attention Learning for Local Feature Matching

  • paper_url: http://arxiv.org/abs/2307.05180
  • repo_url: https://github.com/acuooooo/resmatch
  • paper_authors: Yuxin Deng, Jiayi Ma
  • for: 本研究旨在探讨特征匹配学习中自然语言注意力机制的工作方式,以便更好地理解和改进特征匹配和筛选器的学习。
  • methods: 本研究提出了一种基于传统特征匹配和筛选器的十字和自我注意力机制,并在十字和自我注意力机制中注入描述符相似性和相对位置相关性,以便学习差异匹配和筛选器函数。此外,我们还提出了一种精简的注意力学习策略,可以在每个点的邻域内进行精简的注意力计算,以提高计算效率。
  • results: 我们通过了多种实验,包括特征匹配、pose估计和视觉地理位置估计,证明了我们的网络在特征匹配和筛选器学习方面具有优越性。
    Abstract Attention-based graph neural networks have made great progress in feature matching learning. However, insight of how attention mechanism works for feature matching is lacked in the literature. In this paper, we rethink cross- and self-attention from the viewpoint of traditional feature matching and filtering. In order to facilitate the learning of matching and filtering, we inject the similarity of descriptors and relative positions into cross- and self-attention score, respectively. In this way, the attention can focus on learning residual matching and filtering functions with reference to the basic functions of measuring visual and spatial correlation. Moreover, we mine intra- and inter-neighbors according to the similarity of descriptors and relative positions. Then sparse attention for each point can be performed only within its neighborhoods to acquire higher computation efficiency. Feature matching networks equipped with our full and sparse residual attention learning strategies are termed ResMatch and sResMatch respectively. Extensive experiments, including feature matching, pose estimation and visual localization, confirm the superiority of our networks.
    摘要 Traditional feature matching and filtering 的视角下,我们重新思考了交叉和自身注意力机制。为了促进匹配和筛选的学习,我们在交叉和自身注意力分数中注入了描述符之间的相似性和相对位置的相关性。这样,注意力可以专注于学习差异匹配和筛选函数,基于视觉和空间相关性的基本函数。此外,我们根据描述符之间的相似性和相对位置来 Mine intra-和inter-邻居。然后,对每个点进行精度的注意力分配,只在其邻居中进行 sparse attention,以提高计算效率。具有我们的全局和精度匹配学习策略的特征匹配网络被称为 ResMatch 和 sResMatch。广泛的实验,包括特征匹配、pose estimation和视觉地标定,证明了我们的网络的优越性。

HistoColAi: An Open-Source Web Platform for Collaborative Digital Histology Image Annotation with AI-Driven Predictive Integration

  • paper_url: http://arxiv.org/abs/2307.07525
  • repo_url: None
  • paper_authors: Cristian Camilo Pulgarín-Ospina, Rocío del Amor, Adrián Colomera, Julio Silva-Rodríguez, Valery Naranjo
  • for: 该论文旨在提供一个高效的在线图像标注工具,以便在数字patology中进行图像分析。
  • methods: 该论文使用了深度学习基本的方法来支持图像分析,并提供了一个用于数字 histological 图像的可视化和标注工具。
  • results: 该论文包括了一个使用该工具进行皮肤细胞肿瘤诊断的用例,以及一项用户体验研究,证明了该工具的可用性。
    Abstract Digital pathology has become a standard in the pathology workflow due to its many benefits. These include the level of detail of the whole slide images generated and the potential immediate sharing of cases between hospitals. Recent advances in deep learning-based methods for image analysis make them of potential aid in digital pathology. However, a major limitation in developing computer-aided diagnostic systems for pathology is the lack of an intuitive and open web application for data annotation. This paper proposes a web service that efficiently provides a tool to visualize and annotate digitized histological images. In addition, to show and validate the tool, in this paper we include a use case centered on the diagnosis of spindle cell skin neoplasm for multiple annotators. A usability study of the tool is also presented, showing the feasibility of the developed tool.
    摘要 数字病理学已成为病理过程中的标准,它具有许多优点,如整个报告图像的细节水平和医院之间的案例 immediate 共享。在最近的深度学习基本方法中,图像分析方面的进步可能为数字病理学提供帮助。然而,开发计算机助动诊系统的主要限制是病理图像数据标注的intuitive和开放的web应用程序的缺乏。这篇论文提议一种高效的web服务,可以让用户观看和标注数字化 histological 图像。此外,为证明和验证工具的可用性,我们在本文中包括了多个标注者中心的皮肤癌诊断用例。此外,我们还进行了一项用户体验研究,表明开发的工具可行。

A Modular Multimodal Architecture for Gaze Target Prediction: Application to Privacy-Sensitive Settings

  • paper_url: http://arxiv.org/abs/2307.05158
  • repo_url: https://github.com/idiap/multimodal_gaze_target_prediction
  • paper_authors: Anshul Gupta, Samy Tafasca, Jean-Marc Odobez
  • for: 本文主要针对的问题是预测人员的视线方向,这是一个复杂的任务,需要理解人员的视线和场景内容,以及人员的境界和情况(是否操作?交流?观察别人?注意力?),以检测视线的干扰或应用人类注意力的约束。
  • methods: 本文提出了一种基于多Modal的听说见解决方案,利用明确 derivation的深度和pose特征,通过注意力机制进行组合。该架构可以在隐私敏感的场景中广泛应用,如Surveillance和医疗领域,因为不能泄露个人可识别信息。
  • results: 在GazeFollow和VideoAttentionTarget公共数据集上,本文进行了广泛的实验,取得了领先的性能,并在隐私设定中表现出了非常竞争力的结果。
    Abstract Predicting where a person is looking is a complex task, requiring to understand not only the person's gaze and scene content, but also the 3D scene structure and the person's situation (are they manipulating? interacting or observing others? attentive?) to detect obstructions in the line of sight or apply attention priors that humans typically have when observing others. In this paper, we hypothesize that identifying and leveraging such priors can be better achieved through the exploitation of explicitly derived multimodal cues such as depth and pose. We thus propose a modular multimodal architecture allowing to combine these cues using an attention mechanism. The architecture can naturally be exploited in privacy-sensitive situations such as surveillance and health, where personally identifiable information cannot be released. We perform extensive experiments on the GazeFollow and VideoAttentionTarget public datasets, obtaining state-of-the-art performance and demonstrating very competitive results in the privacy setting case.
    摘要 预测人员看向的位置是一项复杂任务,需要理解人员的 gaze 和场景内容,以及人员的情况(是否操作?与他人互动?注意?),以探测视线方向上的障碍物或应用人类注意力的偏好。在这篇论文中,我们假设可以通过利用显式 derive 的多模态cue来更好地实现这些偏好。我们因此提议一种模块化多模态架构,可以结合这些cue使用注意力机制。这种架构可以自然地在隐私保护 Situation 中使用,如监视和医疗,无需发布个人隐私信息。我们在 GazeFollow 和 VideoAttentionTarget 公共数据集上进行了广泛的实验,得到了状态对应的表现,并在隐私设定情况下示出了非常竞争力的结果。

ExFaceGAN: Exploring Identity Directions in GAN’s Learned Latent Space for Synthetic Identity Generation

  • paper_url: http://arxiv.org/abs/2307.05151
  • repo_url: https://github.com/fdbtrs/exfacegan
  • paper_authors: Fadi Boutros, Marcel Klemt, Meiling Fang, Arjan Kuijper, Naser Damer
  • for: 本研究旨在提出一种框架,即ExFaceGAN,以分离预训练的GAN中的人脸信息,以便生成多个任意的人脸样本。
  • methods: ExFaceGAN使用了一种新的方法,即学习人脸方向边界,以分离GAN的隐藏空间。这个方法可以在不需要专门的架构或属性分类器的情况下,生成多个人脸样本。
  • results: ExFaceGAN在三种SOTA GAN方法的预训练空间中进行了集成,并得到了丰富的实验结果,证明了ExFaceGAN的一致性和有效性。此外,通过使用ExFaceGAN生成的数据,我们还证明了这些数据可以成功地训练人脸识别模型。
    Abstract Deep generative models have recently presented impressive results in generating realistic face images of random synthetic identities. To generate multiple samples of a certain synthetic identity, previous works proposed to disentangle the latent space of GANs by incorporating additional supervision or regularization, enabling the manipulation of certain attributes. Others proposed to disentangle specific factors in unconditional pretrained GANs latent spaces to control their output, which also requires supervision by attribute classifiers. Moreover, these attributes are entangled in GAN's latent space, making it difficult to manipulate them without affecting the identity information. We propose in this work a framework, ExFaceGAN, to disentangle identity information in pretrained GANs latent spaces, enabling the generation of multiple samples of any synthetic identity. Given a reference latent code of any synthetic image and latent space of pretrained GAN, our ExFaceGAN learns an identity directional boundary that disentangles the latent space into two sub-spaces, with latent codes of samples that are either identity similar or dissimilar to a reference image. By sampling from each side of the boundary, our ExFaceGAN can generate multiple samples of synthetic identity without the need for designing a dedicated architecture or supervision from attribute classifiers. We demonstrate the generalizability and effectiveness of ExFaceGAN by integrating it into learned latent spaces of three SOTA GAN approaches. As an example of the practical benefit of our ExFaceGAN, we empirically prove that data generated by ExFaceGAN can be successfully used to train face recognition models (\url{https://github.com/fdbtrs/ExFaceGAN}).
    摘要 深度生成模型最近几年来已经展示了生成真实面部图像的辉煌成绩。为生成特定的合成人脸图像,先前的工作提出了在GAN的含义空间中拓展准确的约束或正则化,以便控制特定的特征。其他人则在预训练GAN的含义空间中特征化特定因素,以控制其输出,但这也需要由特征分类器提供超级视。然而,这些特征在GAN的含义空间中相互杂化,使其难以分离而不影响人脸信息。我们在这个工作中提出了一个框架,即ExFaceGAN,以分离预训练GAN的含义空间中的人脸信息。给定任意合成图像的参考含义代码和预训练GAN的含义空间,我们的ExFaceGAN学习了一个方向性边界,将预训练GAN的含义空间分解成两个子空间,每个子空间的含义代码都是与参考图像的含义相似或不相似的样本。通过采样每个边界两侧的样本,我们的ExFaceGAN可以生成多个基于参考图像的合成人脸图像,无需设计专门的架构或由特征分类器提供超级视。我们在三个SOTA GAN方法的学习含义空间中集成了ExFaceGAN,并证明了它的一致性和效果。例如,我们通过实际证明,通过ExFaceGAN生成的数据可以成功地训练面Recognition模型(参考链接:https://github.com/fdbtrs/ExFaceGAN)。

Unveiling the Invisible: Enhanced Detection and Analysis of Deteriorated Areas in Solar PV Modules Using Unsupervised Sensing Algorithms and 3D Augmented Reality

  • paper_url: http://arxiv.org/abs/2307.05136
  • repo_url: None
  • paper_authors: Adel Oulefki, Yassine Himeur, Thaweesak Trongtiraku, Kahina Amara, Sos Agaian, Samir Benbelkacem, Mohamed Amine Guerroudji, Mohamed Zemmouri, Sahla Ferhat, Nadia Zenati, Shadi Atalla, Wathiq Mansoor
  • for: 提高太阳能电池模块的维护效率和能量产量
  • methods: 使用不监督学习算法和3D增强现实视觉化来自动识别和分析太阳能电池模块中的异常
  • results: 通过计算机模拟和实际图像数据验证,提出的方法可以准确地识别受损区域,并且可以大幅降低太阳能电池维护成本。
    Abstract Solar Photovoltaic (PV) is increasingly being used to address the global concern of energy security. However, hot spot and snail trails in PV modules caused mostly by crakes reduce their efficiency and power capacity. This article presents a groundbreaking methodology for automatically identifying and analyzing anomalies like hot spots and snail trails in Solar Photovoltaic (PV) modules, leveraging unsupervised sensing algorithms and 3D Augmented Reality (AR) visualization. By transforming the traditional methods of diagnosis and repair, our approach not only enhances efficiency but also substantially cuts down the cost of PV system maintenance. Validated through computer simulations and real-world image datasets, the proposed framework accurately identifies dirty regions, emphasizing the critical role of regular maintenance in optimizing the power capacity of solar PV modules. Our immediate objective is to leverage drone technology for real-time, automatic solar panel detection, significantly boosting the efficacy of PV maintenance. The proposed methodology could revolutionize solar PV maintenance, enabling swift, precise anomaly detection without human intervention. This could result in significant cost savings, heightened energy production, and improved overall performance of solar PV systems. Moreover, the novel combination of unsupervised sensing algorithms with 3D AR visualization heralds new opportunities for further research and development in solar PV maintenance.
    摘要

DFR: Depth from Rotation by Uncalibrated Image Rectification with Latitudinal Motion Assumption

  • paper_url: http://arxiv.org/abs/2307.05129
  • repo_url: https://github.com/zhangtaxue/dfr
  • paper_authors: Yongcong Zhang, Yifei Xue, Ming Liao, Huiqing Zhang, Yizhen Lao
  • for: 解决不精度的静止摄像机拍摄问题,提高摄像机拍摄的精度和效率。
  • methods: 提出了一种基于旋转的深度估计方法,通过分析两个图像的匹配点来直接计算图像的恢复变换。同时,提出了一种自适应缓冲策略来降低投射变换后的几何扭曲。
  • results: 对于 synthetic 和实际数据进行了广泛的实验,结果表明,提出的方法在效率和精度两个方面比现有方法有显著的优势。
    Abstract Despite the increasing prevalence of rotating-style capture (e.g., surveillance cameras), conventional stereo rectification techniques frequently fail due to the rotation-dominant motion and small baseline between views. In this paper, we tackle the challenge of performing stereo rectification for uncalibrated rotating cameras. To that end, we propose Depth-from-Rotation (DfR), a novel image rectification solution that analytically rectifies two images with two-point correspondences and serves for further depth estimation. Specifically, we model the motion of a rotating camera as the camera rotates on a sphere with fixed latitude. The camera's optical axis lies perpendicular to the sphere's surface. We call this latitudinal motion assumption. Then we derive a 2-point analytical solver from directly computing the rectified transformations on the two images. We also present a self-adaptive strategy to reduce the geometric distortion after rectification. Extensive synthetic and real data experiments demonstrate that the proposed method outperforms existing works in effectiveness and efficiency by a significant margin.
    摘要 尽管旋转风格捕捉(例如surveillance camera)的使用逐渐增加,但传统的斯特瑞套件技术经常失败,这是因为旋转动作占主导地位,基线间距离小。在这篇论文中,我们面临了不调整的旋转相机中的斯特瑞套件问题。为解决这问题,我们提出了深度从旋转(DfR),一种新的图像正则化解决方案。具体来说,我们模拟了旋转相机的运动为旋转在球面上的Fixed latitude的运动,相机的光学轴沿着球面表面垂直。我们称这为纬度运动假设。然后,我们从直接计算两个图像的正则化变换而 derivation of a 2-point analytical solver。我们还提出了一种自适应策略来减少正则化后的几何扭曲。广泛的 sintetic和实际数据实验表明,我们提出的方法在效果和效率方面与现有方法相比,具有显著的优势。

One-Shot Learning for Periocular Recognition: Exploring the Effect of Domain Adaptation and Data Bias on Deep Representations

  • paper_url: http://arxiv.org/abs/2307.05128
  • repo_url: None
  • paper_authors: Kevin Hernandez-Diaz, Fernando Alonso-Fernandez, Josef Bigun
    for: This paper focuses on the challenge of biometric recognition under extreme data scarcity, specifically for One-Shot periocular recognition.methods: The authors use widely used CNN models and analyze the behavior of deep representations in these models under data scarcity. They also employ Domain Adaptation and evaluate the method’s robustness concerning data normalization and generalization.results: The authors achieve state-of-the-art results on the Cross-Eyed dataset, reducing the EER by 67% and 79% in the Close-World and Open-World protocols, respectively. They also demonstrate that traditional algorithms like SIFT can outperform CNNs in certain situations, such as limited data or unseen classes.
    Abstract One weakness of machine-learning algorithms is the need to train the models for a new task. This presents a specific challenge for biometric recognition due to the dynamic nature of databases and, in some instances, the reliance on subject collaboration for data collection. In this paper, we investigate the behavior of deep representations in widely used CNN models under extreme data scarcity for One-Shot periocular recognition, a biometric recognition task. We analyze the outputs of CNN layers as identity-representing feature vectors. We examine the impact of Domain Adaptation on the network layers' output for unseen data and evaluate the method's robustness concerning data normalization and generalization of the best-performing layer. We improved state-of-the-art results that made use of networks trained with biometric datasets with millions of images and fine-tuned for the target periocular dataset by utilizing out-of-the-box CNNs trained for the ImageNet Recognition Challenge and standard computer vision algorithms. For example, for the Cross-Eyed dataset, we could reduce the EER by 67% and 79% (from 1.70% and 3.41% to 0.56% and 0.71%) in the Close-World and Open-World protocols, respectively, for the periocular case. We also demonstrate that traditional algorithms like SIFT can outperform CNNs in situations with limited data or scenarios where the network has not been trained with the test classes like the Open-World mode. SIFT alone was able to reduce the EER by 64% and 71.6% (from 1.7% and 3.41% to 0.6% and 0.97%) for Cross-Eyed in the Close-World and Open-World protocols, respectively, and a reduction of 4.6% (from 3.94% to 3.76%) in the PolyU database for the Open-World and single biometric case.
    摘要 我们发现,使用 widely used convolutional neural network (CNN) 模型进行一shot periocular recognition task 时,需要进行训练,这会带来一些挑战,主要是因为数据库的动态性和需要Subject collaboration для数据采集。在这篇论文中,我们研究了深度表示的 CNN 模型在极端数据缺乏情况下的行为,我们分析了 CNN 层的输出作为标识特征向量,并评估了领域适应对不可见数据的影响。我们还评估了方法的数据 нормализа和最佳层的泛化性。我们发现,通过使用 out-of-the-box CNNs 训练 ImageNet Recognition Challenge 和标准计算机视觉算法,可以提高 state-of-the-art 结果。例如,在 Cross-Eyed 数据集上,我们可以降低 EER 值by 67% 和 79% (从 1.70% 和 3.41% 降至 0.56% 和 0.71%) 在 Close-World 和 Open-World 协议中。我们还证明,传统算法如 SIFT 可以在有限数据或测试类不同于网络训练类的情况下表现更好。SIFT 独立地降低了 EER 值by 64% 和 71.6% (从 1.7% 和 3.41% 降至 0.6% 和 0.97%) 在 Cross-Eyed 数据集上,并在 PolyU 数据库中降低了 4.6% (从 3.94% 降至 3.76%)。

Hyperspherical Embedding for Point Cloud Completion

  • paper_url: http://arxiv.org/abs/2307.05634
  • repo_url: https://github.com/haomengz/hyperpc
  • paper_authors: Junming Zhang, Haomeng Zhang, Ram Vasudevan, Matthew Johnson-Roberson
  • for: 提高3D点云补充任务的完teness和精度。
  • methods: 提出了一种卷积扩展模块,可以将encoder中提取的嵌入特征转换到卷积扩展模块,并在这个模块中进行正则化。这使得输出的卷积扩展嵌入具有更好的稳定性和更紧凑的分布。
  • results: 实验结果显示,在单任务和多任务学习中,提出的方法可以有效地提高点云补充任务的完teness和精度。
    Abstract Most real-world 3D measurements from depth sensors are incomplete, and to address this issue the point cloud completion task aims to predict the complete shapes of objects from partial observations. Previous works often adapt an encoder-decoder architecture, where the encoder is trained to extract embeddings that are used as inputs to generate predictions from the decoder. However, the learned embeddings have sparse distribution in the feature space, which leads to worse generalization results during testing. To address these problems, this paper proposes a hyperspherical module, which transforms and normalizes embeddings from the encoder to be on a unit hypersphere. With the proposed module, the magnitude and direction of the output hyperspherical embedding are decoupled and only the directional information is optimized. We theoretically analyze the hyperspherical embedding and show that it enables more stable training with a wider range of learning rates and more compact embedding distributions. Experiment results show consistent improvement of point cloud completion in both single-task and multi-task learning, which demonstrates the effectiveness of the proposed method.
    摘要 大多数实际世界3D测量从深度传感器是不完整的,为了解决这个问题,点云完成任务目标是预测对象的完整形状从部分观察记录。先前的工作通常采用了编码器-解码器架构,其中编码器训练以提取嵌入 Vector,并将其用于解码器生成预测。然而,学习的嵌入 Vector 有稀疏分布在特征空间,这会导致测试时的泛化结果更差。为解决这些问题,这篇论文提出了偏球模块,它将编码器输出的嵌入 Vector 转换并 норralize,使其在单位偏球上。通过该模块,输出偏球嵌入的大小和方向分解,只有方向信息被优化。我们 theoretically 分析了偏球嵌入,并证明它使得更稳定地训练,并在更广泛的学习速率范围内更加紧凑地表示。实验结果表明,提案方法在单任务和多任务学习中都具有稳定的改进效果,这证明了提案的方法的有效性。

Offline and Online Optical Flow Enhancement for Deep Video Compression

  • paper_url: http://arxiv.org/abs/2307.05092
  • repo_url: None
  • paper_authors: Chuanbo Tang, Xihua Sheng, Zhuoyuan Li, Haotian Zhang, Li Li, Dong Liu
  • for: 提高深度视频压缩网络的效率,使其更好地利用视频帧之间的时间相似性。
  • methods: 在两个阶段进行增强:在线阶段使用梯度下降算法对视频进行适应性优化,而在离线阶段使用训练过的光流估计网络进行光流估计,并与传统视频压缩方案(如H.266/VVC)的运动信息进行结合。
  • results: 对一种现有的深度视频压缩方案DCVC进行实验,实验结果表明,将在线和离线增强结合使用可以在测试视频上平均获得12.8%的比特率减少,而无需增加解码器的模型或计算复杂度。
    Abstract Video compression relies heavily on exploiting the temporal redundancy between video frames, which is usually achieved by estimating and using the motion information. The motion information is represented as optical flows in most of the existing deep video compression networks. Indeed, these networks often adopt pre-trained optical flow estimation networks for motion estimation. The optical flows, however, may be less suitable for video compression due to the following two factors. First, the optical flow estimation networks were trained to perform inter-frame prediction as accurately as possible, but the optical flows themselves may cost too many bits to encode. Second, the optical flow estimation networks were trained on synthetic data, and may not generalize well enough to real-world videos. We address the twofold limitations by enhancing the optical flows in two stages: offline and online. In the offline stage, we fine-tune a trained optical flow estimation network with the motion information provided by a traditional (non-deep) video compression scheme, e.g. H.266/VVC, as we believe the motion information of H.266/VVC achieves a better rate-distortion trade-off. In the online stage, we further optimize the latent features of the optical flows with a gradient descent-based algorithm for the video to be compressed, so as to enhance the adaptivity of the optical flows. We conduct experiments on a state-of-the-art deep video compression scheme, DCVC. Experimental results demonstrate that the proposed offline and online enhancement together achieves on average 12.8% bitrate saving on the tested videos, without increasing the model or computational complexity of the decoder side.
    摘要 视频压缩通过利用视频帧之间的时间重复来实现,通常是通过计算运动信息来实现。运动信息通常被表示为光流在大多数现有的深度视频压缩网络中。然而,这些网络经常采用预训练的光流估计网络进行运动估计。然而,光流可能不适合视频压缩,因为以下两点:首先,光流估计网络通常是为了尽可能准确地进行间帧预测,但光流本身可能需要太多比特来编码。其次,光流估计网络通常是在 sintetic 数据上训练的,可能无法适应实际视频。我们解决这两个限制,在两个阶段进行增强:离线阶段和在线阶段。在离线阶段,我们使用已经训练的光流估计网络,并将其与传统非深度视频压缩方案,如 H.266/VVC 提供的运动信息进行精度调整。在线阶段,我们使用一种基于梯度下降的算法来优化压缩中的缓存特征,以提高视频的适应性。我们在一个现有的深度视频压缩方案,DCVC 上进行了实验,实验结果表明,我们的离线和在线增强结合使得在测试视频上 average 12.8% 比特率减少,而无需增加解码器的模型或计算复杂度。

SAR-NeRF: Neural Radiance Fields for Synthetic Aperture Radar Multi-View Representation

  • paper_url: http://arxiv.org/abs/2307.05087
  • repo_url: None
  • paper_authors: Zhengxin Lei, Feng Xu, Jiangtao Wei, Feng Cai, Feng Wang, Ya-Qiu Jin
  • for: 本研究旨在提出一种基于NeRF的SAR图像生成模型,以增强SAR图像的多视图表示和泛化能力。
  • methods: 该模型根据SAR探测机制和神经网络结合,使用可导渠Rendering方程式来表示SAR图像的生成。
  • results: 经过量化实验表明,SAR-NeRF可以有效地表示SAR图像的多视图特征,并且可以在少量学习setup下提高SAR目标分类精度。 Specifically, with only 12 images per class, the model achieved a 10-type classification accuracy of 91.6%.
    Abstract SAR images are highly sensitive to observation configurations, and they exhibit significant variations across different viewing angles, making it challenging to represent and learn their anisotropic features. As a result, deep learning methods often generalize poorly across different view angles. Inspired by the concept of neural radiance fields (NeRF), this study combines SAR imaging mechanisms with neural networks to propose a novel NeRF model for SAR image generation. Following the mapping and projection pinciples, a set of SAR images is modeled implicitly as a function of attenuation coefficients and scattering intensities in the 3D imaging space through a differentiable rendering equation. SAR-NeRF is then constructed to learn the distribution of attenuation coefficients and scattering intensities of voxels, where the vectorized form of 3D voxel SAR rendering equation and the sampling relationship between the 3D space voxels and the 2D view ray grids are analytically derived. Through quantitative experiments on various datasets, we thoroughly assess the multi-view representation and generalization capabilities of SAR-NeRF. Additionally, it is found that SAR-NeRF augumented dataset can significantly improve SAR target classification performance under few-shot learning setup, where a 10-type classification accuracy of 91.6\% can be achieved by using only 12 images per class.
    摘要 SAR图像具有高敏感性,因此在不同视角下 exhibit 显著的变化,这使得深度学习方法很难generalize。为了解决这问题,本研究提出了基于NeRF的SAR图像生成模型。通过mapping和projection原理,我们模型了SAR图像为attenuation coefficients和scattering intensities的函数在3D图像空间中。然后,我们构建了SAR-NeRF模型来学习voxels中的分布attenuation coefficients和scattering intensities。我们通过多个实验证明了SAR-NeRF的多视角表示和泛化能力。此外,我们发现SAR-NeRF的增强集合可以大幅提高SAR目标分类性能,特别是在少量学习 setup 下,只需使用12张图像每类就可以达到91.6%的10种分类精度。

Estimating label quality and errors in semantic segmentation data via any model

  • paper_url: http://arxiv.org/abs/2307.05080
  • repo_url: None
  • paper_authors: Vedang Lad, Jonas Mueller
  • for: 提高 semantic segmentation 数据集的标注质量,减少人工标注错误。
  • methods: 使用 probabilistic 预测来自 segmentation 模型,对每个像素的标注进行评分,以便优先级排序需要审核的数据。
  • results: 通过在 DeepLabV3+ 和 FPN segmentation 模型上进行多种标注质量评分方法的研究,发现使用 soft-minimum 方法可以最 effectively 检测多种标注错误。
    Abstract The labor-intensive annotation process of semantic segmentation datasets is often prone to errors, since humans struggle to label every pixel correctly. We study algorithms to automatically detect such annotation errors, in particular methods to score label quality, such that the images with the lowest scores are least likely to be correctly labeled. This helps prioritize what data to review in order to ensure a high-quality training/evaluation dataset, which is critical in sensitive applications such as medical imaging and autonomous vehicles. Widely applicable, our label quality scores rely on probabilistic predictions from a trained segmentation model -- any model architecture and training procedure can be utilized. Here we study 7 different label quality scoring methods used in conjunction with a DeepLabV3+ or a FPN segmentation model to detect annotation errors in a version of the SYNTHIA dataset. Precision-recall evaluations reveal a score -- the soft-minimum of the model-estimated likelihoods of each pixel's annotated class -- that is particularly effective to identify images that are mislabeled, across multiple types of annotation error.
    摘要 人工准备繁琐的标注过程经常存在错误,因为人们很难对每个像素进行正确的标注。我们研究自动检测标注错误的算法,特别是用于评估标注质量的方法,以便优先级化审核数据,以确保高质量的训练/评估数据集,这对敏感应用如医学成像和自动驾驶来说非常重要。我们的标注质量分数可以适用于任何模型架构和训练过程。在这里,我们研究了7种不同的标注质量分数方法,与DeepLabV3+或FPN segmentation模型结合使用,以检测SYNTHIA数据集中的标注错误。精度-回快评估表明,使用模型估计每个像素的类别概率的软最小值分数是特别有效地标识错tilted图像,并且适用于多种标注错误类型。

Disentangled Contrastive Image Translation for Nighttime Surveillance

  • paper_url: http://arxiv.org/abs/2307.05038
  • repo_url: None
  • paper_authors: Guanzhou Lan, Bin Zhao, Xuelong Li
  • for: 本研究旨在提高夜间监控质量,增强安全性。
  • methods: 本文提出了一种夜间监控到日间监控的翻译方法,包括一种学习物理约束(色彩不变),以及一种分离表示(auxiliary pretext task)。
  • results: 对比 existed 方法,本研究的方法在高精度翻译下表现出色,并且可以自动提取 semantics。
    Abstract Nighttime surveillance suffers from degradation due to poor illumination and arduous human annotations. It is challengable and remains a security risk at night. Existing methods rely on multi-spectral images to perceive objects in the dark, which are troubled by low resolution and color absence. We argue that the ultimate solution for nighttime surveillance is night-to-day translation, or Night2Day, which aims to translate a surveillance scene from nighttime to the daytime while maintaining semantic consistency. To achieve this, this paper presents a Disentangled Contrastive (DiCo) learning method. Specifically, to address the poor and complex illumination in the nighttime scenes, we propose a learnable physical prior, i.e., the color invariant, which provides a stable perception of a highly dynamic night environment and can be incorporated into the learning pipeline of neural networks. Targeting the surveillance scenes, we develop a disentangled representation, which is an auxiliary pretext task that separates surveillance scenes into the foreground and background with contrastive learning. Such a strategy can extract the semantics without supervision and boost our model to achieve instance-aware translation. Finally, we incorporate all the modules above into generative adversarial networks and achieve high-fidelity translation. This paper also contributes a new surveillance dataset called NightSuR. It includes six scenes to support the study on nighttime surveillance. This dataset collects nighttime images with different properties of nighttime environments, such as flare and extreme darkness. Extensive experiments demonstrate that our method outperforms existing works significantly. The dataset and source code will be released on GitHub soon.
    摘要 夜间监测受到质量下降的影响,主要是因为低光照和复杂的人工标注。这是一个安全风险。现有方法使用多spectral图像来感知夜间 объекts,但这些图像受到低分辨率和颜色缺失的限制。我们认为夜间监测的最终解决方案是夜晚到白天(Night2Day)翻译,它目的是在保持semantic相同性的前提下,将夜间监测场景翻译成白天场景。为此,本文提出了一种分解对比(DiCo)学习方法。Specifically,我们提出了一个学习可能的物理先验(color invariant),以稳定夜间场景中的高度动态环境,并将其integrated into neural networks的学习管道。Targeting Surveillance scenes,我们开发了一种分解表示(disentangled representation),它是一个auxiliary pretext task,通过对比学习,将surveillance scene分解成背景和前景。这种策略可以不supervision提取 semantics,并且提高我们的模型实现instancet-aware翻译。最后,我们将所有模块集成到生成对抗网络中,实现高效翻译。此外,我们还提供了一个新的监测 Datasetcalled NightSuR,它包括六个场景,以支持夜间监测的研究。这个数据集收集了不同夜间环境的夜间图像,例如炸彩和极暗。我们的方法在实验中表现出色,与现有方法相比,具有显著的优势。数据集和源代码将很快在 GitHub上发布。

Towards Anytime Optical Flow Estimation with Event Cameras

  • paper_url: http://arxiv.org/abs/2307.05033
  • repo_url: https://github.com/yaozhuwa/eva-flow
  • paper_authors: Yaozu Ye, Hao Shi, Kailun Yang, Ze Wang, Xiaoting Yin, Yaonan Wang, Kaiwei Wang
  • for: 该研究旨在使用事件摄像头实现高 Frame Rate 低延迟的光流估计,并提供高 Frame Rate 事件摄像头的数据集。
  • methods: 该研究使用了 Unified Voxel Grid 和 EVent-based Anytime Flow estimation 网络(EVA-Flow),以及 Stacked Spatiotemporal Motion Refinement(SMR)模块来实现高 Frame Rate 低延迟的光流估计。
  • results: 该研究实现了竞争性的性能,包括超低延迟(5毫秒)、最快的推理(9.2毫秒)、时间密集的运动估计(200Hz)和强大的总体化。
    Abstract Event cameras are capable of responding to log-brightness changes in microseconds. Its characteristic of producing responses only to the changing region is particularly suitable for optical flow estimation. In contrast to the super low-latency response speed of event cameras, existing datasets collected via event cameras, however, only provide limited frame rate optical flow ground truth, (e.g., at 10Hz), greatly restricting the potential of event-driven optical flow. To address this challenge, we put forward a high-frame-rate, low-latency event representation Unified Voxel Grid, sequentially fed into the network bin by bin. We then propose EVA-Flow, an EVent-based Anytime Flow estimation network to produce high-frame-rate event optical flow with only low-frame-rate optical flow ground truth for supervision. The key component of our EVA-Flow is the stacked Spatiotemporal Motion Refinement (SMR) module, which predicts temporally-dense optical flow and enhances the accuracy via spatial-temporal motion refinement. The time-dense feature warping utilized in the SMR module provides implicit supervision for the intermediate optical flow. Additionally, we introduce the Rectified Flow Warp Loss (RFWL) for the unsupervised evaluation of intermediate optical flow in the absence of ground truth. This is, to the best of our knowledge, the first work focusing on anytime optical flow estimation via event cameras. A comprehensive variety of experiments on MVSEC, DESC, and our EVA-FlowSet demonstrates that EVA-Flow achieves competitive performance, super-low-latency (5ms), fastest inference (9.2ms), time-dense motion estimation (200Hz), and strong generalization. Our code will be available at https://github.com/Yaozhuwa/EVA-Flow.
    摘要 事件摄像机可以在微秒级响应差值变化。其特点是仅响应变化区域,特别适合光流估计。相比事件摄像机的极低延迟响应速度,现有的事件摄像机采集的数据集只提供有限帧率光流真实值,例如10Hz,很大程度地限制了事件驱动的光流的潜力。为解决这个挑战,我们提出了高帧率低延迟事件表示Unified Voxel Grid,逐步传输到网络中bin处。然后,我们提出了EVENT-based Anytime Flow estimation Network(EVA-Flow),用于生成高帧率事件光流,只使用低帧率光流真实值作为监督。EVA-Flow的关键组件是堆叠的空间时间运动级化(SMR)模块,可以预测时间密集的光流,并通过空间时间运动级化进行精度提高。SMR模块使用的时间密集特征扭曲提供了隐式的监督 для中间光流。此外,我们还提出了Rectified Flow Warp Loss(RFWL),用于无监督评估中间光流的 absence of ground truth。这是我们知道的第一个关注在事件摄像机上的任何时间光流估计工作。我们的实验表明,EVA-Flow在MVSEC、DESC和我们自己的EVA-FlowSet上达到了竞争性表现,超低延迟(5ms),最快执行(9.2ms),时间密集运动估计(200Hz)和强大总体化。我们的代码将在https://github.com/Yaozhuwa/EVA-Flow上发布。

TRansPose: Large-Scale Multispectral Dataset for Transparent Object

  • paper_url: http://arxiv.org/abs/2307.05016
  • repo_url: None
  • paper_authors: Jeongyun Kim, Myung-Hwan Jeon, Sangwoo Jung, Wooseong Yang, Minwoo Jung, Jaeho Shin, Ayoung Kim
  • For: 本研究目的是提供一个大规模多spectral数据集,以促进透明物体研究。* Methods: 本研究使用了STEREO RGB-D、热色光照相机和物体位置数据来构建大规模的TRansPose数据集。* Results: 本研究提供了333,819帧影像和4,000,560个标注数据,包括透明物体和非透明物体的实例分布、实例分布和深度资讯等。
    Abstract Transparent objects are encountered frequently in our daily lives, yet recognizing them poses challenges for conventional vision sensors due to their unique material properties, not being well perceived from RGB or depth cameras. Overcoming this limitation, thermal infrared cameras have emerged as a solution, offering improved visibility and shape information for transparent objects. In this paper, we present TRansPose, the first large-scale multispectral dataset that combines stereo RGB-D, thermal infrared (TIR) images, and object poses to promote transparent object research. The dataset includes 99 transparent objects, encompassing 43 household items, 27 recyclable trashes, 29 chemical laboratory equivalents, and 12 non-transparent objects. It comprises a vast collection of 333,819 images and 4,000,056 annotations, providing instance-level segmentation masks, ground-truth poses, and completed depth information. The data was acquired using a FLIR A65 thermal infrared (TIR) camera, two Intel RealSense L515 RGB-D cameras, and a Franka Emika Panda robot manipulator. Spanning 87 sequences, TRansPose covers various challenging real-life scenarios, including objects filled with water, diverse lighting conditions, heavy clutter, non-transparent or translucent containers, objects in plastic bags, and multi-stacked objects. TRansPose dataset can be accessed from the following link: https://sites.google.com/view/transpose-dataset
    摘要 TRansPose 是一个大规模多spectral数据集,包含了333819张图像和4000056个注释,用于促进透明物体研究。该数据集包含99个透明物体,其中43个家用品、27个可回收垃圾、29个化学实验室 equipments 和12个不透明物体。它包括87个序列,涵盖了各种实际生活中的挑战,如水填充的物体、多样的照明条件、拥挤的环境、不透明或半透明容器、物体在塑料袋中、多层物体等。TRansPose 数据集可以从以下链接获取:https://sites.google.com/view/transpose-dataset。

Test-Time Training on Video Streams

  • paper_url: http://arxiv.org/abs/2307.05014
  • repo_url: https://github.com/molyswu/hand_detection
  • paper_authors: Renhao Wang, Yu Sun, Yossi Gandelsman, Xinlei Chen, Alexei A. Efros, Xiaolong Wang
  • for: further improve a trained model at test time
  • methods: online test-time training (TTT) with masked autoencoders
  • results: significant improvement (45%-66%) in instance and panoptic segmentation tasks compared to fixed-model baseline, and outperformed offline TTT with more information
    Abstract Prior work has established test-time training (TTT) as a general framework to further improve a trained model at test time. Before making a prediction on each test instance, the model is trained on the same instance using a self-supervised task, such as image reconstruction with masked autoencoders. We extend TTT to the streaming setting, where multiple test instances - video frames in our case - arrive in temporal order. Our extension is online TTT: The current model is initialized from the previous model, then trained on the current frame and a small window of frames immediately before. Online TTT significantly outperforms the fixed-model baseline for four tasks, on three real-world datasets. The relative improvement is 45% and 66% for instance and panoptic segmentation. Surprisingly, online TTT also outperforms its offline variant that accesses more information, training on all frames from the entire test video regardless of temporal order. This differs from previous findings using synthetic videos. We conceptualize locality as the advantage of online over offline TTT. We analyze the role of locality with ablations and a theory based on bias-variance trade-off.
    摘要 先前的工作已经确立了测试时重定型(TTT)为一种通用的改进已经训练的模型的框架。在测试每个实例之前,模型会在同一个实例上使用自我超级vised任务,如图像重建WithMasked autoencoders进行训练。我们将TTT扩展到流动设定, где多个测试实例(视频帧在我们的情况下)会在时间顺序中到达。我们的扩展是在线 TTT:当前模型从前一个模型初始化,然后在当前帧和前一个小窗口的帧上进行训练。在线 TTT在四个任务上表现出色,在三个实际世界数据集上显著超过固定模型基线。相对提升为45%和66% дляINSTANCE和panoptic segmentation。 surprisingly,在线 TTT还超过了其停机 variant,即训练所有测试视频帧的整体信息,无论时间顺序。这与之前使用 synthetic videos 的发现不同。我们认为地域性是在线 TTT 的优势。我们通过拓展和基于偏差-variance 质量的理论进行分析地域性的角色。

Neural Point-based Volumetric Avatar: Surface-guided Neural Points for Efficient and Photorealistic Volumetric Head Avatar

  • paper_url: http://arxiv.org/abs/2307.05000
  • repo_url: None
  • paper_authors: Cong Wang, Di Kang, Yanpei Cao, Linchao Bao, Ying Shan, Song-Hai Zhang
  • for: 提高 AR/VR 和视频会议应用中人头的写实感和动态运动
  • methods: 使用神经点表示和神经体积渲染过程,避免使用 mesh-based 方法的固定连接和硬对应
  • results: 在三个 Multiface 数据集上进行实验,表现比前一代方法更好,特别是在处理困难的 facial 区域时Here’s the full translation of the abstract in Simplified Chinese:
  • for: 提高 AR/VR 和视频会议应用中人头的写实感和动态运动
  • methods: 使用神经点表示和神经体积渲染过程,避免使用 mesh-based 方法的固定连接和硬对应
  • results: 在三个 Multiface 数据集上进行实验,表现比前一代方法更好,特别是在处理困难的 facial 区域时I hope that helps!
    Abstract Rendering photorealistic and dynamically moving human heads is crucial for ensuring a pleasant and immersive experience in AR/VR and video conferencing applications. However, existing methods often struggle to model challenging facial regions (e.g., mouth interior, eyes, hair/beard), resulting in unrealistic and blurry results. In this paper, we propose {\fullname} ({\name}), a method that adopts the neural point representation as well as the neural volume rendering process and discards the predefined connectivity and hard correspondence imposed by mesh-based approaches. Specifically, the neural points are strategically constrained around the surface of the target expression via a high-resolution UV displacement map, achieving increased modeling capacity and more accurate control. We introduce three technical innovations to improve the rendering and training efficiency: a patch-wise depth-guided (shading point) sampling strategy, a lightweight radiance decoding process, and a Grid-Error-Patch (GEP) ray sampling strategy during training. By design, our {\name} is better equipped to handle topologically changing regions and thin structures while also ensuring accurate expression control when animating avatars. Experiments conducted on three subjects from the Multiface dataset demonstrate the effectiveness of our designs, outperforming previous state-of-the-art methods, especially in handling challenging facial regions.
    摘要 displaying photorealistic and dynamically moving human heads is crucial for creating a pleasant and immersive experience in AR/VR and video conferencing applications. However, existing methods often struggle to model challenging facial regions (e.g., mouth interior, eyes, hair/beard), resulting in unrealistic and blurry results. In this paper, we propose (Name)(Method), a method that adopts the neural point representation as well as the neural volume rendering process and discards the predefined connectivity and hard correspondence imposed by mesh-based approaches. Specifically, the neural points are strategically constrained around the surface of the target expression via a high-resolution UV displacement map, achieving increased modeling capacity and more accurate control. We introduce three technical innovations to improve the rendering and training efficiency: a patch-wise depth-guided (shading point) sampling strategy, a lightweight radiance decoding process, and a Grid-Error-Patch (GEP) ray sampling strategy during training. By design, our (Name)is better equipped to handle topologically changing regions and thin structures while also ensuring accurate expression control when animating avatars. Experiments conducted on three subjects from the Multiface dataset demonstrate the effectiveness of our designs, outperforming previous state-of-the-art methods, especially in handling challenging facial regions.

$\mathrm{SAM^{Med}}$: A medical image annotation framework based on large vision model

  • paper_url: http://arxiv.org/abs/2307.05617
  • repo_url: None
  • paper_authors: Chenglong Wang, Dexuan Li, Sucheng Wang, Chengxiu Zhang, Yida Wang, Yun Liu, Guang Yang
  • for: 这个研究旨在应用大量数据量计算机视觉模型,尤其是Segment Anything Model(SAM),以提高医疗影像标注的效率和精度。
  • methods: 本研究提出了一个增强的框架,名为$\mathrm{SAM^{Med}$,它利用SAM的能力以及提示学习的方法来测试医疗影像标注的下游任务。 $\mathrm{SAM^{Med}$框架包括两个子模组,namely $\mathrm{SAM^{assist}$和$\mathrm{SAM^{auto}$.
  • results: 研究结果显示,$\mathrm{SAM^{Med}$在医疗影像标注任务中具有优秀的效率和精度。 Specifically, the proposed SAP-Net model achieved an average Dice coefficient of 0.80 and 0.82 for kidney and liver segmentation, respectively, with only five annotated slices.
    Abstract Recently, large vision model, Segment Anything Model (SAM), has revolutionized the computer vision field, especially for image segmentation. SAM presented a new promptable segmentation paradigm that exhibit its remarkable zero-shot generalization ability. An extensive researches have explore the potential and limits of SAM in various downstream tasks. In this study, we presents $\mathrm{SAM^{Med}$, an enhanced framework for medical image annotation that leverages the capabilities of SAM. $\mathrm{SAM^{Med}$ framework consisted of two submodules, namely $\mathrm{SAM^{assist}$ and $\mathrm{SAM^{auto}$. The $\mathrm{SAM^{assist}$ demonstrates the generalization ability of SAM to the downstream medical segmentation task using the prompt-learning approach. Results show a significant improvement in segmentation accuracy with only approximately 5 input points. The $\mathrm{SAM^{auto}$ model aims to accelerate the annotation process by automatically generating input prompts. The proposed SAP-Net model achieves superior segmentation performance with only five annotated slices, achieving an average Dice coefficient of 0.80 and 0.82 for kidney and liver segmentation, respectively. Overall, $\mathrm{SAM^{Med}$ demonstrates promising results in medical image annotation. These findings highlight the potential of leveraging large-scale vision models in medical image annotation tasks.
    摘要 最近,大型视觉模型Segment Anything Model(SAM)在计算机视觉领域中引起了革命性的变革,特别是在图像分割方面。SAM提出了一种新的可Promptable分割 парадигма,其表现出了强大的零学习泛化能力。多个研究已经探讨了SAM在不同下游任务中的潜力和局限性。在这项研究中,我们提出了 $\mathrm{SAM^{Med}$ 框架,这是基于 SAM 的医疗图像注释框架。 $\mathrm{SAM^{Med}$ 框架由两个子模块组成:$\mathrm{SAM^{assist}$ 和 $\mathrm{SAM^{auto}$。 $\mathrm{SAM^{assist}$ 通过示例学习方法来表明 SAM 在下游医疗图像分割任务中的泛化能力。结果表明,只需要约5个输入点,就可以 achieve significanly 提高分割精度。 $\mathrm{SAM^{auto}$ 模型则目的是加速注释过程,通过自动生成输入点来减少人工干预。我们提出的 SAP-Net 模型在只有5个注释 slice 的情况下,实现了平均的 dice 系数为 0.80 和 0.82 для肾脏和肝脏分割任务。总的来说, $\mathrm{SAM^{Med}$ 表现出了出色的结果在医疗图像注释任务中。这些发现 highlights 大型视觉模型在医疗图像注释任务中的潜力。

A Multi-view Impartial Decision Network for Frontotemporal Dementia Diagnosis

  • paper_url: http://arxiv.org/abs/2307.04981
  • repo_url: None
  • paper_authors: Guoyao Deng, Ke Zou, Meng Wang, Xuedong Yuan, Sancong Ying, Huazhu Fu
  • for: 本研究旨在提出一种可靠的多视图偏函数磁共振成像(fMRI)诊断前置部 tumor(FTD)的方法,以解决现有的FTD诊断方法存在的两个限制。
  • methods: 我们提出了一种可靠的多视图不偏诊断网络(MID-Net),使用多个专家模型提取fMRI图像中丰富的神经网络信息,并使用Dirichlet分布来描述专家类别概率分布。我们还提出了一种新的不偏决定器(IDer),以合并不同专家意见而不需要额外计算成本。
  • results: 我们的MID-Net在高质量FTD fMRI数据集上进行了广泛的实验,并证明了其超过了之前的方法,并提供了高不确定性的硬编译例子。我们认为,我们的方法代表了在多专家条件下可靠的FTD决策的一个重要一步。
    Abstract Frontotemporal Dementia (FTD) diagnosis has been successfully progress using deep learning techniques. However, current FTD identification methods suffer from two limitations. Firstly, they do not exploit the potential of multi-view functional magnetic resonance imaging (fMRI) for classifying FTD. Secondly, they do not consider the reliability of the multi-view FTD diagnosis. To address these limitations, we propose a reliable multi-view impartial decision network (MID-Net) for FTD diagnosis in fMRI. Our MID-Net provides confidence for each view and generates a reliable prediction without any conflict. To achieve this, we employ multiple expert models to extract evidence from the abundant neural network information contained in fMRI images. We then introduce the Dirichlet Distribution to characterize the expert class probability distribution from an evidence level. Additionally, a novel Impartial Decision Maker (IDer) is proposed to combine the different opinions inductively to arrive at an unbiased prediction without additional computation cost. Overall, our MID-Net dynamically integrates the decisions of different experts on FTD disease, especially when dealing with multi-view high-conflict cases. Extensive experiments on a high-quality FTD fMRI dataset demonstrate that our model outperforms previous methods and provides high uncertainty for hard-to-classify examples. We believe that our approach represents a significant step toward the deployment of reliable FTD decision-making under multi-expert conditions. We will release the codes for reproduction after acceptance.
    摘要 前rontemporal dementia(FTD)诊断已成功应用深入学习技术。然而,当前FTD诊断方法受到两种限制。首先,它们不利用多视图功能磁共振成像(fMRI)来分类FTD。其次,它们不考虑多视图FTD诊断的可靠性。为了解决这些限制,我们提议一种可靠的多视图偏见网络(MID-Net) дляFTD诊断。我们的MID-Net提供每个视图的信任度,并生成不受冲突的预测。为实现这一目标,我们采用多个专家模型来提取丰富的神经网络信息,包括fMRI图像中的讯息。然后,我们引入Dirichlet分布来描述专家类别概率分布的证据水平。此外,我们还提出了一种新的偏见决策器(IDer),用于在不同专家意见之间协调决策,从而实现无偏见的预测。总的来说,我们的MID-Net可以动态集成不同专家的FTD病变诊断,特别是在多视图高冲突的案例中。广泛的实验表明,我们的模型在高质量FTD fMRI数据集上表现出色,并提供高度不确定的答案。我们认为,我们的方法代表了多视图FTD诊断领域中一个重要的突破,并将为多个专家决策提供可靠的基础。我们将在接受后发布代码。

Diffusion idea exploration for art generation

  • paper_url: http://arxiv.org/abs/2307.04978
  • repo_url: None
  • paper_authors: Nikhil Verma
  • For: The paper is written for generating creative art using text and rough sketches as guiding information.* Methods: The paper uses state-of-the-art diffusion models to generate images, which start with a pattern of random dots and convert it into a design image based on the guiding information fed into the model.* Results: The initial experiments demonstrated promising qualitative results.Here’s the simplified Chinese text for the three information points:* For: 这篇论文是为了使用文本和粗略图像作为导引信息,生成创新的艺术。* Methods: 这篇论文使用当前最佳的扩散模型来生成图像,这些模型从Random Dots开始,逐渐转化为设计图像,以导入模型中的导引信息为基础。* Results: 初步实验表现出了有前途的质量结果。
    Abstract Cross-Modal learning tasks have picked up pace in recent times. With plethora of applications in diverse areas, generation of novel content using multiple modalities of data has remained a challenging problem. To address the same, various generative modelling techniques have been proposed for specific tasks. Novel and creative image generation is one important aspect for industrial application which could help as an arm for novel content generation. Techniques proposed previously used Generative Adversarial Network(GAN), autoregressive models and Variational Autoencoders (VAE) for accomplishing similar tasks. These approaches are limited in their capability to produce images guided by either text instructions or rough sketch images decreasing the overall performance of image generator. We used state of the art diffusion models to generate creative art by primarily leveraging text with additional support of rough sketches. Diffusion starts with a pattern of random dots and slowly converts that pattern into a design image using the guiding information fed into the model. Diffusion models have recently outperformed other generative models in image generation tasks using cross modal data as guiding information. The initial experiments for this task of novel image generation demonstrated promising qualitative results.
    摘要 Cross-modal learning tasks 在 latest 时间内得到了更多的应用。通过多种数据模式,生成新的内容是一个挑战性的问题。为了解决这个问题,各种生成模型技术被提出来用于特定任务。 novel 和创新的图像生成是一个重要的工业应用,可以作为内容生成的一个手臂。 previously proposed 技术使用 Generative Adversarial Network(GAN)、autoregressive 模型和 Variational Autoencoders (VAE) 来完成相似的任务。这些方法受限于能够根据文本指令或粗略绘制图像来生成图像,这会降低整体图像生成器的性能。我们使用了 state of the art 的扩散模型来生成创新艺术, primarily 利用文本,并且具有附加的粗略绘制图像支持。扩散模型在使用交叉模式数据作为指导信息时在图像生成任务中最近表现出了比其他生成模型更好的性能。初步实验表明,这项新图像生成任务的Result 是非常有前途的。

SAM-U: Multi-box prompts triggered uncertainty estimation for reliable SAM in medical image

  • paper_url: http://arxiv.org/abs/2307.04973
  • repo_url: None
  • paper_authors: Guoyao Deng, Ke Zou, Kai Ren, Meng Wang, Xuedong Yuan, Sancong Ying, Huazhu Fu
  • for: 这个研究旨在提高Segmenting Anything(SAM)的可靠性和公平性,特别在医疗领域。
  • methods: 该研究提出了多个框架触发的uncertainty估计方法,通过Monte Carlo方法使用不同的提示参数来估计SAM预测结果的分布。
  • results: 实验结果表明,多个框架触发的augmentation可以提高SAM性能,并为每个像素提供不确定性。这成为了第一种可靠的SAM paradigm。
    Abstract Recently, Segmenting Anything has taken an important step towards general artificial intelligence. At the same time, its reliability and fairness have also attracted great attention, especially in the field of health care. In this study, we propose multi-box prompts triggered uncertainty estimation for SAM cues to demonstrate the reliability of segmented lesions or tissues. We estimate the distribution of SAM predictions via Monte Carlo with prior distribution parameters, which employs different prompts as formulation of test-time augmentation. Our experimental results found that multi-box prompts augmentation improve the SAM performance, and endowed each pixel with uncertainty. This provides the first paradigm for a reliable SAM.
    摘要 最近,Segmenting Anything(SAM)已经向全面人工智能至关重要的一步。同时,其可靠性和公正性也吸引了广泛的关注,特别是医疗领域。在这项研究中,我们提议使用多个框架触发uncertainty估计来证明分割结果的可靠性。我们通过Monte Carlo方法使用不同的提示参数来估算SAM预测结果的分布。实验结果显示,多个框架增强可以提高SAM性能,并赋予每个像素不确定性。这提供了第一种可靠的SAM方法。Note: "Segmenting Anything" (SAM) is a fictional concept, and the text is a hypothetical research paper. The translation is provided for illustration purposes only.

Image Reconstruction using Enhanced Vision Transformer

  • paper_url: http://arxiv.org/abs/2307.05616
  • repo_url: None
  • paper_authors: Nikhil Verma, Deepkamal Kaur, Lydia Chau
  • for: 这个项目的目标是提高计算机视觉领域中图像去噪的能力,以便提高图像的量化测量精度。
  • methods: 该项目提出了一种基于视图转换器(ViT)的图像重建框架,该框架可以用于图像去噪、抑噪和填充等任务。项目还integrated了四种优化技术来提高模型的重建能力,包括局部敏感注意力(LSA)、偏移patch Tokenization(SPT)、旋转位嵌入(RoPE)以及基于生成对抗网络(GANs)的敌对损失函数。这些优化技术使得 transformer 更加有效地学习 dataset,而且增强了重建图像的分辨率。
  • results: 根据我们的实验,提出的架构在图像去噪和填充任务上的重建性能比对比(U-Net)模型高出3.5%的结构相似指标(SSIM)。而在添加了LSA、SPT和RoPE优化技术后,提出的架构在两个任务上的重建性能增加了大约5%的SSIM。
    Abstract Removing noise from images is a challenging and fundamental problem in the field of computer vision. Images captured by modern cameras are inevitably degraded by noise which limits the accuracy of any quantitative measurements on those images. In this project, we propose a novel image reconstruction framework which can be used for tasks such as image denoising, deblurring or inpainting. The model proposed in this project is based on Vision Transformer (ViT) that takes 2D images as input and outputs embeddings which can be used for reconstructing denoised images. We incorporate four additional optimization techniques in the framework to improve the model reconstruction capability, namely Locality Sensitive Attention (LSA), Shifted Patch Tokenization (SPT), Rotary Position Embeddings (RoPE) and adversarial loss function inspired from Generative Adversarial Networks (GANs). LSA, SPT and RoPE enable the transformer to learn from the dataset more efficiently, while the adversarial loss function enhances the resolution of the reconstructed images. Based on our experiments, the proposed architecture outperforms the benchmark U-Net model by more than 3.5\% structural similarity (SSIM) for the reconstruction tasks of image denoising and inpainting. The proposed enhancements further show an improvement of \textasciitilde5\% SSIM over the benchmark for both tasks.
    摘要 修除图像中的噪声是计算机视觉领域的一个挑战性问题。现代摄像头捕捉的图像都会受到噪声的影响,从而限制图像的准确性。在这个项目中,我们提出了一种新的图像重建框架,可以用于图像杂谱、模糊和缺失部分的修复。我们的模型基于视觉变换器(ViT),它可以将2D图像作为输入,并生成用于重建清晰图像的嵌入。我们在框架中添加了四种优化技术,以提高模型的重建能力,即:局部敏感注意力(LSA)、移动矩阵Tokenization(SPT)、旋转位嵌入(RoPE)以及基于生成对抗网络(GANs)的对抗损失函数。LSA、SPT和RoPE使得转换器能够更有效地学习数据集,而对抗损失函数则提高了重建图像的分辨率。根据我们的实验,我们的建议架构在图像杂谱和缺失部分的重建任务上超过了标准U-Net模型的3.5%Structural Similarity(SSIM),而我们的优化技术还能够提高\textasciitilde5% SSIM。

PKU-GoodsAD: A Supermarket Goods Dataset for Unsupervised Anomaly Detection and Segmentation

  • paper_url: http://arxiv.org/abs/2307.04956
  • repo_url: https://github.com/jianzhang96/goodsad
  • paper_authors: Jian Zhang, Runwei Ding, Miaoju Ban, Ge Yang
  • for: 这个研究是为了实现自动化超市商品异常检测,以扩展计算机视觉领域内的异常检测应用和研究。
  • methods: 这个研究使用了现有的无监督异常检测方法,并对其进行了评估。
  • results: 研究发现,一些在工业异常检测dataset(例如MVTec AD)中表现良好的方法对于这个超市商品异常检测dataset表现不佳,这是一个全面、多bject的异常检测 dataset。
    Abstract Visual anomaly detection is essential and commonly used for many tasks in the field of computer vision. Recent anomaly detection datasets mainly focus on industrial automated inspection, medical image analysis and video surveillance. In order to broaden the application and research of anomaly detection in unmanned supermarkets and smart manufacturing, we introduce the supermarket goods anomaly detection (GoodsAD) dataset. It contains 6124 high-resolution images of 484 different appearance goods divided into 6 categories. Each category contains several common different types of anomalies such as deformation, surface damage and opened. Anomalies contain both texture changes and structural changes. It follows the unsupervised setting and only normal (defect-free) images are used for training. Pixel-precise ground truth regions are provided for all anomalies. Moreover, we also conduct a thorough evaluation of current state-of-the-art unsupervised anomaly detection methods. This initial benchmark indicates that some methods which perform well on the industrial anomaly detection dataset (e.g., MVTec AD), show poor performance on our dataset. This is a comprehensive, multi-object dataset for supermarket goods anomaly detection that focuses on real-world applications.
    摘要 “视觉异常检测是计算机视觉领域中非常重要的任务之一。现有的异常检测数据集主要集中在自动化生产、医疗影像分析和视频监测等领域。为扩展无人超市和智能制造领域中的异常检测应用和研究,我们介绍了超市商品异常检测(GoodsAD)数据集。该数据集包含6124个高分辨率图像,分为6类不同的商品类型,每类含有多种常见的异常类型,如扭曲、表面损害和开启等。异常包括Texture变化和结构变化。数据集遵循无监督设置,只有无损图像用于训练。每个异常都有精确的像素精度的真实区域标注。此外,我们还进行了现有状态的权威无监督异常检测方法的完整评估。这个初始的比较表明,一些在工业异常检测数据集(例如MVTec AD)中表现出色的方法,在我们的数据集中表现不佳。这是一个complete、多对象的超市商品异常检测数据集,关注实际应用。”

Compact Twice Fusion Network for Edge Detection

  • paper_url: http://arxiv.org/abs/2307.04952
  • repo_url: https://github.com/li-yachuan/ctfn-pytorch-master
  • paper_authors: Yachuan Li, Zongmin Li, Xavier Soria P., Chaozhi Yang, Qian Xiao, Yun Bai, Hua Li, Xiangdong Wang
    for: 这篇论文旨在提出一种具有较少参数和计算成本的多级特征融合网络,以便实现高精度Edge检测。methods: 该方法包括两个轻量级多级特征融合模块:一个具有语义增强功能的语义增强模块(SEM),可以利用粗级特征中的语义信息来导引细级特征的学习;另一个是一个伪像像素级权重(PPW)模块,可以将多级特征的共同优点权重为所有特征。results: 相比于state-of-the-art方法,CTFN在BSDS500、NYUDv2和BIPEDv2三个数据集上达到了竞争性的准确率,同时具有较少的参数和计算成本。特别是,CTFN只需要0.1M的额外参数,相比其他state-of-the-art方法的参数量减少了60%。代码可以在https://github.com/Li-yachuan/CTFN-pytorch-master中下载。
    Abstract The significance of multi-scale features has been gradually recognized by the edge detection community. However, the fusion of multi-scale features increases the complexity of the model, which is not friendly to practical application. In this work, we propose a Compact Twice Fusion Network (CTFN) to fully integrate multi-scale features while maintaining the compactness of the model. CTFN includes two lightweight multi-scale feature fusion modules: a Semantic Enhancement Module (SEM) that can utilize the semantic information contained in coarse-scale features to guide the learning of fine-scale features, and a Pseudo Pixel-level Weighting (PPW) module that aggregate the complementary merits of multi-scale features by assigning weights to all features. Notwithstanding all this, the interference of texture noise makes the correct classification of some pixels still a challenge. For these hard samples, we propose a novel loss function, coined Dynamic Focal Loss, which reshapes the standard cross-entropy loss and dynamically adjusts the weights to correct the distribution of hard samples. We evaluate our method on three datasets, i.e., BSDS500, NYUDv2, and BIPEDv2. Compared with state-of-the-art methods, CTFN achieves competitive accuracy with less parameters and computational cost. Apart from the backbone, CTFN requires only 0.1M additional parameters, which reduces its computation cost to just 60% of other state-of-the-art methods. The codes are available at https://github.com/Li-yachuan/CTFN-pytorch-master.
    摘要 “多尺度特征的重要性逐渐被边缘检测社区所认可。然而,将多尺度特征融合到模型中增加了模型的复杂度,这不符合实际应用的需求。在这个工作中,我们提出了一个名为Compact Twice Fusion Network(CTFN)的方法,可以充分融合多尺度特征,同时保持模型的简洁性。CTFN包括两个轻量级多尺度特征融合模组:一个具有Semantic Enhancement Module(SEM),可以利用粗细度特征中的 semantics信息来引导细节特征的学习;另一个则是一个名为Pseudo Pixel-level Weighting(PPW)模组,可以将多尺度特征的补偿特点相互融合。不过,由于Texture noise的干扰,使得某些像素的正确分类仍然是一个挑战。为了解决这个问题,我们提出了一个新的损失函数,即Dynamic Focal Loss,它可以重新定义标准的交叉熵损失函数,并在适当的情况下动态地调整权重,以正确地对待困难的样本。我们在BSDS500、NYUDv2和BIPEDv2三个dataset上评估了我们的方法,与现有的方法相比,CTFN实现了竞争的精度,仅需额外0.1M参数,对应的计算成本只有60%。代码可以在https://github.com/Li-yachuan/CTFN-pytorch-master中找到。”

DDGM: Solving inverse problems by Diffusive Denoising of Gradient-based Minimization

  • paper_url: http://arxiv.org/abs/2307.04946
  • repo_url: None
  • paper_authors: Kyle Luther, H. Sebastian Seung
  • for: 这个论文的目的是提出一种简单的方法,用于解决逆问题,该方法combines tradicional gradient-based minimization of reconstruction error with denoising。
  • methods: 该方法使用了一个卷积神经网络来去噪,并在运行时通过这个神经网络进行反射。
  • results: 研究发现,使用这种方法可以在50个去噪步骤中获得高精度的重建结果,并且比DDRM和DPS等更复杂的扩散方法更高精度(按照MSE和SSIM的评价)。此外,该方法还可以在处理任意大小的图像上进行重建。
    Abstract Inverse problems generally require a regularizer or prior for a good solution. A recent trend is to train a convolutional net to denoise images, and use this net as a prior when solving the inverse problem. Several proposals depend on a singular value decomposition of the forward operator, and several others backpropagate through the denoising net at runtime. Here we propose a simpler approach that combines the traditional gradient-based minimization of reconstruction error with denoising. Noise is also added at each step, so the iterative dynamics resembles a Langevin or diffusion process. Both the level of added noise and the size of the denoising step decay exponentially with time. We apply our method to the problem of tomographic reconstruction from electron micrographs acquired at multiple tilt angles. With empirical studies using simulated tilt views, we find parameter settings for our method that produce good results. We show that high accuracy can be achieved with as few as 50 denoising steps. We also compare with DDRM and DPS, more complex diffusion methods of the kinds mentioned above. These methods are less accurate (as measured by MSE and SSIM) for our tomography problem, even after the generation hyperparameters are optimized. Finally we extend our method to reconstruction of arbitrary-sized images and show results on 128 $\times$ 1568 pixel images
    摘要 通常,反射问题需要一个正则化或先验来获得良好的解决方案。一种现在趋势是使用卷积网来除噪图像,并将这个网络作为先验来解决反射问题。一些提议基于前景算子的特征值分解,而其他一些在运行时通过杀噪网络进行反propagation。在这里,我们提出了一种更简单的方法,它将传统的梯度基于的最小化重建错误与杀噪结合在一起。噪声也在每步添加,因此迭代过程类似于朗凡或分散过程。噪声水平和杀噪步骤的加入幂数都随时间呈指数衰减。我们应用我们的方法于电子镜像多角度扫描问题中的重建问题。通过对 simulate tilt view 进行实验,我们选择了方法的参数设置,并发现高精度可以通过50次杀噪步骤获得。我们还与DDRM和DPS等更复杂的扩散方法进行比较,这些方法在我们的 Tomatoes 问题中具有较差的精度(按照MSE和SSIM来度量),即使通过优化其 гиперпараметров。最后,我们扩展了我们的方法到任意大小的图像重建问题,并对128 x 1568像素图像进行示例。

Count-Free Single-Photon 3D Imaging with Race Logic

  • paper_url: http://arxiv.org/abs/2307.04924
  • repo_url: None
  • paper_authors: Atul Ingle, David Maier
  • For: The paper is written for the development of an online approach for distance estimation using single-photon cameras (SPCs) that can reduce bandwidth and power consumption while maintaining similar distance reconstruction accuracy as conventional processing methods.* Methods: The paper uses race logic to process photon streams in the time-delay domain and constructs count-free equi-depth histograms using a binner element that converges on the median of a distribution.* Results: The paper shows that the proposed approach can provide an order of magnitude reduction in bandwidth and power consumption while maintaining similar distance reconstruction accuracy as conventional processing methods.
    Abstract Single-photon cameras (SPCs) have emerged as a promising technology for high-resolution 3D imaging. A single-photon 3D camera determines the round-trip time of a laser pulse by capturing the arrival of individual photons at each camera pixel. Constructing photon-timestamp histograms is a fundamental operation for a single-photon 3D camera. However, in-pixel histogram processing is computationally expensive and requires large amount of memory per pixel. Digitizing and transferring photon timestamps to an off-sensor histogramming module is bandwidth and power hungry. Here we present an online approach for distance estimation without explicitly storing photon counts. The two key ingredients of our approach are (a) processing photon streams using race logic, which maintains photon data in the time-delay domain, and (b) constructing count-free equi-depth histograms. Equi-depth histograms are a succinct representation for ``peaky'' distributions, such as those obtained by an SPC pixel from a laser pulse reflected by a surface. Our approach uses a binner element that converges on the median (or, more generally, to another quantile) of a distribution. We cascade multiple binners to form an equi-depth histogrammer that produces multi-bin histograms. Our evaluation shows that this method can provide an order of magnitude reduction in bandwidth and power consumption while maintaining similar distance reconstruction accuracy as conventional processing methods.
    摘要 单 photon 摄像机(SPC)已经出现为高分辨率 3D 成像技术的替代方案。一个单 photon 3D 摄像机可以通过记录每个像素中的单 photon 到达时间来确定激光脉冲的往返时间。构建单 photon 时间域 histogram 是基本的操作。但是,在每个像素中进行 histogram 处理是 computationally expensive 和需要大量的内存。将单 photon 时间戳转移到外部 histogramming 模块进行处理是带宽和功耗很大的。我们现在介绍一种在线方法,无需直接存储单 photon 计数,可以实现距离估计。我们的方法包括以下两个关键元素:(a) 使用 race logic 处理单 photon 流,以保持单 photon 数据在时间延迟Domain中,(b) 使用 count-free equi-depth histogram 构建方法。 equi-depth histogram 是一种简洁的表示方法,可以用于表示由 SPC 像素反射激光脉冲后得到的“峰状”分布。我们的方法使用一个 binner 元素,可以 converge 到分布的中位値(或更一般地,另一个量)。我们将多个 binner 组合成一个 equi-depth histogrammer,以生成多个分布。我们的评估表明,这种方法可以提供一个数颗级别的带宽和功耗减少,同时保持与传统处理方法相同的距离重建精度。

Kinematically-Decoupled Impedance Control for Fast Object Visual Servoing and Grasping on Quadruped Manipulators

  • paper_url: http://arxiv.org/abs/2307.04918
  • repo_url: None
  • paper_authors: Riccardo Parosi, Mattia Risiglione, Darwin G. Caldwell, Claudio Semini, Victor Barasuol
  • for: 该论文旨在提出一个基于分离式机械链和弹簧控制的对象搜寻、接近和抓取(SAG)控制管道,并与图像基于视 серво(IBVS)集成。
  • methods: 该管道使用分离式机械链,以实现快速的终端器运动和恢复,从而实现可靠的视 серво。
  • results: 在我们140公斤HyQReal四脚机器人上测试了该管道,并在不同的动态移动、外部干扰和快速目标物移动等情况下,表现出了高效和稳定的性能。
    Abstract We propose a control pipeline for SAG (Searching, Approaching, and Grasping) of objects, based on a decoupled arm kinematic chain and impedance control, which integrates image-based visual servoing (IBVS). The kinematic decoupling allows for fast end-effector motions and recovery that leads to robust visual servoing. The whole approach and pipeline can be generalized for any mobile platform (wheeled or tracked vehicles), but is most suitable for dynamically moving quadruped manipulators thanks to their reactivity against disturbances. The compliance of the impedance controller makes the robot safer for interactions with humans and the environment. We demonstrate the performance and robustness of the proposed approach with various experiments on our 140 kg HyQReal quadruped robot equipped with a 7-DoF manipulator arm. The experiments consider dynamic locomotion, tracking under external disturbances, and fast motions of the target object.
    摘要 我们提出了一个对象搜索、接近和抓取(SAG)控制管道,基于分离式机械臂链和弹簧控制,并 integrate了图像基于视服务(IBVS)。机械链的分离使得结束效器快速运动和恢复,从而实现了可靠的视服务。整个方法和管道可以普遍应用于任何移动平台(轮式或轨道车辆),但最适合动态移动四脚 manipulate器,因为它们对干扰的敏感。弹簧控制器的弹性使 robot更安全地与人类和环境进行交互。我们通过对我们7度自由度拥有的140公斤HyQReal四脚 robot的各种实验,证明了我们提出的方法的性能和可靠性。这些实验包括动态移动、外部干扰的追踪和目标物体的快速运动。

Rapid Deforestation and Burned Area Detection using Deep Multimodal Learning on Satellite Imagery

  • paper_url: http://arxiv.org/abs/2307.04916
  • repo_url: https://github.com/h2oai/cvpr-multiearth-deforestation-segmentation
  • paper_authors: Gabor Fodor, Marcos V. Conde
  • for: 这项研究的目的是提出一种基于多模态卫星影像和远程感知技术的方法,用于估计亚马逊盆地的森林破坏和野火检测。
  • methods: 该研究使用了深度学习方法和全面数据处理技术,并开发了一个新的准备过程,以提高森林破坏和野火检测的精度。
  • results: 该研究成功地实现了高精度的森林破坏估计和野火检测,并且在未见图像上也能够达到高精度。code、模型和数据集都是开源的:https://github.com/h2oai/cvpr-multiearth-deforestation-segmentation。
    Abstract Deforestation estimation and fire detection in the Amazon forest poses a significant challenge due to the vast size of the area and the limited accessibility. However, these are crucial problems that lead to severe environmental consequences, including climate change, global warming, and biodiversity loss. To effectively address this problem, multimodal satellite imagery and remote sensing offer a promising solution for estimating deforestation and detecting wildfire in the Amazonia region. This research paper introduces a new curated dataset and a deep learning-based approach to solve these problems using convolutional neural networks (CNNs) and comprehensive data processing techniques. Our dataset includes curated images and diverse channel bands from Sentinel, Landsat, VIIRS, and MODIS satellites. We design the dataset considering different spatial and temporal resolution requirements. Our method successfully achieves high-precision deforestation estimation and burned area detection on unseen images from the region. Our code, models and dataset are open source: https://github.com/h2oai/cvpr-multiearth-deforestation-segmentation
    摘要 亚马逊雨林的森林耗损和野火探测存在巨大的挑战,主要是因为这个地区的面积非常广阔,同时访问受限。但这些问题对环境造成严重的影响,包括气候变化、全球暖化和生物多样性损失。为了有效解决这个问题,多模态卫星影像和远程感知技术提供了一个有前途的解决方案。本研究论文介绍了一个新的准备过的数据集和深度学习基于 convolutional neural networks (CNNs) 和全面的数据处理技术来解决森林耗损和野火探测问题。我们的数据集包括准备过的图像和多种通道频谱的卫星数据,包括 Sentinel、Landsat、VIIRS 和 MODIS 卫星。我们设计了数据集,考虑了不同的空间和时间分辨率要求。我们的方法在未看到图像上实现了高精度的森林耗损和烧毁地带探测。我们的代码、模型和数据集都是开源的,可以在 GitHub 上找到:https://github.com/h2oai/cvpr-multiearth-deforestation-segmentation。

Planar Curve Registration using Bayesian Inversion

  • paper_url: http://arxiv.org/abs/2307.04909
  • repo_url: None
  • paper_authors: Andreas Bock, Colin J. Cotter, Robert C. Kirby
  • For: 该研究是关于固有参数化的关闭曲线匹配问题的 bayesian 反向问题。* Methods: 该研究使用了 diffeomorphism group 模型曲线的运动,并使用了 Wu-Xu 元素来解析 Hamilton’s 方程,以确定曲线的匹配。* Results: 研究使用了 ensemble Kalman inversion 方法,并采用了负 Sobolev нор偏差罚 penalty 来衡量目标和尘埃均值形状之间的差异。
    Abstract We study parameterisation-independent closed planar curve matching as a Bayesian inverse problem. The motion of the curve is modelled via a curve on the diffeomorphism group acting on the ambient space, leading to a large deformation diffeomorphic metric mapping (LDDMM) functional penalising the kinetic energy of the deformation. We solve Hamilton's equations for the curve matching problem using the Wu-Xu element [S. Wu, J. Xu, Nonconforming finite element spaces for $2m^\text{th}$ order partial differential equations on $\mathbb{R}^n$ simplicial grids when $m=n+1$, Mathematics of Computation 88 (316) (2019) 531-551] which provides mesh-independent Lipschitz constants for the forward motion of the curve, and solve the inverse problem for the momentum using Bayesian inversion. Since this element is not affine-equivalent we provide a pullback theory which expedites the implementation and efficiency of the forward map. We adopt ensemble Kalman inversion using a negative Sobolev norm mismatch penalty to measure the discrepancy between the target and the ensemble mean shape. We provide several numerical examples to validate the approach.
    摘要 我们研究Parameterisation-independent closed planar curve匹配作为 bayesian inverse problem。Curve的运动被模型为diffomorphism group acting on the ambient space中的一个curve,导致一个大尺度diffometric mapping(LDDMM)函数 penalty curvature的动能。我们使用Wu-Xu element [S. Wu, J. Xu, Nonconforming finite element spaces for $2m^\text{th}$ order partial differential equations on $\mathbb{R}^n$ simplicial grids when $m=n+1$, Mathematics of Computation 88 (316) (2019) 531-551]来解析 Hamilton's equations for the curve匹配问题,这个element提供了独立于网格的Lipschitz常数for the forward motion of the curve,并使用 bayesian inversion来解决反问题。由于这个element不是 affine-equivalent,我们提供了一个pullback theory来 expedite the implementation and efficiency of the forward map。我们采用ensemble Kalman inversion using a negative Sobolev norm mismatch penalty to measure the discrepancy between the target and the ensemble mean shape。我们提供了多个数字示例来验证方法。

Unsupervised Domain Adaptation with Deep Neural-Network

  • paper_url: http://arxiv.org/abs/2307.05601
  • repo_url: https://github.com/jetwev/domain-adaptation
  • paper_authors: Artem Bituitskii
  • for: 本研究为Unsupervised Domain Adaptation领域提供了一个分析现有方法、引入新方法,并在不同领域下进行视觉识别任务的改进。
  • methods: 本研究使用了现有的方法和新提出的方法来进行领域适应。
  • results: 本研究的结果表明,通过采用新的方法和技术,可以在不同领域下进行视觉识别任务的改进。
    Abstract This report contributes to the field of unsupervised domain adaptation by providing an analysis of existing methods, introducing a new approach, and demonstrating the potential for improving visual recognition tasks across different domains. The results of this study open up opportunities for further study and development of advanced methods in the field of domain adaptation.
    摘要 这份报告对不监督领域适应进行了分析,提出了一种新的方法,并证明了在不同频率上进行视觉识别任务的改进potential。这些研究结果为领域适应领域的进一步研究开创了新的机会。Here's a word-for-word translation:这份报告对不监督领域适应进行了分析,提出了一种新的方法,并证明了在不同频率上进行视觉识别任务的改进potential。这些研究结果为领域适应领域的进一步研究开创了新的机会。

Articulated 3D Head Avatar Generation using Text-to-Image Diffusion Models

  • paper_url: http://arxiv.org/abs/2307.04859
  • repo_url: None
  • paper_authors: Alexander W. Bergman, Wang Yifan, Gordon Wetzstein
  • for: 文章旨在提供一种基于文本描述的3D人物头像生成方法,以满足数字化人物创建、虚拟现实等领域的需求。
  • methods: 该方法基于预训练的2D文本到图像扩散模型,直接利用这些模型生成3D多视角准确的辐射场,以实现3D人物头像的生成。新的优化策略引入了 geometry和 texture 的约束,以保证生成的3D人物头像与文本描述保持一致。
  • results: 实验结果表明, compared to CLIP 等其他方法,我们的扩散基于的3D人物头像生成方法可以提供更高的多样性和准确性。
    Abstract The ability to generate diverse 3D articulated head avatars is vital to a plethora of applications, including augmented reality, cinematography, and education. Recent work on text-guided 3D object generation has shown great promise in addressing these needs. These methods directly leverage pre-trained 2D text-to-image diffusion models to generate 3D-multi-view-consistent radiance fields of generic objects. However, due to the lack of geometry and texture priors, these methods have limited control over the generated 3D objects, making it difficult to operate inside a specific domain, e.g., human heads. In this work, we develop a new approach to text-guided 3D head avatar generation to address this limitation. Our framework directly operates on the geometry and texture of an articulable 3D morphable model (3DMM) of a head, and introduces novel optimization procedures to update the geometry and texture while keeping the 2D and 3D facial features aligned. The result is a 3D head avatar that is consistent with the text description and can be readily articulated using the deformation model of the 3DMM. We show that our diffusion-based articulated head avatars outperform state-of-the-art approaches for this task. The latter are typically based on CLIP, which is known to provide limited diversity of generation and accuracy for 3D object generation.
    摘要 <>将文本导向的3D人物头生成能力是许多应用程序的关键,包括增强现实、电影摄影和教育。最近的文本导向3D对象生成研究已经展现出了很大的承诺,这些方法直接利用预训练的2D文本扩散模型来生成3D多视角准确的辐射场景。然而,由于缺乏geometry和Texture prior,这些方法控制3D对象的生成很难,特别是在人头部上。在这项工作中,我们开发了一种新的文本导向3D头人物生成方法,以解决这个限制。我们的框架直接操作3D人物模型(3DMM)的几何和Texture,并引入了新的优化过程来更新几何和Texture,同时保持2D和3D脸部特征的一致。结果是一个与文本描述一致的3D头人物,可以轻松地使用3DMM的扭formation来操作。我们示出了我们的扩散基于的拟合头人物比之前的方法更高。这些方法通常基于CLIP,CLIP知道提供有限的多样性和3D对象生成的准确性。

AmadeusGPT: a natural language interface for interactive animal behavioral analysis

  • paper_url: http://arxiv.org/abs/2307.04858
  • repo_url: https://github.com/adaptivemotorcontrollab/amadeusgpt
  • paper_authors: Shaokai Ye, Jessy Lauer, Mu Zhou, Alexander Mathis, Mackenzie W. Mathis
  • for: 这 paper 的目的是提供一种自然语言界面,使得动物行为分析可以轻松地转化为机器可读代码。
  • methods: 这 paper 使用了大型自然语言模型(LLM),如 GPT3.5 和 GPT4,以及一种新的双存储机制,以解决自然语言界面中的上下文窗口限制。
  • results: authors 通过 benchmarking 表明,AmadeusGPT 可以在 MABE 2022 行为挑战任务上达到状态码性表现。
    Abstract The process of quantifying and analyzing animal behavior involves translating the naturally occurring descriptive language of their actions into machine-readable code. Yet, codifying behavior analysis is often challenging without deep understanding of animal behavior and technical machine learning knowledge. To limit this gap, we introduce AmadeusGPT: a natural language interface that turns natural language descriptions of behaviors into machine-executable code. Large-language models (LLMs) such as GPT3.5 and GPT4 allow for interactive language-based queries that are potentially well suited for making interactive behavior analysis. However, the comprehension capability of these LLMs is limited by the context window size, which prevents it from remembering distant conversations. To overcome the context window limitation, we implement a novel dual-memory mechanism to allow communication between short-term and long-term memory using symbols as context pointers for retrieval and saving. Concretely, users directly use language-based definitions of behavior and our augmented GPT develops code based on the core AmadeusGPT API, which contains machine learning, computer vision, spatio-temporal reasoning, and visualization modules. Users then can interactively refine results, and seamlessly add new behavioral modules as needed. We benchmark AmadeusGPT and show we can produce state-of-the-art performance on the MABE 2022 behavior challenge tasks. Note, an end-user would not need to write any code to achieve this. Thus, collectively AmadeusGPT presents a novel way to merge deep biological knowledge, large-language models, and core computer vision modules into a more naturally intelligent system. Code and demos can be found at: https://github.com/AdaptiveMotorControlLab/AmadeusGPT.
    摘要 “量化和分析动物行为的过程通常需要将动物的自然occurring描述语言转化为可读取的机器代码。然而,编码行为分析 часто是困难的,因为需要深刻了解动物行为和技术机器学习知识。为限制这个差距,我们介绍AmadeusGPT:一个自然语言界面,可以将自然语言描述转化为机器可执行代码。大语言模型(LLM),如GPT3.5和GPT4,允许用户进行互动语言基本查询,这些查询可能适合进行互动行为分析。然而,这些LLM的理解能力受到上下文窗口大小的限制,因此无法记忆远程对话。为超越上下文窗口限制,我们实现了一种双重记忆机制,允许短期和长期记忆之间的交互通信,使用符号作为上下文指针进行检索和存储。”“用户可以直接使用自然语言来定义行为,而我们的扩展GPT将基于 AmadeusGPT API 中的机器学习、计算机视觉、空间-时间推理和可视化模块来开发代码。用户可以互动地纠正结果,并可以随时添加新的行为模块。我们对AmadeusGPT进行了 benchmarking,并证明我们可以在 MABE 2022 行为挑战任务上达到状态作均或更高的性能。注意,用户不需要写任何代码来实现这一点。因此,AmadeusGPT 总的来说是一种将深刻生物知识、大语言模型和核心计算机视觉模块集成在一起的更自然智能系统。代码和示例可以在 GitHub 上找到:https://github.com/AdaptiveMotorControlLab/AmadeusGPT。”

CREPE: Learnable Prompting With CLIP Improves Visual Relationship Prediction

  • paper_url: http://arxiv.org/abs/2307.04838
  • repo_url: https://github.com/llnl/crepe
  • paper_authors: Rakshith Subramanyam, T. S. Jayram, Rushil Anirudh, Jayaraman J. Thiagarajan
  • for: 这paper是为了探讨视觉语言模型(VLM),具体来说是CLIP,在预测视觉对象关系方面的潜力。
  • methods: 这paper使用的方法是UVTransE关系预测框架,该框架学习关系为一个翻译嵌入,包括主体、 объек和联合盒 embeddings。
  • results: 这paper使用CLIP表示法,与UVTransE框架结合,在Visual Genome benchmark上实现了预测 predicate 的state-of-the-art性能,具体来说是mR@5 27.79,mR@20 31.95,相比最近的状态态的表现提高了15.3%。
    Abstract In this paper, we explore the potential of Vision-Language Models (VLMs), specifically CLIP, in predicting visual object relationships, which involves interpreting visual features from images into language-based relations. Current state-of-the-art methods use complex graphical models that utilize language cues and visual features to address this challenge. We hypothesize that the strong language priors in CLIP embeddings can simplify these graphical models paving for a simpler approach. We adopt the UVTransE relation prediction framework, which learns the relation as a translational embedding with subject, object, and union box embeddings from a scene. We systematically explore the design of CLIP-based subject, object, and union-box representations within the UVTransE framework and propose CREPE (CLIP Representation Enhanced Predicate Estimation). CREPE utilizes text-based representations for all three bounding boxes and introduces a novel contrastive training strategy to automatically infer the text prompt for union-box. Our approach achieves state-of-the-art performance in predicate estimation, mR@5 27.79, and mR@20 31.95 on the Visual Genome benchmark, achieving a 15.3\% gain in performance over recent state-of-the-art at mR@20. This work demonstrates CLIP's effectiveness in object relation prediction and encourages further research on VLMs in this challenging domain.
    摘要 在这篇论文中,我们探索了视觉语言模型(VLM),尤其是CLIP,在预测视觉对象关系方面的潜力。现有的状态艺术方法使用复杂的图形模型,利用语言提示和视觉特征来解决这个挑战。我们假设CLIP的强语言偏好可以简化这些图形模型,为更简单的方法开辟道路。我们采用UVTransE关系预测框架,该框架学习关系为视图的翻译嵌入,从场景中获得主题、 объек、并 union 盒子嵌入。我们系统地探索CLIP基于主题、 объек、并 union 盒子表示的设计,并提出CREPE(CLIP Representation Enhanced Predicate Estimation)方法。CREPE利用文本基于表示 для所有三个 bounding 盒子,并 introduces 一种新的对比训练策略,自动生成 union 盒子的文本提示。我们的方法在Visual Genome benchmark上实现了 predicate estimation 的状态艺术性成绩,mR@5 27.79,mR@20 31.95,与最近的状态艺术提高15.3%。这项工作证明了CLIP在对象关系预测中的效iveness,并鼓励进一步的VLM在这一领域的研究。

Semantic-SAM: Segment and Recognize Anything at Any Granularity

  • paper_url: http://arxiv.org/abs/2307.04767
  • repo_url: https://github.com/ux-decoder/semantic-sam
  • paper_authors: Feng Li, Hao Zhang, Peize Sun, Xueyan Zou, Shilong Liu, Jianwei Yang, Chunyuan Li, Lei Zhang, Jianfeng Gao
  • for: 本研究旨在开发一种通用的图像分割模型,以实现任何粒度的分割和识别。
  • methods: 我们的模型具有两个优势:具备semantic-awareness和granularity-abundance。我们通过将多个数据集集成到三个粒度上,并引入分离类别的object和part分类来实现semantic-awareness。为实现多粒度能力,我们提出了多选学习方案,使得每个鼠标单击可以生成多个粒度的mask,与多个真实粒度的mask相匹配。
  • results: 实验结果和视觉化表明,我们的模型成功实现了semantic-awareness和granularity-abundance。此外,将SA-1B训练与其他分割任务相结合,如�anoptic分割和part分割,会提高性能。我们将提供代码和demo,以便进一步的探索和评估。
    Abstract In this paper, we introduce Semantic-SAM, a universal image segmentation model to enable segment and recognize anything at any desired granularity. Our model offers two key advantages: semantic-awareness and granularity-abundance. To achieve semantic-awareness, we consolidate multiple datasets across three granularities and introduce decoupled classification for objects and parts. This allows our model to capture rich semantic information. For the multi-granularity capability, we propose a multi-choice learning scheme during training, enabling each click to generate masks at multiple levels that correspond to multiple ground-truth masks. Notably, this work represents the first attempt to jointly train a model on SA-1B, generic, and part segmentation datasets. Experimental results and visualizations demonstrate that our model successfully achieves semantic-awareness and granularity-abundance. Furthermore, combining SA-1B training with other segmentation tasks, such as panoptic and part segmentation, leads to performance improvements. We will provide code and a demo for further exploration and evaluation.
    摘要 在本文中,我们介绍Semantic-SAM模型,一种通用的图像分割模型,可以在任何希望的粒度上进行分割和识别。我们的模型具有两个优势:具有 semantic-awareness 和 granularity-abundance。为实现 semantic-awareness,我们将多个数据集合并,并引入分离类别对象和部件。这使得我们的模型能够捕捉丰富的Semantic信息。为实现多粒度能力,我们提议在训练时使用多选学习方案,使每个鼠标单击可以生成多个粒度的mask,与多个真实粒度的mask相匹配。值得注意的是,这是首次将SA-1B、通用和部分分割数据集合并训练模型。实验结果和视觉化显示,我们的模型成功实现 semantic-awareness 和 granularity-abundance。此外,将SA-1B训练与其他分割任务,如�ANN和部分分割,结合可以提高性能。我们将提供代码和示例,以便进一步探索和评估。

Learning Spatial Features from Audio-Visual Correspondence in Egocentric Videos

  • paper_url: http://arxiv.org/abs/2307.04760
  • repo_url: None
  • paper_authors: Sagnik Majumder, Ziad Al-Halah, Kristen Grauman
  • for: 学习基于 Egocentric 视频中的空间声视关系的表示,提高 Egocentric 视频中的社交场景中的空间理解。
  • methods: 使用masked auto-encoding框架,通过声视同步学习,学习声视之间的有用关系。
  • results: 通过大量实验,表明我们的特征可以超越多个状态的准确基eline,在两个公共的 Egocentric 视频数据集上提高active speaker detection和spatial audio denoising等两个视频任务的性能。Here’s the breakdown of each line:
  • for: The paper is written for learning representations based on spatial audio-visual correspondences in egocentric videos, with the goal of improving spatial understanding in social scenarios.
  • methods: The method used is a masked auto-encoding framework, which leverages the synergy of audio and vision to learn useful spatial relationships between the two modalities.
  • results: The features learned by the method are generic enough to improve over multiple state-of-the-art baselines on two public challenging egocentric video datasets, EgoCom and EasyCom.
    Abstract We propose a self-supervised method for learning representations based on spatial audio-visual correspondences in egocentric videos. In particular, our method leverages a masked auto-encoding framework to synthesize masked binaural audio through the synergy of audio and vision, thereby learning useful spatial relationships between the two modalities. We use our pretrained features to tackle two downstream video tasks requiring spatial understanding in social scenarios: active speaker detection and spatial audio denoising. We show through extensive experiments that our features are generic enough to improve over multiple state-of-the-art baselines on two public challenging egocentric video datasets, EgoCom and EasyCom. Project: http://vision.cs.utexas.edu/projects/ego_av_corr.
    摘要 我们提出一种自主学习的方法,通过 egocentric 视频中的空间声音关系学习表示。具体来说,我们利用伪装自编码框架,通过声音和视频之间的共同作用,生成伪装声音,从而学习声音和视频之间的有用空间关系。我们使用我们预训练的特征来解决两个需要社交场景中的空间理解的视频任务:活跃speaker检测和空间声音净化。我们通过广泛的实验表明,我们的特征可以超越多个州态艺法基elines在两个公共的 egocentric 视频数据集上,EgoCom和EasyCom。项目:http://vision.cs.utexas.edu/projects/ego_av_corr。

Shelving, Stacking, Hanging: Relational Pose Diffusion for Multi-modal Rearrangement

  • paper_url: http://arxiv.org/abs/2307.04751
  • repo_url: None
  • paper_authors: Anthony Simeonov, Ankit Goyal, Lucas Manuelli, Lin Yen-Chen, Alina Sarmiento, Alberto Rodriguez, Pulkit Agrawal, Dieter Fox
    for: 这个系统是用于重新排序场景中的物体,以实现想要的物体-场景置换关系,如书插入开放架上的槽中。methods: 该系统使用3D点云数据进行训练,并且可以普适到新的几何结构、姿态和布局。它使用迭代姿态噪声训练过程,可以处理多种多样的示例数据,并且可以生成多种多样的输出。results: 该系统可以在三个不同的重新排序任务中进行处理,包括处理多样性和总体结构的损害。它可以提供精度和准确性,并且可以conditioning on相关的地方几何特征,而忽略不相关的全局结构。
    Abstract We propose a system for rearranging objects in a scene to achieve a desired object-scene placing relationship, such as a book inserted in an open slot of a bookshelf. The pipeline generalizes to novel geometries, poses, and layouts of both scenes and objects, and is trained from demonstrations to operate directly on 3D point clouds. Our system overcomes challenges associated with the existence of many geometrically-similar rearrangement solutions for a given scene. By leveraging an iterative pose de-noising training procedure, we can fit multi-modal demonstration data and produce multi-modal outputs while remaining precise and accurate. We also show the advantages of conditioning on relevant local geometric features while ignoring irrelevant global structure that harms both generalization and precision. We demonstrate our approach on three distinct rearrangement tasks that require handling multi-modality and generalization over object shape and pose in both simulation and the real world. Project website, code, and videos: https://anthonysimeonov.github.io/rpdiff-multi-modal/
    摘要 我们提出一个系统,用于重新排序场景中的物体,以实现想要的物体-场景占位关系,如一本书被插入打开的书架中的一个开口中。我们的管道可以涵盖新的几何结构、姿态和布局,并从示例学习而来操作直接3D点云。我们的系统可以解决场景中存在许多几何相似的重新排序解的挑战。通过利用循环pose的净化训练过程,我们可以适应多模态示例数据,并产生多模态输出,同时保持精度和准确。我们还表明了在conditioning on relevante的本地几何特征,而忽略不相关的全局结构,可以提高generalization和精度。我们在三个不同的重新排序任务中展示了我们的方法,这些任务需要处理多模态和对物体形状和姿态的泛化。项目网站、代码和视频:https://anthonysimeonov.github.io/rpdiff-multi-modal/

AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning

  • paper_url: http://arxiv.org/abs/2307.04725
  • repo_url: https://github.com/guoyww/animatediff
  • paper_authors: Yuwei Guo, Ceyuan Yang, Anyi Rao, Yaohui Wang, Yu Qiao, Dahua Lin, Bo Dai
  • for: 这个论文的目的是提出一种实用的框架,以便使得大多数现有的个性化文本到图模型都可以一键animate。
  • methods: 该框架的核心是插入一个新初始化的动态模型模块,并在视频剪辑中培养它以抽取合理的动态约束。
  • results: 在多个公共代表性的个性化文本到图模型上进行评估,研究发现该框架可以使这些模型生成smooth的动画图片,同时保持它们的领域和多样性。I hope that helps! Let me know if you have any other questions.
    Abstract With the advance of text-to-image models (e.g., Stable Diffusion) and corresponding personalization techniques such as DreamBooth and LoRA, everyone can manifest their imagination into high-quality images at an affordable cost. Subsequently, there is a great demand for image animation techniques to further combine generated static images with motion dynamics. In this report, we propose a practical framework to animate most of the existing personalized text-to-image models once and for all, saving efforts in model-specific tuning. At the core of the proposed framework is to insert a newly initialized motion modeling module into the frozen text-to-image model and train it on video clips to distill reasonable motion priors. Once trained, by simply injecting this motion modeling module, all personalized versions derived from the same base T2I readily become text-driven models that produce diverse and personalized animated images. We conduct our evaluation on several public representative personalized text-to-image models across anime pictures and realistic photographs, and demonstrate that our proposed framework helps these models generate temporally smooth animation clips while preserving the domain and diversity of their outputs. Code and pre-trained weights will be publicly available at https://animatediff.github.io/ .
    摘要 《文本到图像模型(如稳定扩散)和相关个性化技术(如梦幻箱和LoRA)的进步,使得任何人都可以将想象力转化成高质量图像,而且很有经济效益。然而,随着图像动画技术的发展,需要一个实用的框架来animateexististing的个性化文本到图像模型,以免需要额外的模型特定调整。我们提出了一种实用的框架,核心思想是插入一个新初始化的动力模型模块到冻结的文本到图像模型中,并在视频剪辑中训练以抽取合理的动力约束。一旦训练完成,只需将这个动力模型模块插入到个性化文本到图像模型中,即使不是原始模型的特定版本,也可以使得这些模型生成个性化的动画图像。我们对多个公共代表性的个性化文本到图像模型进行了评估,包括漫画图像和真实 fotografías,并证明了我们的提议框架可以帮助这些模型生成 temporally smooth的动画clip,保持域和多样性的输出。代码和预训练参数将公开在https://animatediff.github.io/。》

CVPR MultiEarth 2023 Deforestation Estimation Challenge:SpaceVision4Amazon

  • paper_url: http://arxiv.org/abs/2307.04715
  • repo_url: None
  • paper_authors: Sunita Arya, S Manthira Moorthi, Debajyoti Dhar
  • for: 本研究开发了一种基于注意力引导UNet架构的森林砍伐估计方法,使用电子光(EO)和Synthetic Aperture Radar(SAR)卫星图像。
  • methods: 本研究使用了Landstat-8和Sentinel-1侦测器的EO和SAR图像进行训练和验证,由于没有时空对准的数据,因此个别模型被训练了每个感应器。
  • results: 在训练时,Landstat-8模型 achieved training和验证像素精度为93.45%,Sentinel-2模型 achieved 83.87%像素精度。在测试集评估中,模型 achieved像素精度为84.70%,F1-Score为0.79,IoU为0.69。
    Abstract In this paper, we present a deforestation estimation method based on attention guided UNet architecture using Electro-Optical (EO) and Synthetic Aperture Radar (SAR) satellite imagery. For optical images, Landsat-8 and for SAR imagery, Sentinel-1 data have been used to train and validate the proposed model. Due to the unavailability of temporally and spatially collocated data, individual model has been trained for each sensor. During training time Landsat-8 model achieved training and validation pixel accuracy of 93.45% and Sentinel-2 model achieved 83.87% pixel accuracy. During the test set evaluation, the model achieved pixel accuracy of 84.70% with F1-Score of 0.79 and IoU of 0.69.
    摘要 在本文中,我们提出了一种基于注意力导向UNet架构的检测方法,用于估计采用电Optical(EO)和Synthetic Aperture Radar(SAR)卫星图像中的森林析出。对于光学图像,我们使用了Landstat-8数据进行训练和验证;对于SAR图像,我们使用了Sentinel-1数据。由于数据的时空均不可用,我们需要对每枚摄像机进行单独的模型训练。在训练时,Landstat-8模型在训练和验证像素精度上达到93.45%,而Sentinel-2模型在训练和验证像素精度上达到83.87%。在测试集评估中,模型达到了像素精度84.70%,F1得分0.79和 IoU0.69。

FreeDrag: Point Tracking is Not What You Need for Interactive Point-based Image Editing

  • paper_url: http://arxiv.org/abs/2307.04684
  • repo_url: https://github.com/lpengyang/freedrag
  • paper_authors: Pengyang Ling, Lin Chen, Pan Zhang, Huaian Chen, Yi Jin
  • for: 提高图像编辑精度和灵活性,解决DragGANmiss tracking和ambiguous tracking问题。
  • methods: 采用特征导向方法,具有适应模板特征、线搜索和柔化本地化技术,实现稳定和高效的点基于图像编辑。
  • results: 比DragGAN更高效和稳定,在复杂的图像编辑 scenarios中实现精准的点基于编辑。
    Abstract To serve the intricate and varied demands of image editing, precise and flexible manipulation of image content is indispensable. Recently, DragGAN has achieved impressive editing results through point-based manipulation. However, we have observed that DragGAN struggles with miss tracking, where DragGAN encounters difficulty in effectively tracking the desired handle points, and ambiguous tracking, where the tracked points are situated within other regions that bear resemblance to the handle points. To deal with the above issues, we propose FreeDrag, which adopts a feature-oriented approach to free the burden on point tracking within the point-oriented methodology of DragGAN. The FreeDrag incorporates adaptive template features, line search, and fuzzy localization techniques to perform stable and efficient point-based image editing. Extensive experiments demonstrate that our method is superior to the DragGAN and enables stable point-based editing in challenging scenarios with similar structures, fine details, or under multi-point targets.
    摘要 为了满足图像修改的细致和多样化需求,图像内容的精准和灵活修改是不可或缺的。近期,DragGAN已经实现了印象深刻的编辑结果通过点基修改。然而,我们发现DragGAN在跟踪把钩点时存在着困难和混淆跟踪问题,其中跟踪点可能会被其他相似区域所吸引。为了解决这些问题,我们提出了FreeDrag,它采用了特征 ориентирован的方法来减轻DragGAN中点跟踪的负担。FreeDragintegrates了适应模板特征、线搜索和朴素地理化技术,以实现稳定和高效的点基图像修改。广泛的实验表明,我们的方法比DragGAN更加稳定和可靠,并且在复杂的场景下,包括类似结构、细节和多个目标下,也能够实现稳定的点基修改。

cs.AI - 2023-07-11

Handwritten Text Recognition Using Convolutional Neural Network

  • paper_url: http://arxiv.org/abs/2307.05396
  • repo_url: https://github.com/sushant097/Handwritten-Line-Text-Recognition-using-Deep-Learning-with-Tensorflow
  • paper_authors: Atman Mishra, A. Sharath Ram, Kavyashree C
  • for: 这篇论文的目的是提出一种基于Convolutional Neural Network(CNN)模型的智能字符识别系统,以便将手写或印刷的字符转换为ASCII文本。
  • methods: 该论文使用了NIST数据集中的超过100,000张图像进行训练,并通过提取图像中的特征来生成每个图像的概率分类结果。
  • results: 该论文在NIST数据集上达到了90.54%的准确率,并且loss为2.53%。
    Abstract OCR (Optical Character Recognition) is a technology that offers comprehensive alphanumeric recognition of handwritten and printed characters at electronic speed by merely scanning the document. Recently, the understanding of visual data has been termed Intelligent Character Recognition (ICR). Intelligent Character Recognition (ICR) is the OCR module that can convert scans of handwritten or printed characters into ASCII text. ASCII data is the standard format for data encoding in electronic communication. ASCII assigns standard numeric values to letters, numeral, symbols, white-spaces and other characters. In more technical terms, OCR is the process of using an electronic device to transform 2-Dimensional textual information into machine-encoded text. Anything that contains text both machine written or handwritten can be scanned either through a scanner or just simply a picture of the text is enough for the recognition system to distinguish the text. The goal of this papers is to show the results of a Convolutional Neural Network model which has been trained on National Institute of Science and Technology (NIST) dataset containing over a 100,000 images. The network learns from the features extracted from the images and use it to generate the probability of each class to which the picture belongs to. We have achieved an accuracy of 90.54% with a loss of 2.53%.
    摘要 OCR(光学字符识别)技术可以快速将手写和印刷字符转换为电子文档,只需扫描文档即可。近些年来,人们开始称这种技术为智能字符识别(ICR)。ICR模块可以将扫描到的手写或印刷字符转换为ASCII文本。ASCII数据是电子通信中的标准编码格式,它将字母、数字、符号、空格和其他字符赋予标准的数字值。在更技术性的说法来,OCR是将二维文本信息转换为机器编码文本的过程。任何包含文本的东西,无论是机器写的还是手写的,都可以通过扫描或直接拍照来让识别系统识别文本。本文的目标是通过使用一个 convolutional neural network 模型,对国家标准技术研究所(NIST)数据集中的 более than 100,000 张图片进行训练,以便从图片中提取特征,并使用这些特征来计算每个图片的概率属于哪一类。我们已经实现了90.54%的准确率,损失为2.53%。

Combating Data Imbalances in Federated Semi-supervised Learning with Dual Regulators

  • paper_url: http://arxiv.org/abs/2307.05358
  • repo_url: None
  • paper_authors: Sikai Bai, Shuaicheng Li, Weiming Zhuang, Jie Zhang, Song Guo, Kunlin Yang, Jun Hou, Shuai Zhang, Junyu Gao, Shuai Yi
  • for: 这个研究是为了解决 Federated semi-supervised learning (FSSL) 中 Data Distribution 不均匀的问题,包括在 client 端的 Label scarce 问题。
  • methods: 这个研究提出了一个名为 FedDure 的新的 FSSL 框架,使用了两种 Regulator:Coarse-grained Regulator (C-reg) 和 Fine-grained Regulator (F-reg),实现了在 client 端执行模型训练的 Bi-level 优化。
  • results: 这个研究 empirically 显示了 FedDure 在多种设定下表现出色,特别是在 CIFAR-10 和 CINIC-10 数据集上表现了超过 11% 的提升。
    Abstract Federated learning has become a popular method to learn from decentralized heterogeneous data. Federated semi-supervised learning (FSSL) emerges to train models from a small fraction of labeled data due to label scarcity on decentralized clients. Existing FSSL methods assume independent and identically distributed (IID) labeled data across clients and consistent class distribution between labeled and unlabeled data within a client. This work studies a more practical and challenging scenario of FSSL, where data distribution is different not only across clients but also within a client between labeled and unlabeled data. To address this challenge, we propose a novel FSSL framework with dual regulators, FedDure.} FedDure lifts the previous assumption with a coarse-grained regulator (C-reg) and a fine-grained regulator (F-reg): C-reg regularizes the updating of the local model by tracking the learning effect on labeled data distribution; F-reg learns an adaptive weighting scheme tailored for unlabeled instances in each client. We further formulate the client model training as bi-level optimization that adaptively optimizes the model in the client with two regulators. Theoretically, we show the convergence guarantee of the dual regulators. Empirically, we demonstrate that FedDure is superior to the existing methods across a wide range of settings, notably by more than 11% on CIFAR-10 and CINIC-10 datasets.
    摘要 federated 学习已成为学习 Decentralized 多样数据的流行方法。 federated 半监督学习(FSSL)emerges 用一小 Fraction of labeled data 来训练模型,因为 labels 的稀缺在 Decentralized 客户端。 existing FSSL 方法假设客户端上的标签数据是独立并且相同分布的(IID),并且在客户端内部的标签和无标签数据之间具有一致的类别分布。这项工作研究了更实际和挑战性的 FSSL 场景, где数据分布不仅在客户端之间不同,而且在客户端内部的标签和无标签数据之间也不同。为解决这个挑战,我们提议了一种新的 FSSL 框架,即 FedDure。FedDure 继承了先前的假设,并使用粗粒度调节器(C-reg)和细粒度调节器(F-reg)来规范本地模型的更新。C-reg 跟踪本地模型的学习效果对标签数据分布的影响,而 F-reg 学习了适应每个客户端的适应权重方案。我们还将客户端模型训练定义为两级优化问题,以适应两个调节器。理论上,我们证明了调节器的收敛保证。实际上,我们证明了 FedDure 在各种设置下的优越性,特别是在 CIFAR-10 和 CINIC-10 数据集上,提高了 более чем 11%。

ProgGP: From GuitarPro Tablature Neural Generation To Progressive Metal Production

  • paper_url: http://arxiv.org/abs/2307.05328
  • repo_url: None
  • paper_authors: Jackson Loth, Pedro Sarmento, CJ Carr, Zack Zukowski, Mathieu Barthet
  • for: 这个论文的目的是创建一种能够生成进步金属音乐的人工智能工具,通过人机合作来创作乐曲。
  • methods: 这个论文使用了一个已经预训练的 Transformer 模型,并在 ProgGP 数据集上进行了微调,以便使用 GuitarPro 格式的符号化表示来生成多个吉他、贝司、鼓、钢琴和管弦乐部分。
  • results: 研究人员使用了一种混合方法,结合计算音乐学和实践研究两种方法,来评估生成的乐曲的有效性。最终,他们使用了这种模型来创作一首完整的进步金属歌曲,并由人类重金属制作人在 AI 生成的音乐基础上进行了完整的制作和混音。
    Abstract Recent work in the field of symbolic music generation has shown value in using a tokenization based on the GuitarPro format, a symbolic representation supporting guitar expressive attributes, as an input and output representation. We extend this work by fine-tuning a pre-trained Transformer model on ProgGP, a custom dataset of 173 progressive metal songs, for the purposes of creating compositions from that genre through a human-AI partnership. Our model is able to generate multiple guitar, bass guitar, drums, piano and orchestral parts. We examine the validity of the generated music using a mixed methods approach by combining quantitative analyses following a computational musicology paradigm and qualitative analyses following a practice-based research paradigm. Finally, we demonstrate the value of the model by using it as a tool to create a progressive metal song, fully produced and mixed by a human metal producer based on AI-generated music.
    摘要 近期在 симвоlic music generation 领域的工作表明了使用 GuitarPro 格式为输入和输出表示的 токен化基于的可能性,这种表示支持吉他表达特性。我们将此工作推广到了一个 Progressive metal 类型的自定义数据集(ProgGP)上,并使用预训练的 Transformer 模型进行微调,以创建这种类型的作品。我们的模型可以生成多个吉他、低音吉他、鼓、钢琴和管弦部分。我们通过混合计算音乐学和实践研究两种方法来评估生成的音乐的有效性。最后,我们使用这种模型创造了一首完整的进步金属歌曲,由人工制作和混音。

Automatic Generation of Semantic Parts for Face Image Synthesis

  • paper_url: http://arxiv.org/abs/2307.05317
  • repo_url: https://github.com/TFonta/Semantic-VAE
  • paper_authors: Tomaso Fontanini, Claudio Ferrari, Massimo Bertozzi, Andrea Prati
  • for: 本研究旨在生成具有 semantic segmentation 标签的实际图像,并可以自动控制图像的形状和文化。
  • methods: 我们提出了一个网络架构,可以将 semantic segmentation 图像中的物类分类 embedding 独立地编识。然后,我们使用 bi-directional LSTM 层和梯度减少层,将新的、地方化修改的图像输出出来。
  • results: 我们在 CelebMask-HQ 数据集上进行了量化和质感评估,结果显示我们的模型可以将 segmentation 图像 faithfully 重建,并且可以自动修改图像的形状和文化。此外,我们还证明了我们的模型可以与 semantic image synthesis generator 组合使用,实现对图像的完全自动生成控制。
    Abstract Semantic image synthesis (SIS) refers to the problem of generating realistic imagery given a semantic segmentation mask that defines the spatial layout of object classes. Most of the approaches in the literature, other than the quality of the generated images, put effort in finding solutions to increase the generation diversity in terms of style i.e. texture. However, they all neglect a different feature, which is the possibility of manipulating the layout provided by the mask. Currently, the only way to do so is manually by means of graphical users interfaces. In this paper, we describe a network architecture to address the problem of automatically manipulating or generating the shape of object classes in semantic segmentation masks, with specific focus on human faces. Our proposed model allows embedding the mask class-wise into a latent space where each class embedding can be independently edited. Then, a bi-directional LSTM block and a convolutional decoder output a new, locally manipulated mask. We report quantitative and qualitative results on the CelebMask-HQ dataset, which show our model can both faithfully reconstruct and modify a segmentation mask at the class level. Also, we show our model can be put before a SIS generator, opening the way to a fully automatic generation control of both shape and texture. Code available at https://github.com/TFonta/Semantic-VAE.
    摘要 Semantic Image Synthesis (SIS) 指的是根据 semantic segmentation mask 生成真实的图像,其中 semantic segmentation mask 定义图像中对象类的空间布局。大多数文献中的方法,除了生成图像质量外,都努力增加生成图像的样式多样性,即文本ure。然而,它们都忽略了一个不同的特点,即使用 semantic segmentation mask 中提供的布局可以被修改。目前,只有通过图形用户界面进行手动修改。在这篇文章中,我们描述了一种网络架构,用于自动修改或生成 semantic segmentation mask 中对象类的形状,具体是人脸。我们的提议的模型可以将 mask 类别 embedding embedding 在一个隐藏空间中,每个类别 embedding 可以独立地编辑。然后,一个 bidirectional LSTM 块和一个 convolutional decoder 输出一个新的、本地修改过的 mask。我们在 CelebMask-HQ 数据集上进行了评量和质量测试,结果表明我们的模型可以同时准确地重建和修改 segmentation mask。此外,我们的模型可以在 SIS 生成过程中使用,打开了完全自动控制图像的形状和Texture的可能性。代码可以在 https://github.com/TFonta/Semantic-VAE 上找到。

Masked Vision and Language Pre-training with Unimodal and Multimodal Contrastive Losses for Medical Visual Question Answering

  • paper_url: http://arxiv.org/abs/2307.05314
  • repo_url: https://github.com/pengfeiliheu/mumc
  • paper_authors: Pengfei Li, Gang Liu, Jinlong He, Zixu Zhao, Shenjun Zhong
  • for: 这 paper 的目的是提出一种新的自动化医学图像问答系统,以便回答基于医学图像的临床问题。
  • methods: 该 paper 使用了一种新的自我supervised方法,通过利用医学图像描述集和文本描述集来学习输入图像和文本的单模态和多模态特征表示,并使用了掩码语言模型和图像文本匹配作为预训练目标。
  • results: 该 paper 的实验结果表明,使用该自我supervised方法可以在三个公开available的医学图像问答数据集上实现状态机器的表现,并且与之前的最佳性能相比,具有2.2%、14.7%和1.7%的准确率提升。
    Abstract Medical visual question answering (VQA) is a challenging task that requires answering clinical questions of a given medical image, by taking consider of both visual and language information. However, due to the small scale of training data for medical VQA, pre-training fine-tuning paradigms have been a commonly used solution to improve model generalization performance. In this paper, we present a novel self-supervised approach that learns unimodal and multimodal feature representations of input images and text using medical image caption datasets, by leveraging both unimodal and multimodal contrastive losses, along with masked language modeling and image text matching as pretraining objectives. The pre-trained model is then transferred to downstream medical VQA tasks. The proposed approach achieves state-of-the-art (SOTA) performance on three publicly available medical VQA datasets with significant accuracy improvements of 2.2%, 14.7%, and 1.7% respectively. Besides, we conduct a comprehensive analysis to validate the effectiveness of different components of the approach and study different pre-training settings. Our codes and models are available at https://github.com/pengfeiliHEU/MUMC.
    摘要 医疗视觉问答(VQA)是一项复杂的任务,需要根据医疗图像回答医疗问题,同时考虑视觉和语言信息。然而,due to the small scale of training data for medical VQA, pre-training fine-tuning paradigms have been a commonly used solution to improve model generalization performance.在这篇论文中,我们提出了一种新的自主学习方法,通过医疗图像caption dataset来学习输入图像和文本的单modal和多modal特征表示,并使用单modal和多modal对比损失、遮盖语言模型和图像文本匹配作为预训练目标。预训练模型然后被转移到下游医疗VQA任务中。我们的方法实现了三个公共可用的医疗VQA数据集上的state-of-the-art(SOTA)性能,并且在不同的预训练设置下进行了广泛的分析和研究。我们的代码和模型可以在https://github.com/pengfeiliHEU/MUMC上获取。

Unleashing Cognitive Synergy in Large Language Models: A Task-Solving Agent through Multi-Persona Self-Collaboration

  • paper_url: http://arxiv.org/abs/2307.05300
  • repo_url: https://github.com/mikewangwzhl/solo-performance-prompting
  • paper_authors: Zhenhailong Wang, Shaoguang Mao, Wenshan Wu, Tao Ge, Furu Wei, Heng Ji
    for: 这个论文旨在使大语言模型(LLM)成为聪明的同工合作者,通过自我合作和多个人格的交互来提高问题解决和总体表现。methods: 该论文提出了 Solo Performance Prompting(SPP)技术,通过在不同的任务输入下动态地标识和模拟多个人格,使得 LLM 可以充分发挥多元智能的优势。results: 在三个复杂任务中(知识填充创作、合作猜谜和逻辑网格问题),SPP 技术能够更好地解决问题,提高 LLM 的问题解决和总体表现,并且能够避免过度推理和幻想现象。
    Abstract Human intelligence thrives on the concept of cognitive synergy, where collaboration and information integration among different cognitive processes yield superior outcomes compared to individual cognitive processes in isolation. Although Large Language Models (LLMs) have demonstrated promising performance as general task-solving agents, they still struggle with tasks that require intensive domain knowledge and complex reasoning. In this work, we propose Solo Performance Prompting (SPP), which transforms a single LLM into a cognitive synergist by engaging in multi-turn self-collaboration with multiple personas. A cognitive synergist refers to an intelligent agent that collaborates with multiple minds, combining their individual strengths and knowledge, to enhance problem-solving and overall performance in complex tasks. By dynamically identifying and simulating different personas based on task inputs, SPP unleashes the potential of cognitive synergy in LLMs. We have discovered that assigning multiple, fine-grained personas in LLMs elicits better problem-solving abilities compared to using a single or fixed number of personas. We evaluate SPP on three challenging tasks: Trivia Creative Writing, Codenames Collaborative, and Logic Grid Puzzle, encompassing both knowledge-intensive and reasoning-intensive types. Unlike previous works, such as Chain-of-Thought, that solely enhance the reasoning abilities in LLMs, SPP effectively elicits internal knowledge acquisition abilities, reduces hallucination, and maintains strong reasoning capabilities. Code, data, and prompts can be found at: https://github.com/MikeWangWZHL/Solo-Performance-Prompting.git.
    摘要 人类智能强制依赖于认知协同作用,即不同认知过程之间的协作和信息集成,以实现更高水平的成果。虽然大型自然语言模型(LLM)已经表现出了普通任务解决能力的承诺,但它们在需要深厚领域知识和复杂逻辑的任务时仍然受到挑战。在这项工作中,我们提出了 Solo Performance Prompting(SPP),它将单个 LLM 转变成一个认知协同者,通过与多个人格进行多回合自我合作来提高问题解决和总体性能。一个认知协同者是一个智能代理,它与多个智能合作,汇集它们的个人优势和知识,以提高复杂任务的问题解决和总体性能。通过动态确定和模拟不同人格基于任务输入,SPP 解放了 LLM 中认知协同的潜力。我们发现,对 LLM 分配多个细化的人格可以提高问题解决能力,比使用单一或固定数量的人格更好。我们在三个挑战任务上评估了 SPP:知识型创作、codename合作和逻辑网格问题,这些任务包括知识丰富和逻辑推理两类。不同于前一些作品,如链条思维,SPP 不仅提高 LLM 的逻辑能力,还能够诱发内部知识获得能力,减少幻想,保持强大的逻辑能力。代码、数据和提示可以在:https://github.com/MikeWangWZHL/Solo-Performance-Prompting.git 找到。

RoPDA: Robust Prompt-based Data Augmentation for Low-Resource Named Entity Recognition

  • paper_url: http://arxiv.org/abs/2307.07417
  • repo_url: None
  • paper_authors: Sihan Song, Furao Shen, Jian Zhao
  • for: 提高low-resource NER任务中的数据稀少问题,提出了Robust Prompt-based Data Augmentation(RoPDA)方法。
  • methods: RoPDA使用预训练语言模型(PLM)的连续提示,通过五种基本的数据增强操作进行实体增强和上下文增强,生成标签颠倒和标签保持的例子。
  • results: 对三个不同领域的benchmark进行了广泛的实验,证明了RoPDA可以明显超越强基eline,同时也可以超越状态之前的半supervised学习方法当有无标注数据。
    Abstract Data augmentation has been widely used in low-resource NER tasks to tackle the problem of data sparsity. However, previous data augmentation methods have the disadvantages of disrupted syntactic structures, token-label mismatch, and requirement for external knowledge or manual effort. To address these issues, we propose Robust Prompt-based Data Augmentation (RoPDA) for low-resource NER. Based on pre-trained language models (PLMs) with continuous prompt, RoPDA performs entity augmentation and context augmentation through five fundamental augmentation operations to generate label-flipping and label-preserving examples. To optimize the utilization of the augmented samples, we present two techniques: Self-Consistency Filtering and mixup. The former effectively eliminates low-quality samples, while the latter prevents performance degradation arising from the direct utilization of label-flipping samples. Extensive experiments on three benchmarks from different domains demonstrate that RoPDA significantly improves upon strong baselines, and also outperforms state-of-the-art semi-supervised learning methods when unlabeled data is included.
    摘要 <>转换文本到简化中文。<>数据扩充已广泛应用于低资源NER任务中,以解决数据稀缺问题。然而,前一代数据扩充方法存在破坏语法结构、token-标签匹配问题以及需要外部知识或手动努力等缺点。为了解决这些问题,我们提出了Robust Prompt-based Data Augmentation(RoPDA) для低资源NER。基于预训练语言模型(PLM)与连续提示,RoPDA通过五种基本扩充操作进行实体扩充和上下文扩充,生成标签颠倒和标签保持的例子。为了优化扩充样本的利用,我们提出了两种技术:自我一致筛选和mixup。前者有效地消除低质量样本,而后者防止因直接使用标签颠倒样本而导致性能下降。我们在三个不同领域的benchmark上进行了广泛的实验,结果显示,RoPDA可以明显超越强基线,同时也在包含未标注数据时超越状态之前的最佳半supervised学习方法。

On the Need for a Language Describing Distribution Shifts: Illustrations on Tabular Datasets

  • paper_url: http://arxiv.org/abs/2307.05284
  • repo_url: https://github.com/namkoong-lab/whyshift
  • paper_authors: Jiashuo Liu, Tianyu Wang, Peng Cui, Hongseok Namkoong
  • For: The paper aims to address the issue of distribution shifts in tabular data and their impact on machine learning models’ performance.* Methods: The paper uses a thorough investigation of natural shifts in 5 tabular datasets over 86,000 model configurations to identify the most prevalent type of distribution shift, $Y|X$-shifts. The authors also build an empirical testbed called WhyShift to characterize the type of shift they benchmark performance over.* Results: The paper finds that $Y|X$-shifts are most prevalent in tabular settings and identifies covariate regions that suffer the biggest $Y|X$-shifts. The authors discuss the implications for algorithmic and data-based interventions and highlight the importance of future research to build an understanding of how distributions differ.Here’s the Chinese version of the three key points:* For: 本研究旨在 Addressing tabular 数据中的分布shift问题以及其对机器学习模型性能的影响。* Methods: 本研究使用5个表格数据集和86,000个模型配置进行了广泛的天然分布Shift的调查,并发现了 $Y|X$-shift 是 tabular 设置中最为常见的分布shift 类型。作者还建立了一个名为 WhyShift 的 empirical 测试床,以Characterize 测试中的类型分布shift。* Results: 本研究发现 tabular 设置中 $Y|X$-shift 是最为常见的分布shift 类型,并identified covariate 区域uffering最大 $Y|X$-shift。作者讨论了对 algorithmic 和数据基于的 intervención的影响,并 highlighted 未来研究所需建立分布之间的理解。
    Abstract Different distribution shifts require different algorithmic and operational interventions. Methodological research must be grounded by the specific shifts they address. Although nascent benchmarks provide a promising empirical foundation, they implicitly focus on covariate shifts, and the validity of empirical findings depends on the type of shift, e.g., previous observations on algorithmic performance can fail to be valid when the $Y|X$ distribution changes. We conduct a thorough investigation of natural shifts in 5 tabular datasets over 86,000 model configurations, and find that $Y|X$-shifts are most prevalent. To encourage researchers to develop a refined language for distribution shifts, we build WhyShift, an empirical testbed of curated real-world shifts where we characterize the type of shift we benchmark performance over. Since $Y|X$-shifts are prevalent in tabular settings, we identify covariate regions that suffer the biggest $Y|X$-shifts and discuss implications for algorithmic and data-based interventions. Our testbed highlights the importance of future research that builds an understanding of how distributions differ.
    摘要 不同的分布变化需要不同的算法和运略干预。方法学研究应该基于具体的变化来铺垫。虽然初始的标准准确提供了一个有前途的实际基础,但它们默认地关注 covariate 变化,并且实际发现的有效性取决于类型的变化,例如,之前对算法性能的评估可能无法保持有效性当 $Y|X$ 分布发生变化。我们进行了5个表格数据集的全面调查,找到了86,000个配置中的$Y|X$-变化最为普遍。为了鼓励研究人员开发更加细化的分布变化语言,我们建立了 WhyShift,一个实际测试环境,其中我们Characterize了我们在 benchmark 性能时所测试的类型的变化。由于 tabular Setting 中 $Y|X$-变化最为普遍,我们 identific covariate 区域uffer 最大 $Y|X$-变化,并讨论了对算法和数据基础的干预的影响。我们的测试环境 highlights 未来研究应该建立一个理解如何分布不同的知识。

U-CREAT: Unsupervised Case Retrieval using Events extrAcTion

  • paper_url: http://arxiv.org/abs/2307.05260
  • repo_url: https://github.com/exploration-lab/il-pcr
  • paper_authors: Abhinav Joshi, Akshat Sharma, Sai Kiran Tanikella, Ashutosh Modi
  • for: 本研究旨在提高法律领域中的先前案例检索(PCR)任务的自动化。
  • methods: 本文提出了一个新的大型benchmark(IL-PCR corpus),用于检验PCR任务的自动化方法。另外,本文还提出了一种基于事件EXTRACTION的无监督检索方法pipeline U-CREAT,用于提高PCR任务的性能。
  • results: 对于IL-PCR corpus和COLIEE corpus两个法律系统,提出的无监督检索方法在benchmark上表现出了state-of-the-art的性能,并且比BM25更快速。
    Abstract The task of Prior Case Retrieval (PCR) in the legal domain is about automatically citing relevant (based on facts and precedence) prior legal cases in a given query case. To further promote research in PCR, in this paper, we propose a new large benchmark (in English) for the PCR task: IL-PCR (Indian Legal Prior Case Retrieval) corpus. Given the complex nature of case relevance and the long size of legal documents, BM25 remains a strong baseline for ranking the cited prior documents. In this work, we explore the role of events in legal case retrieval and propose an unsupervised retrieval method-based pipeline U-CREAT (Unsupervised Case Retrieval using Events Extraction). We find that the proposed unsupervised retrieval method significantly increases performance compared to BM25 and makes retrieval faster by a considerable margin, making it applicable to real-time case retrieval systems. Our proposed system is generic, we show that it generalizes across two different legal systems (Indian and Canadian), and it shows state-of-the-art performance on the benchmarks for both the legal systems (IL-PCR and COLIEE corpora).
    摘要 法律领域中的先前案例检索任务(PCR)的目标是自动提供与当前案件相关的(根据事实和先例)的先前法律案例。为了进一步推动PCR领域的研究,在这篇论文中,我们提出了一个新的大型benchmark(以英语为语言) дляPCR任务:IL-PCR(印度法律先前案例检索) corpora。由于法律案例的复杂性和法律文档的长度,BM25仍然是PCR任务中的强基线,用于排名被引用的先前法律案例。在这种工作中,我们探索了法律案例中事件的角色,并提出了一种无监督检索方法领导管道U-CREAT(无监督案例检索使用事件EXTRACTION)。我们发现,提议的无监督检索方法可以明显提高性能,并且可以大幅提高检索速度,使其适用于实时案例检索系统。我们的提出的系统是通用的,我们表明它可以在印度和加拿大两个不同的法律系统上实现状态的表现,并在IL-PCR和COLIEE corpora上实现了状态的benchmark。

Integrated Planning in Hospitals: A Review

  • paper_url: http://arxiv.org/abs/2307.05258
  • repo_url: None
  • paper_authors: Sebastian Rachuba, Melanie Reuter-Oppermann, Clemens Thielen
  • for: 这篇论文旨在概述医院资源规划的Operator Research和管理科学文献,尤其是关于多资源集成规划的研究。
  • methods: 该论文分析了不同方面的不确定性模型和使用实际数据,并进行了跨比较,揭示了模型和解决方法的实际应用和潜在发展方向。
  • results: 该论文提供了一个高级分类系统,用于 классифика多资源集成规划方法,并指出了文献缺失和未来研究的潜在方向。
    Abstract Efficient planning of scarce resources in hospitals is a challenging task for which a large variety of Operations Research and Management Science approaches have been developed since the 1950s. While efficient planning of single resources such as operating rooms, beds, or specific types of staff can already lead to enormous efficiency gains, integrated planning of several resources has been shown to hold even greater potential, and a large number of integrated planning approaches have been presented in the literature over the past decades. This paper provides the first literature review that focuses specifically on the Operations Research and Management Science literature related to integrated planning of different resources in hospitals. We collect the relevant literature and analyze it regarding different aspects such as uncertainty modeling and the use of real-life data. Several cross comparisons reveal interesting insights concerning, e.g., relations between the modeling and solution methods used and the practical implementation of the approaches developed. Moreover, we provide a high-level taxonomy for classifying different resource-focused integration approaches and point out gaps in the literature as well as promising directions for future research.
    摘要 高效规划医院资源是一项复杂的任务,自1950年代以来,操作研究和管理科学方法已经开发出了各种方法。虽然单一资源的准确规划,如操作房、床位或特定类型的人员,已经能够实现很大的效率提升,但是集成资源规划具有更大的潜在潜力,文献中已经报道了许多集成资源规划方法。这篇论文是医院操作研究和管理科学文献中集成资源规划的首个文献回顾。我们收集了相关文献,并分析它们,包括不确定性模型和实际数据使用。跨比较表明了一些有趣的发现,例如模型和解决方法的关系和实际应用中使用的方法。此外,我们还提供了资源集成规划方法的高级分类和文献缺失以及未来研究的潜在方向。

Score Function Gradient Estimation to Widen the Applicability of Decision-Focused Learning

  • paper_url: http://arxiv.org/abs/2307.05213
  • repo_url: None
  • paper_authors: Mattia Silvestri, Senne Berden, Jayanta Mandi, Ali İrfan Mahmutoğulları, Maxime Mulamba, Allegra De Filippo, Tias Guns, Michele Lombardi
  • for: 这篇论文主要针对优化问题中Unknown参数的预测,以提高优化结果。
  • methods: 这篇论文提出了一种新的决策关注学习(DFL)方法,通过直接最小化下游任务损失来训练机器学习模型。但是现有的DFL方法受到问题结构假设(如线性问题)和仅仅能预测出现在目标函数中的参数的限制。这篇论文采用分布预测和分数函数高斯估计(SFGE)来扩展DFL的应用范围。
  • results: 通过SFGE,我们可以:(1)处理出现在目标函数和约束中的预测;(2)有效地解决两阶段随机优化问题。
    Abstract Many real-world optimization problems contain unknown parameters that must be predicted prior to solving. To train the predictive machine learning (ML) models involved, the commonly adopted approach focuses on maximizing predictive accuracy. However, this approach does not always lead to the minimization of the downstream task loss. Decision-focused learning (DFL) is a recently proposed paradigm whose goal is to train the ML model by directly minimizing the task loss. However, state-of-the-art DFL methods are limited by the assumptions they make about the structure of the optimization problem (e.g., that the problem is linear) and by the fact that can only predict parameters that appear in the objective function. In this work, we address these limitations by instead predicting \textit{distributions} over parameters and adopting score function gradient estimation (SFGE) to compute decision-focused updates to the predictive model, thereby widening the applicability of DFL. Our experiments show that by using SFGE we can: (1) deal with predictions that occur both in the objective function and in the constraints; and (2) effectively tackle two-stage stochastic optimization problems.
    摘要 多数现实世界优化问题中存在未知参数,需要在解决之前预测。为了训练预测机器学习(ML)模型,通常采用的方法是寻求最大化预测精度。然而,这种方法并不总是可以最小化下游任务损失。决策专注学习(DFL)是一种最近提出的方法,其目标是通过直接最小化任务损失来训练 ML 模型。然而,现有的 DFL 方法受到问题结构的假设(例如,问题是线性的)和只能预测出现在目标函数中的参数的限制。在这种情况下,我们提出了一种新的方法,即预测参数的分布,并采用分数函数梯度估计(SFGE)计算决策专注更新,从而扩展 DFL 的应用范围。我们的实验表明,通过使用 SFGE,我们可以:(1)处理目标函数中的预测和约束中的预测;(2)有效地解决两个阶段随机优化问题。

Contextual Pre-Planning on Reward Machine Abstractions for Enhanced Transfer in Deep Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2307.05209
  • repo_url: None
  • paper_authors: Guy Azran, Mohamad H. Danesh, Stefano V. Albrecht, Sarah Keren
  • for: 提高深度强化学习代理人的适应能力和减少过拟合
  • methods: 使用奖机器(RM)来表示当前任务,通过奖机器来激励代理人学习和转移
  • results: 在多个领域中提高了代理人的样本效率和几拟合转移能力
    Abstract Recent studies show that deep reinforcement learning (DRL) agents tend to overfit to the task on which they were trained and fail to adapt to minor environment changes. To expedite learning when transferring to unseen tasks, we propose a novel approach to representing the current task using reward machines (RM), state machine abstractions that induce subtasks based on the current task's rewards and dynamics. Our method provides agents with symbolic representations of optimal transitions from their current abstract state and rewards them for achieving these transitions. These representations are shared across tasks, allowing agents to exploit knowledge of previously encountered symbols and transitions, thus enhancing transfer. Our empirical evaluation shows that our representations improve sample efficiency and few-shot transfer in a variety of domains.
    摘要 现代研究显示,深度强化学习(DRL)代理往往因任务训练而过拟合和环境变化小的适应能力。为了加速转移到未经训练任务上的学习,我们提出了一种新的方法,即使用奖机器(RM)来表示当前任务,从而导致基于当前任务的奖励和动力的子任务。我们的方法为代理提供符号表示当前抽象状态中的优质转移,并奖励代理实现这些转移。这些表示在任务之间共享,使代理能够利用先前遇到的符号和转移知识,从而提高转移。我们的实验证明,我们的表示能够提高样本效率和少量转移在多种领域。

Differentially Private Statistical Inference through $β$-Divergence One Posterior Sampling

  • paper_url: http://arxiv.org/abs/2307.05194
  • repo_url: None
  • paper_authors: Jack Jewson, Sahra Ghalebikesabi, Chris Holmes
  • for: 这个研究旨在提供一种能够保证隐私的统计分析方法,以便发布敏感数据的结果而不会威胁任何参与者的隐私。
  • methods: 这种方法是基于 Bayesian posterior sampling 的,它可以生成具有隐私保证的统计分析结果,而不需要裁剪数据或引入噪声。
  • results: 这种方法可以为复杂的分类器和连续回归模型,如神经网络,提供 differentially private 的估计,并且比传统方法更准确。
    Abstract Differential privacy guarantees allow the results of a statistical analysis involving sensitive data to be released without compromising the privacy of any individual taking part. Achieving such guarantees generally requires the injection of noise, either directly into parameter estimates or into the estimation process. Instead of artificially introducing perturbations, sampling from Bayesian posterior distributions has been shown to be a special case of the exponential mechanism, producing consistent, and efficient private estimates without altering the data generative process. The application of current approaches has, however, been limited by their strong bounding assumptions which do not hold for basic models, such as simple linear regressors. To ameliorate this, we propose $\beta$D-Bayes, a posterior sampling scheme from a generalised posterior targeting the minimisation of the $\beta$-divergence between the model and the data generating process. This provides private estimation that is generally applicable without requiring changes to the underlying model and consistently learns the data generating parameter. We show that $\beta$D-Bayes produces more precise inference estimation for the same privacy guarantees, and further facilitates differentially private estimation via posterior sampling for complex classifiers and continuous regression models such as neural networks for the first time.
    摘要 diffe革保证 позволяет发布涉及敏感数据的统计分析结果,而不损害参与者的隐私。实现这些保证通常需要注入噪声,ether directly into parameter estimates or into the estimation process. Instead of artificially introducing perturbations, sampling from Bayesian posterior distributions has been shown to be a special case of the exponential mechanism, producing consistent, and efficient private estimates without altering the data generative process. However, current approaches have been limited by their strong bounding assumptions, which do not hold for basic models, such as simple linear regressors. To address this, we propose $\beta$D-Bayes, a posterior sampling scheme from a generalized posterior targeting the minimization of the $\beta$-divergence between the model and the data generating process. This provides private estimation that is generally applicable without requiring changes to the underlying model and consistently learns the data generating parameter. We show that $\beta$D-Bayes produces more precise inference estimation for the same privacy guarantees, and further facilitates differentially private estimation via posterior sampling for complex classifiers and continuous regression models such as neural networks for the first time.

Can I say, now machines can think?

  • paper_url: http://arxiv.org/abs/2307.07526
  • repo_url: https://github.com/rprokap/pset-9
  • paper_authors: Nitisha Aggarwal, Geetika Jain Saxena, Sanjeev Singh, Amit Pundir
  • for: 本研究旨在探讨人工智能技术的应用和发展,以及这些技术在不同领域中的可能性和挑战。
  • methods: 本研究使用了许多不同的方法,包括对现有技术的分析和评估,以及对新技术的开发和应用。
  • results: 本研究结果表明,人工智能技术已经在许多领域取得了 significiant progress,包括图像生成、回答问题和编写代码等。同时,研究也探讨了这些技术的可能性和挑战,以及如何评估这些技术的认知能力。
    Abstract Generative AI techniques have opened the path for new generations of machines in diverse domains. These machines have various capabilities for example, they can produce images, generate answers or stories, and write codes based on the "prompts" only provided by users. These machines are considered 'thinking minds' because they have the ability to generate human-like responses. In this study, we have analyzed and explored the capabilities of artificial intelligence-enabled machines. We have revisited on Turing's concept of thinking machines and compared it with recent technological advancements. The objections and consequences of the thinking machines are also discussed in this study, along with available techniques to evaluate machines' cognitive capabilities. We have concluded that Turing Test is a critical aspect of evaluating machines' ability. However, there are other aspects of intelligence too, and AI machines exhibit most of these aspects.
    摘要 人工智能技术已经开启了新一代机器的道路,这些机器在多个领域都有不同的能力,例如生成图片、回答问题或故事、根据用户提供的提示生成代码等。这些机器被认为是“思维机器”,因为它们有人类化的回应能力。在这项研究中,我们对人工智能技术的应用和发展进行了分析和探讨。我们还重新审视了图灵的思想机器理论,并与最新的技术进步进行比较。这些思想机器的反对和后果也得到了讨论,同时还提出了评估机器智能能力的技巧。我们认为图灵测试是评估机器智能能力的关键方面,但是还有其他智能方面,AI机器几乎涵盖了所有这些方面。

CAT-ViL: Co-Attention Gated Vision-Language Embedding for Visual Question Localized-Answering in Robotic Surgery

  • paper_url: http://arxiv.org/abs/2307.05182
  • repo_url: https://github.com/longbai1006/cat-vil
  • paper_authors: Long Bai, Mobarakol Islam, Hongliang Ren
  • for: 这个研究旨在帮助医学生和初级外科医生通过记录的手术视频学习和理解手术技巧。
  • methods: 该研究提出了一种基于Transformer的眼动语言(ViL)嵌入,并将视觉和文本特征 fusion 以提高手术Scene理解。
  • results: 实验结果表明,该方法在MICCAI EndoVis Challenge 2017和2018中的公共手术视频上表现出色,并且在各种情况下具有出色的Robustness。
    Abstract Medical students and junior surgeons often rely on senior surgeons and specialists to answer their questions when learning surgery. However, experts are often busy with clinical and academic work, and have little time to give guidance. Meanwhile, existing deep learning (DL)-based surgical Visual Question Answering (VQA) systems can only provide simple answers without the location of the answers. In addition, vision-language (ViL) embedding is still a less explored research in these kinds of tasks. Therefore, a surgical Visual Question Localized-Answering (VQLA) system would be helpful for medical students and junior surgeons to learn and understand from recorded surgical videos. We propose an end-to-end Transformer with the Co-Attention gaTed Vision-Language (CAT-ViL) embedding for VQLA in surgical scenarios, which does not require feature extraction through detection models. The CAT-ViL embedding module is designed to fuse multimodal features from visual and textual sources. The fused embedding will feed a standard Data-Efficient Image Transformer (DeiT) module, before the parallel classifier and detector for joint prediction. We conduct the experimental validation on public surgical videos from MICCAI EndoVis Challenge 2017 and 2018. The experimental results highlight the superior performance and robustness of our proposed model compared to the state-of-the-art approaches. Ablation studies further prove the outstanding performance of all the proposed components. The proposed method provides a promising solution for surgical scene understanding, and opens up a primary step in the Artificial Intelligence (AI)-based VQLA system for surgical training. Our code is publicly available.
    摘要 医学生和初级外科医生经常寻求 senior 外科医生和专家回答他们学习外科手术时的问题。然而,专家往往忙于临床和学术工作,有少量时间提供指导。此外,现有的深度学习(DL)基本的外科视觉问答(VQA)系统只能提供简单的答案而不是答案的位置。此外,视语(ViL)嵌入仍然是这类任务的未知领域。因此,一个外科视觉问题本地回答(VQLA)系统会对医学生和初级外科医生学习和理解记录的外科手术视频非常有用。我们提议一种终端Transformer结构,其中包括协同注意力加特化视语(CAT-ViL)嵌入模块,不需要通过检测模型提取特征。CAT-ViL嵌入模块设计用于融合视觉和文本来源的多Modal特征。这些融合的嵌入将被 feed 到标准数据效果图像变换器(DeiT)模块,然后通过平行分类器和检测器进行共同预测。我们在MICCAI EndoVis Challenge 2017和2018公共的外科手术视频上进行实验验证。实验结果表明我们提出的方法在现状顶尖方法的基础上表现出色,并且具有较好的 Robustness。剖析研究进一步证明了我们所提出的所有组件的出色性。该方法为外科场景理解提供了一个有前途的解决方案,并打开了人工智能(AI)基于VQLA系统的初步步骤。我们的代码公共可用。

Enriching Verbal Feedback from Usability Testing: Automatic Linking of Thinking-Aloud Recordings and Stimulus using Eye Tracking and Mouse Data

  • paper_url: http://arxiv.org/abs/2307.05171
  • repo_url: None
  • paper_authors: Supriya Murali, Tina Walber, Christoph Schaefer, Sezen Lim
  • for: This paper aims to automatically analyze verbal protocols and investigate the link between spoken feedback and the stimulus using eye tracking and mouse tracking.
  • methods: The paper uses eye tracking and mouse tracking to record the verbal responses, eye movements, and cursor movements of participants as they view and provide feedback on three websites.
  • results: The results show that the hit rate for gaze data is significantly higher than for mouse data, indicating that eye tracking data provides more detailed information and valuable insights about the verbalizations compared to mouse data.
    Abstract The think aloud method is an important and commonly used tool for usability optimization. However, analyzing think aloud data could be time consuming. In this paper, we put forth an automatic analysis of verbal protocols and test the link between spoken feedback and the stimulus using eye tracking and mouse tracking. The gained data - user feedback linked to a specific area of the stimulus - could be used to let an expert review the feedback on specific web page elements or to visualize on which parts of the web page the feedback was given. Specifically, we test if participants fixate on or point with the mouse to the content of the webpage that they are verbalizing. During the testing, participants were shown three websites and asked to verbally give their opinion. The verbal responses, along with the eye and cursor movements were recorded. We compared the hit rate, defined as the percentage of verbally mentioned areas of interest (AOIs) that were fixated with gaze or pointed to with the mouse. The results revealed a significantly higher hit rate for the gaze compared to the mouse data. Further investigation revealed that, while the mouse was mostly used passively to scroll, the gaze was often directed towards relevant AOIs, thus establishing a strong association between spoken words and stimuli. Therefore, eye tracking data possibly provides more detailed information and more valuable insights about the verbalizations compared to the mouse data.
    摘要 “对话思维法”是用户体验优化的重要工具之一,但分析对话数据可能是时间消耗的。本文提出一种自动分析声明协议的方法,并通过眼动跟踪和鼠标跟踪测试了声明与刺激之间的关系。获得的数据(用户反馈与特定区域刺激之间的关系)可以用于让专家审查特定网页元素的反馈,或者可以视化在网页上提供反馈。我们测试了参与者对三个网站的 opinio 的报告,并记录了口头响应、眼动和鼠标移动。我们比较了hit rate,定义为口头提到的兴趣区域(AOI)的眼动或鼠标指向率。结果显示,眼动数据与鼠标数据之间存在显著差异,即眼动数据的hit rate significantly higher。进一步的调查表明,鼠标主要用于滚动,而眼动则经常指向有关的 AOI,因此确立了声明与刺激之间的强相关性。因此,眼动跟踪数据可能提供更多细节信息和更有价值的探索,相比鼠标数据。

Neural Quantile Optimization for Edge-Cloud Computing

  • paper_url: http://arxiv.org/abs/2307.05170
  • repo_url: None
  • paper_authors: Bin Du, He Zhang, Xiangle Cheng, Lei Zhang
  • for: 这篇论文旨在设计一个高效的edge-cloud computingu网络资源分配方案,满足网络组件的硬件和软件组件的硬件和软件缓冲�ayer,并最小化成本。
  • methods: 本论文使用了一种称为Gumbel-softmax reparameterization方法,将原本的离散问题转换为一个连续问题,并透过一个叫做Gumbel-softmax sampling网络来解决这个问题。这个网络结构是根据edge-cloud computing网络的架构设计,并且在训练过程中将网络训练为对于连续问题的最佳解决方案。
  • results: 本论文的实验结果显示,使用Gumbel-softmax sampling网络可以对edge-cloud computing网络进行高效的资源分配,并且可以优化成本。另外,研究人员还证明了这个方法可以跨时间步进行扩展,并且可以与现有的整数优化器相互运算。
    Abstract We seek the best traffic allocation scheme for the edge-cloud computing network that satisfies constraints and minimizes the cost based on burstable billing. First, for a fixed network topology, we formulate a family of integer programming problems with random parameters describing the various traffic demands. Then, to overcome the difficulty caused by the discrete feature of the problem, we generalize the Gumbel-softmax reparameterization method to induce an unconstrained continuous optimization problem as a regularized continuation of the discrete problem. Finally, we introduce the Gumbel-softmax sampling network to solve the optimization problems via unsupervised learning. The network structure reflects the edge-cloud computing topology and is trained to minimize the expectation of the cost function for unconstrained continuous optimization problems. The trained network works as an efficient traffic allocation scheme sampler, remarkably outperforming the random strategy in feasibility and cost function value. Besides testing the quality of the output allocation scheme, we examine the generalization property of the network by increasing the time steps and the number of users. We also feed the solution to existing integer optimization solvers as initial conditions and verify the warm-starts can accelerate the short-time iteration process. The framework is general with solid performance, and the decoupled feature of the random neural networks is adequate for practical implementations.
    摘要

SuryaKiran at MEDIQA-Sum 2023: Leveraging LoRA for Clinical Dialogue Summarization

  • paper_url: http://arxiv.org/abs/2307.05162
  • repo_url: None
  • paper_authors: Kunal Suri, Prakhar Mishra, Saumajit Saha, Atul Singh
  • for: 这个论文是为了提高预测领域具体应用场景的结果而写的。
  • methods: 这个论文使用了Parameter Efficient Fine Tuning(PEFT)方法,将大型语言模型作为固定基础,然后添加额外层,并使用PEFT方法进行微调。
  • results: 这个论文透过评估LoRA方法在医疗对话摘要化领域的表现,发现LoRA方法与终端微调方法的表现相似。
    Abstract Finetuning Large Language Models helps improve the results for domain-specific use cases. End-to-end finetuning of large language models is time and resource intensive and has high storage requirements to store the finetuned version of the large language model. Parameter Efficient Fine Tuning (PEFT) methods address the time and resource challenges by keeping the large language model as a fixed base and add additional layers, which the PEFT methods finetune. This paper demonstrates the evaluation results for one such PEFT method Low Rank Adaptation (LoRA), for Clinical Dialogue Summarization. The evaluation results show that LoRA works at par with end-to-end finetuning for a large language model. The paper presents the evaluations done for solving both the Subtask A and B from ImageCLEFmedical {https://www.imageclef.org/2023/medical}
    摘要 使用大语言模型进行精度调整可以提高域специфи的应用场景中的结果。结束到终端的精度调整大语言模型需要很多时间和资源,同时也需要大量存储空间来存储精度调整后的大语言模型。参数高效精度调整(PEFT)方法解决了时间和资源挑战,它将大语言模型作为基础不变,然后在其上添加额外层,并使用PEFT方法进行精度调整。本文介绍了使用LoRA方法进行临床对话摘要的评估结果。评估结果表明,LoRA与终端精度调整的大语言模型相当。文章还介绍了解决ImageCLEF医学 {https://www.imageclef.org/2023/medical} 的两个任务A和B的评估结果。

Multiobjective Hydropower Reservoir Operation Optimization with Transformer-Based Deep Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2307.05643
  • repo_url: None
  • paper_authors: Rixin Wu, Ran Wang, Jie Hao, Qiang Wu, Ping Wang
  • for: 旨在做好多座水力发电厂的共同运行,以实现能量生产、生态保护和居民用水的平衡。
  • methods: 使用深度学习 reinforcement learning 方法,包括 transformer 框架,以提取多个水库和居民区域信息,并生成合适的操作决策。
  • results: 对 Lake Mead 和 Lake Powell 进行实验,结果表明 transformer 基于 deep reinforcement learning 方法可以生成适当的操作结果,相比之前的方法,可以增加电力生产10.11%,降低修改年度负担流量偏差39.69%,提高水质财富4.10%。
    Abstract Due to shortage of water resources and increasing water demands, the joint operation of multireservoir systems for balancing power generation, ecological protection, and the residential water supply has become a critical issue in hydropower management. However, the numerous constraints and nonlinearity of multiple reservoirs make solving this problem time-consuming. To address this challenge, a deep reinforcement learning approach that incorporates a transformer framework is proposed. The multihead attention mechanism of the encoder effectively extracts information from reservoirs and residential areas, and the multireservoir attention network of the decoder generates suitable operational decisions. The proposed method is applied to Lake Mead and Lake Powell in the Colorado River Basin. The experimental results demonstrate that the transformer-based deep reinforcement learning approach can produce appropriate operational outcomes. Compared to a state-of-the-art method, the operation strategies produced by the proposed approach generate 10.11% more electricity, reduce the amended annual proportional flow deviation by 39.69%, and increase water supply revenue by 4.10%. Consequently, the proposed approach offers an effective method for the multiobjective operation of multihydropower reservoir systems.
    摘要 Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China. The translation may vary depending on the specific dialect or region.

On the Effectiveness of Speech Self-supervised Learning for Music

  • paper_url: http://arxiv.org/abs/2307.05161
  • repo_url: None
  • paper_authors: Yinghao Ma, Ruibin Yuan, Yizhi Li, Ge Zhang, Xingran Chen, Hanzhi Yin, Chenghua Lin, Emmanouil Benetos, Anton Ragni, Norbert Gyenge, Ruibo Liu, Gus Xia, Roger Dannenberg, Yike Guo, Jie Fu
  • for: 本研究探讨了自监督学习(SSL)在音乐信息检索(MIR)领域的应用情况。
  • methods: 本研究使用了两种不同的语音相关模型,即data2vec1.0和Hubert,并将其应用到音乐录音中。
  • results: 研究发现,使用音乐数据进行SSL训练可以提高MIR任务的性能,即使使用了基于语音的模型。但是,研究还发现了现有的语音导向的设计方法在处理多重音乐信息方面存在限制。
    Abstract Self-supervised learning (SSL) has shown promising results in various speech and natural language processing applications. However, its efficacy in music information retrieval (MIR) still remains largely unexplored. While previous SSL models pre-trained on music recordings may have been mostly closed-sourced, recent speech models such as wav2vec2.0 have shown promise in music modelling. Nevertheless, research exploring the effectiveness of applying speech SSL models to music recordings has been limited. We explore the music adaption of SSL with two distinctive speech-related models, data2vec1.0 and Hubert, and refer to them as music2vec and musicHuBERT, respectively. We train $12$ SSL models with 95M parameters under various pre-training configurations and systematically evaluate the MIR task performances with 13 different MIR tasks. Our findings suggest that training with music data can generally improve performance on MIR tasks, even when models are trained using paradigms designed for speech. However, we identify the limitations of such existing speech-oriented designs, especially in modelling polyphonic information. Based on the experimental results, empirical suggestions are also given for designing future musical SSL strategies and paradigms.
    摘要 In this study, we explore the music adaptation of SSL using two distinct speech-related models, data2vec1.0 and Hubert, and refer to them as music2vec and musicHuBERT, respectively. We train 12 SSL models with 95M parameters under various pre-training configurations and systematically evaluate the MIR task performances with 13 different MIR tasks. Our findings suggest that training with music data can generally improve performance on MIR tasks, even when models are trained using paradigms designed for speech. However, we identify the limitations of such existing speech-oriented designs, especially in modeling polyphonic information.Based on the experimental results, we provide empirical suggestions for designing future musical SSL strategies and paradigms. For example, we find that using a larger model size and a longer pre-training duration can improve the performance of music2vec and musicHuBERT on MIR tasks. Additionally, we suggest that incorporating more diverse musical features and using a more robust evaluation metric can further enhance the performance of these models.Overall, our study demonstrates the potential of applying SSL to MIR tasks, and provides insights into the limitations and opportunities of using speech-oriented models for music modeling. These findings can help guide the development of future musical SSL models and paradigms, and contribute to the advancement of MIR technology.

Stable Normative Explanations: From Argumentation to Deontic Logic

  • paper_url: http://arxiv.org/abs/2307.05156
  • repo_url: None
  • paper_authors: Cecilia Di Florio, Guido Governatori, Antonino Rotolo, Giovanni Sartor
  • for: 这 paper 探讨了 Defeasible Logic 中稳定解释的表述方式,以及如何在形式逻辑中表述这种概念。
  • methods: 这 paper 使用了形式逻辑的方法,包括建立了 argumentation neighborhood structures 以及讨论了这种概念的 deontic meaning。
  • results: 这 paper 提供了一些直接复杂性结果。
    Abstract This paper examines how a notion of stable explanation developed elsewhere in Defeasible Logic can be expressed in the context of formal argumentation. With this done, we discuss the deontic meaning of this reconstruction and show how to build from argumentation neighborhood structures for deontic logic where this notion of explanation can be characterised. Some direct complexity results are offered.
    摘要 这篇论文研究了如何在正式推理中表达 elsewhere in Defeasible Logic 中的稳定解释概念。然后,我们讨论了这种重建的德 Ontic 含义,并示出如何从推理 neigh 结构中建立德 Ontic 逻辑,其中可以表示这种解释的特征。还提供了一些直接复杂性结果。Here's a breakdown of the translation:* "elsewhere in Defeasible Logic" is translated as " elsewhere in Defeasible Logic" (使用同义词 "elsewhere" 表示 "else" 的意思)* "notion of stable explanation" is translated as "稳定解释概念" (使用词根 "稳" 表示 "stable" 的意思)* "deontic meaning" is translated as "德 Ontic 含义" (使用词根 "德" 表示 "deontic" 的意思)* "argumentation neighborhood structures" is translated as "推理 neigh 结构" (使用词根 "推理" 表示 "argumentation" 的意思)* "direct complexity results" is translated as "直接复杂性结果" (使用词根 "直接" 表示 "direct" 的意思)

A Modal Logic for Explaining some Graph Neural Networks

  • paper_url: http://arxiv.org/abs/2307.05150
  • repo_url: None
  • paper_authors: Pierre Nunn, François Schwarzentruber
  • for: 这个论文是关于模态逻辑中 counting 模态的研究。
  • methods: 论文使用 linear 不等式来表示模态逻辑,并将每个 форму拉到等价的图神经网络(GNN)中。同时,论文还将 GNN 转换成对应的模态逻辑 форму。
  • results: 论文表明了这个扩展的模态逻辑满足问题是可解决的。此外,论文还讨论了一些variant,其中一些在 PSPACE 中。
    Abstract In this paper, we propose a modal logic in which counting modalities appear in linear inequalities. We show that each formula can be transformed into an equivalent graph neural network (GNN). We also show that each GNN can be transformed into a formula. We show that the satisfiability problem is decidable. We also discuss some variants that are in PSPACE.
    摘要 在这篇论文中,我们提出了一种Modal 逻辑,其中 counting modalities 出现在线性不等式中。我们证明了每个公式都可以被转化成等价的图 neural network (GNN)。我们还证明了每个 GNN 都可以被转化成公式。我们还证明了满足问题是可解决的。我们还讨论了一些变体,它们是 PSPACE 中的。Here's the breakdown of the translation:* "Modal 逻辑" (Modal 逻辑) is the Simplified Chinese translation of "modal logic".* "counting modalities" (计数modalities) is the Simplified Chinese translation of "counting modalities".* "线性不等式" (线性不等式) is the Simplified Chinese translation of "linear inequalities".* "GNN" (GNN) is the Simplified Chinese translation of "graph neural network".* "等价" (等价) is the Simplified Chinese translation of "equivalent".* "PSPACE" (PSPACE) is the Simplified Chinese translation of "PSPACE".

Learning Active Subspaces and Discovering Important Features with Gaussian Radial Basis Functions Neural Networks

  • paper_url: http://arxiv.org/abs/2307.05639
  • repo_url: https://github.com/dannyzx/grbf-nns
  • paper_authors: Danny D’Agostino, Ilija Ilievski, Christine Annette Shoemaker
  • for: 提高机器学习模型的预测性能和可读性。
  • methods: modificar Radial Basis Function Neural Network 模型,使其的 Gaussian kernel 具有可学习的精度矩阵。
  • results: 在回归、分类和特征选择任务中,提出的模型不仅具有吸引人的预测性能,还提供了可读的结果,帮助决策过程中减少风险。
    Abstract Providing a model that achieves a strong predictive performance and at the same time is interpretable by humans is one of the most difficult challenges in machine learning research due to the conflicting nature of these two objectives. To address this challenge, we propose a modification of the Radial Basis Function Neural Network model by equipping its Gaussian kernel with a learnable precision matrix. We show that precious information is contained in the spectrum of the precision matrix that can be extracted once the training of the model is completed. In particular, the eigenvectors explain the directions of maximum sensitivity of the model revealing the active subspace and suggesting potential applications for supervised dimensionality reduction. At the same time, the eigenvectors highlight the relationship in terms of absolute variation between the input and the latent variables, thereby allowing us to extract a ranking of the input variables based on their importance to the prediction task enhancing the model interpretability. We conducted numerical experiments for regression, classification, and feature selection tasks, comparing our model against popular machine learning models and the state-of-the-art deep learning-based embedding feature selection techniques. Our results demonstrate that the proposed model does not only yield an attractive prediction performance with respect to the competitors but also provides meaningful and interpretable results that potentially could assist the decision-making process in real-world applications. A PyTorch implementation of the model is available on GitHub at the following link. https://github.com/dannyzx/GRBF-NNs
    摘要 “提供一个具有强predictive表现的模型,同时具有人类可解释性是机器学习研究中最大的挑战。我们提出一个将Radial Basis Function Neural Network模型中的 Gaussian kernelEquipped with a learnable precision matrix。我们展示了在训练完成后可以从精度矩阵的spectrum中提取有用信息。特别是,对称的 eigenvectors可以解释模型的最大敏感方向,显示活跃的子空间和可能的应用 дляsupervised dimensionality reduction。另外,eigenvectors显示输入和隐藏变量之间的绝对差异,因此可以从入门变量中提取关键的importance,增强模型的解释性。我们在回归、分类和特征选择任务中进行了numerical experiments,与流行的机器学习模型和深度学习基于嵌入特征选择技术进行比较。我们的结果显示,我们的模型不仅与竞争者具有吸引力的预测性能,同时也提供了有意义且可解释的结果,可能对实际应用中的决策过程提供帮助。PyTorch的实现可以在GitHub上找到,以下是连结:https://github.com/dannyzx/GRBF-NNs。”

A Comprehensive Survey of Deep Transfer Learning for Anomaly Detection in Industrial Time Series: Methods, Applications, and Directions

  • paper_url: http://arxiv.org/abs/2307.05638
  • repo_url: None
  • paper_authors: Peng Yan, Ahmed Abdulkadir, Matthias Rosenthal, Gerrit A. Schatte, Benjamin F. Grewe, Thilo Stadelmann
  • for: 这篇论文的目的是探讨深度转移学习在工业领域中的应用,以实现时间序列异常探测和精确的质量优化。
  • methods: 这篇论文使用深度转移学习方法,具有转移知识和处理不同数据分布的能力,以解决工业领域中的时间序列异常探测 зада题。
  • results: 这篇论文探讨了各种时间序列异常探测任务,例如生产过程监控、预防维护、能源管理和基础设施监控等,并提供了实际的建议和解决方案。
    Abstract Automating the monitoring of industrial processes has the potential to enhance efficiency and optimize quality by promptly detecting abnormal events and thus facilitating timely interventions. Deep learning, with its capacity to discern non-trivial patterns within large datasets, plays a pivotal role in this process. Standard deep learning methods are suitable to solve a specific task given a specific type of data. During training, the algorithms demand large volumes of labeled training data. However, due to the dynamic nature of processes and the environment, it is impractical to acquire the needed data for standard deep learning training for every slightly different case anew. Deep transfer learning offers a solution to this problem. By leveraging knowledge from related tasks and accounting for variations in data distributions, this learning framework solves new tasks even with little or no additional labeled data. The approach bypasses the need to retrain a model from scratch for every new setup and dramatically reduces the labeled data requirement. This survey provides an in-depth review of deep transfer learning, examining the problem settings of transfer learning and classifying the prevailing deep transfer learning methods. Moreover, we delve into applying deep transfer learning in the context of a broad spectrum of time series anomaly detection tasks prevalent in primary industrial domains, e.g., manufacturing process monitoring, predictive maintenance, energy management, and infrastructure facility monitoring. We conclude this survey by underlining the challenges and limitations of deep transfer learning in industrial contexts. We also provide practical directions for solution design and implementation for these tasks, leading to specific, actionable suggestions.
    摘要 自动监测工业过程可以提高效率和质量的优化,通过及时检测异常事件并且进行时间化 intervención。深度学习,作为检测复杂模式的技术,在这个过程中扮演着关键性的角色。标准的深度学习方法适用于特定任务和数据类型。在训练过程中,算法需要大量标注训练数据。然而,由于生产过程和环境的动态性,获得每个微不同的情况的充足数据是不可能的。深度传输学习提供了一个解决方案。通过利用相关任务的知识和考虑数据分布的变化,这种学习框架可以解决新任务,即使有少量或没有额外标注数据。这种方法可以避免重新训练模型的需要,并大幅减少标注数据的需求。本文提供了深度传输学习的深入审查,包括传输学习问题的设定和已知的深度传输学习方法。此外,我们还探讨了在主要工业领域中广泛存在的时间序列异常检测任务,如制造过程监测、预测维护、能源管理和基础设施监测。我们在这种情况下结尾这篇评论,并指出了深度传输学习在工业上的挑战和局限性。同时,我们还提供了实践的解决方案和实施建议,以便帮助读者在这些任务中进行实际应用。

TIAM – A Metric for Evaluating Alignment in Text-to-Image Generation

  • paper_url: http://arxiv.org/abs/2307.05134
  • repo_url: https://github.com/grimalpaul/tiam
  • paper_authors: Paul Grimal, Hervé Le Borgne, Olivier Ferret, Julien Tourille
  • for: 本研究旨在评估文本到图像(T2I)模型生成图像的质量,特别是考虑提示中的重要内容与生成图像之间的匹配程度。
  • methods: 本研究提出了一种基于提示模板的新评价指标,可以更好地描述生成图像与提示中的内容之间的匹配程度,包括提示中的对象类型、数量和颜色等方面。
  • results: 研究发现,图像质量可以受到随机初始点的影响,并且不同的概率误差可以生成不同质量的图像。此外,研究还发现提示中的概念数量、顺序和颜色属性也会影响图像质量。最后,研究还发现了一些latent seed可以生成更好的图像,开启了新的研究方向。
    Abstract The progress in the generation of synthetic images has made it crucial to assess their quality. While several metrics have been proposed to assess the rendering of images, it is crucial for Text-to-Image (T2I) models, which generate images based on a prompt, to consider additional aspects such as to which extent the generated image matches the important content of the prompt. Moreover, although the generated images usually result from a random starting point, the influence of this one is generally not considered. In this article, we propose a new metric based on prompt templates to study the alignment between the content specified in the prompt and the corresponding generated images. It allows us to better characterize the alignment in terms of the type of the specified objects, their number, and their color. We conducted a study on several recent T2I models about various aspects. An additional interesting result we obtained with our approach is that image quality can vary drastically depending on the latent noise used as a seed for the images. We also quantify the influence of the number of concepts in the prompt, their order as well as their (color) attributes. Finally, our method allows us to identify some latent seeds that produce better images than others, opening novel directions of research on this understudied topic.
    摘要 progress 在生成 sintetic 图像方面的发展使得评估图像质量变得关键。虽然多种指标已经被提出来评估图像的渲染,但是对于 Text-to-Image(T2I)模型,它们生成图像基于提示,需要考虑更多的方面,例如提示中重要内容的匹配度。此外,通常生成的图像都来自于随机的起始点,但是这一点的影响通常不被考虑。本文提出了一个基于提示模板的新指标,用于研究提示中内容和生成图像之间的对应。它允许我们更好地 characteize 对应的类型、数量和颜色。我们对多种最近的 T2I 模型进行了研究,并获得了许多有趣的结果。例如,图像质量可以受到 latent noise 作为种子的影响,并且我们可以衡量提示中概念的数量、顺序以及颜色属性的影响。此外,我们的方法允许我们标识一些 latent seeds 可以生成更好的图像,开启了新的研究方向。

A Deep Dive into Perturbations as Evaluation Technique for Time Series XAI

  • paper_url: http://arxiv.org/abs/2307.05104
  • repo_url: https://github.com/visual-xai-for-time-series/time-series-xai-perturbation-analysis
  • paper_authors: Udo Schlegel, Daniel A. Keim
  • for: 本研究旨在评估时序数据XAI技术中的解释质量。
  • methods: 本研究使用杂化分析方法来评估XAI方法中的解释。
  • results: 研究结果表明,杂化分析方法可以有效评估XAI方法的解释质量,并提供时序数据XAI技术的各种优缺点。
    Abstract Explainable Artificial Intelligence (XAI) has gained significant attention recently as the demand for transparency and interpretability of machine learning models has increased. In particular, XAI for time series data has become increasingly important in finance, healthcare, and climate science. However, evaluating the quality of explanations, such as attributions provided by XAI techniques, remains challenging. This paper provides an in-depth analysis of using perturbations to evaluate attributions extracted from time series models. A perturbation analysis involves systematically modifying the input data and evaluating the impact on the attributions generated by the XAI method. We apply this approach to several state-of-the-art XAI techniques and evaluate their performance on three time series classification datasets. Our results demonstrate that the perturbation analysis approach can effectively evaluate the quality of attributions and provide insights into the strengths and limitations of XAI techniques. Such an approach can guide the selection of XAI methods for time series data, e.g., focusing on return time rather than precision, and facilitate the development of more reliable and interpretable machine learning models for time series analysis.
    摘要 < translator:name="Google" /> < translator:fallback_language="en" />explainable artificial intelligence (XAI) 在最近几年内得到了广泛关注,因为机器学习模型的透明性和可解释性的需求增加了。特别是在金融、医疗和气候科学等领域,XAI for time series data 已经变得越来越重要。然而,评估 XAI 技术提供的解释质量仍然是一个挑战。这篇论文提供了对使用扰动分析来评估 XAI 技术提供的解释的深入分析。扰动分析 involves systematically modifying the input data and evaluating the impact on the attributions generated by the XAI method。我们在多种 state-of-the-art XAI 技术上应用了这种方法,并在三个时间序列分类数据集上进行了评估。我们的结果表明,扰动分析方法可以有效评估 XAI 技术提供的解释质量,并为 XAI 技术的发展提供了有价值的指导。这种方法可以导引选择 XAI 方法,例如关注返回时间而不是精度,并促进更可靠和可解释的机器学习模型的开发。

ATWM: Defense against adversarial malware based on adversarial training

  • paper_url: http://arxiv.org/abs/2307.05095
  • repo_url: None
  • paper_authors: Kun Li, Fan Zhang, Wei Guo
  • for: 防御Windows恶意软件攻击
  • methods: 基于对抗训练的防御方法
  • results: 提高模型对抗攻击能力,不产生模型精度下降
    Abstract Deep learning technology has made great achievements in the field of image. In order to defend against malware attacks, researchers have proposed many Windows malware detection models based on deep learning. However, deep learning models are vulnerable to adversarial example attacks. Malware can generate adversarial malware with the same malicious function to attack the malware detection model and evade detection of the model. Currently, many adversarial defense studies have been proposed, but existing adversarial defense studies are based on image sample and cannot be directly applied to malware sample. Therefore, this paper proposes an adversarial malware defense method based on adversarial training. This method uses preprocessing to defend simple adversarial examples to reduce the difficulty of adversarial training. Moreover, this method improves the adversarial defense capability of the model through adversarial training. We experimented with three attack methods in two sets of datasets, and the results show that the method in this paper can improve the adversarial defense capability of the model without reducing the accuracy of the model.
    摘要 深度学习技术在图像领域得到了很大的成就。为了对抗马尔伯攻击,研究人员已经提出了基于深度学习的Windows马尔伯检测模型许多。然而,深度学习模型容易受到反例攻击。恶意软件可以生成反例攻击模型,使模型无法识别恶意软件。目前,许多反例防御研究已经被提出,但这些研究都基于图像样本,无法直接应用于马尔伯样本。因此,这篇论文提出了基于反例训练的反例防御方法。这种方法使用预处理来防御简单的反例,从而减少反例训练的困难。此外,这种方法通过反例训练提高模型的反例防御能力。我们在两个数据集上使用三种攻击方法进行实验,结果显示,这篇论文中的方法可以提高模型的反例防御能力,不会降低模型的准确率。

OntoChatGPT Information System: Ontology-Driven Structured Prompts for ChatGPT Meta-Learning

  • paper_url: http://arxiv.org/abs/2307.05082
  • repo_url: https://github.com/knowledge-ukraine/ontochatgpt
  • paper_authors: Oleksandr Palagin, Vladislav Kaverinskiy, Anna Litvin, Kyrylo Malakhov
  • for: 这个研究旨在开发一种基于 ontology 的结构化提示系统,用于与 ChatGPT 的大语言模型(LLM)进行集成。
  • methods: 该研究开发了形式化的信息模型和功能模型,并建立了将 ontology-driven 提示与 ChatGPT 的meta-学能力集成的方法ологиocal foundations。
  • results: 通过应用该技术,OntoChatGPT 系统能够从上下文中提取实体,将其分类,并生成相关的回答。研究表明,该方法可以应用于不同的语言和领域,并且可以扩展到其他基于 LLM 的 chatbot 系统,如 Google Bard 使用 PaLM 2 LLM。
    Abstract This research presents a comprehensive methodology for utilizing an ontology-driven structured prompts system in interplay with ChatGPT, a widely used large language model (LLM). The study develops formal models, both information and functional, and establishes the methodological foundations for integrating ontology-driven prompts with ChatGPT's meta-learning capabilities. The resulting productive triad comprises the methodological foundations, advanced information technology, and the OntoChatGPT system, which collectively enhance the effectiveness and performance of chatbot systems. The implementation of this technology is demonstrated using the Ukrainian language within the domain of rehabilitation. By applying the proposed methodology, the OntoChatGPT system effectively extracts entities from contexts, classifies them, and generates relevant responses. The study highlights the versatility of the methodology, emphasizing its applicability not only to ChatGPT but also to other chatbot systems based on LLMs, such as Google's Bard utilizing the PaLM 2 LLM. The underlying principles of meta-learning, structured prompts, and ontology-driven information retrieval form the core of the proposed methodology, enabling their adaptation and utilization in various LLM-based systems. This versatile approach opens up new possibilities for NLP and dialogue systems, empowering developers to enhance the performance and functionality of chatbot systems across different domains and languages.
    摘要 Translated into Simplified Chinese:这项研究提出了一种涵盖性的方法ология,使用ontology驱动的结构化提问系统与ChatGPT,一种广泛使用的大型语言模型(LLM)相结合。该研究开发了формали模型,包括信息模型和功能模型,并确立了将ontology驱动的提问与ChatGPT的元学习能力相结合的方法基础。这个产品ivitytriad包括方法基础、高级信息技术和OntoChatGPT系统,这三者共同提高了聊天机器人系统的效果和性能。该研究在 ukrainian语言领域的rehabilitation中实现了该技术。通过应用提出的方法ология,OntoChatGPT系统可以从上下文中提取实体,分类它们,并生成相关的回答。研究强调了该方法ология的多样性,指出其可以应用于不同的LLM基于系统,如Google的Bard使用PaLM 2 LLM。这些元学习、结构化提问和ontology驱动的信息检索原理成为该方法ология的核心,可以在不同的LLM基于系统中进行适应和应用。这种多元化的approach开 up了对NLP和对话系统的新可能性,让开发者在不同的领域和语言中提高聊天机器人系统的性能和功能。

Uni-Removal: A Semi-Supervised Framework for Simultaneously Addressing Multiple Degradations in Real-World Images

  • paper_url: http://arxiv.org/abs/2307.05075
  • repo_url: None
  • paper_authors: Yongheng Zhang, Danfeng Yan, Yuanqiang Cai
  • for: removes multiple degradations (haze, rain, and blur) from real-world images
  • methods: uses a twostage semi-supervised framework with a unified model and parameters, leverages a supervised multi-teacher and student architecture, and incorporates an adversarial discriminator and generative adversarial loss for domain adaptation
  • results: demonstrates effective removal of degradations in real-world images, outperforming state-of-the-art supervised and unsupervised methods in dehazing, deraining, and deblurring simultaneously
    Abstract Removing multiple degradations, such as haze, rain, and blur, from real-world images poses a challenging and illposed problem. Recently, unified models that can handle different degradations have been proposed and yield promising results. However, these approaches focus on synthetic images and experience a significant performance drop when applied to realworld images. In this paper, we introduce Uni-Removal, a twostage semi-supervised framework for addressing the removal of multiple degradations in real-world images using a unified model and parameters. In the knowledge transfer stage, Uni-Removal leverages a supervised multi-teacher and student architecture in the knowledge transfer stage to facilitate learning from pretrained teacher networks specialized in different degradation types. A multi-grained contrastive loss is introduced to enhance learning from feature and image spaces. In the domain adaptation stage, unsupervised fine-tuning is performed by incorporating an adversarial discriminator on real-world images. The integration of an extended multi-grained contrastive loss and generative adversarial loss enables the adaptation of the student network from synthetic to real-world domains. Extensive experiments on real-world degraded datasets demonstrate the effectiveness of our proposed method. We compare our Uni-Removal framework with state-of-the-art supervised and unsupervised methods, showcasing its promising results in real-world image dehazing, deraining, and deblurring simultaneously.
    摘要 “实际世界中的多重劣化(如雾、雨、模糊)去除是一个具有挑战性和不确定性的问题。现在,一些可以处理不同类型的劣化的统一模型已经被提出来,但这些方法对实际世界图像的应用表现不佳。在这篇论文中,我们介绍了Uni-Removal,一个二阶段半监督框架,用于实际世界图像中多重劣化的去除。在知识转移阶段,Uni-Removal利用了一个监督学习多教师和学生架构,以便从预训特有的教师网络中学习不同类型的劣化。我们引入了一个多层次对比损失来增强学习,包括图像和特征空间。在领域适应阶段,我们通过不监督微调来适应学生网络从синтети到实际世界领域。我们组合了扩展的多层次对比损失和生成敌方loss,使学生网络能够从synthetic到实际世界领域进行适应。实际世界受损图像 dataset 的广泛实验显示了我们提出的方法的有效性。我们与现有的监督和不监督方法进行比较,展示了我们的Uni-Removal框架在实际世界图像中进行雨、雾、模糊的去除,同时展示了它的推理和稳定性。”

Retrieval-augmented GPT-3.5-based Text-to-SQL Framework with Sample-aware Prompting and Dynamic Revision Chain

  • paper_url: http://arxiv.org/abs/2307.05074
  • repo_url: None
  • paper_authors: Chunxi Guo, Zhiliang Tian, Jintao Tang, Shasha Li, Zhihua Wen, Kaixuan Wang, Ting Wang
  • for: 本研究旨在提高LLM-based Text-to-SQL框架的性能,使其能够更好地处理各种自然语言问题。
  • methods: 我们提出了一种基于检索的增强推荐方法,包括样本意识推荐和动态修订链。我们采用样本意识推荐,使用LLM对输入问题进行简化,以便更好地理解用户的意图。另外,我们还提出了两种检索策略,用于帮助检索相似意图的问题。
  • results: 我们在三个Text-to-SQL benchmark上进行了实验,结果显示,我们的方法在比较强的基准模型之上得到了显著的提高。
    Abstract Text-to-SQL aims at generating SQL queries for the given natural language questions and thus helping users to query databases. Prompt learning with large language models (LLMs) has emerged as a recent approach, which designs prompts to lead LLMs to understand the input question and generate the corresponding SQL. However, it faces challenges with strict SQL syntax requirements. Existing work prompts the LLMs with a list of demonstration examples (i.e. question-SQL pairs) to generate SQL, but the fixed prompts can hardly handle the scenario where the semantic gap between the retrieved demonstration and the input question is large. In this paper, we propose a retrieval-augmented prompting method for a LLM-based Text-to-SQL framework, involving sample-aware prompting and a dynamic revision chain. Our approach incorporates sample-aware demonstrations, which include the composition of SQL operators and fine-grained information related to the given question. To retrieve questions sharing similar intents with input questions, we propose two strategies for assisting retrieval. Firstly, we leverage LLMs to simplify the original questions, unifying the syntax and thereby clarifying the users' intentions. To generate executable and accurate SQLs without human intervention, we design a dynamic revision chain which iteratively adapts fine-grained feedback from the previously generated SQL. Experimental results on three Text-to-SQL benchmarks demonstrate the superiority of our method over strong baseline models.
    摘要 文本到SQL是一个目标,它的目的是生成基于自然语言问题的SQL查询。使用大型语言模型(LLM)作为现代方法,它设计了提示来使LLM理解输入问题并生成相应的SQL。然而,它遇到了严格的SQL语法要求的挑战。现有的工作通过提供一列示例问题(即问题-SQL对)来引导LLM生成SQL,但固定的提示很难处理输入问题和示例问题之间的semantic gap。在这篇论文中,我们提议一种基于LLM的文本到SQL框架,包括示例感知提示和动态修订链。我们的方法包括示例感知示例,其中包括SQL运算符的组合和基于给定问题的细化信息。为了助于搜索相似意图的问题,我们提出了两种策略。首先,我们利用LLM来简化原始问题,使其语法统一,从而明确用户的意图。其次,我们设计了动态修订链,以每次生成SQL后收集细化反馈,以便在下一次生成SQL时进行适应。我们的实验结果表明,我们的方法在三个Text-to-SQL标准测试集上具有显著优势。

Aggregating Credences into Beliefs: Agenda Conditions for Impossibility Results

  • paper_url: http://arxiv.org/abs/2307.05072
  • repo_url: None
  • paper_authors: Minkyung Wang, Chisu Kim
  • For: This paper is written for researchers and scholars interested in judgment aggregation, belief binarization, and agenda-theoretic approaches to understanding the limitations of collective decision-making processes.* Methods: The paper uses an agenda-theoretic approach to generalize previous results and determine the necessary and sufficient conditions for the impossibility theorems to arise in binarizing belief aggregation. The authors use path-connectedness, even-negatability, negation-connectedness, blockedness, and other conditions to characterize the agenda conditions for different results.* Results: The paper presents three main results: (1) path-connectedness and even-negatability constitute the exact agenda condition for the oligarchy result, (2) negation-connectedness is the condition for the triviality result, and (3) blockedness is the condition for the impossibility result. The authors also compare these findings with existing agenda-theoretic characterization theorems in judgment aggregation and belief binarization.
    Abstract Binarizing belief aggregation addresses how to rationally aggregate individual probabilistic beliefs into collective binary beliefs. Similar to the development of judgment aggregation theory, formulating axiomatic requirements, proving impossibility theorems, and identifying exact agenda conditions of impossibility theorems are natural and important research topics in binarizing belief aggregation. Building on our previous research on impossibility theorems, we use an agenda-theoretic approach to generalize the results and to determine the necessary and sufficient level of logical interconnection between the issues in an agenda for the impossibility theorems to arise. We demonstrate that (1) path-connectedness and even-negatability constitute the exact agenda condition for the oligarchy result stating that binarizing belief aggregation satisfying proposition-wise independence and deductive closure of collective beliefs yields the oligarchies under minor conditions; (2) negation-connectedness is the condition for the triviality result obtained by adding anonymity to the oligarchy result; and (3) blockedness is the condition for the impossibility result, which follows by adding completeness and consistency of collective beliefs. Moreover, we compare these novel findings with existing agenda-theoretic characterization theorems in judgment aggregation and belief binarization.
    摘要 binarizing belief aggregation 关注如何合理地将个人概率信仰聚合到集体二进制信仰上。与判断聚合理论的发展相似,我们需要明确 axiomatic 要求,证明不可能性定理,并确定不可能性定理的准确议程条件。基于我们之前的研究,我们使用 agenda-theoretic 方法推广结果,并确定了合理议程条件。我们证明了以下结论:1. Path-connectedness 和 even-negatability 是合理议程条件,即在这些条件下,binarizing belief aggregation 满足 Proposition-wise independence 和 deductive closure of collective beliefs 时,会出现 oligarchy 结果,只要满足一些轻微条件。2. negation-connectedness 是添加匿名性后的轻微条件,可以得到 oligarchy 结果。3. blockedness 是添加完整性和一致性的 collective beliefs 后的不可能性条件。此外,我们还与判断聚合理论和 belief binarization 的 agenda-theoretic 特征进行比较。

Mining for Unknown Unknowns

  • paper_url: http://arxiv.org/abs/2307.05071
  • repo_url: https://github.com/jcborges/PeriodicEventMining
  • paper_authors: Bernard Sinclair-Desgagné
  • for: 提高寻找未知未知(Unknown Unknowns)的能力
  • methods: 使用Formal Concept Analysis(FCA),一种基于格理论的数据挖掘和组织技术
  • results: 提出了一个简单的框架,用于系统地思考和搜寻未知未知I hope this helps! Let me know if you have any other questions.
    Abstract Unknown unknowns are future relevant contingencies that lack an ex ante description. While there are numerous retrospective accounts showing that significant gains or losses might have been achieved or avoided had such contingencies been previously uncovered, getting hold of unknown unknowns still remains elusive, both in practice and conceptually. Using Formal Concept Analysis (FCA) - a subfield of lattice theory which is increasingly applied for mining and organizing data - this paper introduces a simple framework to systematically think out of the box and direct the search for unknown unknowns.
    摘要 未知未知是未来重要的不确定因素,它们缺乏先前的描述。虽然有很多回顾账户表明,had these contingencies been previously uncovered, significant gains or losses might have been achieved or avoided,但捕捉未知未知仍然是一个艰难的任务,具体来说是在实践和概念上都存在困难。本文使用正式概念分析(FCA)——一种数据挖掘和组织的子领域——提出了一个简单的框架,以系统地思考和搜索未知未知。

Cognitive Bias and Belief Revision

  • paper_url: http://arxiv.org/abs/2307.05069
  • repo_url: None
  • paper_authors: Panagiotis Papadamos, Nina Gierasimczuk
  • for: 本研究围绕认知偏见的三种类型进行了正式化,并在信念修复框架中应用。
  • methods: 本研究使用了三种常见的信念修复方法:条件修复、lexicographic revision和最小修复。
  • results: 研究发现,偏见信念修复方法在真实追踪中的可靠性不高。计算机实验也表明,偏见信念修复在随机场景中的性能不佳。
    Abstract In this paper we formalise three types of cognitive bias within the framework of belief revision: confirmation bias, framing bias, and anchoring bias. We interpret them generally, as restrictions on the process of iterated revision, and we apply them to three well-known belief revision methods: conditioning, lexicographic revision, and minimal revision. We investigate the reliability of biased belief revision methods in truth tracking. We also run computer simulations to assess the performance of biased belief revision in random scenarios.
    摘要 在这篇论文中,我们将三种认知偏见视为修订信念的框架之下的限制。这三种偏见分别是确认偏见、帧偏见和锚偏见。我们将它们通常 интерпретирова为修订过程中的约束,并应用于三种常见的修订方法:条件修订、lexicographic修订和最小修订。我们研究了偏见修订方法的真实性追踪可靠性。我们还运行了Random Scenario中的计算机实验来评估偏见修订方法的性能。

A Theory of Bounded Inductive Rationality

  • paper_url: http://arxiv.org/abs/2307.05068
  • repo_url: None
  • paper_authors: Caspar Oesterheld, Abram Demski, Vincent Conitzer
  • for: 这篇论文旨在创造一种不假设完美知识的合理决策理论,用于解决具有较大复杂性和不确定性的决策问题。
  • methods: 这篇论文使用了一种基于循环推理的方法,要求合理的推理机器在面临决策问题时,不断测试每个可计算的假设,并遵循这些假设的承诺。
  • results: 论文的主要结果是提供了一种合理的决策理论,可以应对不具有完美知识的决策问题。此外,论文还证明了这种理论的其他愉悦特点,如能够评估随机和 Pseudo-Random 抽签的价值。最后,论文研究了不同代理人之间的竞争交互,并证明了这些代理人可以 converges to 的策略。
    Abstract The dominant theories of rational choice assume logical omniscience. That is, they assume that when facing a decision problem, an agent can perform all relevant computations and determine the truth value of all relevant logical/mathematical claims. This assumption is unrealistic when, for example, we offer bets on remote digits of pi or when an agent faces a computationally intractable planning problem. Furthermore, the assumption of logical omniscience creates contradictions in cases where the environment can contain descriptions of the agent itself. Importantly, strategic interactions as studied in game theory are decision problems in which a rational agent is predicted by its environment (the other players). In this paper, we develop a theory of rational decision making that does not assume logical omniscience. We consider agents who repeatedly face decision problems (including ones like betting on digits of pi or games against other agents). The main contribution of this paper is to provide a sensible theory of rationality for such agents. Roughly, we require that a boundedly rational inductive agent tests each efficiently computable hypothesis infinitely often and follows those hypotheses that keep their promises of high rewards. We then prove that agents that are rational in this sense have other desirable properties. For example, they learn to value random and pseudo-random lotteries at their expected reward. Finally, we consider strategic interactions between different agents and prove a folk theorem for what strategies bounded rational inductive agents can converge to.
    摘要 主流的选择理论假设了推理完整性。即在处理决策问题时,一个代理人可以完成所有相关的计算和决定所有相关的逻辑/数学laim的真伪值。这个假设是不现实的,当我们提供有关距离数字π的赌博或代理人面临 computationally intractable 的观念问题时。此外,这个假设也会导致环境中的描述与代理人之间产生矛盾。在这篇文章中,我们开发了不假设推理完整性的决策理论。我们考虑了面临多次决策问题(包括有关数字π的赌博或与其他代理人的游戏)的代理人。我们的主要贡献是为这种代理人提供一个有道理的做法。简而言之,我们要求这些代理人在可以有效计算的假设上进行无限次测试,并且遵循这些假设的承诺高回数。然后,我们证明了这些代理人具有其他有利的性格。例如,它们对Random和pseudo-Random的抽签有价值。最后,我们考虑了不同的代理人之间的战略互动,并证明了这些代理人可以转化为 Folk Theorem 中的战略。

Exploiting Asymmetry in Logic Puzzles: Using ZDDs for Symbolic Model Checking Dynamic Epistemic Logic

  • paper_url: http://arxiv.org/abs/2307.05067
  • repo_url: None
  • paper_authors: Daniel Miedema, Malvin Gattinger
  • for: 避免状态爆发问题,使用binary decision diagrams(BDDs)进行模型检查。
  • methods: 使用Zero-suppressed Decision Diagrams(ZDDs)来对Kripke模型进行符号编码,以便在多智能体系统中进行知识和信息动态逻辑推理。
  • results: 对三个文献中的Muddy Children、Sum and Product puzzle和Dining Cryptographers问题进行比较,发现使用合适的ZDD变体可以减少内存使用量,这表明ZDDs是用于模型检查多智能体系统的有用工具。
    Abstract Binary decision diagrams (BDDs) are widely used to mitigate the state-explosion problem in model checking. A variation of BDDs are Zero-suppressed Decision Diagrams (ZDDs) which omit variables that must be false, instead of omitting variables that do not matter. We use ZDDs to symbolically encode Kripke models used in Dynamic Epistemic Logic, a framework to reason about knowledge and information dynamics in multi-agent systems. We compare the memory usage of different ZDD variants for three well-known examples from the literature: the Muddy Children, the Sum and Product puzzle and the Dining Cryptographers. Our implementation is based on the existing model checker SMCDEL and the CUDD library. Our results show that replacing BDDs with the right variant of ZDDs can significantly reduce memory usage. This suggests that ZDDs are a useful tool for model checking multi-agent systems.
    摘要 � Binary 决策 diagram (BDD) 广泛用于 mitigate 状态 explosion 问题在 model checking 中。一种 BDD 的变体是 Zero-suppressed Decision Diagrams (ZDD),它们 omits 变量必须为假,而不是 omits 无关变量。我们使用 ZDD 来 символи地编码 Kripke 模型,用于动态 epistemic logic 中的知识和信息动态系统的理解。我们对不同 ZDD 变体的内存使用情况进行比较,对三个文献中的著名例子(即 Muddy Children,Sum and Product puzzle 和 Dining Cryptographers)进行了实验。我们的实现基于现有的 model checker SMCDEL 和 CUDD 库。我们的结果表明,将 BDD replaced with the right variant of ZDD 可以显著减少内存使用量。这表明 ZDD 是用于 model checking 多代理系统的有用工具。

Tableaux for the Logic of Strategically Knowing How

  • paper_url: http://arxiv.org/abs/2307.05066
  • repo_url: None
  • paper_authors: Yanjun Li
  • for: 这篇论文探讨了目标导向的知识如何扩展标准的认知逻辑,并引入了知识如何Operator。
  • methods: 本论文使用了表格过程来处理多代理版本的知识如何逻辑,并证明了这种表格过程的声明性和完整性。
  • results: 本论文证明了知识如何逻辑的满足问题可以在PSPACE中决定,并且展示了这种逻辑的表格过程的声明性和完整性。
    Abstract The logic of goal-directed knowing-how extends the standard epistemic logic with an operator of knowing-how. The knowing-how operator is interpreted as that there exists a strategy such that the agent knows that the strategy can make sure that p. This paper presents a tableau procedure for the multi-agent version of the logic of strategically knowing-how and shows the soundness and completeness of this tableau procedure. This paper also shows that the satisfiability problem of the logic can be decided in PSPACE.
    摘要 这个目的导向知识如何逻辑延伸了标准的知识逻辑,添加了知识如何操作。知识如何操作被解释为存在一个策略,使得代理人知道这个策略可以确保p。这篇文章提供了多代理人版本的知识如何逻辑的桌子程式,证明这个桌子程式的有效性和完整性。此外,文章还证明了这个逻辑的满意问题可以在PSPACE中解决。

Belief Revision from Probability

  • paper_url: http://arxiv.org/abs/2307.05632
  • repo_url: None
  • paper_authors: Jeremy Goodman, Bernhard Salow
  • for: 本研究探讨了一种问题相关的概率论信念观。
  • methods: 该论文使用了推理推论和概率论方法来探讨信念的动态。
  • results: 研究发现该论文的原则比正统的AGM理论弱,但比洛克信念论强。此外,研究还发现一种限定的模型,适用于许多应用程序,并确定了这个模型下的自然原则。
    Abstract In previous work ("Knowledge from Probability", TARK 2021) we develop a question-relative, probabilistic account of belief. On this account, what someone believes relative to a given question is (i) closed under entailment, (ii) sufficiently probable given their evidence, and (iii) sensitive to the relative probabilities of the answers to the question. Here we explore the implications of this account for the dynamics of belief. We show that the principles it validates are much weaker than those of orthodox theories of belief revision like AGM, but still stronger than those valid according to the popular Lockean theory of belief, which equates belief with high subjective probability. We then consider a restricted class of models, suitable for many but not all applications, and identify some further natural principles valid on this class. We conclude by arguing that the present framework compares favorably to the rival probabilistic accounts of belief developed by Leitgeb and by Lin and Kelly.
    摘要 在我们之前的工作("知识从概率", TARK 2021)中,我们发展了问题相关的、概率论的信念观。根据这种观,对于某个问题,某个人的信念是(i)闭合于推论下,(ii)基于证据足够有可能性,以及(iii)受到问题的答案之间的相对概率影响。在这里,我们研究了这种观的动态效应。我们发现这些原则比正统的信念修订理论AGM更弱,但 still stronger than以 Lockean 信念论,该等同于高Subjective 概率。然后,我们考虑了一种限制的模型,适用于许多但不是所有应用,并识别出了这类模型的自然原理。最后,我们 argue that我们的框架与Leitgeb和Lin和Kelly所发展的概率信念观相比,更加有利。

System of Spheres-based Two Level Credibility-limited Revisions

  • paper_url: http://arxiv.org/abs/2307.05062
  • repo_url: None
  • paper_authors: Marco Garapa, Eduardo Ferme, Maurício D. L. Reis
  • for: 本文提出了一种基于Grove的系统圆的两级信任有限修订算法,用于修订信任度较高的句子。
  • methods: 本文使用了系统圆的构造和axiomaic characterization来定义和分析这种修订算法。
  • results: 本文提出的修订算法可以帮助解决信任度较高的句子的修订问题,并且可以保证修订后的信任度仍然满足一定的条件。
    Abstract Two level credibility-limited revision is a non-prioritized revision operation. When revising by a two level credibility-limited revision, two levels of credibility and one level of incredibility are considered. When revising by a sentence at the highest level of credibility, the operator behaves as a standard revision, if the sentence is at the second level of credibility, then the outcome of the revision process coincides with a standard contraction by the negation of that sentence. If the sentence is not credible, then the original belief set remains unchanged. In this paper, we propose a construction for two level credibility-limited revision operators based on Grove's systems of spheres and present an axiomatic characterization for these operators.
    摘要 两级信任限定修改是一种不优先级修改操作。在修改两级信任和一级不信任时,两级信任和一级不信任被考虑。当修改最高水平的信任句时,操作者行为如标准修改,如果句子属第二级信任,然后修改结果与标准减法相同。如果句子不信任,则原信任集未变。在这篇论文中,我们提出了基于Grove的球体系统的两级信任限定修改操作的构建,并提供了这些操作的axiomaCharacterization。

On Imperfect Recall in Multi-Agent Influence Diagrams

  • paper_url: http://arxiv.org/abs/2307.05059
  • repo_url: None
  • paper_authors: James Fox, Matt MacDermott, Lewis Hammond, Paul Harrenstein, Alessandro Abate, Michael Wooldridge
  • for: 这种模型适用于具有忘记和缺失记忆的多代理情况,并提供了解决方案来寻找 Nash 平衡。
  • methods: 这篇文章使用混合策略和两种相关平衡来解决 MAIDs 中的忘记和缺失记忆问题。
  • results: 文章分析了 MAIDs 中关键决策问题的计算复杂性,并描述了在 Markov 游戏和团队情况中的应用。
    Abstract Multi-agent influence diagrams (MAIDs) are a popular game-theoretic model based on Bayesian networks. In some settings, MAIDs offer significant advantages over extensive-form game representations. Previous work on MAIDs has assumed that agents employ behavioural policies, which set independent conditional probability distributions over actions for each of their decisions. In settings with imperfect recall, however, a Nash equilibrium in behavioural policies may not exist. We overcome this by showing how to solve MAIDs with forgetful and absent-minded agents using mixed policies and two types of correlated equilibrium. We also analyse the computational complexity of key decision problems in MAIDs, and explore tractable cases. Finally, we describe applications of MAIDs to Markov games and team situations, where imperfect recall is often unavoidable.
    摘要 多体影响 диаграммы (MAIDs) 是一种流行的游戏理论模型,基于 Bayesian 网络。在某些设置中,MAIDs 提供了Significant advantages 于Extensive-form game 表示。先前的工作 Assume agents 使用行为策略,每个决策都设置独立的 conditional probability distribution over actions。然而,在忘记的情况下,Nash equilibria 在行为策略中可能不存在。我们通过示出如何解决 MAIDs 中忘记和缺失记忆的 agents 使用混合策略和两种相关的 equilibria 来解决这个问题。我们还分析了 MAIDs 中关键决策问题的计算复杂性,并探索可迭代的情况。最后,我们描述了 MAIDs 在 Markov 游戏和团队情况下的应用, где imperfect recall 通常是不可避免的。

Causal Kripke Models

  • paper_url: http://arxiv.org/abs/2307.05631
  • repo_url: None
  • paper_authors: Yiwen Ding, Krishna Manoorkar, Apostolos Tzimoulis, Ruoding Wang, Xiaolong Wang
  • for: 扩展了 Halpern 和 pearl causal models 用于可能世界 semantics 环境中的实际 causality 模型。
  • methods: 使用这种框架,引入了 causality 逻辑,允许在多个可能性、时间、知识和不确定性情况下进行 causality 推理。
  • results: 通过一些例子,证明了这种逻辑的有效性,并提出了未来研究的一些方向。
    Abstract This work extends Halpern and Pearl's causal models for actual causality to a possible world semantics environment. Using this framework we introduce a logic of actual causality with modal operators, which allows for reasoning about causality in scenarios involving multiple possibilities, temporality, knowledge and uncertainty. We illustrate this with a number of examples, and conclude by discussing some future directions for research.
    摘要 这项工作扩展了戴尔和珀尔的 causal models для实际 causality 到可能世界 semantics 环境中。使用这个框架,我们引入了一种实际 causality 逻辑,其允许在多个可能性、时间、知识和不确定性方面进行 causality 的推理。我们通过一些示例来说明,并将在未来的研究方向中讨论一些可能性。

Characterization of AGM Belief Contraction in Terms of Conditionals

  • paper_url: http://arxiv.org/abs/2307.05629
  • repo_url: None
  • paper_authors: Giacomo Bonanno
  • for: 本研究准备了 AGM 信仰缩小的 semantic caracterization,基于帧,包括 Kripke 信仰关系和 Stalnaker-Lewis 选择函数。
  • methods: 本研究使用 Kripke 信仰关系和 Stalnaker-Lewis 选择函数来准备 AGM 信仰缩小的 semantic caracterization。
  • results: 本研究显示,AGM 信仰缩小可以通过使用 Kripke 信仰关系和 Stalnaker-Lewis 选择函数来实现 semantic caracterization。
    Abstract We provide a semantic characterization of AGM belief contraction based on frames consisting of a Kripke belief relation and a Stalnaker-Lewis selection function. The central idea is as follows. Let K be the initial belief set and K-A be the contraction of K by the formula A; then B belongs to the set K-A if and only if, at the actual state, the agent believes B and believes that if not-A is (were) the case then B is (would be) the case.
    摘要 我们提供了AGM信念缩小的语义特征化,基于框架,包括基于Kripke信念关系和Stalnaker-Lewis选择函数。中心思想如下:假设K是初始信念集,则K-A表示通过公式A缩小K得到的新信念集,其中B属于K-A如果且只如果,在实际状态下,代理人认为B是真并认为,如果不是A的情况下,B是真的。

Strengthening Consistency Results in Modal Logic

  • paper_url: http://arxiv.org/abs/2307.05053
  • repo_url: None
  • paper_authors: Samuel Allen Alexander, Arthur Paul Pedersen
  • for: 这篇论文是为了研究Modal逻辑中的一致性问题而写的。
  • methods: 这篇论文使用了Generic Theory这种方法,用以建立Modal逻辑中的一致性。
  • results: 这篇论文得到了一些有关Modal逻辑中一致性的结论,并且这些结论可以帮助解决一些关于Modal逻辑、判断、推理和决策的问题。
    Abstract A fundamental question asked in modal logic is whether a given theory is consistent. But consistent with what? A typical way to address this question identifies a choice of background knowledge axioms (say, S4, D, etc.) and then shows the assumptions codified by the theory in question to be consistent with those background axioms. But determining the specific choice and division of background axioms is, at least sometimes, little more than tradition. This paper introduces **generic theories** for propositional modal logic to address consistency results in a more robust way. As building blocks for background knowledge, generic theories provide a standard for categorical determinations of consistency. We argue that the results and methods of this paper help to elucidate problems in epistemology and enjoy sufficient scope and power to have purchase on problems bearing on modalities in judgement, inference, and decision making.
    摘要 Note:* "modal logic" is translated as "modal 逻辑" (modal logic)* "consistent" is translated as "一致" (consistent)* "background knowledge axioms" is translated as "背景知识axioms" (background knowledge axioms)* "generic theories" is translated as "通用理论" (generic theories)* "categorical determinations of consistency" is translated as "一致的分类决定" (categorical determinations of consistency)* "epistemology" is translated as "知识论" (epistemology)

Towards Understanding In-Context Learning with Contrastive Demonstrations and Saliency Maps

  • paper_url: http://arxiv.org/abs/2307.05052
  • repo_url: https://github.com/paihengxu/xicl
  • paper_authors: Zongxia Li, Paiheng Xu, Fuxiao Liu, Hyemi Song
  • for: 本研究探讨了大语言模型(LLM)在上下文学习(ICL)性能中不同示例组件的作用。
  • methods: 本研究使用了可解释性NLP(XNLP)方法,并使用了对比示例的抽象图来进行质量和量化分析。
  • results: 研究发现,改变真实标签会导致示例的抽象图发生显著变化,特别是在更大的LLM上。改变输入分布的细节对ICL性能的影响较小,而补充说明在符号逻辑任务中有所助益,但在情感分析任务中的助益相对较少。
    Abstract We investigate the role of various demonstration components in the in-context learning (ICL) performance of large language models (LLMs). Specifically, we explore the impacts of ground-truth labels, input distribution, and complementary explanations, particularly when these are altered or perturbed. We build on previous work, which offers mixed findings on how these elements influence ICL. To probe these questions, we employ explainable NLP (XNLP) methods and utilize saliency maps of contrastive demonstrations for both qualitative and quantitative analysis. Our findings reveal that flipping ground-truth labels significantly affects the saliency, though it's more noticeable in larger LLMs. Our analysis of the input distribution at a granular level reveals that changing sentiment-indicative terms in a sentiment analysis task to neutral ones does not have as substantial an impact as altering ground-truth labels. Finally, we find that the effectiveness of complementary explanations in boosting ICL performance is task-dependent, with limited benefits seen in sentiment analysis tasks compared to symbolic reasoning tasks. These insights are critical for understanding the functionality of LLMs and guiding the development of effective demonstrations, which is increasingly relevant in light of the growing use of LLMs in applications such as ChatGPT. Our research code is publicly available at https://github.com/paihengxu/XICL.
    摘要 我团队 investigate 大语言模型(LLM)在场景学习(ICL)性能中不同示例组件的作用。我们具体地研究示例中的真实标签、输入分布和补充解释的影响,特别是当这些元素被修改或干扰时。我们基于之前的研究,这些元素对 ICL 的影响存在混乱的结果。为了探讨这些问题,我们使用可解释的NLPT(XNLP)方法和对比示例的saliency图进行both质量和kvantitative分析。我们发现,翻转真实标签会对saliency产生显著影响,特别是在更大的LLM中。我们在输入分布的细致层次分析中发现,将sentiment-指示性词改为中性词不会有如同修改真实标签那样的影响。最后,我们发现 complementary explanations 在不同任务中的效果是不同的,在 sentiment analysis 任务中的效果有限,而在符号逻辑任务中的效果较高。这些发现对于理解 LLM 的工作原理和开发有效示例的指导性至关重要,这与大语言模型在应用 such as ChatGPT 中的使用量逐渐增加。我们的研究代码可以在 上获取。

Depth-bounded Epistemic Logic

  • paper_url: http://arxiv.org/abs/2307.07448
  • repo_url: None
  • paper_authors: Farid Arthaud, Martin Rinard
  • for: 本研究探讨了智能代理人如何理解自己和其他代理人的信念。
  • methods: 本文使用了 epistemic logics,即如何模型代理人的信念和信念之间的关系。
  • results: 本文提出了 DBEL 扩展,它是 S5 的一种扩展,可以模型代理人只能对 epistemic 表达进行有限深度的理解。此外,文章还提出了公共宣布逻辑 DPAL,可以扩展 DBEL,并证明了其完备性和soundness。最后,文章使用这些逻辑来研究了有限深度代理人如何解决经典的泥沼孩子问题。
    Abstract Epistemic logics model how agents reason about their beliefs and the beliefs of other agents. Existing logics typically assume the ability of agents to reason perfectly about propositions of unbounded modal depth. We present DBEL, an extension of S5 that models agents that can reason about epistemic formulas only up to a specific modal depth. To support explicit reasoning about agent depths, DBEL includes depth atoms Ead (agent a has depth exactly d) and Pad (agent a has depth at least d). We provide a sound and complete axiomatization of DBEL. We extend DBEL to support public announcements for bounded depth agents and show how the resulting DPAL logic generalizes standard axioms from public announcement logic. We present two alternate extensions and identify two undesirable properties, amnesia and knowledge leakage, that these extensions have but DPAL does not. We provide axiomatizations of these logics as well as complexity results for satisfiability and model checking. Finally, we use these logics to illustrate how agents with bounded modal depth reason in the classical muddy children problem, including upper and lower bounds on the depth knowledge necessary for agents to successfully solve the problem.
    摘要 知识逻辑如何模型代理人的信念和其他代理人的信念。现有逻辑通常假设代理人可以完美地理解未bounded模态深度的命题。我们提出了DBEL,它是S5的扩展,可以模型代理人只能理解知识命题的特定模态深度。为了支持显式的代理人深度 reasoning,DBEL包含了深度原子 Ead (代理人a有深度 exactly d) 和 Pad (代理人a有深度 at least d)。我们提供了完整的幂等化和DBEL的幂等化。我们将DPAL逻辑扩展到支持公共宣布,并证明DPAL逻辑将标准公共宣布逻辑的公理推理。我们还提出了两种不同的扩展,并证明这些扩展会导致知识泄露和忘却两种不良性。我们还提供了这些逻辑的幂等化和满足性和模板检查的复杂度分析。最后,我们使用这些逻辑来解释代理人具有受限模态深度如何在经典泥沼孩子问题中进行 reasoning,包括知识深度的上限和下限,代理人需要在解决问题时具备的深度知识。

Epistemic Syllogistic: First Steps

  • paper_url: http://arxiv.org/abs/2307.05043
  • repo_url: None
  • paper_authors: Yipu Li, Yanjing Wang
  • for: 这篇论文旨在探讨阿里斯多德的模态逻辑问题,尤其是在当代逻辑和哲学研究中的意义。
  • methods: 本文使用自然逻辑程序作为灵感,并对模态逻辑进行了多种变体的研究,包括 epistemic syllogistic 的不同扩展。
  • results: 本文提出了多种 axiomatizations 和完整性证明,以描述模态逻辑中更加复杂的概念。
    Abstract Aristotle's discussions on modal syllogistic have often been viewed as error-prone and have garnered significant attention in the literature due to historical and philosophical interests. However, from a contemporary standpoint, they also introduced natural fragments of first-order modal logic, warranting a comprehensive technical analysis. In this paper, drawing inspiration from the natural logic program, we propose and examine several variants of modal syllogistic within the epistemic context, thereby coining the term Epistemic Syllogistic. Specifically, we concentrate on the de re interpretation of epistemic syllogisms containing non-trivial yet natural expressions such as "all things known to be A are also known to be not B." We explore the epistemic apodeictic syllogistic and its extensions, which accommodate more complex terms. Our main contributions include several axiomatizations of these logics, with completeness proofs that may be of independent interest.
    摘要 亚里斯多德的Modal Syllogistic讨论经常被视为错误多端,吸引了历史和哲学研究的关注。然而,从 contemporaneous 的角度来看,它们实际上揭示了自然逻辑的幻影,值得进行完整的技术分析。在这篇论文中,我们Drawing inspiration from natural logic program,提出并研究了 modal syllogistic 的多种变种,并将其称为 Epistemic Syllogistic。我们专注于 de re 解释epistemic syllogisms 中的非rive yet natural 表达,如 "all things known to be A are also known to be not B"。我们探索了 epistemic apodeictic syllogistic 和其扩展,可以满足更复杂的表达。我们的主要贡献包括这些逻辑的几种 axiomatization,以及完整性证明,可能具有独立的价值。

Neural-Symbolic Recommendation with Graph-Enhanced Information

  • paper_url: http://arxiv.org/abs/2307.05036
  • repo_url: https://github.com/hanzo2020/gnnlr
  • paper_authors: Bang Chen, Wei Peng, Maonian Wu, Bo Zheng, Shaojun Zhu
  • for: 该研究旨在构建一种基于图神经网络和符号逻辑运算的推荐模型,以便同时拥有全球隐藏信息的推荐能力和本地显式逻辑推荐能力。
  • methods: 该模型首先基于互动原理建立ITEM-ITEM图,然后使用图神经网络捕捉全球数据中的隐藏信息。接着,将用户行为转换成符号逻辑表达,以便从认知逻辑的视角进行推荐预测。
  • results: 对五个公共数据集进行了广泛的实验,结果显示,我们的提议模型在比较一些现有方法时表现出色,源代码可以在 [https://github.com/hanzo2020/GNNLR] 上获取。
    Abstract The recommendation system is not only a problem of inductive statistics from data but also a cognitive task that requires reasoning ability. The most advanced graph neural networks have been widely used in recommendation systems because they can capture implicit structured information from graph-structured data. However, like most neural network algorithms, they only learn matching patterns from a perception perspective. Some researchers use user behavior for logic reasoning to achieve recommendation prediction from the perspective of cognitive reasoning, but this kind of reasoning is a local one and ignores implicit information on a global scale. In this work, we combine the advantages of graph neural networks and propositional logic operations to construct a neuro-symbolic recommendation model with both global implicit reasoning ability and local explicit logic reasoning ability. We first build an item-item graph based on the principle of adjacent interaction and use graph neural networks to capture implicit information in global data. Then we transform user behavior into propositional logic expressions to achieve recommendations from the perspective of cognitive reasoning. Extensive experiments on five public datasets show that our proposed model outperforms several state-of-the-art methods, source code is avaliable at [https://github.com/hanzo2020/GNNLR].
    摘要 “推荐系统不仅是从数据中的对照学习,也是一个认知任务,需要认知能力。现在最进步的图 neural network 已经广泛地应用在推荐系统中,因为它们可以从图结构数据中捕捉到隐藏的构造资讯。然而, LIKE 多个神经网络算法,它们只会从视觉角度学习匹配模式。一些研究人员使用用户行为进行逻辑推理来实现推荐预测,但这种逻辑是局部的,忽略了全球规模上的隐藏信息。在这个工作中,我们结合了图神经网络和符号逻辑操作的优点,建立了一个具有全球隐藏推理能力和局部明确逻辑推理能力的神经符号推荐模型。我们首先建立了一个项目项目图,根据邻接互动原则,使用图神经网络来捕捉全球数据中的隐藏信息。然后,我们将用户行为转换为符号逻辑表达,以实现从认知角度的推荐预测。实验结果显示,我们的提出模型在五个公开数据集上比以前的多个状态之顶,源代码可以在 [https://github.com/hanzo2020/GNNLR] 查看。”Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.

Unleashing the Potential of Regularization Strategies in Learning with Noisy Labels

  • paper_url: http://arxiv.org/abs/2307.05025
  • repo_url: None
  • paper_authors: Hui Kang, Sheng Liu, Huaxi Huang, Jun Yu, Bo Han, Dadong Wang, Tongliang Liu
    for: 学习含杂标签的研究在最近几年里主要关注开发 novel 算法,以实现对含杂训练标签的Robustness 性,同时能够泛化到干净数据上。methods: 本研究使用 cross-entropy 损失函数,并结合通用的规范策略,如学习速率减少、模型权重平均和数据扩展。results: 我们的结果表明,使用这些规范策略的组合可以超过当前的状态艺术方法。我们的发现鼓励我们重新评估含杂标签学习的benchmark,并重新考虑特殊的学习算法,designed for 含杂标签训练。
    Abstract In recent years, research on learning with noisy labels has focused on devising novel algorithms that can achieve robustness to noisy training labels while generalizing to clean data. These algorithms often incorporate sophisticated techniques, such as noise modeling, label correction, and co-training. In this study, we demonstrate that a simple baseline using cross-entropy loss, combined with widely used regularization strategies like learning rate decay, model weights average, and data augmentations, can outperform state-of-the-art methods. Our findings suggest that employing a combination of regularization strategies can be more effective than intricate algorithms in tackling the challenges of learning with noisy labels. While some of these regularization strategies have been utilized in previous noisy label learning research, their full potential has not been thoroughly explored. Our results encourage a reevaluation of benchmarks for learning with noisy labels and prompt reconsideration of the role of specialized learning algorithms designed for training with noisy labels.
    摘要

Feature Activation Map: Visual Explanation of Deep Learning Models for Image Classification

  • paper_url: http://arxiv.org/abs/2307.05017
  • repo_url: None
  • paper_authors: Yi Liao, Yongsheng Gao, Weichuan Zhang
  • for: 本研究旨在为深度学习模型的解释提供有效的方法,使得模型的决策更容易被理解和解释。
  • methods: 本研究提出了一种后处解释算法 named feature activation map (FAM),该算法可以解释深度学习模型没有全连接层的图像分类模型。
  • results: 对于十种深度学习模型的图像分类、对比学习图像分类和图像检索任务,提出的 FAM 算法能够有效地解释模型的决策。
    Abstract Decisions made by convolutional neural networks(CNN) can be understood and explained by visualizing discriminative regions on images. To this end, Class Activation Map (CAM) based methods were proposed as powerful interpretation tools, making the prediction of deep learning models more explainable, transparent, and trustworthy. However, all the CAM-based methods (e.g., CAM, Grad-CAM, and Relevance-CAM) can only be used for interpreting CNN models with fully-connected (FC) layers as a classifier. It is worth noting that many deep learning models classify images without FC layers, e.g., few-shot learning image classification, contrastive learning image classification, and image retrieval tasks. In this work, a post-hoc interpretation tool named feature activation map (FAM) is proposed, which can interpret deep learning models without FC layers as a classifier. In the proposed FAM algorithm, the channel-wise contribution weights are derived from the similarity scores between two image embeddings. The activation maps are linearly combined with the corresponding normalized contribution weights, forming the explanation map for visualization. The quantitative and qualitative experiments conducted on ten deep learning models for few-shot image classification, contrastive learning image classification and image retrieval tasks demonstrate the effectiveness of the proposed FAM algorithm.
    摘要 <> translate="zh-CN"深度学习模型的决策可以通过图像视觉化的方式进行理解和解释。为此,基于图像活动映射(CAM)的方法被提出,使得深度学习模型的预测变得更加可解、透明和可信。然而,所有的CAM基本方法(例如CAM、Grad-CAM和Relevance-CAM)都只能用于解释具有全连接层(FC)的深度学习模型。它们无法用于解释没有FC层的深度学习模型,例如几步学习图像分类、对比学习图像分类和图像检索任务。在这种情况下,一种后期解释工具名为特征活动图(FAM)被提出。在提出的FAM算法中,通过两个图像的相似度分数来 derivate通道级别的贡献权重。然后,通过将活动图与相应的 норма化贡献权重进行线性组合,形成解释图。对于几步图像分类、对比学习图像分类和图像检索任务,对十个深度学习模型进行了量化和质量的实验, demonstarted the effectiveness of the proposed FAM algorithm。

CILF:Causality Inspired Learning Framework for Out-of-Distribution Vehicle Trajectory Prediction

  • paper_url: http://arxiv.org/abs/2307.05624
  • repo_url: None
  • paper_authors: Shengyi Li, Qifan Xue, Yezhuo Zhang, Xuanpeng Li
  • for: 本研究旨在提高自动驾驶车辆的轨迹预测精度,特别是对于不同驾驶场景和环境的预测。
  • methods: 本研究提出了一种基于 causal graph 的 Out-of-Distribution Causal Graph (OOD-CG) 方法,用于解决现有方法强调 correlations 的问题。此外,还提出了一种 Causal Inspired Learning Framework (CILF),包括三个步骤:1) 提取域 invariant causal feature,2) 提取域 variant feature,3) 分离域 variant causal和非 causal feature。
  • results: 实验表明,CILF 在主流的 NGSIM 和 INTERACTION 数据集上提高了预测性能,特别是在不同驾驶场景和环境下。
    Abstract Trajectory prediction is critical for autonomous driving vehicles. Most existing methods tend to model the correlation between history trajectory (input) and future trajectory (output). Since correlation is just a superficial description of reality, these methods rely heavily on the i.i.d. assumption and evince a heightened susceptibility to out-of-distribution data. To address this problem, we propose an Out-of- Distribution Causal Graph (OOD-CG), which explicitly defines the underlying causal structure of the data with three entangled latent features: 1) domain-invariant causal feature (IC), 2) domain-variant causal feature (VC), and 3) domain-variant non-causal feature (VN ). While these features are confounded by confounder (C) and domain selector (D). To leverage causal features for prediction, we propose a Causal Inspired Learning Framework (CILF), which includes three steps: 1) extracting domain-invariant causal feature by means of an invariance loss, 2) extracting domain variant feature by domain contrastive learning, and 3) separating domain-variant causal and non-causal feature by encouraging causal sufficiency. We evaluate the performance of CILF in different vehicle trajectory prediction models on the mainstream datasets NGSIM and INTERACTION. Experiments show promising improvements in CILF on domain generalization.
    摘要 “轨迹预测是自动驾驶车辆的重要任务。现有的方法通常是模型历史轨迹(输入)和未来轨迹(输出)之间的联乘。但这些方法对于实际情况有限的误导,尤其是在非典型数据上表现不佳。为解决这个问题,我们提出了对出现非典型数据的问题的外部干扰概率概念(OOD-CG),它明确地定义了数据的下游结构,包括三个涉及的隐藏特征:1)预测不受领域影响的 causal 特征(IC),2)领域特有的 causal 特征(VC),和3)领域特有的非 causal 特征(VN)。这些特征被混合运动(C)和领域选择器(D)所混淆。为了利用 causal 特征进行预测,我们提出了一个受 causal 革新数据的构成框架(CILF),包括以下三个步骤:1)通过不受领域影响的对称损失提取预测不受领域影响的 causal 特征,2)通过领域对称学习提取领域特有的 causal 特征,和3)通过将领域特有的 causal 和非 causal 特征分开,以便实现 causal 充分性。我们在主流的 NGSIM 和 INTERACTION 数据集上评估了 CILF 的表现,实际上获得了显著的改善。”

A DeepLearning Framework for Dynamic Estimation of Origin-Destination Sequence

  • paper_url: http://arxiv.org/abs/2307.05623
  • repo_url: None
  • paper_authors: Zheli Xiong, Defu Lian, Enhong Chen, Gang Chen, Xiaomin Cheng
  • for: 估算交通需求矩阵(OD matrix),解决交通领域中的重要问题。
  • methods: 使用深度学习方法推导OD序列的结构,并使用结构约束导航传统的数值优化。
  • results: NN可以有效地推导OD序列的结构,并为数值优化提供实用的约束,解决了延迟问题。
    Abstract OD matrix estimation is a critical problem in the transportation domain. The principle method uses the traffic sensor measured information such as traffic counts to estimate the traffic demand represented by the OD matrix. The problem is divided into two categories: static OD matrix estimation and dynamic OD matrices sequence(OD sequence for short) estimation. The above two face the underdetermination problem caused by abundant estimated parameters and insufficient constraint information. In addition, OD sequence estimation also faces the lag challenge: due to different traffic conditions such as congestion, identical vehicle will appear on different road sections during the same observation period, resulting in identical OD demands correspond to different trips. To this end, this paper proposes an integrated method, which uses deep learning methods to infer the structure of OD sequence and uses structural constraints to guide traditional numerical optimization. Our experiments show that the neural network(NN) can effectively infer the structure of the OD sequence and provide practical constraints for numerical optimization to obtain better results. Moreover, the experiments show that provided structural information contains not only constraints on the spatial structure of OD matrices but also provides constraints on the temporal structure of OD sequence, which solve the effect of the lagging problem well.
    摘要 <> translate("OD matrix estimation is a critical problem in the transportation domain. The principle method uses the traffic sensor measured information such as traffic counts to estimate the traffic demand represented by the OD matrix. The problem is divided into two categories: static OD matrix estimation and dynamic OD matrices sequence(OD sequence for short) estimation. The above two face the underdetermination problem caused by abundant estimated parameters and insufficient constraint information. In addition, OD sequence estimation also faces the lag challenge: due to different traffic conditions such as congestion, identical vehicle will appear on different road sections during the same observation period, resulting in identical OD demands correspond to different trips. To this end, this paper proposes an integrated method, which uses deep learning methods to infer the structure of OD sequence and uses structural constraints to guide traditional numerical optimization. Our experiments show that the neural network(NN) can effectively infer the structure of the OD sequence and provide practical constraints for numerical optimization to obtain better results. Moreover, the experiments show that provided structural information contains not only constraints on the spatial structure of OD matrices but also provides constraints on the temporal structure of OD sequence, which solve the effect of the lagging problem well.")Here's the translation in Traditional Chinese:<>调 Matrix 估算是交通领域中的一个核心问题。主要方法是使用交通传感器所量测的信息,如交通数据,来估算交通需求表示的OD Matrix。问题可分为两 category:静止OD Matrix 估算和动态OD sequence(简称OD序列)估算。上述两者都面临了不足决定问题,因为有着充足的估算参数和不充分的约束信息。此外,OD序列估算还面临了延迟挑战:由于不同的交通条件,例如塞车,同一辆车会在不同的路段上出现在同一个观察时间点,导致同一个OD需求与不同的车程相对应。为此,本文提出了一个统合方法,将深度学习方法用于OD序列的结构推理,并使用结构约束导引传统的数值估算。我们的实验显示,神经网可以有效地推理OD序列的结构,并提供实用的约束来导引数值估算。此外,实验显示,提供的结构信息不仅包含OD矩阵的空间结构约束,还包含了时间结构约束,很好地解决延迟问题。

Control as Probabilistic Inference as an Emergent Communication Mechanism in Multi-Agent Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2307.05004
  • repo_url: None
  • paper_authors: Tomoaki Nakamura, Akira Taniguchi, Tadahiro Taniguchi
  • for: 这个论文提出了一种生成概率模型,旨在 integrate emergent communication和多个代理人学习奖励。
  • methods: 该模型使用概率推理进行控制,并通过消息来实现代理人之间的交流。
  • results: 通过在网格环境中进行实验,我们显示了该PGM可以推理出有意义的消息,以完成协作任务。Note: “for” refers to the purpose or goal of the paper, “methods” refers to the techniques or approaches used in the paper, and “results” refers to the main findings or outcomes of the paper.
    Abstract This paper proposes a generative probabilistic model integrating emergent communication and multi-agent reinforcement learning. The agents plan their actions by probabilistic inference, called control as inference, and communicate using messages that are latent variables and estimated based on the planned actions. Through these messages, each agent can send information about its actions and know information about the actions of another agent. Therefore, the agents change their actions according to the estimated messages to achieve cooperative tasks. This inference of messages can be considered as communication, and this procedure can be formulated by the Metropolis-Hasting naming game. Through experiments in the grid world environment, we show that the proposed PGM can infer meaningful messages to achieve the cooperative task.
    摘要

Selective Sampling and Imitation Learning via Online Regression

  • paper_url: http://arxiv.org/abs/2307.04998
  • repo_url: None
  • paper_authors: Ayush Sekhari, Karthik Sridharan, Wen Sun, Runzhe Wu
  • for: 这个论文主要研究了受难学习(Imitation Learning)问题,特别是在受到噪声专家反馈的情况下。
  • methods: 这个论文使用了选择采样算法,通过在不同的动作上请求噪声专家反馈来解决受难学习问题。
  • results: 论文提出了一种新的选择采样算法,可以在涉及到多个动作和概念函数类型的情况下实现最佳的 regret 和查询次数 bound。此外,论文还提供了一种基于函数 aproximation的受难学习算法,其 regret bound 只取决于搜索过程中出现的状态的margin。
    Abstract We consider the problem of Imitation Learning (IL) by actively querying noisy expert for feedback. While imitation learning has been empirically successful, much of prior work assumes access to noiseless expert feedback which is not practical in many applications. In fact, when one only has access to noisy expert feedback, algorithms that rely on purely offline data (non-interactive IL) can be shown to need a prohibitively large number of samples to be successful. In contrast, in this work, we provide an interactive algorithm for IL that uses selective sampling to actively query the noisy expert for feedback. Our contributions are twofold: First, we provide a new selective sampling algorithm that works with general function classes and multiple actions, and obtains the best-known bounds for the regret and the number of queries. Next, we extend this analysis to the problem of IL with noisy expert feedback and provide a new IL algorithm that makes limited queries. Our algorithm for selective sampling leverages function approximation, and relies on an online regression oracle w.r.t.~the given model class to predict actions, and to decide whether to query the expert for its label. On the theoretical side, the regret bound of our algorithm is upper bounded by the regret of the online regression oracle, while the query complexity additionally depends on the eluder dimension of the model class. We complement this with a lower bound that demonstrates that our results are tight. We extend our selective sampling algorithm for IL with general function approximation and provide bounds on both the regret and the number of queries made to the noisy expert. A key novelty here is that our regret and query complexity bounds only depend on the number of times the optimal policy (and not the noisy expert, or the learner) go to states that have a small margin.
    摘要 我们考虑到循环学习(IL)的问题,通过活动地询问受脏的专家反馈。虽然循环学习在实验上得到了成功,但大多数先前的工作假设了无错的专家反馈,这不是实际应用中的问题。事实上,只有access到受脏的专家反馈时,使用非互动式循环学习(non-interactive IL)的算法可以被证明需要一个实际上是禁止的数量的询问。相比之下,在这个工作中,我们提供了一个互动式的循环学习算法,使用选择性的询问来活动地询问受脏的专家反馈。我们的贡献是二重的:首先,我们提供了一个新的选择性询问算法,可以处理通用函数类型和多个动作。我们得到了最好的知识库 regret bound和询问次数bound。其次,我们将这些分析扩展到受脏专家反馈的循环学习问题上,提供了一个新的循环学习算法,可以仅仅进行有限次询问。我们的选择性询问算法借鉴了函数近似,并且靠着线上回归实验(online regression oracle)来预测动作,以及决定是否询问专家反馈。从理论上来说,我们的 regret bound仅和线上回归实验的 regret bound相依,而且询问次数还受到模型类型的埃勒德尔维度(eluder dimension)的影响。我们补充了一个下界,证明了我们的结果是紧缩的。 finally, we extend our selective sampling algorithm for IL with general function approximation and provide bounds on both the regret and the number of queries made to the noisy expert. A key novelty here is that our regret and query complexity bounds only depend on the number of times the optimal policy (and not the noisy expert, or the learner) go to states that have a small margin.

Empowering recommender systems using automatically generated Knowledge Graphs and Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2307.04996
  • repo_url: https://github.com/GhanshyamVerma/Explainable-Recommender-System
  • paper_authors: Ghanshyam Verma, Shovon Sengupta, Simon Simanta, Huan Chen, Janos A. Perge, Devishree Pillai, John P. McCrae, Paul Buitelaar
  • for: 这个研究旨在提高客户体验,提高客户关系管理的质量,通过知识图(KG)应用。
  • methods: 这个研究使用了两种知识图基本方法:一是基于强化学习的方法,另一是基于XGBoost算法的方法。两种方法都利用了一个基于结构化和无结构化数据生成的知识图。
  • results: 这个研究表明,通过使用知识图驱动的高级机器学习技术,可以提供更加可解释的结果,从而促进更好的决策。这个研究也表明了在客户关系管理中, combining 高级机器学习技术和知识图驱动的想法的潜在价值。
    Abstract Personalized recommendations have a growing importance in direct marketing, which motivates research to enhance customer experiences by knowledge graph (KG) applications. For example, in financial services, companies may benefit from providing relevant financial articles to their customers to cultivate relationships, foster client engagement and promote informed financial decisions. While several approaches center on KG-based recommender systems for improved content, in this study we focus on interpretable KG-based recommender systems for decision making.To this end, we present two knowledge graph-based approaches for personalized article recommendations for a set of customers of a large multinational financial services company. The first approach employs Reinforcement Learning and the second approach uses the XGBoost algorithm for recommending articles to the customers. Both approaches make use of a KG generated from both structured (tabular data) and unstructured data (a large body of text data).Using the Reinforcement Learning-based recommender system we could leverage the graph traversal path leading to the recommendation as a way to generate interpretations (Path Directed Reasoning (PDR)). In the XGBoost-based approach, one can also provide explainable results using post-hoc methods such as SHAP (SHapley Additive exPlanations) and ELI5 (Explain Like I am Five).Importantly, our approach offers explainable results, promoting better decision-making. This study underscores the potential of combining advanced machine learning techniques with KG-driven insights to bolster experience in customer relationship management.
    摘要 personalized recommendations 的重要性在直接市场策略中不断增长,这导致了研究人员努力增强客户体验,通过知识图(KG)应用。例如,在金融服务中,公司可能会从提供 relevanter 的金融文章来培养关系,促进客户参与度和提高客户做出的 финанCIAL 决策。虽然许多方法集中在 KG 基于的 recommender 系统上,但在这种研究中,我们专注于可解释 KG 基于的 recommender 系统,以便在决策过程中提供更多的帮助。为此,我们提出了两种基于知识图的方法,用于个性化文章推荐。首先,我们使用 Reinforcement Learning 方法,并利用知识图的搜索路径来生成解释(Path Directed Reasoning )。其次,我们使用 XGBoost 算法,并使用后处方法如 SHAP 和 ELI5 来提供可解释的结果。重要的是,我们的方法提供了可解释的结果,从而促进更好的决策。本研究表明,将先进的机器学习技术与知识图驱动的洞察结合,可以提高客户关系管理的经验。

Monotone deep Boltzmann machines

  • paper_url: http://arxiv.org/abs/2307.04990
  • repo_url: None
  • paper_authors: Zhili Feng, Ezra Winston, J. Zico Kolter
  • for: 这个论文是研究深度波兰链机制(DBM)的一种可能的扩展,即允许自适应连接的 monotone DBM,以实现高效的approximate inference。
  • methods: 这篇论文使用了 monotone Deep Equilibrium model 的工具,并通过选择特定的活化函数来实现固定点迭代,以获得一个Variational Mean Field解。
  • results: 这个方法可以应用于深度 convolutional Boltzmann 架构,并能够同时完成图像的联合完成和分类任务,而不需要传统 RBM 中的含义场推理。
    Abstract Deep Boltzmann machines (DBMs), one of the first ``deep'' learning methods ever studied, are multi-layered probabilistic models governed by a pairwise energy function that describes the likelihood of all variables/nodes in the network. In practice, DBMs are often constrained, i.e., via the \emph{restricted} Boltzmann machine (RBM) architecture (which does not permit intra-layer connections), in order to allow for more efficient inference. In this work, we revisit the generic DBM approach, and ask the question: are there other possible restrictions to their design that would enable efficient (approximate) inference? In particular, we develop a new class of restricted model, the monotone DBM, which allows for arbitrary self-connection in each layer, but restricts the \emph{weights} in a manner that guarantees the existence and global uniqueness of a mean-field fixed point. To do this, we leverage tools from the recently-proposed monotone Deep Equilibrium model and show that a particular choice of activation results in a fixed-point iteration that gives a variational mean-field solution. While this approach is still largely conceptual, it is the first architecture that allows for efficient approximate inference in fully-general weight structures for DBMs. We apply this approach to simple deep convolutional Boltzmann architectures and demonstrate that it allows for tasks such as the joint completion and classification of images, within a single deep probabilistic setting, while avoiding the pitfalls of mean-field inference in traditional RBMs.
    摘要

Epidemic Modeling with Generative Agents

  • paper_url: http://arxiv.org/abs/2307.04986
  • repo_url: https://github.com/bear96/gabm-epidemic
  • paper_authors: Ross Williams, Niyousha Hosseinichimeh, Aritra Majumdar, Navid Ghaffarzadegan
  • for: This study aims to address the grand challenge of incorporating human behavior in epidemic models by offering a new paradigm of individual-level modeling.
  • methods: The study uses generative artificial intelligence in an agent-based epidemic model, where each agent is empowered to make its own reasonings and decisions via connecting to a large language model such as ChatGPT.
  • results: Through various simulation experiments, the study presents compelling evidence that generative agents mimic real-world behaviors such as quarantining when sick and self-isolation when cases rise, and demonstrate patterns akin to multiple waves observed in recent pandemics followed by an endemic period. Additionally, the agents successfully flatten the epidemic curve.
    Abstract This study offers a new paradigm of individual-level modeling to address the grand challenge of incorporating human behavior in epidemic models. Using generative artificial intelligence in an agent-based epidemic model, each agent is empowered to make its own reasonings and decisions via connecting to a large language model such as ChatGPT. Through various simulation experiments, we present compelling evidence that generative agents mimic real-world behaviors such as quarantining when sick and self-isolation when cases rise. Collectively, the agents demonstrate patterns akin to multiple waves observed in recent pandemics followed by an endemic period. Moreover, the agents successfully flatten the epidemic curve. This study creates potential to improve dynamic system modeling by offering a way to represent human brain, reasoning, and decision making.
    摘要

Reducing Causality to Functions with Structural Models

  • paper_url: http://arxiv.org/abs/2307.07524
  • repo_url: https://github.com/miaotianyi/annotated-sfm
  • paper_authors: Tianyi Miao
  • for: 本文提出了一种新的 causality 定义方法,即基于结构功能模型(SFM)的减 delta 压缩和对比迭代推理,以生成具有我们直觉含义的 causal 句子。
  • methods: 本文使用了 delta 压缩和对比迭代推理来实现 SFM,并将其应用到了多个 causal 场景中。
  • results: 本文通过对 SFM 的应用和比较,发现了它的 compatibiltiy 性和可靠性,并用于解释自由意志、 causal 解释和心理 causation 等问题。
    Abstract The precise definition of causality is currently an open problem in philosophy and statistics. We believe causality should be defined as functions (in mathematics) that map causes to effects. We propose a reductive definition of causality based on Structural Functional Model (SFM). Using delta compression and contrastive forward inference, SFM can produce causal utterances like "X causes Y" and "X is the cause of Y" that match our intuitions. We compile a dataset of causal scenarios and use SFM in all of them. SFM is compatible with but not reducible to probability theory. We also compare SFM with other theories of causation and apply SFM to downstream problems like free will, causal explanation, and mental causation.
    摘要 <>translate "The precise definition of causality is currently an open problem in philosophy and statistics. We believe causality should be defined as functions (in mathematics) that map causes to effects. We propose a reductive definition of causality based on Structural Functional Model (SFM). Using delta compression and contrastive forward inference, SFM can produce causal utterances like "X causes Y" and "X is the cause of Y" that match our intuitions. We compile a dataset of causal scenarios and use SFM in all of them. SFM is compatible with but not reducible to probability theory. We also compare SFM with other theories of causation and apply SFM to downstream problems like free will, causal explanation, and mental causation." into 中文(简体)>>Here's the translation:现在哲学和统计学中,定义 causality 是一个开放的问题。我们认为 causality 应该定义为数学函数,将原因映射到后果上。我们提出了一种简化的定义,基于结构功能模型(SFM)。使用 delta 压缩和对比前进INF,SFM 可以生成如 "X 导致 Y" 和 "X 是 Y 的原因" 这样的 causal 句子,与我们的直觉匹配。我们对所有 causal 场景进行了数据集编译,并使用 SFM。SFM 与概率论相容,但不可reducible。我们还与其他 causation 理论进行了比较,并将 SFM 应用到下游问题,如自由意志、 causal 解释和心理 causation。

Secrets of RLHF in Large Language Models Part I: PPO

  • paper_url: http://arxiv.org/abs/2307.04964
  • repo_url: https://github.com/openlmlab/moss-rlhf
  • paper_authors: Rui Zheng, Shihan Dou, Songyang Gao, Yuan Hua, Wei Shen, Binghai Wang, Yan Liu, Senjie Jin, Qin Liu, Yuhao Zhou, Limao Xiong, Lu Chen, Zhiheng Xi, Nuo Xu, Wenbin Lai, Minghao Zhu, Cheng Chang, Zhangyue Yin, Rongxiang Weng, Wensen Cheng, Haoran Huang, Tianxiang Sun, Hang Yan, Tao Gui, Qi Zhang, Xipeng Qiu, Xuanjing Huang
  • for: 这个论文的主要目标是提高人工智能的普遍性,并通过人类中心的帮助、诚实和无害的方式来实现这一目标。
  • methods: 该论文使用了奖励模型来衡量人类喜好,并使用Proximal Policy Optimization(PPO)算法来优化策略模型的输出。它还使用过程监视来提高步骤进行逻辑能力。
  • results: 该论文通过分析RLHF框架、重新评估PPO算法的内部工作机制,并探索PPO算法中的部件如何影响策略代理训练。研究发现策略约束是RLHF训练稳定的关键因素。因此,研究者提出了PPO-max算法,以提高策略模型训练稳定性。基于主要结果,研究者进行了RLHF能力的全面分析,并与SFT模型和ChatGPT进行比较。
    Abstract Large language models (LLMs) have formulated a blueprint for the advancement of artificial general intelligence. Its primary objective is to function as a human-centric (helpful, honest, and harmless) assistant. Alignment with humans assumes paramount significance, and reinforcement learning with human feedback (RLHF) emerges as the pivotal technological paradigm underpinning this pursuit. Current technical routes usually include \textbf{reward models} to measure human preferences, \textbf{Proximal Policy Optimization} (PPO) to optimize policy model outputs, and \textbf{process supervision} to improve step-by-step reasoning capabilities. However, due to the challenges of reward design, environment interaction, and agent training, coupled with huge trial and error cost of large language models, there is a significant barrier for AI researchers to motivate the development of technical alignment and safe landing of LLMs. The stable training of RLHF has still been a puzzle. In the first report, we dissect the framework of RLHF, re-evaluate the inner workings of PPO, and explore how the parts comprising PPO algorithms impact policy agent training. We identify policy constraints being the key factor for the effective implementation of the PPO algorithm. Therefore, we explore the PPO-max, an advanced version of PPO algorithm, to efficiently improve the training stability of the policy model. Based on our main results, we perform a comprehensive analysis of RLHF abilities compared with SFT models and ChatGPT. The absence of open-source implementations has posed significant challenges to the investigation of LLMs alignment. Therefore, we are eager to release technical reports, reward models and PPO codes, aiming to make modest contributions to the advancement of LLMs.
    摘要 大型语言模型(LLM)已经制定了人工通用智能的发展蓝图。其主要目标是作为人acentric(协助、诚实、无害)助手。与人类Alignment的重要性,以及人类反馈学习(RLHF) emerges as the pivotal technological paradigm underpinning this pursuit。现有的技术 Routes usually include 激励模型 to measure human preferences, Proximal Policy Optimization(PPO) to optimize policy model outputs, and process supervision to improve step-by-step reasoning capabilities。然而,由于激励设计、环境互动、代理人训练等因素,加上大型语言模型的巨大实验成本,导致 AI研究者对技术Alignment和安全降落的发展带来了很大的阻碍。RLHF的稳定训练仍然是一个谜。在本报告中,我们分析了RLHF的框架,重新评估PPO内部运作,并探索PPO算法中的不同部分如何影响代理人训练。我们发现政策约束是RLHF的关键因素。因此,我们探索了PPO-max,一种RLHF的进阶版本,以提高政策模型训练的稳定性。根据我们的主要结果,我们进行了RLHF能力的全面分析,与SFT模型和ChatGPT进行比较。由于LLMs的开源实现缺乏,我们对LLMs的Alignment进行了实验性的探索。因此,我们将发布技术报告、激励模型和PPO代码,以做出一定的贡献于LLMs的发展。

Intrinsically motivated graph exploration using network theories of human curiosity

  • paper_url: http://arxiv.org/abs/2307.04962
  • repo_url: https://github.com/spatank/GraphRL
  • paper_authors: Shubhankar P. Patankar, Mathieu Ouellet, Juan Cervino, Alejandro Ribeiro, Kieran A. Murphy, Dani S. Bassett
  • for: 本研究旨在透过两种人类好奇理论来驱动 Graph 结构资料中的探索。
  • methods: 本研究提出了一种基于 Graph Neural Network 的奖励学习方法,使用提议的特征来帮助 Agent 在环境中探索。
  • results: 训练Agent使用提议的奖励学习方法后,可以在更大的环境中和更长的探索路径上获得更好的性能,并且比使用单纯的搜索更加快速。此外,curiosity-based 的推荐系统在真实世界的 Graph 资料上也比 PageRank 中心性更加预测人类行为。
    Abstract Intrinsically motivated exploration has proven useful for reinforcement learning, even without additional extrinsic rewards. When the environment is naturally represented as a graph, how to guide exploration best remains an open question. In this work, we propose a novel approach for exploring graph-structured data motivated by two theories of human curiosity: the information gap theory and the compression progress theory. The theories view curiosity as an intrinsic motivation to optimize for topological features of subgraphs induced by the visited nodes in the environment. We use these proposed features as rewards for graph neural-network-based reinforcement learning. On multiple classes of synthetically generated graphs, we find that trained agents generalize to larger environments and to longer exploratory walks than are seen during training. Our method computes more efficiently than the greedy evaluation of the relevant topological properties. The proposed intrinsic motivations bear particular relevance for recommender systems. We demonstrate that curiosity-based recommendations are more predictive of human behavior than PageRank centrality for several real-world graph datasets, including MovieLens, Amazon Books, and Wikispeedia.
    摘要 自适应的探索被证明对于强化学习非常有用,即使没有额外的奖励。当环境自然表示为图时,如何最佳化探索仍然是一个开放的问题。在这项工作中,我们提议一种新的方法来探索图结构数据,基于人类好奇的两种理论:信息差距理论和压缩进步理论。这两种理论视好奇为一种内在的动机,用于优化图中子节点的 topological 特征。我们使用这些提出的特征作为图神经网络基于的强化学习中的奖励。在多种 sintetically 生成的图上,我们发现训练的代理人在更大的环境和更长的探索步长上进行更好的探索。我们的方法比较有效率于评估相关的 topological 特征。提议的内在动机具有特别 relevance для推荐系统。我们示出,好奇基于推荐是对实际世界图据集的人类行为更加预测的,比如 MovieLens、Amazon Books 和 Wikispeedia。

Reinforcement Learning with Non-Cumulative Objective

  • paper_url: http://arxiv.org/abs/2307.04957
  • repo_url: https://github.com/willtop/Reinforcement_Learning_With_Non-Cumulative_Objective
  • paper_authors: Wei Cui, Wei Yu
  • for: 这篇论文主要针对的是在控制和学习中处理非累积目标的问题。
  • methods: 该论文提出了一种修改现有算法,以便优化非累积目标。Specifically, it modifies the Bellman optimality equation to handle non-cumulative objectives.
  • results: 在实验中,该方法在经典的优化控制和学习任务中,以及在两个网络流量最大化问题中,都能够达到全球最优的协调。
    Abstract In reinforcement learning, the objective is almost always defined as a \emph{cumulative} function over the rewards along the process. However, there are many optimal control and reinforcement learning problems in various application fields, especially in communications and networking, where the objectives are not naturally expressed as summations of the rewards. In this paper, we recognize the prevalence of non-cumulative objectives in various problems, and propose a modification to existing algorithms for optimizing such objectives. Specifically, we dive into the fundamental building block for many optimal control and reinforcement learning algorithms: the Bellman optimality equation. To optimize a non-cumulative objective, we replace the original summation operation in the Bellman update rule with a generalized operation corresponding to the objective. Furthermore, we provide sufficient conditions on the form of the generalized operation as well as assumptions on the Markov decision process under which the globally optimal convergence of the generalized Bellman updates can be guaranteed. We demonstrate the idea experimentally with the bottleneck objective, i.e., the objectives determined by the minimum reward along the process, on classical optimal control and reinforcement learning tasks, as well as on two network routing problems on maximizing the flow rates.
    摘要 在再循环学习中,目标通常是一个累加函数,表示过程中的奖励的总和。但在应用领域中,有许多优化控制和再循环学习问题,特别是在通信和网络领域,其目标不自然表示为奖励的总和。在这篇论文中,我们认可这些问题中的非累加目标的普遍存在,并提出修改现有算法以优化这些目标的方法。specifically,我们探究了许多优化控制和再循环学习算法的基本构建块:bellman优化方程。为了优化非累加目标,我们在bellman更新规则中 Replace the original summation operation with a generalized operation corresponding to the objective. In addition, we provide sufficient conditions on the form of the generalized operation as well as assumptions on the Markov decision process under which the globally optimal convergence of the generalized Bellman updates can be guaranteed. We demonstrate the idea experimentally with the bottleneck objective, i.e., the objectives determined by the minimum reward along the process, on classical optimal control and reinforcement learning tasks, as well as on two network routing problems on maximizing the flow rates.

Impact of Feature Encoding on Malware Classification Explainability

  • paper_url: http://arxiv.org/abs/2307.05614
  • repo_url: None
  • paper_authors: Elyes Manai, Mohamed Mejri, Jaouhar Fattahi
  • for: 这个论文研究了对于可解释人工智能(XAI)算法的特征编码技术的影响。
  • methods: 我们使用了一个架构分类 dataset,并将 XGBoost 模型与两种特征编码方法进行比较:标签编码(LE)和一个热点编码(OHE)。
  • results: 我们发现,使用 OHE 而不是 LE 会导致一些性能下降,但是 OHE 提供的更多的细节解释使得这些下降被补偿。我们还发现,使用 OHE 可以更好地探索全局和本地上下文中的细节,使得更全面的回答。此外,我们发现使用 OHE 可以减少解释文件的大小和人类分析者的分析时间。这些结论强调了在 XAI 研究中考虑特征编码技术的重要性,并提出了进一步探索的可能性,包括添加更多的编码方法和创新的可视化方法。
    Abstract This paper investigates the impact of feature encoding techniques on the explainability of XAI (Explainable Artificial Intelligence) algorithms. Using a malware classification dataset, we trained an XGBoost model and compared the performance of two feature encoding methods: Label Encoding (LE) and One Hot Encoding (OHE). Our findings reveal a marginal performance loss when using OHE instead of LE. However, the more detailed explanations provided by OHE compensated for this loss. We observed that OHE enables deeper exploration of details in both global and local contexts, facilitating more comprehensive answers. Additionally, we observed that using OHE resulted in smaller explanation files and reduced analysis time for human analysts. These findings emphasize the significance of considering feature encoding techniques in XAI research and suggest potential for further exploration by incorporating additional encoding methods and innovative visualization approaches.
    摘要 这篇论文研究了特征编码技术对Explainable Artificial Intelligence(XAI)算法的可解释性影响。使用一个恶意软件分类 dataset,我们使用 XGBoost 模型进行比较两种特征编码方法的性能:Label Encoding(LE)和One Hot Encoding(OHE)。我们发现,使用 OHE 而不是 LE 会导致一些性能下降,但是 OHE 提供的更详细的解释相应弥补了这些损失。我们发现,使用 OHE 可以更深入探索全局和局部上下文中的细节,从而提供更全面的答案。此外,我们发现使用 OHE 会减少解释文件的大小和人工分析者的分析时间。这些发现强调了考虑特征编码技术在 XAI 研究中的重要性,并提出了进一步探索的可能性,包括添加更多的编码方法和创新的视觉化方法。

Substance or Style: What Does Your Image Embedding Know?

  • paper_url: http://arxiv.org/abs/2307.05610
  • repo_url: None
  • paper_authors: Cyrus Rashtchian, Charles Herrmann, Chun-Sung Ferng, Ayan Chakrabarti, Dilip Krishnan, Deqing Sun, Da-Cheng Juan, Andrew Tomkins
  • for: 本研究旨在探索 популяр的嵌入模型(如 MAE、SimCLR 和 CLIP)中的非 semantics 信息,以便更好地理解训练算法和这些基础模型的应用场景。
  • methods: 作者设计了一种系统的变换预测任务,用于测试嵌入中的视觉内容。他们使用了多种自然和人工变换来测试嵌入,并发现六个嵌入(包括 SimCLR)能够确定多达数十种变换。
  • results: 研究发现,使用图像文本模型(如 CLIP 和 ALIGN)可以更好地识别新的样式转移,而使用masking-based模型(如 CAN 和 MAE)则更适合识别图像的涂抹效果。总的来说,研究结果表明,选择预训练算法可以影响嵌入中的信息类型,并且某些模型更适合非 semantics 下游任务。
    Abstract Probes are small networks that predict properties of underlying data from embeddings, and they provide a targeted, effective way to illuminate the information contained in embeddings. While analysis through the use of probes has become standard in NLP, there has been much less exploration in vision. Image foundation models have primarily been evaluated for semantic content. Better understanding the non-semantic information in popular embeddings (e.g., MAE, SimCLR, or CLIP) will shed new light both on the training algorithms and on the uses for these foundation models. We design a systematic transformation prediction task and measure the visual content of embeddings along many axes, including image style, quality, and a range of natural and artificial transformations. Surprisingly, six embeddings (including SimCLR) encode enough non-semantic information to identify dozens of transformations. We also consider a generalization task, where we group similar transformations and hold out several for testing. We find that image-text models (CLIP and ALIGN) are better at recognizing new examples of style transfer than masking-based models (CAN and MAE). Overall, our results suggest that the choice of pre-training algorithm impacts the types of information in the embedding, and certain models are better than others for non-semantic downstream tasks.
    摘要 probes 是小型网络,可以预测基于嵌入的数据属性,它们提供一种有效、targeted的方式来探索嵌入中包含的信息。在 NLP 领域中,对嵌入进行分析已成为标准的操作,而在视觉领域中,image foundation models 主要被评估为semantic content。我们认为更好地理解流行 embedding 中的非 semantic 信息(例如 MAE、SimCLR 或 CLIP)将为training algorithms 和这些基础模型提供新的灯光。我们设计了一个系统性的转换预测任务,并测量嵌入中的视觉内容在多个轴上,包括图像风格、质量和自然/人工转换的范围。结果显示,六个嵌入(包括 SimCLR)中的信息足够以识别多达数十种转换。我们还考虑了一个通用化任务,将相似的转换分组,并将其中的一些作为测试集保留。我们发现,图像文本模型(CLIP 和 ALIGN)在新的样式转换任务中表现更好,而基于 masking 的模型(CAN 和 MAE)则表现不如其他模型。总之,我们的结果表明,选择预训练算法的类型会影响嵌入中包含的信息,以及选择合适的模型可以对非 semantic 下游任务进行更好的表现。

KU-DMIS-MSRA at RadSum23: Pre-trained Vision-Language Model for Radiology Report Summarization

  • paper_url: http://arxiv.org/abs/2307.07409
  • repo_url: None
  • paper_authors: Gangwoo Kim, Hajung Kim, Lei Ji, Seongsu Bae, Chanhwi Kim, Mujeen Sung, Hyunjae Kim, Kun Yan, Eric Chang, Jaewoo Kang
  • for: 这篇论文是为了开发一个新的预训条件(Pre-trained Vision-Language Model,VLM),用于胸部X射影领域。
  • methods: 这篇论文使用多种多Modal dataset进行初始化,然后转移到胸部X射影领域。它运用了一个简单的序列对话Schema,让模型从有限的资源中学习需要的知识和技能。
  • results: 这篇论文在 BioNLP 共享任务的benchmark数据集上显示出了优秀的表现,并且在 RadSum23 领域中的隐藏测试集上取得了第一名。
    Abstract In this paper, we introduce CheXOFA, a new pre-trained vision-language model (VLM) for the chest X-ray domain. Our model is initially pre-trained on various multimodal datasets within the general domain before being transferred to the chest X-ray domain. Following a prominent VLM, we unify various domain-specific tasks into a simple sequence-to-sequence schema. It enables the model to effectively learn the required knowledge and skills from limited resources in the domain. Demonstrating superior performance on the benchmark datasets provided by the BioNLP shared task, our model benefits from its training across multiple tasks and domains. With subtle techniques including ensemble and factual calibration, our system achieves first place on the RadSum23 leaderboard for the hidden test set.
    摘要 在这篇论文中,我们介绍CheXOFA,一种新的预训练视觉语言模型(VLM),用于胸部X射影领域。我们的模型首先在通用领域中预训练于多种多Modal数据集,然后将其转移到胸部X射影领域。遵循一种知名的VLM,我们将各个领域特有的任务统一为简单的序列到序列 schema。这使得模型可以很好地从限制的资源中学习需要的知识和技能。在 BioNLP 共同任务提供的标准 datasets 上,我们的模型表现出色,受益于在多个任务和领域进行训练。通过使用 ensemble 和事实抽象,我们的系统在 RadSum23 隐藏测试集上获得了第一名。

Learning to Solve Constraint Satisfaction Problems with Recurrent Transformer

  • paper_url: http://arxiv.org/abs/2307.04895
  • repo_url: https://github.com/azreasoners/recurrent_transformer
  • paper_authors: Zhun Yang, Adam Ishay, Joohyung Lee
  • for: 解决具有约束的问题 (Constraint Satisfaction Problems, CSPs)
  • methods: 使用Transformer扩展加上回归,实现端到端的学习方法,与现有方法(如图 neural network、SATNet和一些 neuralsymbolic 模型)有明显的优势。
  • results: 可以直接应用于视觉约束逻辑问题,成功解决符号固定问题,并在 inductive 学习中利用约束知识来实现高效的学习和半supervised learning for CSPs。
    Abstract Constraint satisfaction problems (CSPs) are about finding values of variables that satisfy the given constraints. We show that Transformer extended with recurrence is a viable approach to learning to solve CSPs in an end-to-end manner, having clear advantages over state-of-the-art methods such as Graph Neural Networks, SATNet, and some neuro-symbolic models. With the ability of Transformer to handle visual input, the proposed Recurrent Transformer can straightforwardly be applied to visual constraint reasoning problems while successfully addressing the symbol grounding problem. We also show how to leverage deductive knowledge of discrete constraints in the Transformer's inductive learning to achieve sample-efficient learning and semi-supervised learning for CSPs.
    摘要

Choosing Well Your Opponents: How to Guide the Synthesis of Programmatic Strategies

  • paper_url: http://arxiv.org/abs/2307.04893
  • repo_url: https://github.com/rubensolv/locallearnerijcai
  • paper_authors: Rubens O. Moraes, David S. Aleixo, Lucas N. Ferreira, Levi H. S. Lelis
  • for: 提供一个集合参考策略来导引搜索算法,以提高两个玩家零点游戏中的策略搜索质量。
  • methods: 使用Local Learner(2L)算法,活动选择一组参考策略,以提高搜索信号。
  • results: 比较IBR、FP和DO等前一代学习算法,2L学习的参考策略提供了更强的搜索信号,并在MicroRTS游戏中synthesize策略时出色表现。
    Abstract This paper introduces Local Learner (2L), an algorithm for providing a set of reference strategies to guide the search for programmatic strategies in two-player zero-sum games. Previous learning algorithms, such as Iterated Best Response (IBR), Fictitious Play (FP), and Double-Oracle (DO), can be computationally expensive or miss important information for guiding search algorithms. 2L actively selects a set of reference strategies to improve the search signal. We empirically demonstrate the advantages of our approach while guiding a local search algorithm for synthesizing strategies in three games, including MicroRTS, a challenging real-time strategy game. Results show that 2L learns reference strategies that provide a stronger search signal than IBR, FP, and DO. We also simulate a tournament of MicroRTS, where a synthesizer using 2L outperformed the winners of the two latest MicroRTS competitions, which were programmatic strategies written by human programmers.
    摘要 Translated into Simplified Chinese:这篇论文介绍了本地学习者(2L)算法,用于为两 Player零点游戏提供参考策略,以提高搜索信号。先前的学习算法,如趋同最佳响应(IBR)、虚拟游戏(FP)和双oracle(DO),可能具有计算成本高或缺乏重要信息,导致搜索算法受到限制。2L活动选择参考策略,以提高搜索信号。我们实际示出了我们方法的优势,在三个游戏中,包括实时战略游戏MicroRTS,对于 synthesizing 策略进行了地方搜索。结果显示,2L学习的参考策略比IBR、FP和DO更强。我们还在MicroRTS tournament中使用2Lsynthesizer,击败了两个最近的 MicroRTS 比赛冠军,这两个冠军分别是人工编程的程序策略。

Measuring and Mitigating Interference in Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2307.04887
  • repo_url: None
  • paper_authors: Vincent Liu, Han Wang, Ruo Yu Tao, Khurram Javed, Adam White, Martha White
  • for: 本研究旨在提供一种定义和评估Catastrophic interference的方法,以便更好地理解和解决这种问题。
  • methods: 本研究使用了Fitted Q-Iteration和DQN等值基于学习方法,并提出了一种新的interference measure。
  • results: 研究人员通过系统地评估了新的interference measure,发现它与控制性能的不稳定相关,并在多种网络架构上进行了评估。此外,研究人员还提出了一类名为“online-aware”的算法,可以减少interference,并在多个 классических控制环境中提高稳定性和性能。
    Abstract Catastrophic interference is common in many network-based learning systems, and many proposals exist for mitigating it. Before overcoming interference we must understand it better. In this work, we provide a definition and novel measure of interference for value-based reinforcement learning methods such as Fitted Q-Iteration and DQN. We systematically evaluate our measure of interference, showing that it correlates with instability in control performance, across a variety of network architectures. Our new interference measure allows us to ask novel scientific questions about commonly used deep learning architectures and study learning algorithms which mitigate interference. Lastly, we outline a class of algorithms which we call online-aware that are designed to mitigate interference, and show they do reduce interference according to our measure and that they improve stability and performance in several classic control environments.
    摘要 “潜在的干扰是许多基于网络的学习系统中的常见问题,而有许多提议可以减轻它。在这个工作中,我们提供一个定义和新的干扰量表,用于值基于动作学习方法,如适应Q值迭代和DQN。我们系统评估了我们的干扰量表,证明它与控制性能的不稳定相联系。我们的新干扰量表允许我们问新科学问题,研究通用的深度学习架构和减轻干扰的学习算法。最后,我们描述一 classe of algorithms,我们称之为在线独立的,可以减轻干扰,并证明它们可以降低干扰根据我们的量表,并提高稳定性和性能在一些类典控制环境中。”Note that Simplified Chinese is used in mainland China, while Traditional Chinese is used in Taiwan and Hong Kong.

ChatGPT for Digital Forensic Investigation: The Good, The Bad, and The Unknown

  • paper_url: http://arxiv.org/abs/2307.10195
  • repo_url: https://github.com/markscanlonucd/chatgpt-for-digital-forensics
  • paper_authors: Mark Scanlon, Frank Breitinger, Christopher Hargreaves, Jan-Niclas Hilgert, John Sheppard
  • for: This paper is written to assess the impact of ChatGPT on the field of digital forensics, specifically looking at its latest pre-trained LLM, GPT-4.
  • methods: The paper uses a series of experiments to assess the capability of ChatGPT across several digital forensic use cases, including artefact understanding, evidence searching, code generation, anomaly detection, incident response, and education.
  • results: The paper concludes that while there are some potential low-risk applications of ChatGPT within digital forensics, many are either unsuitable at present or require sufficient knowledge of the topic being asked of the tool to identify incorrect assumptions, inaccuracies, and mistakes. However, to an appropriately knowledgeable user, ChatGPT could act as a useful supporting tool in some circumstances.
    Abstract The disruptive application of ChatGPT (GPT-3.5, GPT-4) to a variety of domains has become a topic of much discussion in the scientific community and society at large. Large Language Models (LLMs), e.g., BERT, Bard, Generative Pre-trained Transformers (GPTs), LLaMA, etc., have the ability to take instructions, or prompts, from users and generate answers and solutions based on very large volumes of text-based training data. This paper assesses the impact and potential impact of ChatGPT on the field of digital forensics, specifically looking at its latest pre-trained LLM, GPT-4. A series of experiments are conducted to assess its capability across several digital forensic use cases including artefact understanding, evidence searching, code generation, anomaly detection, incident response, and education. Across these topics, its strengths and risks are outlined and a number of general conclusions are drawn. Overall this paper concludes that while there are some potential low-risk applications of ChatGPT within digital forensics, many are either unsuitable at present, since the evidence would need to be uploaded to the service, or they require sufficient knowledge of the topic being asked of the tool to identify incorrect assumptions, inaccuracies, and mistakes. However, to an appropriately knowledgeable user, it could act as a useful supporting tool in some circumstances.
    摘要 各种领域中的ChatGPT应用(GPT-3.5、GPT-4)已成为科学社会中的一个热点话题。大语言模型(LLM),如BERT、Bard、生成预训练变换器(GPT)、LLaMA等,可以根据用户提供的指令或提示,从大量文本基础数据中生成答案和解决方案。这篇论文评估了ChatGPT在数字审计领域的影响和潜在影响,特别是关注其最新的预训练LLM——GPT-4。经过一系列实验,对多个数字审计应用场景进行了评估,包括文本理解、证据搜索、代码生成、异常检测、应急应对和教育等。在这些话题中,其优点和风险被详细描述,并从一般角度提出了一些结论。总之,这篇论文认为,虽然ChatGPT在数字审计领域有一些低风险应用,但大多数情况下需要上传证据到服务器,或者需要具备足够的话题知识,以确定 incorrect assumption、不准确和错误。然而,对知道这些话题的用户来说,它可以作为一个有用的支持工具在某些情况下使用。

AI For Global Climate Cooperation 2023 Competition Proceedings

  • paper_url: http://arxiv.org/abs/2307.06951
  • repo_url: None
  • paper_authors: Yoshua Bengio, Prateek Gupta, Lu Li, Soham Phade, Sunil Srinivasa, Andrew Williams, Tianyu Zhang, Yang Zhang, Stephan Zheng
    for:The paper aims to design international frameworks for mitigating climate change and promoting economic growth through the use of AI and climate-economic simulations.methods:The paper uses RICE-N, an AI-driven integrated assessment model, to model regional decision-making and assess the climate-economic impact of those decisions into the future.results:The proposals submitted to the second track were evaluated both quantitatively and qualitatively, with a focus on the degree of mitigation of global temperature rise and the increase in economic productivity. An interdisciplinary panel of human experts evaluated the solutions qualitatively, considering effectiveness, simplicity, feasibility, ethics, and notions of climate justice.
    Abstract The international community must collaborate to mitigate climate change and sustain economic growth. However, collaboration is hard to achieve, partly because no global authority can ensure compliance with international climate agreements. Combining AI with climate-economic simulations offers a promising solution to design international frameworks, including negotiation protocols and climate agreements, that promote and incentivize collaboration. In addition, these frameworks should also have policy goals fulfillment, and sustained commitment, taking into account climate-economic dynamics and strategic behaviors. These challenges require an interdisciplinary approach across machine learning, economics, climate science, law, policy, ethics, and other fields. Towards this objective, we organized AI for Global Climate Cooperation, a Mila competition in which teams submitted proposals and analyses of international frameworks, based on (modifications of) RICE-N, an AI-driven integrated assessment model (IAM). In particular, RICE-N supports modeling regional decision-making using AI agents. Furthermore, the IAM then models the climate-economic impact of those decisions into the future. Whereas the first track focused only on performance metrics, the proposals submitted to the second track were evaluated both quantitatively and qualitatively. The quantitative evaluation focused on a combination of (i) the degree of mitigation of global temperature rise and (ii) the increase in economic productivity. On the other hand, an interdisciplinary panel of human experts in law, policy, sociology, economics and environmental science, evaluated the solutions qualitatively. In particular, the panel considered the effectiveness, simplicity, feasibility, ethics, and notions of climate justice of the protocols. In the third track, the participants were asked to critique and improve RICE-N.
    摘要 国际社区必须合作以 Mitigate 气候变化并促进经济增长。然而,合作困难以实现,一部分因为没有全球权威机构可以确保国际气候协议的遵从性。通过结合 AI 与气候经济仿真 simulations 提供了一个有 Promise 的解决方案,设计国际框架,包括谈判协议和气候协议,以促进和激励合作。此外,这些框架还应该具备政策目标实现和持续承诺,考虑气候经济动态和战略行为。这些挑战需要跨学科的approach,涵盖机器学习、经济学、气候科学、法律、政策、伦理和其他领域。为了实现这个目标,我们组织了 AI for Global Climate Cooperation 竞赛,各 коман队提交了国际框架的提案和分析,基于(修改后) RICE-N AI 驱动的集成评估模型(IAM)。具体来说,RICE-N 支持地域决策使用 AI 代理。而 IAM 则模拟了未来气候经济的影响。在第一个轨道中,只评估表现指标。而在第二个轨道中,提案被评估 both 量化和质量上。量化评估包括(i)全球气温升高的减少程度和(ii)经济生产力的增加。而质量评估则由一个多学科专家组评估,包括法律、政策、社会学、经济学和环境科学。特别是,专家组考虑了解决方案的有效性、简单性、可行性、伦理和气候正义。在第三个轨道中,参与者被要求提出 RICE-N 的改进建议。

Fed-CPrompt: Contrastive Prompt for Rehearsal-Free Federated Continual Learning

  • paper_url: http://arxiv.org/abs/2307.04869
  • repo_url: None
  • paper_authors: Gaurav Bagwe, Xiaoyong Yuan, Miao Pan, Lan Zhang
  • for: 这个论文关注于无需练习的 Federated Continual Learning(FCL),它在分布在客户端上的隐私数据上逐步学习新任务。
  • methods: 这篇论文提出了基于提问学习技术的 Fed-CPrompt,通过异步提问学习和对比性 continual loss 来解决无法访问历史任务数据的忘记问题。
  • results: 实验证明,Fed-CPrompt 可以在无需练习的情况下实现 SOTA 级别的 FCL 性能。
    Abstract Federated continual learning (FCL) learns incremental tasks over time from confidential datasets distributed across clients. This paper focuses on rehearsal-free FCL, which has severe forgetting issues when learning new tasks due to the lack of access to historical task data. To address this issue, we propose Fed-CPrompt based on prompt learning techniques to obtain task-specific prompts in a communication-efficient way. Fed-CPrompt introduces two key components, asynchronous prompt learning, and contrastive continual loss, to handle asynchronous task arrival and heterogeneous data distributions in FCL, respectively. Extensive experiments demonstrate the effectiveness of Fed-CPrompt in achieving SOTA rehearsal-free FCL performance.
    摘要 federated continual learning (FCL) 学习 incremental 任务随时间的发展,数据分布在客户端上,这篇论文关注 rehearsal-free FCL,因为缺乏历史任务数据,导致新任务学习存在严重的忘记问题。为解决这个问题,我们提议 Fed-CPrompt,基于提示学习技术,在通信效率高的方式下获取任务特定的提示。Fed-CPrompt 具有两个关键组成部分:异步提示学习和矛盾连续损失,以处理 FCL 中的异步任务到达和不同数据分布。广泛的实验证明 Fed-CPrompt 可以实现 SOTA 的忘记-free FCL 性能。

Automated Detection of Gait Events and Travel Distance Using Waist-worn Accelerometers Across a Typical Range of Walking and Running Speeds

  • paper_url: http://arxiv.org/abs/2307.04866
  • repo_url: None
  • paper_authors: Albara Ah Ramli, Xin Liu, Kelly Berndt, Chen-Nee Chuah, Erica Goude, Lynea B. Kaethler, Amanda Lopez, Alina Nicorici, Corey Owens, David Rodriguez, Jane Wang, Daniel Aranki, Craig M. McDonald, Erik K. Henricson
    for: 这项研究的目的是使用商用智能手机的加速度仪数据来提取孩子 Duchenne muscular dystrophy (DMD) 和常见发育 typical developing controls (TDs) 中的步行临床特征 (CFs),并使用机器学习 (ML) 方法来实现这一目标。methods: 该研究使用了一种多步机器学习基本过程来提取加速度仪数据中的步行特征,并对这些特征进行了比较与实际观察数据。results: 研究发现,使用这种方法可以准确地测量孩子在不同步行速度下的步行特征,并且与实际观察数据之间存在强相关性(Pearson 相关系数为 -0.9929 至 0.9986,p < 0.0001)。
    Abstract Background: Estimation of temporospatial clinical features of gait (CFs), such as step count and length, step duration, step frequency, gait speed and distance traveled is an important component of community-based mobility evaluation using wearable accelerometers. However, challenges arising from device complexity and availability, cost and analytical methodology have limited widespread application of such tools. Research Question: Can accelerometer data from commercially-available smartphones be used to extract gait CFs across a broad range of attainable gait velocities in children with Duchenne muscular dystrophy (DMD) and typically developing controls (TDs) using machine learning (ML)-based methods Methods: Fifteen children with DMD and 15 TDs underwent supervised clinical testing across a range of gait speeds using 10 or 25m run/walk (10MRW, 25MRW), 100m run/walk (100MRW), 6-minute walk (6MWT) and free-walk (FW) evaluations while wearing a mobile phone-based accelerometer at the waist near the body's center of mass. Gait CFs were extracted from the accelerometer data using a multi-step machine learning-based process and results were compared to ground-truth observation data. Results: Model predictions vs. observed values for step counts, distance traveled, and step length showed a strong correlation (Pearson's r = -0.9929 to 0.9986, p<0.0001). The estimates demonstrated a mean (SD) percentage error of 1.49% (7.04%) for step counts, 1.18% (9.91%) for distance traveled, and 0.37% (7.52%) for step length compared to ground truth observations for the combined 6MWT, 100MRW, and FW tasks. Significance: The study findings indicate that a single accelerometer placed near the body's center of mass can accurately measure CFs across different gait speeds in both TD and DMD peers, suggesting that there is potential for accurately measuring CFs in the community with consumer-level smartphones.
    摘要 Background: 评估社区中行走的临床功能(CFs),如步数和长度、步 duration、步频、走速和距离覆盖,是评估社区基础 mobilidade 评估工具的重要组成部分。但是,设备复杂性和可用性、成本和分析方法等因素,限制了这些工具的广泛应用。研究问题:可以使用商业可用的智能手机加速器数据来提取孩子 Duchenne muscular dystrophy(DMD)和常见发育阶段(TD)的走姿CFs,并使用机器学习(ML)基本方法来实现这一点。Methods: fifteen children with DMD和 fifteen TDs underwent supervised clinical testing across a range of gait speeds using 10或25m run/walk(10MRW, 25MRW)、100m run/walk(100MRW)、6分钟步行(6MWT)和自由步行(FW)评估,并在腰部附近的身体中心的 mobilphone 加速器上穿戴。孩子的走姿CFs 从加速器数据中提取,使用多步骤机器学习基本过程,并与实际观察数据进行比较。Results: 模型预测与实际观察值之间的相关性(Pearson's r = -0.9929到0.9986,p < 0.0001),并且估计结果表明,对于步数、距离旅行和步长,模型的 Mean(SD)百分比误差为1.49%(7.04%)、1.18%(9.91%)和0.37%(7.52%),与实际观察值相比。Significance: 研究发现,使用商业可用的智能手机加速器可以准确地测量CFs across different gait speeds in both TD and DMD peers, suggesting that there is potential for accurately measuring CFs in the community with consumer-level smartphones.

SHAP@k:Efficient and Probably Approximately Correct (PAC) Identification of Top-k Features

  • paper_url: http://arxiv.org/abs/2307.04850
  • repo_url: None
  • paper_authors: Sanjay Kariyappa, Leonidas Tsepenekas, Freddy Lécué, Daniele Magazzeni
  • for: 本研究旨在提高现有方法的样本效率,以解决顶层特征标识问题(TkIP)。
  • methods: 本研究使用了两种多臂投机(MAB) литераature中的技术来提高样本效率:首先,提供一个更好的停止条件,以确定当PAC保证已经得到的时候停止抽样;其次,采用一种精巧的抽样方案,将抽样分配给不同的特征。
  • results: 通过采用KernelSHAP@k和SamplingSHAP@k方法,本研究可以高效地解决TkIP,提供了平均提高$5\times$的样本效率和运行时间。
    Abstract The SHAP framework provides a principled method to explain the predictions of a model by computing feature importance. Motivated by applications in finance, we introduce the Top-k Identification Problem (TkIP), where the objective is to identify the k features with the highest SHAP values. While any method to compute SHAP values with uncertainty estimates (such as KernelSHAP and SamplingSHAP) can be trivially adapted to solve TkIP, doing so is highly sample inefficient. The goal of our work is to improve the sample efficiency of existing methods in the context of solving TkIP. Our key insight is that TkIP can be framed as an Explore-m problem--a well-studied problem related to multi-armed bandits (MAB). This connection enables us to improve sample efficiency by leveraging two techniques from the MAB literature: (1) a better stopping-condition (to stop sampling) that identifies when PAC (Probably Approximately Correct) guarantees have been met and (2) a greedy sampling scheme that judiciously allocates samples between different features. By adopting these methods we develop KernelSHAP@k and SamplingSHAP@k to efficiently solve TkIP, offering an average improvement of $5\times$ in sample-efficiency and runtime across most common credit related datasets.
    摘要 “SHAP框架提供一种原则性的方法来解释模型预测结果,计算特征重要性。受金融应用驱动,我们引入Top-k标识问题(TkIP), objective是 Identifying the k 特征 WITH the highest SHAP值。虽然任何方法可以计算SHAP值 WITH uncertainty estimates(如 KernelSHAP 和 SamplingSHAP)可以轻松地解决TkIP,但这样做是高度抽象的。我们的目标是提高现有方法在TkIP的样本效率。我们的关键发现是:TkIP可以被划为Explore-m问题——一个已有研究的多重枪仗问题(MAB)。这种连接使我们能够提高样本效率,通过利用MAB литературе中的两种技术:(1)一个更好的停止条件(to stop sampling),可以确定当PAC(Probably Approximately Correct)保证已经得到到了,和(2)一种聪明的采样方案,可以合理地分配样本 между不同的特征。我们采用这些方法,开发了KernelSHAP@k 和 SamplingSHAP@k,以高效地解决TkIP,在大多数常见的借款相关数据集上提供了平均提高5倍的样本效率和运行时间。”

SigOpt Mulch: An Intelligent System for AutoML of Gradient Boosted Trees

  • paper_url: http://arxiv.org/abs/2307.04849
  • repo_url: None
  • paper_authors: Aleksei Sorokin, Xinran Zhu, Eric Hans Lee, Bolong Cheng
  • For: 提高Gradient Boosted Trees(GBT)的hyperparameter tuning效率和用户体验,尤其是在自动化hyperparameter tuning中。* Methods: 利用模型 aware的 hyperparameter tuning系统,结合多元学习和多数据点优化技术,自动地学习GBT模型的优秀hyperparameter。* Results: 比对于现有系统,SigOpt Mulch可以更高效地 Identify GBT模型的优秀hyperparameter,并且更易于使用,不需要用户具备域知识。
    Abstract Gradient boosted trees (GBTs) are ubiquitous models used by researchers, machine learning (ML) practitioners, and data scientists because of their robust performance, interpretable behavior, and ease-of-use. One critical challenge in training GBTs is the tuning of their hyperparameters. In practice, selecting these hyperparameters is often done manually. Recently, the ML community has advocated for tuning hyperparameters through black-box optimization and developed state-of-the-art systems to do so. However, applying such systems to tune GBTs suffers from two drawbacks. First, these systems are not \textit{model-aware}, rather they are designed to apply to a \textit{generic} model; this leaves significant optimization performance on the table. Second, using these systems requires \textit{domain knowledge} such as the choice of hyperparameter search space, which is an antithesis to the automatic experimentation that black-box optimization aims to provide. In this paper, we present SigOpt Mulch, a model-aware hyperparameter tuning system specifically designed for automated tuning of GBTs that provides two improvements over existing systems. First, Mulch leverages powerful techniques in metalearning and multifidelity optimization to perform model-aware hyperparameter optimization. Second, it automates the process of learning performant hyperparameters by making intelligent decisions about the optimization search space, thus reducing the need for user domain knowledge. These innovations allow Mulch to identify good GBT hyperparameters far more efficiently -- and in a more seamless and user-friendly way -- than existing black-box hyperparameter tuning systems.
    摘要 Gradient Boosted Trees (GBTs) 是研究人员、机器学习实践者和数据科学家们常用的广泛模型,因其 robust性、可解释性和易用性而受到欢迎。然而,在训练 GBTs 时,一个关键挑战是调节其 гипер参数。在实践中,选择这些 гипер参数 oftentimes 是通过手动方式进行的。过去几年,机器学习社区强调通过黑盒优化进行 гипер参数调节,并开发出了先进的系统来实现这一点。然而,在应用这些系统来调节 GBTs 时,存在两个缺点。首先,这些系统不是 GBTs 模型具有的知识,而是针对普通模型设计的,这使得优化性能受到了很大的浪费。其次,使用这些系统需要域知识,如选择优化搜索空间,这与黑盒优化的自动实验相opposite。在这篇论文中,我们提出了 SigOpt Mulch,一种专门为 GBTs 自动调节 гипер参数的模型意识系统。相比现有系统,Mulch 具有两个改进:首先,Mulch 利用了强大的元学习和多级优化技术来进行模型意识化优化。其次,它自动化了优化过程中的performant гипер参数学习,从而减少了用户域知识的需求。这些创新使得 Mulch 可以更高效地、更易于用户使用地调节 GBTs 的 гипер参数,而不需要域知识。

Dynamics of Temporal Difference Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2307.04841
  • repo_url: https://github.com/pehlevan-group/td-rl-dynamics
  • paper_authors: Blake Bordelon, Paul Masset, Henry Kuo, Cengiz Pehlevan
  • for: 这 paper 的目的是研究 reinforcement learning 模型在缺少反馈情况下的学习Dynamic。
  • methods: 这 paper 使用 statistical physics 的概念来研究 temporal difference learning 的值函数学习曲线。
  • results: 研究发现,在 Gaussian equivalence hypothesis 下,学习过程中的抽象函数approximator 会导致学习Dynamic 存在板块,而这些板块与学习率、折扣因子、奖励函数等参数有关。此外,研究还发现,通过 adjusting 学习率和奖励函数,可以改变学习Dynamic 和板块。
    Abstract Reinforcement learning has been successful across several applications in which agents have to learn to act in environments with sparse feedback. However, despite this empirical success there is still a lack of theoretical understanding of how the parameters of reinforcement learning models and the features used to represent states interact to control the dynamics of learning. In this work, we use concepts from statistical physics, to study the typical case learning curves for temporal difference learning of a value function with linear function approximators. Our theory is derived under a Gaussian equivalence hypothesis where averages over the random trajectories are replaced with temporally correlated Gaussian feature averages and we validate our assumptions on small scale Markov Decision Processes. We find that the stochastic semi-gradient noise due to subsampling the space of possible episodes leads to significant plateaus in the value error, unlike in traditional gradient descent dynamics. We study how learning dynamics and plateaus depend on feature structure, learning rate, discount factor, and reward function. We then analyze how strategies like learning rate annealing and reward shaping can favorably alter learning dynamics and plateaus. To conclude, our work introduces new tools to open a new direction towards developing a theory of learning dynamics in reinforcement learning.
    摘要 reenforcement 学习在多个应用中得到了成功,其中agent需要在具有罕见反馈的环境中学习行为。然而,Despite this empirical success, there is still a lack of theoretical understanding of how the parameters of reinforcement learning models and the features used to represent states interact to control the dynamics of learning. 在这个工作中,我们使用统计物理学的概念来研究 temporal difference 学习值函数的Typical case learning curves。我们的理论是基于 Gaussian equivalence hypothesis,在averages over the random trajectories 被 replaced with temporally correlated Gaussian feature averages,并且我们在小规模 Markov Decision Processes 上验证了我们的假设。我们发现,由于 episodic sampling 的抽样决策而导致的stochastic semi-gradient noise 会导致值错误存在显著的板块,与传统的梯度下降动力学不同。我们研究了学习动力学和板块如何受到特征结构、学习率、折损因子和奖励函数的影响。然后,我们分析了如何通过学习率渐进和奖励修饰来改善学习动力学和板块。总之,我们的工作开启了一个新的方向,用于发展 reinforcement 学习的学习动力学理论。

Divide, Evaluate, and Refine: Evaluating and Improving Text-to-Image Alignment with Iterative VQA Feedback

  • paper_url: http://arxiv.org/abs/2307.04749
  • repo_url: None
  • paper_authors: Jaskirat Singh, Liang Zheng
  • for: 本文目的是提出一种简单 yet effective的分解方法,以评估和改进文本到图像Alignment。
  • methods: 我们首先引入了一种Decompositional-Alignment-Score,通过将复杂的提示分解成一系列独立的声明,然后使用VQA模型测量每个声明与生成的图像之间的Alignment。最后,我们将不同声明之间的Alignment分数合并 posteriori,以获得最终的文本到图像Alignment分数。
  • results: 我们的实验分析表明,提出的Alignment度量与人工评分之间有高度相关性,而且我们发现了不同声明之间的Alignment分数可以提供有用的反馈,可以在一种简单的迭代过程中逐步提高生成的图像中不同声明的表达。人工用户研究表明,我们的方法比前一个状态的方法提高了8.7%的文本到图像Alignment精度。
    Abstract The field of text-conditioned image generation has made unparalleled progress with the recent advent of latent diffusion models. While remarkable, as the complexity of given text input increases, the state-of-the-art diffusion models may still fail in generating images which accurately convey the semantics of the given prompt. Furthermore, it has been observed that such misalignments are often left undetected by pretrained multi-modal models such as CLIP. To address these problems, in this paper we explore a simple yet effective decompositional approach towards both evaluation and improvement of text-to-image alignment. In particular, we first introduce a Decompositional-Alignment-Score which given a complex prompt decomposes it into a set of disjoint assertions. The alignment of each assertion with generated images is then measured using a VQA model. Finally, alignment scores for different assertions are combined aposteriori to give the final text-to-image alignment score. Experimental analysis reveals that the proposed alignment metric shows significantly higher correlation with human ratings as opposed to traditional CLIP, BLIP scores. Furthermore, we also find that the assertion level alignment scores provide a useful feedback which can then be used in a simple iterative procedure to gradually increase the expression of different assertions in the final image outputs. Human user studies indicate that the proposed approach surpasses previous state-of-the-art by 8.7% in overall text-to-image alignment accuracy. Project page for our paper is available at https://1jsingh.github.io/divide-evaluate-and-refine
    摘要 Traditional text-to-image generation models have made significant progress with the recent advent of latent diffusion models. However, as the complexity of the input text increases, the state-of-the-art diffusion models may still struggle to generate images that accurately convey the semantics of the given prompt. Moreover, it has been observed that such misalignments are often undetected by pretrained multi-modal models such as CLIP. To address these issues, in this paper we propose a simple yet effective decompositional approach to both evaluate and improve text-to-image alignment. Specifically, we first introduce a Decompositional-Alignment-Score that decomposes a complex prompt into a set of disjoint assertions. The alignment of each assertion with generated images is then measured using a VQA model. Finally, the alignment scores for different assertions are combined aposteriori to give the final text-to-image alignment score. Experimental analysis shows that the proposed alignment metric is significantly correlated with human ratings, compared to traditional CLIP and BLIP scores. Moreover, we find that the assertion-level alignment scores provide useful feedback that can be used in a simple iterative procedure to gradually increase the expression of different assertions in the final image outputs. Human user studies indicate that the proposed approach surpasses previous state-of-the-art by 8.7% in overall text-to-image alignment accuracy. The project page for our paper is available at .

RoCo: Dialectic Multi-Robot Collaboration with Large Language Models

  • paper_url: http://arxiv.org/abs/2307.04738
  • repo_url: https://github.com/MandiZhao/robot-collab
  • paper_authors: Zhao Mandi, Shreeya Jain, Shuran Song
  • for: 这篇论文是为了提出一种基于大语言模型(LLM)的多机器人协作方法,用于高级通信和低级路径规划。
  • methods: 论文使用了LLM来进行高级沟通和低级路径规划,并提供了环境反馈,如碰撞检查,以便提高计划和方向点的准确性。
  • results: 试验表明,该方法在多种多机器人协作场景中具有高成功率,并能够适应任务语义的变化。此外,对话设置具有高可读性和灵活性,可以在实际世界实验中与人类在一起完成任务。
    Abstract We propose a novel approach to multi-robot collaboration that harnesses the power of pre-trained large language models (LLMs) for both high-level communication and low-level path planning. Robots are equipped with LLMs to discuss and collectively reason task strategies. They then generate sub-task plans and task space waypoint paths, which are used by a multi-arm motion planner to accelerate trajectory planning. We also provide feedback from the environment, such as collision checking, and prompt the LLM agents to improve their plan and waypoints in-context. For evaluation, we introduce RoCoBench, a 6-task benchmark covering a wide range of multi-robot collaboration scenarios, accompanied by a text-only dataset for agent representation and reasoning. We experimentally demonstrate the effectiveness of our approach -- it achieves high success rates across all tasks in RoCoBench and adapts to variations in task semantics. Our dialog setup offers high interpretability and flexibility -- in real world experiments, we show RoCo easily incorporates human-in-the-loop, where a user can communicate and collaborate with a robot agent to complete tasks together. See project website https://project-roco.github.io for videos and code.
    摘要 我们提出了一种新的多机器人协作方法,利用预训练的大型自然语言模型(LLM)来实现高级沟通和低级路径规划。机器人配备了LLM,以便集体讨论和逻辑任务策略。然后,它们生成子任务计划和任务空间弧线路径,这些路径被用于加速 trajectory 规划。我们还提供了环境反馈,如碰撞检查,并让 LLM 代理人提高其计划和弧线路径。为了评估,我们提出了 RoCoBench,一个6个任务的benchmark,覆盖了多机器人协作场景的广泛范围。同时,我们还提供了一个文本Only的数据集,用于代理人表示和逻辑。我们的对话设置具有高可读性和灵活性,在实际实验中,我们示例了 RoCo 可以与人类在一起完成任务。更多信息可以在项目网站 查看。

Diffusion Policies for Out-of-Distribution Generalization in Offline Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2307.04726
  • repo_url: None
  • paper_authors: Suzan Ece Ada, Erhan Oztop, Emre Ugur
  • for: 本研究旨在提高非线上强化学习(Offline Reinforcement Learning)方法的性能,使其能够更好地学习政策,并且能够处理不同模式的行为政策。
  • methods: 本研究使用了 conditional diffusion models 来获得表达性的政策,并且引入了状态重建特征学习来解决非线上状态分布偏移问题。
  • results: 本研究在一个新的 2D Multimodal Contextual Bandit 环境中展示了其性能,并在多个 D4RL 标准任务上达到了领先的成绩。
    Abstract Offline Reinforcement Learning (RL) methods leverage previous experiences to learn better policies than the behavior policy used for experience collection. In contrast to behavior cloning, which assumes the data is collected from expert demonstrations, offline RL can work with non-expert data and multimodal behavior policies. However, offline RL algorithms face challenges in handling distribution shifts and effectively representing policies due to the lack of online interaction during training. Prior work on offline RL uses conditional diffusion models to obtain expressive policies to represent multimodal behavior in the dataset. Nevertheless, they are not tailored toward alleviating the out-of-distribution state generalization. We introduce a novel method incorporating state reconstruction feature learning in the recent class of diffusion policies to address the out-of-distribution generalization problem. State reconstruction loss promotes more descriptive representation learning of states to alleviate the distribution shift incurred by the out-of-distribution states. We design a 2D Multimodal Contextual Bandit environment to demonstrate and evaluate our proposed model. We assess the performance of our model not only in this new environment but also on several D4RL benchmark tasks, achieving state-of-the-art results.
    摘要 <> translate "Offline Reinforcement Learning (RL) methods leverage previous experiences to learn better policies than the behavior policy used for experience collection. In contrast to behavior cloning, which assumes the data is collected from expert demonstrations, offline RL can work with non-expert data and multimodal behavior policies. However, offline RL algorithms face challenges in handling distribution shifts and effectively representing policies due to the lack of online interaction during training. Prior work on offline RL uses conditional diffusion models to obtain expressive policies to represent multimodal behavior in the dataset. Nevertheless, they are not tailored toward alleviating the out-of-distribution state generalization. We introduce a novel method incorporating state reconstruction feature learning in the recent class of diffusion policies to address the out-of-distribution generalization problem. State reconstruction loss promotes more descriptive representation learning of states to alleviate the distribution shift incurred by the out-of-distribution states. We design a 2D Multimodal Contextual Bandit environment to demonstrate and evaluate our proposed model. We assess the performance of our model not only in this new environment but also on several D4RL benchmark tasks, achieving state-of-the-art results."中文翻译:offline reinforcement learning(RL)方法利用先前的经验学习更好的策略,而不是使用Behavior Cloning(行为复制)方法,该方法假设数据是从专家示范中收集的。然而,offline RL算法面临着无法处理分布变化和有效表示策略的挑战,因为训练过程中缺乏在线互动。先前的offline RL方法使用Conditional Diffusion Model(条件扩散模型)来获得表达性的策略,但这些方法并不适应对于非标准分布的状态泛化。我们提出了一种新的方法,即在Recent Class of Diffusion Policies(流体策略的最近一代)中添加了状态重建特征学习来解决非标准分布的状态泛化问题。状态重建损失使得状态表示更加详细,以适应分布变化。我们设计了一个2D多模态上下文抽象环境,以评估和评测我们的提议模型。我们不仅在这个新环境中评估了我们的模型,还在多个D4RL benchmark任务上实现了状态之最。

Large Language Models as General Pattern Machines

  • paper_url: http://arxiv.org/abs/2307.04721
  • repo_url: https://github.com/Aryia-Behroziuan/neurons
  • paper_authors: Suvir Mirchandani, Fei Xia, Pete Florence, Brian Ichter, Danny Driess, Montserrat Gonzalez Arenas, Kanishka Rao, Dorsa Sadigh, Andy Zeng
  • for: 这篇论文是关于使用预训练的大语言模型(LLM)来完成复杂的token序列的研究。
  • methods: 该论文使用了随机 sampling tokens from vocabulary 来测试 LLM 的 pattern completion 能力,并研究了如何应用这种零学习能力到机器人控制问题。
  • results: 研究发现,无需任何额外训练,LLM 可以作为通用的序列模型,通过在上下文中学习来完成复杂的序列。这些结果提示了在机器人控制问题中使用 LLM 可能有可能。
    Abstract We observe that pre-trained large language models (LLMs) are capable of autoregressively completing complex token sequences -- from arbitrary ones procedurally generated by probabilistic context-free grammars (PCFG), to more rich spatial patterns found in the Abstract Reasoning Corpus (ARC), a general AI benchmark, prompted in the style of ASCII art. Surprisingly, pattern completion proficiency can be partially retained even when the sequences are expressed using tokens randomly sampled from the vocabulary. These results suggest that without any additional training, LLMs can serve as general sequence modelers, driven by in-context learning. In this work, we investigate how these zero-shot capabilities may be applied to problems in robotics -- from extrapolating sequences of numbers that represent states over time to complete simple motions, to least-to-most prompting of reward-conditioned trajectories that can discover and represent closed-loop policies (e.g., a stabilizing controller for CartPole). While difficult to deploy today for real systems due to latency, context size limitations, and compute costs, the approach of using LLMs to drive low-level control may provide an exciting glimpse into how the patterns among words could be transferred to actions.
    摘要 (Simplified Chinese translation)我们发现,预训练的大语言模型(LLMs)可以自动完成复杂的token序列——从生成过程中的任意PCFG推理出来的,到ARC权限 benchmark中的更复杂的空间模式。 Surprisingly,完成模式的能力可以在使用 vocabulary 中随机选择的token表示时保持一定程度的预置。这些结果表明,无需进一步训练,LLMs可以作为通用的序列模型,受到上下文学习驱动。在这个工作中,我们研究了如何这种零shot能力可以应用于机器人控制问题——从时间序列中推断完整的运动,到由奖金条件的轨迹驱动的关闭Loop策略(例如,CartPole中的稳定控制器)。虽然 todays 因为延迟、上下文大小和计算成本等原因,在真实系统中部署这种方法尚未可行,但是使用 LLMs 驱动低级控制可能提供了一个吸引人的前景——将字符串中的模式传递到动作中。

Understanding Real-World AI Planning Domains: A Conceptual Framework

  • paper_url: http://arxiv.org/abs/2307.04701
  • repo_url: None
  • paper_authors: Ebaa Alnazer, Ilche Georgievski
  • for: 这篇论文旨在为AI规划系统的开发提供支持,帮助开发人员更好地理解和处理实际应用领域的复杂因素。
  • methods: 本文提出了一个概念框架,用于识别和分类实际应用领域中的各种因素,包括规划域的不同级别和建筑领域中的可持续发展。
  • results: 本文采用了域的例子,如可持续建筑领域,以示出框架的应用性和可行性。这种框架有助于开发人员更好地设计和实现AI规划系统,并且可能对实际应用领域的规划做出贡献。
    Abstract Planning is a pivotal ability of any intelligent system being developed for real-world applications. AI planning is concerned with researching and developing planning systems that automatically compute plans that satisfy some user objective. Identifying and understanding the relevant and realistic aspects that characterise real-world application domains are crucial to the development of AI planning systems. This provides guidance to knowledge engineers and software engineers in the process of designing, identifying, and categorising resources required for the development process. To the best of our knowledge, such support does not exist. We address this research gap by developing a conceptual framework that identifies and categorises the aspects of real-world planning domains in varying levels of granularity. Our framework provides not only a common terminology but also a comprehensive overview of a broad range of planning aspects exemplified using the domain of sustainable buildings as a prominent application domain of AI planning. The framework has the potential to impact the design, development, and applicability of AI planning systems in real-world application domains.
    摘要 планирование 是任何智能系统的关键能力,它涉及到自动计算满足某个用户目标的计划。人工智能 планирование关注于研究和开发计划系统,以满足实际应用场景中的用户需求。在开发人工智能计划系统时,正确识别和理解实际应用场景中的重要和现实主义特征是非常重要。这对知识工程师和软件工程师在开发过程中的设计、识别和分类资源提供了指导。到目前为止,这种支持不存在。我们通过开发一个概念框架,识别和分类实际应用场景中的各种方面,来填补这一研究漏洞。我们的框架不仅提供了共同术语,还为广泛的计划方面提供了全面的概述,并通过可持续建筑领域作为人工智能计划系统的一个典型应用领域,进行了例示。我们的框架具有影响人工智能计划系统的设计、开发和应用的潜在影响力。

COMEX: A Tool for Generating Customized Source Code Representations

  • paper_url: http://arxiv.org/abs/2307.04693
  • repo_url: https://github.com/ibm/tree-sitter-codeviews
  • paper_authors: Debeshee Das, Noble Saji Mathews, Alex Mathai, Srikanth Tamilselvam, Kranthi Sedamaki, Sridhar Chimalakonda, Atul Kumar
  • For: The paper aims to provide a tool for creating and combining multiple code-views that can be used by machine learning models for various software engineering tasks.* Methods: The tool uses tree-sitter, a widely used incremental parser that supports over 40 languages, to generate code-views such as Control Flow Graph (CFG), Data Flow Graph (DFG), and Abstract Syntax Tree (AST) directly from source code.* Results: The tool is easy to use and can be applied to various programming languages, including Java and C#. It supports both intra-procedural and inter-procedural analysis, and can be used to analyze both method-level snippets and program-level snippets.
    Abstract Learning effective representations of source code is critical for any Machine Learning for Software Engineering (ML4SE) system. Inspired by natural language processing, large language models (LLMs) like Codex and CodeGen treat code as generic sequences of text and are trained on huge corpora of code data, achieving state of the art performance on several software engineering (SE) tasks. However, valid source code, unlike natural language, follows a strict structure and pattern governed by the underlying grammar of the programming language. Current LLMs do not exploit this property of the source code as they treat code like a sequence of tokens and overlook key structural and semantic properties of code that can be extracted from code-views like the Control Flow Graph (CFG), Data Flow Graph (DFG), Abstract Syntax Tree (AST), etc. Unfortunately, the process of generating and integrating code-views for every programming language is cumbersome and time consuming. To overcome this barrier, we propose our tool COMEX - a framework that allows researchers and developers to create and combine multiple code-views which can be used by machine learning (ML) models for various SE tasks. Some salient features of our tool are: (i) it works directly on source code (which need not be compilable), (ii) it currently supports Java and C#, (iii) it can analyze both method-level snippets and program-level snippets by using both intra-procedural and inter-procedural analysis, and (iv) it is easily extendable to other languages as it is built on tree-sitter - a widely used incremental parser that supports over 40 languages. We believe this easy-to-use code-view generation and customization tool will give impetus to research in source code representation learning methods and ML4SE. Tool: https://pypi.org/project/comex - GitHub: https://github.com/IBM/tree-sitter-codeviews - Demo: https://youtu.be/GER6U87FVbU
    摘要 学习有效的源代码表示是机器学习 для软件工程(ML4SE)系统的关键。以自然语言处理为灵感,大型语言模型(LLM)如Codex和CodeGen将代码视为普通的文本序列,在庞大的代码数据集上训练,实现了软件工程(SE)任务的状态之权。然而,有效的源代码,与自然语言不同,受到编程语言的下面结构和模式所控制。当前的LLMs未能利用代码的这种属性,而是将代码视为字符串的序列,忽略代码的结构和含义。为了突破这个障碍,我们提出了我们的工具COMEX。COMEX是一个框架,允许研究人员和开发人员通过创建和组合多种代码视图来为机器学习(ML)模型提供多种SE任务。COMEX的一些优点包括:* 直接处理源代码(不需要编译)* 当前支持Java和C#* 可以分析方法级别的剪辑和程序级别的剪辑* 易于扩展到其他语言,基于tree-sitter,一个广泛使用的增量分析器,支持40多种语言我们认为,这个易于使用的代码视图生成和自定义工具,将为源代码表示学习方法和ML4SE研究提供新的动力。工具地址:https://pypi.org/project/comexGitHub地址:https://github.com/IBM/tree-sitter-codeviews demo:https://youtu.be/GER6U87FVbU

VampNet: Music Generation via Masked Acoustic Token Modeling

  • paper_url: http://arxiv.org/abs/2307.04686
  • repo_url: https://github.com/hugofloresgarcia/vampnet
  • paper_authors: Hugo Flores Garcia, Prem Seetharaman, Rithesh Kumar, Bryan Pardo
  • for: 这个论文主要用于音乐生成、压缩、缺失填充、变换等任务。
  • methods: 该方法使用masked acoustic token modeling Approach,使用可变的masking schedule进行训练,并且使用bidirectional transformer架构来进行非 autoregressive生成。
  • results: 通过不同的prompting方法,VampNet可以应用于音乐压缩、缺失填充、outpainting、continuation和looping等任务,并且可以保持音乐的style、genre、乐器等高级特征。
    Abstract We introduce VampNet, a masked acoustic token modeling approach to music synthesis, compression, inpainting, and variation. We use a variable masking schedule during training which allows us to sample coherent music from the model by applying a variety of masking approaches (called prompts) during inference. VampNet is non-autoregressive, leveraging a bidirectional transformer architecture that attends to all tokens in a forward pass. With just 36 sampling passes, VampNet can generate coherent high-fidelity musical waveforms. We show that by prompting VampNet in various ways, we can apply it to tasks like music compression, inpainting, outpainting, continuation, and looping with variation (vamping). Appropriately prompted, VampNet is capable of maintaining style, genre, instrumentation, and other high-level aspects of the music. This flexible prompting capability makes VampNet a powerful music co-creation tool. Code and audio samples are available online.
    摘要 我们介绍VampNet,一种带有掩盖的语音对象模型,用于音乐合成、压缩、缺失填充和变化。我们在训练过程中使用可变的掩盖程式,让我们在推断过程中针对不同的掩盖方法(称为“启发”)进行推断。VampNet 不是自然语言模型,它使用两个方向的transformer架构,在前进过程中统计所有的单词。仅需36次推断,VampNet 可以生成具有高匹配度和音质的音乐波形。我们显示,通过对 VampNet 进行不同的启发,可以将其应用到音乐压缩、缺失填充、外填、继续和循环等任务上,并且保持音乐的风格、乐器、乐器等高级特征。这种灵活的启发能力使得 VampNet 成为一个强大的音乐合作工具。代码和音乐样本可以在网上获取。

Quantifying the Echo Chamber Effect: An Embedding Distance-based Approach

  • paper_url: http://arxiv.org/abs/2307.04668
  • repo_url: https://github.com/faalatawi/echo-chamber-score
  • paper_authors: Faisal Alatawi, Paras Sheth, Huan Liu
  • for: This paper aims to develop a novel metric for quantifying echo chambers in online social media platforms.
  • methods: The proposed method, called Echo Chamber Score (ECS), uses a self-supervised graph autoencoder-based user embedding model (EchoGAE) to measure distances between users in the embedding space without making assumptions about the structure of the interaction graph or requiring labels for user ideologies.
  • results: The proposed method was tested on a Twitter dataset consisting of four topics, and the results showcased the effectiveness of ECS in quantifying echo chambers and shedding light on the dynamics of online discourse.
    Abstract The rise of social media platforms has facilitated the formation of echo chambers, which are online spaces where users predominantly encounter viewpoints that reinforce their existing beliefs while excluding dissenting perspectives. This phenomenon significantly hinders information dissemination across communities and fuels societal polarization. Therefore, it is crucial to develop methods for quantifying echo chambers. In this paper, we present the Echo Chamber Score (ECS), a novel metric that assesses the cohesion and separation of user communities by measuring distances between users in the embedding space. In contrast to existing approaches, ECS is able to function without labels for user ideologies and makes no assumptions about the structure of the interaction graph. To facilitate measuring distances between users, we propose EchoGAE, a self-supervised graph autoencoder-based user embedding model that leverages users' posts and the interaction graph to embed them in a manner that reflects their ideological similarity. To assess the effectiveness of ECS, we use a Twitter dataset consisting of four topics - two polarizing and two non-polarizing. Our results showcase ECS's effectiveness as a tool for quantifying echo chambers and shedding light on the dynamics of online discourse.
    摘要 “社交媒体平台的崛起导致了几何圈的形成,这些在网络上的空间中,用户主要遇到的观点都是与他们现有的信念相符的观点,而排挤不同的观点。这个现象严重地阻碍了社区之间的信息传播,并促进了社会的分化。因此,发展方法来量化几何圈的重要性。在这篇文章中,我们提出了几何圈分数(ECS),一个新的度量方法,可以衡量用户社区的凝聚和分离程度。与现有方法不同的是,ECS不需要用户的意识型别标签,并且不 assumptions about the structure of the interaction graph。为了衡量用户之间的距离,我们提出了几何圈GAE,一个基于用户的帖子和互动关系的内置自动encoder模型,可以将用户嵌入到一个实际上反映他们意识上的相似性的空间中。为了评估ECS的有效性,我们使用了一个Twitter dataset,包括四个主题:两个激化主题和两个非激化主题。我们的结果显示ECS是一个有效的几何圈量化工具,可以独立地量化用户社区的凝聚程度,并且给出了网络讨论的动态。”

cs.CL - 2023-07-11

GujiBERT and GujiGPT: Construction of Intelligent Information Processing Foundation Language Models for Ancient Texts

  • paper_url: http://arxiv.org/abs/2307.05354
  • repo_url: None
  • paper_authors: Dongbo Wang, Chang Liu, Zhixiao Zhao, Si Shen, Liu Liu, Bin Li, Haotian Hu, Mengcheng Wu, Litao Lin, Xue Zhao, Xiyu Wang
  • for: 这篇论文是为了推介一种基于古籍的语言模型,用于智能处理古文献。
  • methods: 这篇论文使用了自动学习的方法,通过训练在大量的古文献数据集上,使得模型能够有效地处理各种自然语言处理任务,如自动句 segmentation、标点符号、词语分词、部首标注、实体识别等。
  • results: 这篇论文的研究发现,通过使用自然语言处理任务的公共数据集进行自我超vised学习,可以更好地提高模型的下游任务能力。此外,研究还发现了字体选择、数据集规模和初始模型选择等因素对实验结果的影响。
    Abstract In the context of the rapid development of large language models, we have meticulously trained and introduced the GujiBERT and GujiGPT language models, which are foundational models specifically designed for intelligent information processing of ancient texts. These models have been trained on an extensive dataset that encompasses both simplified and traditional Chinese characters, allowing them to effectively handle various natural language processing tasks related to ancient books, including but not limited to automatic sentence segmentation, punctuation, word segmentation, part-of-speech tagging, entity recognition, and automatic translation. Notably, these models have exhibited exceptional performance across a range of validation tasks using publicly available datasets. Our research findings highlight the efficacy of employing self-supervised methods to further train the models using classical text corpora, thus enhancing their capability to tackle downstream tasks. Moreover, it is worth emphasizing that the choice of font, the scale of the corpus, and the initial model selection all exert significant influence over the ultimate experimental outcomes. To cater to the diverse text processing preferences of researchers in digital humanities and linguistics, we have developed three distinct categories comprising a total of nine model variations. We believe that by sharing these foundational language models specialized in the domain of ancient texts, we can facilitate the intelligent processing and scholarly exploration of ancient literary works and, consequently, contribute to the global dissemination of China's rich and esteemed traditional culture in this new era.
    摘要 在大语言模型的快速发展背景下,我们仔细训练并引入了古稿BERT和古稿GPT语言模型,这些基础模型专门为古稿智能处理设计。这些模型在包括简化和传统中文字符的广泛数据集上进行了仔细训练,因此能够有效地处理各种古稿自然语言处理任务,包括自动句子分 segmentation、括号、词 segmentation、部件标注、实体识别和自动翻译等。特别是,这些模型在多个验证任务中表现出色,使用公开的数据集。我们的研究发现显示,使用自动学习方法进一步训练这些模型,使其在下游任务中表现更出色。此外,我们发现字体选择、文本规模和初始模型选择等因素均对实验结果产生重要影响。为满足数字人文学和语言学研究人员对文本处理的多样化需求,我们开发了三个分类,共九个模型变种。我们认为,通过分享这些专门为古稿领域设计的基础语言模型,可以促进古稿智能处理和学术探索,从而为中国古代文化的全球传播做出贡献。

Explaining Competitive-Level Programming Solutions using LLMs

  • paper_url: http://arxiv.org/abs/2307.05337
  • repo_url: None
  • paper_authors: Jierui Li, Szymon Tworkowski, Yingying Wu, Raymond Mooney
  • for: 本研究目的是解决竞赛级程序编程问题,即为程序编程问题提供理解和代码生成的复合任务。
  • methods: 我们提出了一种新的方法,用于自动标注自然语言解释到《问题、解决方案》对应的Pair。我们发现,当前的LLMs虽然在解决竞赛级程序问题上表现不佳,但它们对解决方案的描述和分析具有强大的能力。
  • results: 我们的解释生成方法可以为问题提供结构化的解释,包括描述和分析。我们对CodeContests dataset进行实验,结果显示,虽然GPT3.5和GPT-4在描述解决方案方面表现相似,但GPT-4更好地理解解决方案的核心思想。
    Abstract In this paper, we approach competitive-level programming problem-solving as a composite task of reasoning and code generation. We propose a novel method to automatically annotate natural language explanations to \textit{} pairs. We show that despite poor performance in solving competitive-level programming problems, state-of-the-art LLMs exhibit a strong capacity in describing and explaining solutions. Our explanation generation methodology can generate a structured solution explanation for the problem containing descriptions and analysis. To evaluate the quality of the annotated explanations, we examine their effectiveness in two aspects: 1) satisfying the human programming expert who authored the oracle solution, and 2) aiding LLMs in solving problems more effectively. The experimental results on the CodeContests dataset demonstrate that while LLM GPT3.5's and GPT-4's abilities in describing the solution are comparable, GPT-4 shows a better understanding of the key idea behind the solution.
    摘要 在这篇论文中,我们将竞赛水平编程问题的解决视为一种复合任务,包括理解和代码生成。我们提出了一种新的方法,用于自动标注自然语言说明到\(\textit{<问题, 解决方案>}\)对。我们发现,当前的LLMs,虽然在竞赛水平编程问题上表现不佳,但它们在描述和解释解决方案方面具有强大的能力。我们的说明生成方法可以生成一个结构化的解决方案说明,包括描述和分析。为评估注解的质量,我们对两个方面进行评估:1)满足由人工编程专家撰写的oracle解决方案作者的期望,2)帮助LLMs更好地解决问题。我们在CodeContests数据集上进行了实验,结果表明,虽然GPT3.5和GPT-4在描述解决方案方面的能力相似,但GPT-4更好地理解解决方案的关键想法。

Attribute Controlled Dialogue Prompting

  • paper_url: http://arxiv.org/abs/2307.05228
  • repo_url: None
  • paper_authors: Runcheng Liu, Ahmad Rashid, Ivan Kobyzev, Mehdi Rezagholizadeh, Pascal Poupart
  • for: 这篇论文是为了研究如何使用参数有效地适应下游任务,但传统的精确提示和连续提示都假设所有数据样本内的提示都是固定的,而忽略了一些任务中的输入变化很大,如开放领域对话生成。
  • methods: 这篇论文提出了一种新的、实例特定的提示调教算法,具体来说是根据实例级控制代码而不是对话历史来生成提示,以探索它们对控制对话生成的影响。
  • results: 对于流行的开放领域对话 dataset 进行了实验,并通过自动评估指标和人工评估来评估结果,得到的结论是,我们的方法比基eline prompting 和精确调教的基eline parameter 的6%-7%。
    Abstract Prompt-tuning has become an increasingly popular parameter-efficient method for adapting large pretrained language models to downstream tasks. However, both discrete prompting and continuous prompting assume fixed prompts for all data samples within a task, neglecting the fact that inputs vary greatly in some tasks such as open-domain dialogue generation. In this paper, we present a novel, instance-specific prompt-tuning algorithm for dialogue generation. Specifically, we generate prompts based on instance-level control code, rather than the conversation history, to explore their impact on controlled dialogue generation. Experiments on popular open-domain dialogue datasets, evaluated on both automated metrics and human evaluation, demonstrate that our method is superior to prompting baselines and comparable to fine-tuning with only 5%-6% of total parameters.
    摘要 Prompt-tuning已成为大型预训练语言模型适应下游任务的常用方法之一,但是 both discrete prompting和continuous prompting假设所有数据样本内一个任务中的Fixed prompts,忽略了一些任务,如开放领域对话生成,输入可能很大。在这篇论文中,我们提出了一种新的、实例特定的Prompt-tuning算法,Specifically,我们基于实例级控制代码而不是对话历史来生成Prompts,以探索它们在控制对话生成中的影响。我们在流行的开放领域对话集上进行了实验,并通过自动评价指标和人工评价来评估我们的方法。结果显示,我们的方法在比Prompting基线和只使用5%-6%总参数进行精度的情况下具有优异性。

Better Handling Coreference Resolution in Aspect Level Sentiment Classification by Fine-Tuning Language Models

  • paper_url: http://arxiv.org/abs/2307.05646
  • repo_url: None
  • paper_authors: Dhruv Mullick, Bilal Ghanem, Alona Fyshe
  • for: 这项研究的目的是提高大型自然语言模型(LLM)在核心参照解决(CR)场景下的性能。
  • methods: 这项研究使用了精度地 fine-tuning LLM 来提高其在 CR 场景下的性能。
  • results: 研究发现,这种方法可以提高 LLM 的 CR 能力,并且发布了一个新的数据集,该数据集专门关注 CR 在核心参照解决中的问题。
    Abstract Customer feedback is invaluable to companies as they refine their products. Monitoring customer feedback can be automated with Aspect Level Sentiment Classification (ALSC) which allows us to analyse specific aspects of the products in reviews. Large Language Models (LLMs) are the heart of many state-of-the-art ALSC solutions, but they perform poorly in some scenarios requiring Coreference Resolution (CR). In this work, we propose a framework to improve an LLM's performance on CR-containing reviews by fine tuning on highly inferential tasks. We show that the performance improvement is likely attributed to the improved model CR ability. We also release a new dataset that focuses on CR in ALSC.
    摘要 客户反馈是企业发展产品的重要来源,实时监控客户反馈可以通过层面情感分类(ALSC)进行自动化。大型语言模型(LLM)是许多现代ALSC解决方案的核心,但它们在一些需要核心对话(CR)的情况下表现不佳。在这个工作中,我们提出了一个框架,可以将LML的表现提升到CR包含的评价中。我们显示,表现提升的原因可能是改善的CR能力。我们还发布了一个新的数据集, concentrate on CR在ALSC中。Note: The translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need Traditional Chinese, please let me know.

Writer adaptation for offline text recognition: An exploration of neural network-based methods

  • paper_url: http://arxiv.org/abs/2307.15071
  • repo_url: https://github.com/tobiasvanderwerff/master-thesis
  • paper_authors: Tobias van der Werff, Maruf A. Dhali, Lambert Schomaker
  • for: 这 paper 的目的是提高手写文本识别(HTR)模型的适应性,使其能够更好地处理新的写作风格。
  • methods: 这 paper 使用了两种方法来使 HTR 模型变得Writer-adaptive:1) 模型共享学习(MAML),一种通常用于少量示例分类 зада务中的算法,和2) 作者代码(writer codes),一种来自自动语音识别领域的想法。
  • results: results 表明,使用 HTR-specific MAML(MetaHTR)可以提高表现,相比基eline 有1.4到2.0个word error rate(WER)的提升。作者修改的提升为0.2到0.7 WER,深度模型通过MetaHTR进行修改显得更加适合。但是,应用 MetaHTR 到更大的 HTR 模型或句子级 HTR 可能会变得计算和内存需求过高。最后,基于学习特征或边均特征的作者代码没有提高识别性能。
    Abstract Handwriting recognition has seen significant success with the use of deep learning. However, a persistent shortcoming of neural networks is that they are not well-equipped to deal with shifting data distributions. In the field of handwritten text recognition (HTR), this shows itself in poor recognition accuracy for writers that are not similar to those seen during training. An ideal HTR model should be adaptive to new writing styles in order to handle the vast amount of possible writing styles. In this paper, we explore how HTR models can be made writer adaptive by using only a handful of examples from a new writer (e.g., 16 examples) for adaptation. Two HTR architectures are used as base models, using a ResNet backbone along with either an LSTM or Transformer sequence decoder. Using these base models, two methods are considered to make them writer adaptive: 1) model-agnostic meta-learning (MAML), an algorithm commonly used for tasks such as few-shot classification, and 2) writer codes, an idea originating from automatic speech recognition. Results show that an HTR-specific version of MAML known as MetaHTR improves performance compared to the baseline with a 1.4 to 2.0 improvement in word error rate (WER). The improvement due to writer adaptation is between 0.2 and 0.7 WER, where a deeper model seems to lend itself better to adaptation using MetaHTR than a shallower model. However, applying MetaHTR to larger HTR models or sentence-level HTR may become prohibitive due to its high computational and memory requirements. Lastly, writer codes based on learned features or Hinge statistical features did not lead to improved recognition performance.
    摘要 深度学习已经在手写识别方面取得了 significative 成功。然而,深度神经网络在面临不同数据分布时表现不佳,特别是在手写文本识别(HTR)领域。为了解决这个问题,我们在本文中 explore 如何使 HTR 模型变得作者适应的。我们使用了两种 HTR 架构,它们都使用 ResNet 背景网络,并且使用 LSTM 或 Transformer 序列解码器。使用这些基本模型,我们考虑了两种方法来使其变得作者适应:1)模型独立多学习(MAML),这是一种通常用于几何数据集上的几拟分类任务,以及2)作者码,这是一种起源于自动语音识别的想法。结果表明,使用 HTR 专有的 MAML 版本(MetaHTR)可以提高表现,相比于基eline 的 1.4 到 2.0 个词错率(WER)。作者适应的改进为 0.2 到 0.7 WER,深度模型似乎更适合使用 MetaHTR 进行适应。然而,对于更大的 HTR 模型或句子级 HTR,使用 MetaHTR 可能会变得繁琐和占用计算和内存资源。最后,基于学习特征或折衔统计特征的作者码并不导致改进的识别性能。

Mao-Zedong At SemEval-2023 Task 4: Label Represention Multi-Head Attention Model With Contrastive Learning-Enhanced Nearest Neighbor Mechanism For Multi-Label Text Classification

  • paper_url: http://arxiv.org/abs/2307.05174
  • repo_url: https://github.com/peterlau0626/semeval2023-task4-humanvalue
  • paper_authors: Che Zhang, Ping’an Liu, Zhenyang Xiao, Haojun Fei
  • for: 这个论文的目的是提出一种基于Roberta模型和对比学习的方法来自动识别人类价值观。
  • methods: 该方法使用Roberta模型获得文档的词向量编码,并提出了一种多头注意力机制来建立 especific labels和 semantic components之间的连接。此外,该方法还使用了一种对比学习强化的K-最近邻机制来利用现有的实例信息进行预测。
  • results: 该方法在测试集上 achievied an F1 score of 0.533,并在领先者榜单上排名第四。
    Abstract The study of human values is essential in both practical and theoretical domains. With the development of computational linguistics, the creation of large-scale datasets has made it possible to automatically recognize human values accurately. SemEval 2023 Task 4\cite{kiesel:2023} provides a set of arguments and 20 types of human values that are implicitly expressed in each argument. In this paper, we present our team's solution. We use the Roberta\cite{liu_roberta_2019} model to obtain the word vector encoding of the document and propose a multi-head attention mechanism to establish connections between specific labels and semantic components. Furthermore, we use a contrastive learning-enhanced K-nearest neighbor mechanism\cite{su_contrastive_2022} to leverage existing instance information for prediction. Our approach achieved an F1 score of 0.533 on the test set and ranked fourth on the leaderboard.
    摘要 study of human values 是在实践和理论领域都是非常重要的。 With the development of computational linguistics, the creation of large-scale datasets has made it possible to automatically recognize human values with high accuracy. SemEval 2023 Task 4\cite{kiesel:2023} provides a set of arguments and 20 types of human values that are implicitly expressed in each argument. In this paper, we present our team's solution. We use the Roberta\cite{liu_roberta_2019} model to obtain the word vector encoding of the document and propose a multi-head attention mechanism to establish connections between specific labels and semantic components. Furthermore, we use a contrastive learning-enhanced K-nearest neighbor mechanism\cite{su_contrastive_2022} to leverage existing instance information for prediction. Our approach achieved an F1 score of 0.533 on the test set and ranked fourth on the leaderboard.Note: Please note that the translation is in Simplified Chinese, which is one of the two standard forms of Chinese writing.

Overview of BioASQ 2023: The eleventh BioASQ challenge on Large-Scale Biomedical Semantic Indexing and Question Answering

  • paper_url: http://arxiv.org/abs/2307.05131
  • repo_url: None
  • paper_authors: Anastasios Nentidis, Georgios Katsimpras, Anastasia Krithara, Salvador Lima López, Eulália Farré-Maduell, Luis Gasco, Martin Krallinger, Georgios Paliouras
  • for: 本研究は、大规模の生物医学Semantic indexing和问题回答に関する进歩を促进する国际チャレンジであるBioASQの第11回大会の概要を提供します。
  • methods: 本研究では、新たなタスク(MedProcNER)を含むBioASQチャレンジの结果が提供されます。このタスクでは、スペイン语の医疗内容に対するSemantic annotationを行いました。
  • results: 本研究では、28チームが150以上のシステムの结果を提供しました。大多数のシステムが竞争的な性能を示しているため、fieldの状况が継続的に进歩していることを示しています。
    Abstract This is an overview of the eleventh edition of the BioASQ challenge in the context of the Conference and Labs of the Evaluation Forum (CLEF) 2023. BioASQ is a series of international challenges promoting advances in large-scale biomedical semantic indexing and question answering. This year, BioASQ consisted of new editions of the two established tasks b and Synergy, and a new task (MedProcNER) on semantic annotation of clinical content in Spanish with medical procedures, which have a critical role in medical practice. In this edition of BioASQ, 28 competing teams submitted the results of more than 150 distinct systems in total for the three different shared tasks of the challenge. Similarly to previous editions, most of the participating systems achieved competitive performance, suggesting the continuous advancement of the state-of-the-art in the field.
    摘要 这是 BioASQ 挑战的第十一版简介,发生在 Conference and Labs of the Evaluation Forum (CLEF) 2023 年。BioASQ 是一系列国际挑战,推动大规模生物医学semantic indexing和问答技术的发展。本年的 BioASQ 包括两个已有任务(b 和 Synergy)的新版本,以及一个新任务(MedProcNER),涉及医学语言描述中的医疗程序,这些程序在医学实践中具有重要作用。本年的 BioASQ 有 28 支参赛队伍提交了超过 150 个不同的系统,共同参与三个不同的共同任务。与过去的版本一样,大多数参与的系统表现竞争力强,这表明了该领域技术的不断发展。

Go Beyond The Obvious: Probing the gap of INFORMAL reasoning ability between Humanity and LLMs by Detective Reasoning Puzzle Benchmark

  • paper_url: http://arxiv.org/abs/2307.05113
  • repo_url: None
  • paper_authors: Zhouhon Gu, Zihan Li, Lin Zhang, Zhuozhi Xiong, Haoning Ye, Yikai Zhang, Wenhao Huang, Xiaoxuan Zhu, Qianyu He, Rui Xu, Sihang Jiang, Shusen Wang, Zili Wang, Hongwei Feng, Zhixu Li, Yanghua Xiao
  • for: 本研究旨在探讨人工智能语言模型在日常生活中的非正式思维能力是否具备人类水平。
  • methods: 研究人员构建了一个探测推理benchmark,收集了1,200道问题资源,以评估语言模型的非正式思维能力。另外,研究人员还提出了一种自我问题推广框架,以模拟人类思维方式提高语言模型的非正式思维能力。
  • results: 实验结果显示,人类表现大大超越了当前最佳语言模型在探测推理benchmark中的表现,而自我问题推广框架也被证明是提高语言模型非正式思维能力的最有效的引言工程。
    Abstract Informal reasoning ability is the ability to reason based on common sense, experience, and intuition.Humans use informal reasoning every day to extract the most influential elements for their decision-making from a large amount of life-like information.With the rapid development of language models, the realization of general artificial intelligence has emerged with hope. Given the outstanding informal reasoning ability of humans, how much informal reasoning ability language models have has not been well studied by scholars.In order to explore the gap between humans and language models in informal reasoning ability, this paper constructs a Detective Reasoning Benchmark, which is an assembly of 1,200 questions gathered from accessible online resources, aims at evaluating the model's informal reasoning ability in real-life context.Considering the improvement of the model's informal reasoning ability restricted by the lack of benchmark, we further propose a Self-Question Prompt Framework that mimics human thinking to enhance the model's informal reasoning ability.The goals of self-question are to find key elements, deeply investigate the connections between these elements, encourage the relationship between each element and the problem, and finally, require the model to reasonably answer the problem.The experimental results show that human performance greatly outperforms the SoTA Language Models in Detective Reasoning Benchmark.Besides, Self-Question is proven to be the most effective prompt engineering in improving GPT-4's informal reasoning ability, but it still does not even surpass the lowest score made by human participants.Upon acceptance of the paper, the source code for the benchmark will be made publicly accessible.
    摘要 人类日常生活中使用非正式逻辑能力来解决问题,包括基于经验、直觉和常识的推理。随着人工智能语言模型的快速发展,实现普通人工智能的希望也在上升。然而,学者们对语言模型的非正式逻辑能力的研究尚未得到充分的关注。为了探讨人类和语言模型在非正式逻辑能力方面的差距,本文提出了探测推理benchmark,该benchmark包括1,200道问题,从可达性的在线资源中收集而来,用于评估模型的非正式逻辑能力在实际生活中的表现。由于模型的非正式逻辑能力受限于缺乏benchmark,我们进一步提出了基于人类思维的自我问题推动框架,以提高模型的非正式逻辑能力。自我问题的目标是找到关键元素,深入探究这些元素之间的连接,促使每个元素与问题之间的关系,并最后做出合理的答案。实验结果显示,人类参与者在探测推理benchmark中表现出色,而SoTA语言模型则表现落后。此外,我们发现自我问题推动框架可以有效地提高GPT-4的非正式逻辑能力,但它仍然不足以超越人类参与者的最低得分。接受本文后,benchmark的源代码将公开发布。

Vacaspati: A Diverse Corpus of Bangla Literature

  • paper_url: http://arxiv.org/abs/2307.05083
  • repo_url: None
  • paper_authors: Pramit Bhattacharyya, Joydeep Mondal, Subhadip Maji, Arnab Bhattacharya
  • for: The paper aims to address the lack of a diverse and high-quality corpus for Bangla NLP tasks, which has hindered the development of state-of-the-art NLP models for the language.
  • methods: The authors build a corpus of Bangla literature, called Vacaspati, by collecting literary works from various websites and leveraging the public availability of these works without copyright violations or restrictions. They also build a word embedding model, Vac-FT, using FastText, and an Electra model, Vac-BERT, using the corpus.
  • results: The authors show that Vac-FT outperforms other FastText-based models on multiple downstream tasks, and Vac-BERT performs either better or similar to other state-of-the-art transformer models, despite having far fewer parameters and requiring fewer resources. They also demonstrate the efficacy of Vacaspati as a corpus by showing that models built from other corpora are not as effective.
    Abstract Bangla (or Bengali) is the fifth most spoken language globally; yet, the state-of-the-art NLP in Bangla is lagging for even simple tasks such as lemmatization, POS tagging, etc. This is partly due to lack of a varied quality corpus. To alleviate this need, we build Vacaspati, a diverse corpus of Bangla literature. The literary works are collected from various websites; only those works that are publicly available without copyright violations or restrictions are collected. We believe that published literature captures the features of a language much better than newspapers, blogs or social media posts which tend to follow only a certain literary pattern and, therefore, miss out on language variety. Our corpus Vacaspati is varied from multiple aspects, including type of composition, topic, author, time, space, etc. It contains more than 11 million sentences and 115 million words. We also built a word embedding model, Vac-FT, using FastText from Vacaspati as well as trained an Electra model, Vac-BERT, using the corpus. Vac-BERT has far fewer parameters and requires only a fraction of resources compared to other state-of-the-art transformer models and yet performs either better or similar on various downstream tasks. On multiple downstream tasks, Vac-FT outperforms other FastText-based models. We also demonstrate the efficacy of Vacaspati as a corpus by showing that similar models built from other corpora are not as effective. The models are available at https://bangla.iitk.ac.in/.
    摘要 孟加拉语(或孟加利语)是全球第五大常用语言,但是当前的NPLP在孟加拉语方面落后,尤其是对于基本任务如lemmatization、POS标注等。这主要是因为缺乏多样化的质量词库。为了解决这个问题,我们建立了Vacaspati词库,这是一个多样化的孟加拉语文学词库。我们从多个网站上收集了各种文学作品,只收集了公有领域的作品,以避免版权问题和限制。我们认为,出版物更好地捕捉语言特征,因为它们不仅限于某些文学模式,而且包含更多的语言多样性。Vacaspati词库包含超过1100万句子和115亿个字。我们还使用FastText建立了Vac-FT模型,以及使用词库进行训练了Electra模型,即Vac-BERT。Vac-BERT具有较少的参数,需要资源的一小部分,但在多种下游任务中表现和其他当前顶峰变换模型相当或更好。此外,我们还证明了Vacaspati词库的有用性,显示其他词库上的模型不如Vacaspati词库上的模型。这些模型可以在https://bangla.iitk.ac.in/上下载。

  • paper_url: http://arxiv.org/abs/2307.05081
  • repo_url: None
  • paper_authors: Huihui Xu, Kevin Ashley
  • for: 这项研究的目的是开发一种法律批判分类技术,帮助法律专业人员更好地理解和分析法律案例。
  • methods: 该研究使用了结构化辩论分割和法律辩论模式,并使用GPT-3.5生成辩论摘要。
  • results: 研究表明,使用该技术可以生成更高质量的辩论摘要,同时忽略不重要的背景信息。
    Abstract We use the combination of argumentative zoning [1] and a legal argumentative scheme to create legal argumentative segments. Based on the argumentative segmentation, we propose a novel task of classifying argumentative segments of legal case decisions. GPT-3.5 is used to generate summaries based on argumentative segments. In terms of automatic evaluation metrics, our method generates higher quality argumentative summaries while leaving out less relevant context as compared to GPT-4 and non-GPT models.
    摘要 我们使用辩论zoning(1)和法律辩论模式组合创建法律辩论段落。基于辩论段落分割,我们提出了一种将法律案例判决的辩论段落分类任务。使用GPT-3.5生成辩论摘要,与GPT-4和非GPT模型相比,我们的方法生成了更高质量的辩论摘要,同时减少了 menos相关的 контекст。Note:* "辩论zoning" (argumentative zoning) is a term used to describe the process of identifying and segmenting argumentative passages in text.* "法律辩论模式" (legal argumentative scheme) refers to the patterns and structures used in legal arguments.* "辩论段落" (argumentative segments) refers to the individual passages or sections of text that contain argumentative content.* "辩论摘要" (argumentative summaries) are summaries of text that focus on the argumentative content and leave out less relevant information.

Separate-and-Aggregate: A Transformer-based Patch Refinement Model for Knowledge Graph Completion

  • paper_url: http://arxiv.org/abs/2307.05627
  • repo_url: None
  • paper_authors: Chen Chen, Yufei Wang, Yang Zhang, Quan Z. Sheng, Kwok-Yan Lam
  • for: 本研究的目的是提出一种新的Transformer基于的Patch Refinement Model(PatReFormer),用于解决知识图加成(KGC)任务中的缺失实体问题。
  • methods: PatReFormer首先将知识图Entity和关系的嵌入分割成一系列的patch,然后使用cross-attention模块来实现Entity和关系嵌入特征之间的双向交互,从而更好地理解下面知识图。
  • results: 我们在四个流行的KGC benchmark上进行了实验,结果显示PatReFormer在标准KGC评价指标中(如MRR和H@n)表现出了显著的性能提升,较existings KGC方法更好。我们的分析表明PatReFormer的设计方式的有效性,并且发现PatReFormer可以更好地捕捉大的关系嵌入维度中的KG信息。最后,我们示出PatReFormer在复杂关系类型上的强大性,相比其他KGC模型。
    Abstract Knowledge graph completion (KGC) is the task of inferencing missing facts from any given knowledge graphs (KG). Previous KGC methods typically represent knowledge graph entities and relations as trainable continuous embeddings and fuse the embeddings of the entity $h$ (or $t$) and relation $r$ into hidden representations of query $(h, r, ?)$ (or $(?, r, t$)) to approximate the missing entities. To achieve this, they either use shallow linear transformations or deep convolutional modules. However, the linear transformations suffer from the expressiveness issue while the deep convolutional modules introduce unnecessary inductive bias, which could potentially degrade the model performance. Thus, we propose a novel Transformer-based Patch Refinement Model (PatReFormer) for KGC. PatReFormer first segments the embedding into a sequence of patches and then employs cross-attention modules to allow bi-directional embedding feature interaction between the entities and relations, leading to a better understanding of the underlying KG. We conduct experiments on four popular KGC benchmarks, WN18RR, FB15k-237, YAGO37 and DB100K. The experimental results show significant performance improvement from existing KGC methods on standard KGC evaluation metrics, e.g., MRR and H@n. Our analysis first verifies the effectiveness of our model design choices in PatReFormer. We then find that PatReFormer can better capture KG information from a large relation embedding dimension. Finally, we demonstrate that the strength of PatReFormer is at complex relation types, compared to other KGC models
    摘要 知识图完成(KGC)是指从知识图中推理缺失的事实。先前的KGC方法通常将知识图实体和关系表示为可训练的连续嵌入,并将实体$h$(或$t$)和关系$r$的嵌入进行拼接,以便在查询 $(h, r, ?)$(或$(?, r, t)$)中估计缺失的实体。为此,它们可以使用浅层的线性变换或深度的卷积模块。然而,线性变换受到表达能力问题的限制,而卷积模块可能会引入不必要的抽象假设,可能会降低模型性能。因此,我们提出了一种基于Transformer的PatReFormer模型来解决KGC问题。PatReFormer首先将嵌入分解成一系列的补丁,然后使用 Kreuz-感知模块来允许实体和关系之间的bi-向 embedding特征交互,从而更好地理解下面的KG。我们在四个流行的KGC benchmark上进行了实验,包括WN18RR、FB15k-237、YAGO37和DB100K。实验结果显示,PatReFormer与现有KGC方法在标准KGC评价指标(例如MRR和H@n)上表现出显著的性能改进。我们的分析首先证明了我们的模型设计选择的有效性。然后,我们发现PatReFormer可以更好地捕捉KG信息,特别是在大关系嵌入维度上。最后,我们表明PatReFormer在复杂的关系类型上表现更优异于其他KGC模型。

Synthetic Dataset for Evaluating Complex Compositional Knowledge for Natural Language Inference

  • paper_url: http://arxiv.org/abs/2307.05034
  • repo_url: https://github.com/sushmaakoju/natural-logic
  • paper_authors: Sushma Anand Akoju, Robert Vacareanu, Haris Riaz, Eduardo Blanco, Mihai Surdeanu
  • for: 本研究使用 Synthetic Dataset 和 Natural Language Inference(NLI)模型来 investigates 语言compose(Compositionality)的性能。
  • methods: 使用 Modified 15 个 SICK 数据集示例(Marelli et al., 2014)生成 1,304 个句子对。使用一组表示 universal quantifier、existential quantifier、否定和其他概念修饰的词组来修改句子的主语、谓语和补语部分。采用 NL 规则进行标注。
  • results: 发现 zero-shot 设置下 NLI 模型表现不佳,特别是对修改后句子中的否定和universal modifier 的表现较差。经过 fine-tuning 后,模型仍然表现不佳于否定、universal 和existential modifier。
    Abstract We introduce a synthetic dataset called Sentences Involving Complex Compositional Knowledge (SICCK) and a novel analysis that investigates the performance of Natural Language Inference (NLI) models to understand compositionality in logic. We produce 1,304 sentence pairs by modifying 15 examples from the SICK dataset (Marelli et al., 2014). To this end, we modify the original texts using a set of phrases - modifiers that correspond to universal quantifiers, existential quantifiers, negation, and other concept modifiers in Natural Logic (NL) (MacCartney, 2009). We use these phrases to modify the subject, verb, and object parts of the premise and hypothesis. Lastly, we annotate these modified texts with the corresponding entailment labels following NL rules. We conduct a preliminary verification of how well the change in the structural and semantic composition is captured by neural NLI models, in both zero-shot and fine-tuned scenarios. We found that the performance of NLI models under the zero-shot setting is poor, especially for modified sentences with negation and existential quantifiers. After fine-tuning this dataset, we observe that models continue to perform poorly over negation, existential and universal modifiers.
    摘要 我们介绍了一个人工数据集 called Sentences Involving Complex Compositional Knowledge (SICCK) 和一种新的分析方法,用于 Investigating the performance of Natural Language Inference (NLI) models in understanding compositionality in logic. 我们生成了1,304个句子对,通过修改15个例子从 SICK 数据集 (Marelli et al., 2014)。为此,我们使用了一组词组 - 修饰词,这些词组与自然逻辑 (NL) 中的通用量词、存在量词、否定和其他概念修饰词相对应。我们使用这些词组修改句子的主语、谓语和补语部分。最后,我们将这些修改后的句子与相应的推理标签 annotate 以NL规则。我们进行了一些预liminary verification,以确定 neural NLI 模型在零shot 和 fine-tuned 情况下对修改后的句子的表现是否能够 Capture 其结构和Semantic Composition 的变化。我们发现,在零shot 设置下,NLI 模型的性能很差,特别是对修改后的句子中的否定和存在量词的表现。经过 fine-tuning 这个数据集后,我们发现,模型仍然对修改后的句子中的否定、存在量词和通用量词表现很差。

Improving RNN-Transducers with Acoustic LookAhead

  • paper_url: http://arxiv.org/abs/2307.05006
  • repo_url: None
  • paper_authors: Vinit S. Unni, Ashish Mittal, Preethi Jyothi, Sunita Sarawagi
  • for: 这篇论文是为了提高Speech-to-Text转换模型的准确性和流动性。
  • methods: 这篇论文提出了一种名为LookAhead的技术,它使得文本表示更加基于声音证据,通过在音频输入中查看未来的未来。
  • results: 这篇论文的结果显示,使用LookAhead技术可以实现5%-20%的相对减少字误率,在内Domain和外Domain评估集上均有显著改善。
    Abstract RNN-Transducers (RNN-Ts) have gained widespread acceptance as an end-to-end model for speech to text conversion because of their high accuracy and streaming capabilities. A typical RNN-T independently encodes the input audio and the text context, and combines the two encodings by a thin joint network. While this architecture provides SOTA streaming accuracy, it also makes the model vulnerable to strong LM biasing which manifests as multi-step hallucination of text without acoustic evidence. In this paper we propose LookAhead that makes text representations more acoustically grounded by looking ahead into the future within the audio input. This technique yields a significant 5%-20% relative reduction in word error rate on both in-domain and out-of-domain evaluation sets.
    摘要

DyCL: Dynamic Neural Network Compilation Via Program Rewriting and Graph Optimization

  • paper_url: http://arxiv.org/abs/2307.04963
  • repo_url: None
  • paper_authors: Simin Chen, Shiyi Wei, Cong Liu, Wei Yang
  • for: 该论文的目的是解决现有的深度学习(DL)编译器无法正确编译动态神经网络(DyNNs)的问题。
  • methods: 该论文提出了一种通用的方法,使得任何现有的DL编译器都可以成功编译DyNNs。该方法包括程序分析和程序转换技术,将动态神经网络转换为多个子神经网络,每个子神经网络都是无条件语句的,可以独立编译。此外,该方法还synthesizes一个主机模块,模拟动态神经网络的控制流,并且实现了对动态神经网络的邀ocation。
  • results: 论文的实验表明,使用该方法可以成功编译所有的动态神经网络,并且生成的执行代码 exhibits significatively improved performance,在特定的情况下,运行速度可以达到1.12倍至20.21倍。
    Abstract DL compiler's primary function is to translate DNN programs written in high-level DL frameworks such as PyTorch and TensorFlow into portable executables. These executables can then be flexibly executed by the deployed host programs. However, existing DL compilers rely on a tracing mechanism, which involves feeding a runtime input to a neural network program and tracing the program execution paths to generate the computational graph necessary for compilation. Unfortunately, this mechanism falls short when dealing with modern dynamic neural networks (DyNNs) that possess varying computational graphs depending on the inputs. Consequently, conventional DL compilers struggle to accurately compile DyNNs into executable code. To address this limitation, we propose \tool, a general approach that enables any existing DL compiler to successfully compile DyNNs. \tool tackles the dynamic nature of DyNNs by introducing a compilation mechanism that redistributes the control and data flow of the original DNN programs during the compilation process. Specifically, \tool develops program analysis and program transformation techniques to convert a dynamic neural network into multiple sub-neural networks. Each sub-neural network is devoid of conditional statements and is compiled independently. Furthermore, \tool synthesizes a host module that models the control flow of the DyNNs and facilitates the invocation of the sub-neural networks. Our evaluation demonstrates the effectiveness of \tool, achieving a 100\% success rate in compiling all dynamic neural networks. Moreover, the compiled executables generated by \tool exhibit significantly improved performance, running between $1.12\times$ and $20.21\times$ faster than the original DyNNs executed on general-purpose DL frameworks.
    摘要 DL编译器的主要功能是将深度学习(DL)程序从高级DL框架 such as PyTorch和TensorFlow转换为可移植的执行程序。这些执行程序可以在部署的主机程序上灵活执行。然而,现有的DL编译器都是通过跟踪机制来进行编译,即将运行时输入传递给神经网络程序,并跟踪程序执行的路径来生成必要的计算图来进行编译。然而,这种机制无法正确编译动态神经网络(DyNNs),因为DyNNs的计算图会根据输入而变化。因此,传统的DL编译器无法正确编译DyNNs。为解决这种限制,我们提出了\tool,一种通用的方法,可以使任何现有的DL编译器成功编译DyNNs。\tool对神经网络程序进行分析和转换,将动态神经网络转换为多个子神经网络。每个子神经网络都是无条件语句的,可以独立编译。此外,\tool Synthesize主机模块,用于模拟DyNNs的控制流并调用子神经网络。我们的评估表明,\tool可以成功编译所有动态神经网络,并且生成的执行程序展现出了明显的性能提升,在特定的情况下,运行速度可以达到1.12倍至20.21倍。

SimpleMTOD: A Simple Language Model for Multimodal Task-Oriented Dialogue with Symbolic Scene Representation

  • paper_url: http://arxiv.org/abs/2307.04907
  • repo_url: None
  • paper_authors: Bhathiya Hemanthage, Christian Dondrup, Phil Bartie, Oliver Lemon
  • for: 这个论文主要针对 Multimodal task-oriented dialogues 的问题,旨在提出一种简单的语言模型 SimpleMTOD,可以有效地在多模式对话中完成各种任务。
  • methods: 该论文采用了大规模的 transformer 架构,并使用了预训练的 GPT-2 进行转移学习,以便在多模式对话中进行序列预测任务。此外,论文还引入了本地和非本地token,以capture视觉场景中对象的 semantics。
  • results: SimpleMTOD 在 Response Generation 子任务中 achiev 了 SIMMC 2.0 test-std 数据集中的state-of-the-art BLEU 分数 (0.327),并在其他多模式子任务中(Disambiguation、Coreference Resolution、Dialog State Tracking)表现在 par。这些成绩尽管使用了一种 minimalist 的方法来提取视觉(和非视觉)信息,并没有采用任务特有的结构变化,如分类头。
    Abstract SimpleMTOD is a simple language model which recasts several sub-tasks in multimodal task-oriented dialogues as sequence prediction tasks. SimpleMTOD is built on a large-scale transformer-based auto-regressive architecture, which has already proven to be successful in uni-modal task-oriented dialogues, and effectively leverages transfer learning from pre-trained GPT-2. In-order to capture the semantics of visual scenes, we introduce both local and de-localized tokens for objects within a scene. De-localized tokens represent the type of an object rather than the specific object itself and so possess a consistent meaning across the dataset. SimpleMTOD achieves a state-of-the-art BLEU score (0.327) in the Response Generation sub-task of the SIMMC 2.0 test-std dataset while performing on par in other multimodal sub-tasks: Disambiguation, Coreference Resolution, and Dialog State Tracking. This is despite taking a minimalist approach for extracting visual (and non-visual) information. In addition the model does not rely on task-specific architectural changes such as classification heads.
    摘要 SimpleMTOD 是一种简单的语言模型,它将多Modal 任务对话中的多个子任务转化为序列预测任务。 SimpleMTOD 基于大规模 transformer 自动生成架构,这种架构在单Modal 任务对话中已经证明成功,并且有效地利用了预训练 GPT-2 的转移学习。为了捕捉视觉场景的 semantics,我们引入了场景中对象的本地和 decomposite токен。 decomposite токен表示对象的类型而不是具体的对象,因此具有数据集中的一致性。 SimpleMTOD 在 Response Generation 子任务中取得了 SIMMC 2.0 测试标准数据集中的state-of-the-art BLEU 分数(0.327),并在其他多Modal 子任务中(歧义解决、核心寻引和对话状态跟踪)表现相当。这是不withdrawing 任务特有的 architectural 变化,如分类头。

Entity Identifier: A Natural Text Parsing-based Framework For Entity Relation Extraction

  • paper_url: http://arxiv.org/abs/2307.04892
  • repo_url: None
  • paper_authors: El Mehdi Chouham, Jessica López Espejel, Mahaman Sanoussi Yahaya Alassan, Walid Dahhane, El Hassane Ettifouri
  • for: 本研究旨在提高对象对象编程中代码生成的效率和质量,通过使用自然语言处理技术自动生成CRUD(创建、读取、更新、删除)类代码。
  • methods: 本研究使用自然语言处理技术提取需求描述中的结构化信息,并使用一种名为“实体树”的表示方式来模型这些信息。同时,我们还创建了一个评估数据集来评估我们的方法的效果。
  • results: 我们的研究表明,使用自然语言处理技术可以高效地提取需求描述中的结构化信息,并生成高质量的CRUD类代码。
    Abstract The field of programming has a diversity of paradigms that are used according to the working framework. While current neural code generation methods are able to learn and generate code directly from text, we believe that this approach is not optimal for certain code tasks, particularly the generation of classes in an object-oriented project. Specifically, we use natural language processing techniques to extract structured information from requirements descriptions, in order to automate the generation of CRUD (Create, Read, Update, Delete) class code. To facilitate this process, we introduce a pipeline for extracting entity and relation information, as well as a representation called an "Entity Tree" to model this information. We also create a dataset to evaluate the effectiveness of our approach.
    摘要 programming的领域有多种程序模式,根据工作框架进行使用。现有的神经网络代码生成方法可以直接从文本学习和生成代码,但我们认为这种方法不适用于某些代码任务,特别是对象封装项目中的类生成。我们使用自然语言处理技术来提取需求描述中的结构化信息,以自动生成CRUD(创建、读取、更新、删除)类代码。为此,我们提出了一个抽取实体和关系信息的管道,以及一种叫“实体树”的表示方式。我们还创建了一个评估效果的数据集。

SITTA: A Semantic Image-Text Alignment for Image Captioning

  • paper_url: http://arxiv.org/abs/2307.05591
  • repo_url: https://github.com/ml-jku/semantic-image-text-alignment
  • paper_authors: Fabian Paischer, Thomas Adler, Markus Hofmarcher, Sepp Hochreiter
  • for: 本研究旨在使用预训练语言模型(LM)和预训练多Modal(image-text)模型来实现图像描述的语言能力。
  • methods: 本研究使用了两种新的建构方法来将视觉模型的语义传递到LM中,包括使用token对应关系来将多Modal语言encoder的embedding空间与预训练LM的embedding空间相对应,以及使用额外数据来直接从视觉空间到语言空间的映射。
  • results: 使用本研究的semantic映射技术,LM可以在没有梯度信息的情况下进行图像描述,并在MS-COCO和Flickr30k数据集上达到了强的描述性能。而且,even with limited data, our method partly exceeds the performance of other zero-shot and even finetuned competitors.
    Abstract Textual and semantic comprehension of images is essential for generating proper captions. The comprehension requires detection of objects, modeling of relations between them, an assessment of the semantics of the scene and, finally, representing the extracted knowledge in a language space. To achieve rich language capabilities while ensuring good image-language mappings, pretrained language models (LMs) were conditioned on pretrained multi-modal (image-text) models that allow for image inputs. This requires an alignment of the image representation of the multi-modal model with the language representations of a generative LM. However, it is not clear how to best transfer semantics detected by the vision encoder of the multi-modal model to the LM. We introduce two novel ways of constructing a linear mapping that successfully transfers semantics between the embedding spaces of the two pretrained models. The first aligns the embedding space of the multi-modal language encoder with the embedding space of the pretrained LM via token correspondences. The latter leverages additional data that consists of image-text pairs to construct the mapping directly from vision to language space. Using our semantic mappings, we unlock image captioning for LMs without access to gradient information. By using different sources of data we achieve strong captioning performance on MS-COCO and Flickr30k datasets. Even in the face of limited data, our method partly exceeds the performance of other zero-shot and even finetuned competitors. Our ablation studies show that even LMs at a scale of merely 250M parameters can generate decent captions employing our semantic mappings. Our approach makes image captioning more accessible for institutions with restricted computational resources.
    摘要 文本和semantic comprehension of images是图像描述的关键。这需要检测对象,模型对其之间的关系,评估场景的 semantics,并将提取的知识表示在语言空间。为了实现良好的语言功能而确保好的图像-语言映射,我们使用预训练的语言模型(LM) conditioned on 预训练的多Modal(图像-文本)模型,允许图像输入。但是,将视觉模型中的semantics传递到LM是一个问题。我们介绍了两种新的方法构建线性映射,成功地传递semantics между两个预训练模型的embedding空间。第一种将图像语言encoder的embedding空间与预训练LM的embedding空间对应。第二种利用额外数据,包括图像-文本对,直接从视觉空间构建映射到语言空间。使用我们的semantic mapping,我们可以让LM在 gradient information 不可用的情况下进行图像描述。通过使用不同的数据源,我们在 MS-COCO 和 Flickr30k 数据集上实现了强大的描述性能。即使面临有限的数据,我们的方法在其他零 shot 和精度调整的竞争者之上 partly exceeds 的表现。我们的abbilation study 表明,只有250M parameters 的LM可以生成不错的描述。我们的方法使图像描述更加可 accessible для有限的计算资源的机构。

Retrieval of phonemes and Kohonen algorithm

  • paper_url: http://arxiv.org/abs/2307.07407
  • repo_url: None
  • paper_authors: Brunello Tirozzi, Orchidea Maria Lecian
  • for: 这种phoneme-retrieval技术是为了回归特定类型的数据结构而设计的。
  • methods: 这种网络使用一个初始化的 neurons 集合,其数量约等于数据集中典型结构的数量。
  • results: 这种网络可以回归特定类型的数据,但它的学习过程可能受到样本的影响,而且只能回归特定类型的数据。
    Abstract A phoneme-retrieval technique is proposed, which is due to the particular way of the construction of the network. An initial set of neurons is given. The number of these neurons is approximately equal to the number of typical structures of the data. For example if the network is built for voice retrieval then the number of neurons must be equal to the number of characteristic phonemes of the alphabet of the language spoken by the social group to which the particular person belongs. Usually this task is very complicated and the network can depend critically on the samples used for the learning. If the network is built for image retrieval then it works only if the data to be retrieved belong to a particular set of images. If the network is built for voice recognition it works only for some particular set of words. A typical example is the words used for the flight of airplanes. For example a command like the "airplane should make a turn of 120 degrees towards the east" can be easily recognized by the network if a suitable learning procedure is used.
    摘要 提出一种phoneme-retrieval技术,归功于网络的特定构造方式。初始化一组神经元,其数量约等于数据的典型结构数量。例如,如果建立语音 retrieve 网络,则神经元数量必须等于语言所用的字母系统中的特征Phoneemes数量。通常,这个任务非常复杂,网络的学习过程 critically 取决于使用的样本。如果建立图像 retrieve 网络,则只能对特定集合的图像进行 Retrieval。如果建立语音识别网络,则只能识别特定集合的单词。例如,一个常见的命令如“飞机 Should Make a turn of 120 degrees towards the east”可以由网络轻松地认可,只要使用适当的学习过程。

cs.LG - 2023-07-11

Sports Betting: an application of neural networks and modern portfolio theory to the English Premier League

  • paper_url: http://arxiv.org/abs/2307.13807
  • repo_url: None
  • paper_authors: Vélez Jiménez, Román Alberto, Lecuanda Ontiveros, José Manuel, Edgar Possani
  • for: 这篇研究旨在优化运动赌博策略,应用 Von Neumann-Morgenstern 预期价理论、深度学习技术和进阶的 Kelly 条件。
  • methods: 研究结合神经网络模型和资产估值优化,实现了20/21英超联赛第二季后的135.8%资产增值。
  • results: 研究获得了实用的对预测足球赛果的神经网络模型,并评估了完整和限制策略的性能、风险管理和多样化。
    Abstract This paper presents a novel approach for optimizing betting strategies in sports gambling by integrating Von Neumann-Morgenstern Expected Utility Theory, deep learning techniques, and advanced formulations of the Kelly Criterion. By combining neural network models with portfolio optimization, our method achieved remarkable profits of 135.8% relative to the initial wealth during the latter half of the 20/21 season of the English Premier League. We explore complete and restricted strategies, evaluating their performance, risk management, and diversification. A deep neural network model is developed to forecast match outcomes, addressing challenges such as limited variables. Our research provides valuable insights and practical applications in the field of sports betting and predictive modeling.
    摘要 Translation notes:* "Von Neumann-Morgenstern Expected Utility Theory" is translated as "尼采曼-摩根斯特恩预期风险理论" (Niàimèi-Mògēngshìyì Expected Utility Theory)* "deep learning techniques" is translated as "深度学习技术" (shēngrán xuéxí jīshù)* "advanced formulations of the Kelly Criterion" is translated as "凯利 критери亮的高级形式" (Kēlǐ Críterion de gāojí xíngshì)* "English Premier League" is translated as "英超联赛" (Yīngcháo Liánshì)* "match outcomes" is translated as "赛事结果" (sài shì jiéguǒ)

Stochastic Nested Compositional Bi-level Optimization for Robust Feature Learning

  • paper_url: http://arxiv.org/abs/2307.05384
  • repo_url: None
  • paper_authors: Xuxing Chen, Krishnakumar Balasubramanian, Saeed Ghadimi
  • for: 解决嵌套compose函数的二级优化问题
  • methods: 使用stochev approximation algorithm,不需要matrix inversion或mini-batch
  • results: 可以达到$\epsilon$-站点解的解决方案,复杂度约为$\tilde{O}_T(1/\epsilon^2)$
    Abstract We develop and analyze stochastic approximation algorithms for solving nested compositional bi-level optimization problems. These problems involve a nested composition of $T$ potentially non-convex smooth functions in the upper-level, and a smooth and strongly convex function in the lower-level. Our proposed algorithm does not rely on matrix inversions or mini-batches and can achieve an $\epsilon$-stationary solution with an oracle complexity of approximately $\tilde{O}_T(1/\epsilon^{2})$, assuming the availability of stochastic first-order oracles for the individual functions in the composition and the lower-level, which are unbiased and have bounded moments. Here, $\tilde{O}_T$ hides polylog factors and constants that depend on $T$. The key challenge we address in establishing this result relates to handling three distinct sources of bias in the stochastic gradients. The first source arises from the compositional nature of the upper-level, the second stems from the bi-level structure, and the third emerges due to the utilization of Neumann series approximations to avoid matrix inversion. To demonstrate the effectiveness of our approach, we apply it to the problem of robust feature learning for deep neural networks under covariate shift, showcasing the benefits and advantages of our methodology in that context.
    摘要 我们开发和分析Stochastic Approximation Algorithm,用于解决嵌套组合的 би-级优化问题。这些问题包含 $T$ 个可能不准确的凸函数的嵌套作用在上级,以及一个准确的凸函数在下级。我们的提议的算法不需要矩阵反转或 mini-batch,可以在 $\epsilon$ 静态环境下获得 $\tilde{O}_T(1/\epsilon^{2})$ 精度,假设有随机首ORDER oracle 对各个函数的拟合和下级,这些 oracle 是无偏见的和有 bounded moments。在这里, $\tilde{O}_T$ 隐藏了 polylog 因子和常数,它们取决于 $T$。我们在实现这个结果时面临的主要挑战是处理 Stochastic Gradient 中的三种偏差。第一种来自于嵌套结构,第二种来自于 bi-level 结构,第三种来自于使用 Neumann 系列近似来避免矩阵反转。为了证明我们的方法的效果,我们将它应用于深度神经网络中的 Robust Feature Learning 问题,并通过这个例子展示了我们的方法的优势和优点。

Combating Data Imbalances in Federated Semi-supervised Learning with Dual Regulators

  • paper_url: http://arxiv.org/abs/2307.05358
  • repo_url: None
  • paper_authors: Sikai Bai, Shuaicheng Li, Weiming Zhuang, Jie Zhang, Song Guo, Kunlin Yang, Jun Hou, Shuai Zhang, Junyu Gao, Shuai Yi
  • for: 这个论文targets federated semi-supervised learning (FSSL) in decentralized heterogeneous data, with a focus on addressing the challenge of non-identical data distribution within and across clients.
  • methods: 该论文提出了一种新的FSSL框架,called FedDure,which uses dual regulators (C-reg和F-reg) to address the assumption of independent and identically distributed (IID) labeled data across clients and consistent class distribution between labeled and unlabeled data within a client.
  • results: 该论文的实验表明,FedDure比现有方法在各种场景中表现出色,尤其是在CIFAR-10和CINIC-10 datasets上,提高了更多于11%的性能。
    Abstract Federated learning has become a popular method to learn from decentralized heterogeneous data. Federated semi-supervised learning (FSSL) emerges to train models from a small fraction of labeled data due to label scarcity on decentralized clients. Existing FSSL methods assume independent and identically distributed (IID) labeled data across clients and consistent class distribution between labeled and unlabeled data within a client. This work studies a more practical and challenging scenario of FSSL, where data distribution is different not only across clients but also within a client between labeled and unlabeled data. To address this challenge, we propose a novel FSSL framework with dual regulators, FedDure.} FedDure lifts the previous assumption with a coarse-grained regulator (C-reg) and a fine-grained regulator (F-reg): C-reg regularizes the updating of the local model by tracking the learning effect on labeled data distribution; F-reg learns an adaptive weighting scheme tailored for unlabeled instances in each client. We further formulate the client model training as bi-level optimization that adaptively optimizes the model in the client with two regulators. Theoretically, we show the convergence guarantee of the dual regulators. Empirically, we demonstrate that FedDure is superior to the existing methods across a wide range of settings, notably by more than 11% on CIFAR-10 and CINIC-10 datasets.
    摘要 federated 学习已经成为了learn from 分布式不同数据的popular方法。 federated 半supervised 学习(FSSL)emerges 用于从小量标签数据中训练模型,由于分布式客户端上标签数据的稀缺。 existing FSSL 方法假设客户端上的标签数据是独立同分布的(IID),并且在客户端内部标签和未标签数据的分布相同。 this work studies 一种更实际和挑战的scenario of FSSL,where data distribution is different not only across clients but also within a client between labeled and unlabeled data. To address this challenge, we propose a novel FSSL framework with dual regulators, FedDure。FedDure lifts the previous assumption with a coarse-grained regulator (C-reg) and a fine-grained regulator (F-reg): C-reg regularizes the updating of the local model by tracking the learning effect on labeled data distribution; F-reg learns an adaptive weighting scheme tailored for unlabeled instances in each client. We further formulate the client model training as bi-level optimization that adaptively optimizes the model in the client with two regulators. Theoretically, we show the convergence guarantee of the dual regulators. Empirically, we demonstrate that FedDure is superior to the existing methods across a wide range of settings, notably by more than 11% on CIFAR-10 and CINIC-10 datasets.

Tracking Most Significant Shifts in Nonparametric Contextual Bandits

  • paper_url: http://arxiv.org/abs/2307.05341
  • repo_url: None
  • paper_authors: Joe Suk, Samory Kpotufe
  • For: 这篇论文主要研究非参数contextual bandit,其中 lipschitz均值函数可能随时间变化。* Methods: 作者首先确定了这种情况下的最佳动态快悔率率,并证明现有方法在这种设定下是不优秀的。然后,他们提出了一种新的定义,即“经验性变化”,以更好地考虑本地性。* Results: 作者的主要结果是证明这种更宽容的定义可以实际应用于 adaptive 算法。
    Abstract We study nonparametric contextual bandits where Lipschitz mean reward functions may change over time. We first establish the minimax dynamic regret rate in this less understood setting in terms of number of changes $L$ and total-variation $V$, both capturing all changes in distribution over context space, and argue that state-of-the-art procedures are suboptimal in this setting. Next, we tend to the question of an adaptivity for this setting, i.e. achieving the minimax rate without knowledge of $L$ or $V$. Quite importantly, we posit that the bandit problem, viewed locally at a given context $X_t$, should not be affected by reward changes in other parts of context space $\cal X$. We therefore propose a notion of change, which we term experienced significant shifts, that better accounts for locality, and thus counts considerably less changes than $L$ and $V$. Furthermore, similar to recent work on non-stationary MAB (Suk & Kpotufe, 2022), experienced significant shifts only count the most significant changes in mean rewards, e.g., severe best-arm changes relevant to observed contexts. Our main result is to show that this more tolerant notion of change can in fact be adapted to.
    摘要 我们研究非参数化上下文带狗,其中 lipschitz 平均奖励函数可能随时间变化。我们首先将最佳动态对抗 regret 率定义为函数数量 $L$ 和总变化 $V$,这两个量都捕捉了上下文空间中的所有变化,并证明了现有的方法在这个设定下是不佳的。接下来,我们对这个设定进行了适应,即在不知道 $L$ 或 $V$ 的情况下 achieving 最佳动态对抗 regret 率。我们认为在上下文空间中的带狗问题,应该忽略其他部分上下文空间中的奖励变化。因此,我们提出了一个称为“体验了大幅度变化”的概念,这个概念更好地考虑了地方性,因此只计算了上下文空间中的一部分变化。此外,我们还与最近的非站点带狗研究(Suk & Kpotufe, 2022)相似,只计算了最重要的奖励变化,例如在观察到的上下文中的严重最好臂变化。我们的主要结果是证明这个更宽容的变化定义可以实际地适应。

Predicting small molecules solubilities on endpoint devices using deep ensemble neural networks

  • paper_url: http://arxiv.org/abs/2307.05318
  • repo_url: https://github.com/ur-whitelab/mol.dev
  • paper_authors: Mayk Caldas Ramos, Andrew D. White
  • for: 这个研究旨在提高溶解性预测的精度和计算效率,同时解决使用群体基于的方法的使用问题。
  • methods: 该研究使用深度学习模型来预测溶解性,并通过提供Predictive uncertainty来衡量模型的不确定性。
  • results: 研究结果表明,该模型可以得到满意的溶解性预测结果,并且可以帮助创建溶解性预测模型,既能够考虑不确定性,又能够使用者友好。
    Abstract Aqueous solubility is a valuable yet challenging property to predict. Computing solubility using first-principles methods requires accounting for the competing effects of entropy and enthalpy, resulting in long computations for relatively poor accuracy. Data-driven approaches, such as deep learning, offer improved accuracy and computational efficiency but typically lack uncertainty quantification. Additionally, ease of use remains a concern for any computational technique, resulting in the sustained popularity of group-based contribution methods. In this work, we addressed these problems with a deep learning model with predictive uncertainty that runs on a static website (without a server). This approach moves computing needs onto the website visitor without requiring installation, removing the need to pay for and maintain servers. Our model achieves satisfactory results in solubility prediction. Furthermore, we demonstrate how to create molecular property prediction models that balance uncertainty and ease of use. The code is available at \url{https://github.com/ur-whitelab/mol.dev}, and the model is usable at \url{https://mol.dev}.
    摘要 溶解性是一个有价值又具有挑战性的性质,计算溶解性使用基于原理方法需要考虑竞争的 entropy 和 enthalpy 效应,导致计算效率低下,准确性也不高。数据驱动方法,如深度学习,可以提高准确性和计算效率,但通常缺乏 uncertainty 评估。此外,使用容易性也是一个关键问题,导致群组基于的贡献方法仍然具有广泛的应用。在这种情况下,我们采用了一种深度学习模型,具有预测 uncertainty,运行在静态网站(无需服务器)上。这种方法将计算需求卷积到网站访问者身上,不需要安装和维护服务器。我们的模型可以达到溶解性预测的满意结果。此外,我们还示出了如何创建分子性质预测模型,既具有 uncertainty 也具有容易使用性。代码可以在 \url{https://github.com/ur-whitelab/mol.dev} 上获取,模型可以在 \url{https://mol.dev} 上使用。

Discovering Symbolic Laws Directly from Trajectories with Hamiltonian Graph Neural Networks

  • paper_url: http://arxiv.org/abs/2307.05299
  • repo_url: None
  • paper_authors: Suresh Bishnoi, Ravinder Bhattoo, Jayadeva, Sayan Ranu, N M Anoop Krishnan
  • for: 本研究旨在通过数学方法发现自然系统的互动规律。
  • methods: 本研究使用了加权图 neural network (HGNN) 来学习物理系统的动态。HGNN 是一种符合物理规则的神经网络,可以直接从系统的轨迹中学习动态。
  • results: 研究发现,HGNN 可以很好地适应不同的物理系统,并且可以从小量数据中学习出高度一致的动态。此外,HGNN 还可以扩展到更大的系统大小和混合系统中,并且可以通过符号 regresion 推导出互动方程。
    Abstract The time evolution of physical systems is described by differential equations, which depend on abstract quantities like energy and force. Traditionally, these quantities are derived as functionals based on observables such as positions and velocities. Discovering these governing symbolic laws is the key to comprehending the interactions in nature. Here, we present a Hamiltonian graph neural network (HGNN), a physics-enforced GNN that learns the dynamics of systems directly from their trajectory. We demonstrate the performance of HGNN on n-springs, n-pendulums, gravitational systems, and binary Lennard Jones systems; HGNN learns the dynamics in excellent agreement with the ground truth from small amounts of data. We also evaluate the ability of HGNN to generalize to larger system sizes, and to hybrid spring-pendulum system that is a combination of two original systems (spring and pendulum) on which the models are trained independently. Finally, employing symbolic regression on the learned HGNN, we infer the underlying equations relating the energy functionals, even for complex systems such as the binary Lennard-Jones liquid. Our framework facilitates the interpretable discovery of interaction laws directly from physical system trajectories. Furthermore, this approach can be extended to other systems with topology-dependent dynamics, such as cells, polydisperse gels, or deformable bodies.
    摘要 Physical systems' 时间演化是用导函数方程表示的,这些方程取决于抽象量如能量和力。传统上,这些量是基于观察量如位置和速度来 derivation 的。找出这些指导符号法则是理解自然系统的交互的键。我们现在提出了一种哈密顿图 neural network(HGNN),这是一种符合物理规则的图 neural network,可以直接从系统轨迹中学习系统的动力学。我们在 n-spring、n-pendulum、重力系统和二元 Lenard-Jones 系统上测试了 HGNN,它在小数据量下与真实值一致地学习了系统的动力学。我们还评估了 HGNN 的扩展性和可重复性,以及将两个独立训练的系统(春和振荡)结合在一起的 hybrid 春振荡系统的性能。最后,我们使用符号回归来推导出在 energy 函数上的下文依赖的方程,包括复杂的二元 Lenard-Jones 液体系统。我们的框架可以直接从物理系统轨迹中可读地找到交互的法则,并且可以扩展到其他具有体系依赖动力学的系统,如细胞、多种 gel 或可变形体。

On the Need for a Language Describing Distribution Shifts: Illustrations on Tabular Datasets

  • paper_url: http://arxiv.org/abs/2307.05284
  • repo_url: https://github.com/namkoong-lab/whyshift
  • paper_authors: Jiashuo Liu, Tianyu Wang, Peng Cui, Hongseok Namkoong
  • for: This paper aims to investigate natural shifts in tabular datasets and the impact of these shifts on algorithmic performance.
  • methods: The authors use a thorough investigation of 5 tabular datasets and 86,000 model configurations to identify the most prevalent types of distribution shifts, specifically $Y|X$-shifts. They also build an empirical testbed called WhyShift to characterize and benchmark performance over different types of shifts.
  • results: The authors find that $Y|X$-shifts are the most prevalent type of shift in tabular settings, and they identify covariate regions that suffer the biggest $Y|X$-shifts. They discuss the implications of these shifts for algorithmic and data-based interventions.
    Abstract Different distribution shifts require different algorithmic and operational interventions. Methodological research must be grounded by the specific shifts they address. Although nascent benchmarks provide a promising empirical foundation, they implicitly focus on covariate shifts, and the validity of empirical findings depends on the type of shift, e.g., previous observations on algorithmic performance can fail to be valid when the $Y|X$ distribution changes. We conduct a thorough investigation of natural shifts in 5 tabular datasets over 86,000 model configurations, and find that $Y|X$-shifts are most prevalent. To encourage researchers to develop a refined language for distribution shifts, we build WhyShift, an empirical testbed of curated real-world shifts where we characterize the type of shift we benchmark performance over. Since $Y|X$-shifts are prevalent in tabular settings, we identify covariate regions that suffer the biggest $Y|X$-shifts and discuss implications for algorithmic and data-based interventions. Our testbed highlights the importance of future research that builds an understanding of how distributions differ.
    摘要 不同的分布偏移需要不同的算法和操作干预。方法研究必须基于具体的偏移来定制。虽然初始的标准模型提供了一个有前途的基础,但它们默认地关注 covariate 偏移,并且研究成果的有效性取决于偏移的类型,例如,之前对算法性能的评估可能无法在 $Y|X$ 分布变化时保持有效。我们对 5 个表格数据集进行了全面的调查,并发现了 $Y|X$-偏移是最普遍的。为促进研究人员开发更加细化的分布偏移语言,我们建立了 WhyShift,一个基于实际情况的偏移测试床,我们在这里characterize了我们测试性能的类型的偏移。由于 tabular 设置中 $Y|X$-偏移最普遍,我们将covariate 区域分析出最大的 $Y|X$-偏移,并讨论了对算法和数据基于干预的影响。我们的测试床表明未来研究应该建立一个对分布的不同而建立的理解,以便更好地适应具体的应用场景。

CareFall: Automatic Fall Detection through Wearable Devices and AI Methods

  • paper_url: http://arxiv.org/abs/2307.05275
  • repo_url: None
  • paper_authors: Juan Carlos Ruiz-Garcia, Ruben Tolosana, Ruben Vera-Rodriguez, Carlos Moro
  • for: 这篇研究旨在开发一个自动检测跌倒的系统,以减轻老年人跌倒所带来的负面影响。
  • methods: 这篇研究使用了智能手表上的加速度和陀螺仪时间信号,并使用人工智能方法进行特征提取和类别。
  • results: 实验结果显示,使用机器学习方法结合加速度和陀螺仪信息的方法,在准确性、敏感度和特异度方面都高于阈值基本方法。
    Abstract The aging population has led to a growing number of falls in our society, affecting global public health worldwide. This paper presents CareFall, an automatic Fall Detection System (FDS) based on wearable devices and Artificial Intelligence (AI) methods. CareFall considers the accelerometer and gyroscope time signals extracted from a smartwatch. Two different approaches are used for feature extraction and classification: i) threshold-based, and ii) machine learning-based. Experimental results on two public databases show that the machine learning-based approach, which combines accelerometer and gyroscope information, outperforms the threshold-based approach in terms of accuracy, sensitivity, and specificity. This research contributes to the design of smart and user-friendly solutions to mitigate the negative consequences of falls among older people.
    摘要 随着人口老龄化,社会中的滥落事件数量在全球范围内呈增加趋势。本文介绍了一种基于智能手表和人工智能方法的自动滥落检测系统(FDS),称为CareFall。CareFall利用智能手表上的加速度和自转器时间信号进行特征提取和分类,并使用两种不同的方法:一种是基于阈值的方法,另一种是基于机器学习的方法。在两个公共数据库上进行了实验,结果显示,基于加速度和自转器信息的机器学习方法的检测精度、敏感度和特征鲁平性比基于阈值的方法高。这项研究增加了设计智能和易用的解决方案,以降低老年人滥落的负面影响。

U-CREAT: Unsupervised Case Retrieval using Events extrAcTion

  • paper_url: http://arxiv.org/abs/2307.05260
  • repo_url: https://github.com/exploration-lab/il-pcr
  • paper_authors: Abhinav Joshi, Akshat Sharma, Sai Kiran Tanikella, Ashutosh Modi
  • For: The paper is written for the task of Prior Case Retrieval (PCR) in the legal domain, specifically proposing a new large benchmark (IL-PCR corpus) and exploring the role of events in legal case retrieval.* Methods: The paper proposes an unsupervised retrieval method-based pipeline called U-CREAT (Unsupervised Case Retrieval using Events Extraction), which significantly increases performance compared to BM25 and makes retrieval faster by a considerable margin.* Results: The proposed system is generic and shows state-of-the-art performance on the benchmarks for both the Indian and Canadian legal systems (IL-PCR and COLIEE corpora).Here are the three points in Simplified Chinese text:* For: 这篇论文是为了legal domain中的 Prior Case Retrieval (PCR)任务而写的,具体来说是提出一个新的大型benchmark(IL-PCR corpus),并探讨法律案例之间的事件的角色。* Methods: 这篇论文提出了一个无监督的检索方法基于管道called U-CREAT (Unsupervised Case Retrieval using Events Extraction),它与BM25相比显著提高了性能,并且使检索速度减少了较大的margin,因此适用于实时案例检索系统。* Results: 提posed系统是通用的,并在两个不同的法律系统(印度和加拿大)的benchmark上达到了状态计算机表现。
    Abstract The task of Prior Case Retrieval (PCR) in the legal domain is about automatically citing relevant (based on facts and precedence) prior legal cases in a given query case. To further promote research in PCR, in this paper, we propose a new large benchmark (in English) for the PCR task: IL-PCR (Indian Legal Prior Case Retrieval) corpus. Given the complex nature of case relevance and the long size of legal documents, BM25 remains a strong baseline for ranking the cited prior documents. In this work, we explore the role of events in legal case retrieval and propose an unsupervised retrieval method-based pipeline U-CREAT (Unsupervised Case Retrieval using Events Extraction). We find that the proposed unsupervised retrieval method significantly increases performance compared to BM25 and makes retrieval faster by a considerable margin, making it applicable to real-time case retrieval systems. Our proposed system is generic, we show that it generalizes across two different legal systems (Indian and Canadian), and it shows state-of-the-art performance on the benchmarks for both the legal systems (IL-PCR and COLIEE corpora).
    摘要 PCR任务在法律领域是自动提供相关的前例案例,以便在查询案例中提供有用的信息。为了进一步推动PCR研究,在这篇论文中,我们提出了一个新的大量benchmark(英文) дляPCR任务:IL-PCR(印度法律前例案例采集) corpora。由于案例相关性的复杂性和法律文档的长度,BM25仍然是PCR任务中的强基线。在这种工作中,我们探讨了法律案例中事件的角色,并提出了一种无监督检索方法 pipeline U-CREAT(无监督案例检索使用事件提取)。我们发现,我们提出的无监督检索方法可以在BM25和事件提取方法之间提高性能,并且可以在实时案例检索系统中提高检索速度,使其成为实时案例检索系统中的可靠选择。我们的提出的系统是通用的,我们表明它可以在印度和加拿大两个不同的法律系统中实现状态的表现,并且在IL-PCR和COLIEE corpora上达到了状态的性能。

MAP- and MLE-Based Teaching

  • paper_url: http://arxiv.org/abs/2307.05252
  • repo_url: None
  • paper_authors: Hans Ulrich Simon, Jan Arne Telle
  • for: 这个论文主要研究的是学习概率的概念推理问题,具体来说是learner L从一个观察集Z中INFER一个隐藏的概念。
  • methods: 该论文基于 Ferri et al.的工作,假设learner L被参数化为 prior P(c)和c-conditioned likelihoods P(z|c),其中c是一个给定的概念集C中的一个概念,z是一个观察集Z中的一个观察。learner L被称为MAP-learner(resp. MLE-learner),如果它将观察集S视为一个随机抽样,并返回最大a posteriori probabilities(resp.最大c-conditional likelihood)。
  • results: 该论文的主要结果是,这种教学模型具有一些愉悦的 monotonicity 性质,并且可以通过不同的抽样方式来关联ogether。在特定的情况下(即概念是集合,观察是0,1-标记的示例), authors 还得到了一些额外的结果,例如,MAP-和MLE-教学维度可以被图 theoretically characterize,并且可以通过VC-dimension和其他 combinatorial parameters来Upper bound。
    Abstract Imagine a learner L who tries to infer a hidden concept from a collection of observations. Building on the work [4] of Ferri et al., we assume the learner to be parameterized by priors P(c) and by c-conditional likelihoods P(z|c) where c ranges over all concepts in a given class C and z ranges over all observations in an observation set Z. L is called a MAP-learner (resp. an MLE-learner) if it thinks of a collection S of observations as a random sample and returns the concept with the maximum a-posteriori probability (resp. the concept which maximizes the c-conditional likelihood of S). Depending on whether L assumes that S is obtained from ordered or unordered sampling resp. from sampling with or without replacement, we can distinguish four different sampling modes. Given a target concept c in C, a teacher for a MAP-learner L aims at finding a smallest collection of observations that causes L to return c. This approach leads in a natural manner to various notions of a MAP- or MLE-teaching dimension of a concept class C. Our main results are: We show that this teaching model has some desirable monotonicity properties. We clarify how the four sampling modes are related to each other. As for the (important!) special case, where concepts are subsets of a domain and observations are 0,1-labeled examples, we obtain some additional results. First of all, we characterize the MAP- and MLE-teaching dimension associated with an optimally parameterized MAP-learner graph-theoretically. From this central result, some other ones are easy to derive. It is shown, for instance, that the MLE-teaching dimension is either equal to the MAP-teaching dimension or exceeds the latter by 1. It is shown furthermore that these dimensions can be bounded from above by the so-called antichain number, the VC-dimension and related combinatorial parameters. Moreover they can be computed in polynomial time.
    摘要 学生L将尝试对一个集合观察做推理,建立在[4] Ferri等人的工作之上。我们假设学生L是受到先前知识P(c)和c- conditional likelihoods P(z|c)的参数化学生,其中c是所有概念集C中的一个概念,z是所有观察集Z中的一个观察。L被称为MAP-学生(resp. MLE-学生),如果它视观察集S为一个随机抽样,并返回概念中的最大a posteriori probabilities(resp. 最大c-conditional likelihood)。对于L是否假设S是顺序或无顺序抽样,或者是否从抽样中删除或不删除某些观察,我们可以区别出四种抽样模式。对于目标概念c在C中,教师 дляMAP-学生L的目标是找到一个最小的观察集,使L返回c。这个模型具有一些愉悦的弹性性质。我们还证明了这些抽样模式之间的相关性。对于特别的情况,其中概念是域中的子集和观察是0,1-标注的例子,我们获得了一些额外的结果。例如,我们可以Characterize MAP-和MLE-教育dimension associated with an optimally parameterized MAP-learner graph-theoretically。从这个中央结果,一些其他的结果是容易 derivable。例如,我们可以证明MAP-和MLE-教育dimension是等于或高于对方,并且可以通过 antichain number、VC-dimension和相关的 combinatorial parameters bound from above。此外,这些教育dimension可以在 polynomial time 内计算。

DRMC: A Generalist Model with Dynamic Routing for Multi-Center PET Image Synthesis

  • paper_url: http://arxiv.org/abs/2307.05249
  • repo_url: None
  • paper_authors: Zhiwen Yang, Yang Zhou, Hui Zhang, Bingzheng Wei, Yubo Fan, Yan Xu
  • For: 多中心 positron射tomography(PET)图像合成问题的目的是 recuperate low-dose PET图像从多个不同中心。* Methods: 我们开发了一种通用模型,该模型在不同中心共享结构和参数,以便利用多中心之间的共同知识。但是,这种通用模型可能会受到中心间干扰问题的影响,即不同中心的梯度方向可能不一致或 même 相反。为 Mitigate such interference, we introduce a novel dynamic routing strategy with cross-layer connections that routes data from different centers to different experts.* Results: 我们的通用模型与动态路由(DRMC)在多中心之间表现出了优秀的通用性。
    Abstract Multi-center positron emission tomography (PET) image synthesis aims at recovering low-dose PET images from multiple different centers. The generalizability of existing methods can still be suboptimal for a multi-center study due to domain shifts, which result from non-identical data distribution among centers with different imaging systems/protocols. While some approaches address domain shifts by training specialized models for each center, they are parameter inefficient and do not well exploit the shared knowledge across centers. To address this, we develop a generalist model that shares architecture and parameters across centers to utilize the shared knowledge. However, the generalist model can suffer from the center interference issue, \textit{i.e.} the gradient directions of different centers can be inconsistent or even opposite owing to the non-identical data distribution. To mitigate such interference, we introduce a novel dynamic routing strategy with cross-layer connections that routes data from different centers to different experts. Experiments show that our generalist model with dynamic routing (DRMC) exhibits excellent generalizability across centers. Code and data are available at: https://github.com/Yaziwel/Multi-Center-PET-Image-Synthesis.
    摘要

A Survey From Distributed Machine Learning to Distributed Deep Learning

  • paper_url: http://arxiv.org/abs/2307.05232
  • repo_url: https://github.com/Aryia-Behroziuan/References
  • paper_authors: Mohammad Dehghani, Zahra Yazdanparast
  • for: 本文总结了当前分布式机器学习领域的最新进展,包括分类和聚类(传统机器学习)、深度学习和深度强化学习等方法。
  • methods: 本文对分布式机器学习算法进行了详细的概述,并将其分为分类和聚类(传统机器学习)、深度学习和深度强化学习等类别。
  • results: 本文对各种分布式机器学习算法进行了评估,并将其分为深度学习和传统机器学习两类。深度学习在分布式机器学习中占据了主导地位,大多数研究都集中在这一方面。
    Abstract Artificial intelligence has achieved significant success in handling complex tasks in recent years. This success is due to advances in machine learning algorithms and hardware acceleration. In order to obtain more accurate results and solve more complex problems, algorithms must be trained with more data. This huge amount of data could be time-consuming to process and require a great deal of computation. This solution could be achieved by distributing the data and algorithm across several machines, which is known as distributed machine learning. There has been considerable effort put into distributed machine learning algorithms, and different methods have been proposed so far. In this article, we present a comprehensive summary of the current state-of-the-art in the field through the review of these algorithms. We divide this algorithms in classification and clustering (traditional machine learning), deep learning and deep reinforcement learning groups. Distributed deep learning has gained more attention in recent years and most of studies worked on this algorithms. As a result, most of the articles we discussed here belong to this category. Based on our investigation of algorithms, we highlight limitations that should be addressed in future research.
    摘要 人工智能在最近几年内已经取得了重要的成功,这种成功主要归功于机器学习算法和硬件加速。为了获得更高准确率和解决更复杂的问题,算法需要更多的数据进行训练。这些庞大数据可能需要很长时间来处理,并需要巨量的计算资源。为了解决这个问题,人们提出了分布式机器学习的想法。在这篇文章中,我们对当前领域的状况做了全面的概述,包括分类和聚类(传统机器学习)、深度学习和深度强化学习等方法。分布式深度学习在最近几年内得到了更多的关注, więc大多数研究都是在这个领域进行的。根据我们对算法的调查,我们指出了未来研究中应该解决的一些限制。

Attribute Controlled Dialogue Prompting

  • paper_url: http://arxiv.org/abs/2307.05228
  • repo_url: None
  • paper_authors: Runcheng Liu, Ahmad Rashid, Ivan Kobyzev, Mehdi Rezagholizadeh, Pascal Poupart
  • for: 这篇研究是为了提出一种新的、具体化的问题训练算法,用于控制对话生成。
  • methods: 这篇研究使用了基于实例级控制代码的问题训练算法,而不是基于对话历史。
  • results: 实验结果显示,该方法与问题训练基线比较,并且与只使用5%-6%的总参数进行精致训练相当。
    Abstract Prompt-tuning has become an increasingly popular parameter-efficient method for adapting large pretrained language models to downstream tasks. However, both discrete prompting and continuous prompting assume fixed prompts for all data samples within a task, neglecting the fact that inputs vary greatly in some tasks such as open-domain dialogue generation. In this paper, we present a novel, instance-specific prompt-tuning algorithm for dialogue generation. Specifically, we generate prompts based on instance-level control code, rather than the conversation history, to explore their impact on controlled dialogue generation. Experiments on popular open-domain dialogue datasets, evaluated on both automated metrics and human evaluation, demonstrate that our method is superior to prompting baselines and comparable to fine-tuning with only 5%-6% of total parameters.
    摘要 启发调整已成为大型预训练语言模型适应下游任务的常用方法,但是两种精确提示和连续提示都假设每个数据样本内任务中的提示是固定的,而忽视了一些任务,如开放领域对话生成,输入可能很大的变化。在这篇论文中,我们提出了一种新的实例特定的提示调整算法,具体来说,我们基于实例级控制代码而不是对话历史来生成提示,以探索它们对控制对话生成的影响。我们在流行的开放领域对话Dataset上进行了实验,并通过自动评价指标和人工评价来评价我们的方法。结果表明,我们的方法在比基准提示和只使用5%-6%的总参数进行精度训练时具有优异表现。

Supervised Attention Using Homophily in Graph Neural Networks

  • paper_url: http://arxiv.org/abs/2307.05217
  • repo_url: None
  • paper_authors: Michail Chatzianastasis, Giannis Nikolentzos, Michalis Vazirgiannis
  • for: 本研究旨在提高图像学习中Graph Attention Networks(GATs)的性能,以提高图像分类和图像推荐等任务的效果。
  • methods: 我们提出了一种新的技术,可以与任何图像学习模型结合使用,以增强GATs中节点之间的关注分布。该技术可以提高节点之间的关注分布,从而提高图像分类和图像推荐等任务的效果。
  • results: 我们在多个节点分类数据集上进行了评估,并证明了我们的方法可以提高GATs的性能,并且在某些任务上超过标准基eline模型。
    Abstract Graph neural networks have become the standard approach for dealing with learning problems on graphs. Among the different variants of graph neural networks, graph attention networks (GATs) have been applied with great success to different tasks. In the GAT model, each node assigns an importance score to its neighbors using an attention mechanism. However, similar to other graph neural networks, GATs aggregate messages from nodes that belong to different classes, and therefore produce node representations that are not well separated with respect to the different classes, which might hurt their performance. In this work, to alleviate this problem, we propose a new technique that can be incorporated into any graph attention model to encourage higher attention scores between nodes that share the same class label. We evaluate the proposed method on several node classification datasets demonstrating increased performance over standard baseline models.
    摘要 格raph神经网络已经成为处理图structured learning问题的标准方法。其中,graph attention网络(GATs)在不同任务上得到了成功应用。在GAT模型中,每个节点通过注意机制对其邻居进行重要性分配。然而,与其他图神经网络一样,GATs将来自不同类别的节点的消息聚合,因此生成的节点表示可能并不是根据不同类别而分离得到的,这可能会影响其性能。在这种情况下,我们提出了一种可以在任何图注意模型中应用的新技术,以促进同类别节点之间的高度注意分数。我们在多个节点分类数据集上评估了该方法,并证明了它与标准基准模型相比表现更好。

Score Function Gradient Estimation to Widen the Applicability of Decision-Focused Learning

  • paper_url: http://arxiv.org/abs/2307.05213
  • repo_url: None
  • paper_authors: Mattia Silvestri, Senne Berden, Jayanta Mandi, Ali İrfan Mahmutoğulları, Maxime Mulamba, Allegra De Filippo, Tias Guns, Michele Lombardi
    for: 这个论文主要用于提出一种基于决策学习的优化方法,用于解决具有未知参数的实际优化问题。methods: 这种方法基于直接最小化下游任务损失来训练机器学习模型,而不是直接最大化预测精度。它采用分布预测和分数函数梯度估计(SFGE)来计算决策学习更新,以扩展决策学习的应用范围。results: 经过实验,这种方法可以:(1)处理在目标函数和约束中都有预测的情况;(2)有效地解决两个阶段随机优化问题。
    Abstract Many real-world optimization problems contain unknown parameters that must be predicted prior to solving. To train the predictive machine learning (ML) models involved, the commonly adopted approach focuses on maximizing predictive accuracy. However, this approach does not always lead to the minimization of the downstream task loss. Decision-focused learning (DFL) is a recently proposed paradigm whose goal is to train the ML model by directly minimizing the task loss. However, state-of-the-art DFL methods are limited by the assumptions they make about the structure of the optimization problem (e.g., that the problem is linear) and by the fact that can only predict parameters that appear in the objective function. In this work, we address these limitations by instead predicting \textit{distributions} over parameters and adopting score function gradient estimation (SFGE) to compute decision-focused updates to the predictive model, thereby widening the applicability of DFL. Our experiments show that by using SFGE we can: (1) deal with predictions that occur both in the objective function and in the constraints; and (2) effectively tackle two-stage stochastic optimization problems.
    摘要 许多实际优化问题中含有未知参数,需要预测才能解决。现有的方法通常是通过提高预测精度来训练预测机器学习(ML)模型。然而,这种方法不一定能够最小化下游任务损失。决策驱动学习(DFL)是一种最近提出的方法,其目标是通过直接最小化任务损失来训练 ML 模型。然而,现有的 DFL 方法受到问题结构假设(例如,问题是线性的)和仅能预测出现在目标函数中的参数的限制。在这项工作中,我们解决这些限制,而不是直接预测参数,而是预测参数的分布,并采用分数函数梯度估计(SFGE)来计算决策关注更新,从而扩展 DFL 的应用范围。我们的实验表明,通过使用 SFGE,我们可以:(1)处理目标函数中的预测和约束中的预测;(2)有效地解决两阶段随机优化问题。

Contextual Pre-Planning on Reward Machine Abstractions for Enhanced Transfer in Deep Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2307.05209
  • repo_url: None
  • paper_authors: Guy Azran, Mohamad H. Danesh, Stefano V. Albrecht, Sarah Keren
  • for: 提高深度强化学习代理的适应能力和学习效率,使其能够更好地适应未看过的任务和环境变化。
  • methods: 使用奖机器(RM)来表示当前任务,通过生成符号表示法提供代理 Symbolic 表示当前任务的优质转移,并在多个任务之间共享这些表示,使代理可以利用已经遇到的符号和转移来增强转移。
  • results: 在多个领域中进行了实验,证明了我们的方法可以提高代理的样本效率和几个shot转移,从而提高深度强化学习代理的适应能力和学习效率。
    Abstract Recent studies show that deep reinforcement learning (DRL) agents tend to overfit to the task on which they were trained and fail to adapt to minor environment changes. To expedite learning when transferring to unseen tasks, we propose a novel approach to representing the current task using reward machines (RM), state machine abstractions that induce subtasks based on the current task's rewards and dynamics. Our method provides agents with symbolic representations of optimal transitions from their current abstract state and rewards them for achieving these transitions. These representations are shared across tasks, allowing agents to exploit knowledge of previously encountered symbols and transitions, thus enhancing transfer. Our empirical evaluation shows that our representations improve sample efficiency and few-shot transfer in a variety of domains.
    摘要

Reject option models comprising out-of-distribution detection

  • paper_url: http://arxiv.org/abs/2307.05199
  • repo_url: None
  • paper_authors: Vojtech Franc, Daniel Prusa, Jakub Paplham
  • for: 本研究旨在解决机器学习中的 OUT-OF-DISTRIBUTION(OOD)设置问题,提出了多种贡献。
  • methods: 本文提出了三种拒绝选项模型 для OOD 设置:Cost-based 模型、Bounded TPR-FPR 模型和 Bounded Precision-Recall 模型。这些模型将标准的拒绝选项模型扩展到非 OOD 设置,并定义了理想的 OOD 选择类фика器的概念。我们证明所提出的모든模型,尽管它们的不同形式化,都共享一个公共的优化策略。
  • results: 实验结果表明,使用两个选择的 OOD 检测器的uncertainty 分数来进行双重分数 OOD 方法,具有较高的性能。此外,我们提出了基于定义优化策略的新评价指标,以提供全面和可靠地评价 OOD 方法。
    Abstract The optimal prediction strategy for out-of-distribution (OOD) setups is a fundamental question in machine learning. In this paper, we address this question and present several contributions. We propose three reject option models for OOD setups: the Cost-based model, the Bounded TPR-FPR model, and the Bounded Precision-Recall model. These models extend the standard reject option models used in non-OOD setups and define the notion of an optimal OOD selective classifier. We establish that all the proposed models, despite their different formulations, share a common class of optimal strategies. Motivated by the optimal strategy, we introduce double-score OOD methods that leverage uncertainty scores from two chosen OOD detectors: one focused on OOD/ID discrimination and the other on misclassification detection. The experimental results consistently demonstrate the superior performance of this simple strategy compared to state-of-the-art methods. Additionally, we propose novel evaluation metrics derived from the definition of the optimal strategy under the proposed OOD rejection models. These new metrics provide a comprehensive and reliable assessment of OOD methods without the deficiencies observed in existing evaluation approaches.
    摘要 OPTIMAL PREDICTION STRATEGY FOR OUT-OF-DISTRIBUTION SETUPS 是机器学习领域的基本问题。在这篇论文中,我们回答这个问题,并提出了多种贡献。我们提出了三种拒绝选项模型 для OOD 设置:成本基于模型、TPR-FPR 边界值模型和精度-准确率边界值模型。这些模型将标准的非 OOD 拒绝选项模型推广到 OOD 设置,并定义了最佳 OOD 选择类фика器的概念。我们证明了所提出的모든模型,尽管它们的不同形式ulation,都共享一个共同的优化策略。受到优化策略的激发,我们提出了双分数 OOD 方法,利用两个选择的 OOD 探测器的uncertainty scores:一个专注于 OOD/ID 识别,另一个专注于错误探测。实验结果逐 consistently 表明这种简单的策略的 superior performance 与当前方法相比。此外,我们提出了基于定义最佳策略的新评价指标,用于评价 OOD 方法。这些新指标提供了全面和可靠的评价方法,不受现有评价方法的缺陷。

Differentially Private Statistical Inference through $β$-Divergence One Posterior Sampling

  • paper_url: http://arxiv.org/abs/2307.05194
  • repo_url: None
  • paper_authors: Jack Jewson, Sahra Ghalebikesabi, Chris Holmes
  • for: 本研究旨在提供一种可靠地保护敏感数据的统计分析结果的隐私保护机制,无需改变数据生成过程。
  • methods: 本研究使用 Bayesian posterior sampling 方法,通过采样 Bayesian posterior distribution 来生成私有的估计结果,而不需要人工添加噪声。
  • results: 本研究表明,使用 $\beta$D-Bayes 方法可以实现更高精度的隐私保护,同时可以用于复杂的分类器和连续回归模型,如神经网络。
    Abstract Differential privacy guarantees allow the results of a statistical analysis involving sensitive data to be released without compromising the privacy of any individual taking part. Achieving such guarantees generally requires the injection of noise, either directly into parameter estimates or into the estimation process. Instead of artificially introducing perturbations, sampling from Bayesian posterior distributions has been shown to be a special case of the exponential mechanism, producing consistent, and efficient private estimates without altering the data generative process. The application of current approaches has, however, been limited by their strong bounding assumptions which do not hold for basic models, such as simple linear regressors. To ameliorate this, we propose $\beta$D-Bayes, a posterior sampling scheme from a generalised posterior targeting the minimisation of the $\beta$-divergence between the model and the data generating process. This provides private estimation that is generally applicable without requiring changes to the underlying model and consistently learns the data generating parameter. We show that $\beta$D-Bayes produces more precise inference estimation for the same privacy guarantees, and further facilitates differentially private estimation via posterior sampling for complex classifiers and continuous regression models such as neural networks for the first time.
    摘要 dif·fer·en·tial pri·va·cy guar·an·tees allow the results of a stat·is·tical analy·sis in·volv·ing sen·sen·tive data to be re·leased with·out com·pro·mis·ing the pri·va·cy of any in·di·vid·ual tak·ing part. Achiev·ing such guar·an·tees gen·er·al·ly re·quires the in·jec·tion of noise, either di·rect·ly into para·me·ter es·ti·mates or into the es·ti·ma·tion process. In·stead of ar·ti·fi·cial·ly in·tro·duc·ing per·tur·ba·tions, sam·pling from Bay·esian po·st·erior dis·trib·u·tions has been shown to be a spe·cial case of the ex·po·nen·tial mech·a·nism, pro·duc·ing con·sis·tent, and ef·fi·cient pri·va·te es·ti·mates with·out al·ter·ing the data gen·er·a·tive process. The ap·pli·ca·tion of cur·rent ap·proach·es has, how·ev·er, been lim·it·ed by their strong bound·ing as·sump·tions which do not hold for ba·sic mod·els, such as sim·ple lin·ear re·gressors. To ame·li·or·ate this, we pro·pose $\beta$D-Bayes, a po·ster·ior sam·pling scheme from a gen·er·al·ized po·ster·ior tar·get·ing the mi·ni·miza·tion of the $\beta$-di·ver·gence be·tween the model and the data gen·er·a·tive process. This pro·vides pri·va·te es·ti·ma·tion that is gen·er·al·ly ap·plic·a·ble with·out re·quir·ing changes to the un·der·ly·ing mo·del and con·sis·tent·ly learns the data gen·er·a·tive pa·ra·me·ter. We show that $\beta$D-Bayes pro·duces more pre·cise in·fer·ence es·ti·mates for the same pri·va·cy guar·an·tees, and fur·ther fa·cil·i·tates di·ffer·en·tially pri·va·te es·ti·ma·tion via po·ster·ior sam·pling for com·plex class·i·fi·ers and con·tin·u·ous re·gress·ion mod·els such as neu·ral net·works for the first time.

Membership Inference Attacks on DNNs using Adversarial Perturbations

  • paper_url: http://arxiv.org/abs/2307.05193
  • repo_url: https://github.com/hassanalikhatim/amia
  • paper_authors: Hassan Ali, Adnan Qayyum, Ala Al-Fuqaha, Junaid Qadir
  • for: 这个论文主要目标是提出一种高置信度成员检测算法,以便在深度神经网络(DNN)训练后进行成员检测。
  • methods: 这个论文使用了现有的成员检测攻击(MI attack),并提出了两种新的攻击方法: adversarial membership inference attack(AMIA)和 enhance AMIA(E-AMIA)。这些攻击方法利用了subject的会员和非会员信息,并在一定的散度范围内进行了对准损失函数的最小化。
  • results: 论文的实验结果表明,AMIA和E-AMIA在Fashion-MNIST和MNIST datasets上的真正阳性率分别达到6%和8%,而现有的LiRA和EMIA在这些 datasets上的真正阳性率几乎为0。此外,这些攻击方法还能够在不同的训练方法和环境下进行 Transfer Learning,并且比现有的攻击方法更加稳定和可靠。
    Abstract Several membership inference (MI) attacks have been proposed to audit a target DNN. Given a set of subjects, MI attacks tell which subjects the target DNN has seen during training. This work focuses on the post-training MI attacks emphasizing high confidence membership detection -- True Positive Rates (TPR) at low False Positive Rates (FPR). Current works in this category -- likelihood ratio attack (LiRA) and enhanced MI attack (EMIA) -- only perform well on complex datasets (e.g., CIFAR-10 and Imagenet) where the target DNN overfits its train set, but perform poorly on simpler datasets (0% TPR by both attacks on Fashion-MNIST, 2% and 0% TPR respectively by LiRA and EMIA on MNIST at 1% FPR). To address this, firstly, we unify current MI attacks by presenting a framework divided into three stages -- preparation, indication and decision. Secondly, we utilize the framework to propose two novel attacks: (1) Adversarial Membership Inference Attack (AMIA) efficiently utilizes the membership and the non-membership information of the subjects while adversarially minimizing a novel loss function, achieving 6% TPR on both Fashion-MNIST and MNIST datasets; and (2) Enhanced AMIA (E-AMIA) combines EMIA and AMIA to achieve 8% and 4% TPRs on Fashion-MNIST and MNIST datasets respectively, at 1% FPR. Thirdly, we introduce two novel augmented indicators that positively leverage the loss information in the Gaussian neighborhood of a subject. This improves TPR of all four attacks on average by 2.5% and 0.25% respectively on Fashion-MNIST and MNIST datasets at 1% FPR. Finally, we propose simple, yet novel, evaluation metric, the running TPR average (RTA) at a given FPR, that better distinguishes different MI attacks in the low FPR region. We also show that AMIA and E-AMIA are more transferable to the unknown DNNs (other than the target DNN) and are more robust to DP-SGD training as compared to LiRA and EMIA.
    摘要 多种会员推测(MI)攻击已经提议用于审核目标神经网络(DNN)。给定一组主题,MI攻击可以确定目标DNN在训练过程中训练过的主题。这项工作专注于增强后期MI攻击,强调高置信度会员检测——准确率(TPR)在低 FALSE POSITIVE RATE(FPR)下。现有的工作在这个类别——概率比例攻击(LiRA)和增强MI攻击(EMIA)——只在复杂的数据集(如CIFAR-10和Imagenet)上表现出色,但在简单的数据集(Fashion-MNIST中的0% TPR,MNIST中的2%和0% TPR)上表现不佳。为解决这个问题,我们首先:一、将当前MI攻击统一为一个框架,分为三个阶段:准备阶段、指示阶段和决策阶段。二、使用框架提出两种新的攻击:(1)敌意会员推测攻击(AMIA)利用会员和非会员主题的信息,同时 adversarially 最小化一个新的损失函数,在Fashion-MNIST和MNIST数据集上达到6%的TPR;(2)增强AMIA(E-AMIA)结合EMIA和AMIA,在Fashion-MNIST和MNIST数据集上达到8%和4%的TPR,即1%的FPR。三、我们介绍两种新的扩展指标,利用损失函数的梯度信息在Gaussian neighborhood中提高TPR的平均值。这将在1% FPR下提高Fashion-MNIST和MNIST数据集的TPR平均值 by 2.5%和0.25%。最后,我们提出一个简单 yet novel的评价指标——运行TPR平均值(RTA)——更好地 отлича出不同的MI攻击在低FPR区域。我们还证明AMIA和E-AMIA在未知DNN上(与目标DNN不同)更加可转移和DP-SGD训练中更加稳定。

Using Linear Regression for Iteratively Training Neural Networks

  • paper_url: http://arxiv.org/abs/2307.05189
  • repo_url: https://github.com/sayantann11/all-classification-templetes-for-ML
  • paper_authors: Harshad Khadilkar
  • for: 这 paper 的目的是提出一种基于线性回归的神经网络学习方法,作为标准梯度下降法的一种alternative。
  • methods: 本 paper 使用的方法是基于输入层的线性组合和神经元参数( weights 和 biases)来学习神经网络的参数。作者们提出了一种可靠和快速的算法,通过工作 backwards 从输出来计算理想的输入值,并在每个神经元上更新参数和活动值。
  • results: 作者们表明,对小 проблеmlarge, more complex architectures。 In addition, the approach is more stable and faster than gradient-based methods.
    Abstract We present a simple linear regression based approach for learning the weights and biases of a neural network, as an alternative to standard gradient based backpropagation. The present work is exploratory in nature, and we restrict the description and experiments to (i) simple feedforward neural networks, (ii) scalar (single output) regression problems, and (iii) invertible activation functions. However, the approach is intended to be extensible to larger, more complex architectures. The key idea is the observation that the input to every neuron in a neural network is a linear combination of the activations of neurons in the previous layer, as well as the parameters (weights and biases) of the layer. If we are able to compute the ideal total input values to every neuron by working backwards from the output, we can formulate the learning problem as a linear least squares problem which iterates between updating the parameters and the activation values. We present an explicit algorithm that implements this idea, and we show that (at least for small problems) the approach is more stable and faster than gradient-based methods.
    摘要 我们提出了一种基于线性回归的方法来学习 neural network 的权重和偏置。这项工作是探索性的,我们限制了描述和实验到(i)简单的前向 нейрон网络,(ii)具有唯一输出(单输出)回归问题,以及(iii)可逆活化函数。然而,该方法适用于更大、更复杂的结构。关键思想是观察每个神经元在神经网络中的输入是前一层神经元的活化值和当前层参数(权重和偏置)的线性组合。如果我们可以计算每个神经元的理想总输入值,那么我们可以将学习问题转化为一个线性最小二乘问题,这个问题可以通过更新参数和活化值进行迭代来解决。我们提供了一个明确的算法,并证明(至少对小问题)该方法比梯度基本方法更稳定和更快。

Decorrelation using Optimal Transport

  • paper_url: http://arxiv.org/abs/2307.05187
  • repo_url: https://github.com/malteal/ot-decorrelation
  • paper_authors: Malte Algren, John Andrew Raine, Tobias Golling
  • for: decorrelate a continuous feature space against protected attributes with optimal transport
  • methods: Convex Neural Optimal Transport Solvers (Cnots)
  • results: achieved state-of-the-art performance in binary classification, and significantly better performance in multiclass outputs compared to the state-of-the-art.
    Abstract Being able to decorrelate a feature space from protected attributes is an area of active research and study in ethics, fairness, and also natural sciences. We introduce a novel decorrelation method using Convex Neural Optimal Transport Solvers (Cnots) that is able to decorrelate a continuous feature space against protected attributes with optimal transport. We demonstrate how well it performs in the context of jet classification in high energy physics, where classifier scores are desired to be decorrelated from the mass of a jet. The decorrelation achieved in binary classification approaches the levels achieved by the state-of-the-art using conditional normalising flows. When moving to multiclass outputs the optimal transport approach performs significantly better than the state-of-the-art, suggesting substantial gains at decorrelating multidimensional feature spaces.
    摘要 simplified-chinese能够减除保护属性相关的特征空间是一个活跃的研究领域,包括伦理、公平性和自然科学。我们介绍了一种使用凸神经最优运输算法(CNOTS)来减除连续特征空间与保护属性的距离最优运输方法。我们在高能物理中的扩散分类中示出了该方法的性能,并达到了 conditional normalizing flows 的水平。在多类输出情况下,优化运输方法表现更好, suggesting substantial gains in decorrelating multidimensional feature spaces.

A Mapping Study of Machine Learning Methods for Remaining Useful Life Estimation of Lead-Acid Batteries

  • paper_url: http://arxiv.org/abs/2307.05163
  • repo_url: None
  • paper_authors: Sérgio F Chevtchenko, Elisson da Silva Rocha, Bruna Cruz, Ermeson Carneiro de Andrade, Danilo Ricardo Barbosa de Araújo
  • for: 这篇论文主要针对的是锂离子电池的状态评估和剩余用生命时间的机器学习方法。
  • methods: 本论文使用了多种机器学习算法来评估锂离子电池的状态和剩余用生命时间,并评估了这些算法的准确率和计算时间。
  • results: 本论文通过分析不同应用中使用的感知器组合,发现了一些常见的感知器组合,并评估了这些组合的性能。同时,本论文还发现了未来研究的潜在空白和机遇。
    Abstract Energy storage solutions play an increasingly important role in modern infrastructure and lead-acid batteries are among the most commonly used in the rechargeable category. Due to normal degradation over time, correctly determining the battery's State of Health (SoH) and Remaining Useful Life (RUL) contributes to enhancing predictive maintenance, reliability, and longevity of battery systems. Besides improving the cost savings, correct estimation of the SoH can lead to reduced pollution though reuse of retired batteries. This paper presents a mapping study of the state-of-the-art in machine learning methods for estimating the SoH and RUL of lead-acid batteries. These two indicators are critical in the battery management systems of electric vehicles, renewable energy systems, and other applications that rely heavily on this battery technology. In this study, we analyzed the types of machine learning algorithms employed for estimating SoH and RUL, and evaluated their performance in terms of accuracy and inference time. Additionally, this mapping identifies and analyzes the most commonly used combinations of sensors in specific applications, such as vehicular batteries. The mapping concludes by highlighting potential gaps and opportunities for future research, which lays the foundation for further advancements in the field.
    摘要 This study uses machine learning methods to estimate the SoH and RUL of lead-acid batteries. These indicators are critical in battery management systems for electric vehicles, renewable energy systems, and other applications that rely heavily on this battery technology. The study analyzed the types of machine learning algorithms used for estimating SoH and RUL, and evaluated their performance in terms of accuracy and inference time. Additionally, the study identified and analyzed the most commonly used combinations of sensors in specific applications, such as vehicular batteries.The study concludes by highlighting potential gaps and opportunities for future research, providing a foundation for further advancements in the field.

SuryaKiran at MEDIQA-Sum 2023: Leveraging LoRA for Clinical Dialogue Summarization

  • paper_url: http://arxiv.org/abs/2307.05162
  • repo_url: None
  • paper_authors: Kunal Suri, Prakhar Mishra, Saumajit Saha, Atul Singh
  • for: 本研究旨在探讨 Parametric Efficient Fine Tuning (PEFT) 方法可以提高适用于域域特定应用场景的语言模型的性能。
  • methods: 本研究使用的是 Low Rank Adaptation (LoRA) 方法,它在保持大语言模型为固定基础之前,添加额外层次,并使用 PEFT 方法进行微调。
  • results: 实验结果显示,LoRA 方法在临床对话概要SUMMARIZATION 任务上达到了与端到端微调相当的性能水平。
    Abstract Finetuning Large Language Models helps improve the results for domain-specific use cases. End-to-end finetuning of large language models is time and resource intensive and has high storage requirements to store the finetuned version of the large language model. Parameter Efficient Fine Tuning (PEFT) methods address the time and resource challenges by keeping the large language model as a fixed base and add additional layers, which the PEFT methods finetune. This paper demonstrates the evaluation results for one such PEFT method Low Rank Adaptation (LoRA), for Clinical Dialogue Summarization. The evaluation results show that LoRA works at par with end-to-end finetuning for a large language model. The paper presents the evaluations done for solving both the Subtask A and B from ImageCLEFmedical {https://www.imageclef.org/2023/medical}
    摘要 适应域pecific用 caso 中的语言模型细化可以提高结果。整个终端细化大语言模型需要很多时间和资源,同时存储细化后的大语言模型也需要很大的存储空间。Parameter Efficient Fine Tuning (PEFT) 方法解决了这些时间和资源挑战,它保留了大语言模型作为固定基础,并在其上添加了额外层,这些层由 PEFT 方法进行细化。本文介绍了一种名为 Low Rank Adaptation (LoRA) 的 PEFT 方法,用于临床对话摘要。试验结果表明,LoRA 与终端细化大语言模型相当。文章介绍了解决 ImageCLEF 医学 {https://www.imageclef.org/2023/medical} 中的两个任务 A 和 B 的评估结果。

Multiobjective Hydropower Reservoir Operation Optimization with Transformer-Based Deep Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2307.05643
  • repo_url: None
  • paper_authors: Rixin Wu, Ran Wang, Jie Hao, Qiang Wu, Ping Wang
    for: 这个研究的目的是为了对多座水库系统进行调和,以确保发电、生态保护和居民用水的平衡。methods: 这篇研究使用了深度强化学习法,并将 transformer 框架组合到了运算决策中。多头注意机制和多贝勒网络实现了储存和居民区域资讯的提取,并生成了适当的运作决策。results: 实验结果显示,使用这种方法可以生成适当的运作结果,比较于现有方法提高发电量10.11%,减少调整年度调整流量差异39.69%,增加水货收益4.10%。因此,这种方法可以对多座水库系统进行有效的调和运作。
    Abstract Due to shortage of water resources and increasing water demands, the joint operation of multireservoir systems for balancing power generation, ecological protection, and the residential water supply has become a critical issue in hydropower management. However, the numerous constraints and nonlinearity of multiple reservoirs make solving this problem time-consuming. To address this challenge, a deep reinforcement learning approach that incorporates a transformer framework is proposed. The multihead attention mechanism of the encoder effectively extracts information from reservoirs and residential areas, and the multireservoir attention network of the decoder generates suitable operational decisions. The proposed method is applied to Lake Mead and Lake Powell in the Colorado River Basin. The experimental results demonstrate that the transformer-based deep reinforcement learning approach can produce appropriate operational outcomes. Compared to a state-of-the-art method, the operation strategies produced by the proposed approach generate 10.11% more electricity, reduce the amended annual proportional flow deviation by 39.69%, and increase water supply revenue by 4.10%. Consequently, the proposed approach offers an effective method for the multiobjective operation of multihydropower reservoir systems.
    摘要 Simplified Chinese:由于水资源短缺和增长水需求, JOINT操作多个水电堤system为平衡发电、生态保护和居民用水供应成为水力管理中的核心问题。然而,多个堤坝的约束和非线性使得解决这个问题占用了很多时间。为 Addressing this challenge, a deep reinforcement learning approach that incorporates a transformer framework is proposed. The multi-head attention mechanism of the encoder effectively extracts information from reservoirs and residential areas, and the multi-reservoir attention network of the decoder generates suitable operational decisions. The proposed method is applied to Lake Mead and Lake Powell in the Colorado River Basin. The experimental results demonstrate that the transformer-based deep reinforcement learning approach can produce appropriate operational outcomes. Compared to a state-of-the-art method, the operation strategies produced by the proposed approach generate 10.11% more electricity, reduce the amended annual proportional flow deviation by 39.69%, and increase water supply revenue by 4.10%. Therefore, the proposed approach offers an effective method for the multiobjective operation of multihydropower reservoir systems.

On the Effectiveness of Speech Self-supervised Learning for Music

  • paper_url: http://arxiv.org/abs/2307.05161
  • repo_url: None
  • paper_authors: Yinghao Ma, Ruibin Yuan, Yizhi Li, Ge Zhang, Xingran Chen, Hanzhi Yin, Chenghua Lin, Emmanouil Benetos, Anton Ragni, Norbert Gyenge, Ruibo Liu, Gus Xia, Roger Dannenberg, Yike Guo, Jie Fu
  • for: 本研究探讨了自主学习(SSL)在音乐信息检索(MIR)领域的应用,并评估了两种不同的speech相关模型在音乐数据上的适用性。
  • methods: 本研究使用了两种不同的speech相关模型,namely data2vec1.0和Hubert,并在不同的预训练配置下训练了12个SSL模型,共计95M个参数。
  • results: 研究发现,通过训练音乐数据可以改善MIR任务的性能,即使使用了speech相关的模型。然而,研究发现现有的speech导向的设计在处理多重音频信息方面存在局限性。基于实验结果,本研究还提出了未来音乐SSL策略和模式的设计建议。
    Abstract Self-supervised learning (SSL) has shown promising results in various speech and natural language processing applications. However, its efficacy in music information retrieval (MIR) still remains largely unexplored. While previous SSL models pre-trained on music recordings may have been mostly closed-sourced, recent speech models such as wav2vec2.0 have shown promise in music modelling. Nevertheless, research exploring the effectiveness of applying speech SSL models to music recordings has been limited. We explore the music adaption of SSL with two distinctive speech-related models, data2vec1.0 and Hubert, and refer to them as music2vec and musicHuBERT, respectively. We train $12$ SSL models with 95M parameters under various pre-training configurations and systematically evaluate the MIR task performances with 13 different MIR tasks. Our findings suggest that training with music data can generally improve performance on MIR tasks, even when models are trained using paradigms designed for speech. However, we identify the limitations of such existing speech-oriented designs, especially in modelling polyphonic information. Based on the experimental results, empirical suggestions are also given for designing future musical SSL strategies and paradigms.
    摘要 我们探索了将 Speech SSL 模型应用于音乐录制的音乐适应 (SSL),并使用了 two distinctive speech-related models, data2vec1.0和Hubert,并将其称为music2vec和musicHuBERT,respectively. We train $12$ SSL models with 95M parameters under various pre-training configurations and systematically evaluate the MIR task performances with 13 different MIR tasks. Our findings suggest that training with music data can generally improve performance on MIR tasks, even when models are trained using paradigms designed for speech. However, we identify the limitations of such existing speech-oriented designs, especially in modeling polyphonic information. Based on the experimental results, we also provide empirical suggestions for designing future musical SSL strategies and paradigms.

Fast Neural Network Inference on FPGAs for Triggering on Long-Lived Particles at Colliders

  • paper_url: http://arxiv.org/abs/2307.05152
  • repo_url: None
  • paper_authors: Andrea Coccaro, Francesco Armando Di Bello, Stefano Giagu, Lucrezia Rambelli, Nicola Stocchetti
  • for: 这项研究是为了开发一种高效的触发和获取系统,以便更好地处理高能物理实验中的碰撞事件。
  • methods: 这项研究使用了FPGA卡来实现不同的计算方法,以提高触发策略的效率。
  • results: 研究发现,使用FPGA卡加速的机器学习算法可以保持高度的精度,而且加速时间较短。此外,对比CPU和GPU硬件设置,FPGA卡的加速效果更好。
    Abstract Experimental particle physics demands a sophisticated trigger and acquisition system capable to efficiently retain the collisions of interest for further investigation. Heterogeneous computing with the employment of FPGA cards may emerge as a trending technology for the triggering strategy of the upcoming high-luminosity program of the Large Hadron Collider at CERN. In this context, we present two machine-learning algorithms for selecting events where neutral long-lived particles decay within the detector volume studying their accuracy and inference time when accelerated on commercially available Xilinx FPGA accelerator cards. The inference time is also confronted with a CPU- and GPU-based hardware setup. The proposed new algorithms are proven efficient for the considered benchmark physics scenario and their accuracy is found to not degrade when accelerated on the FPGA cards. The results indicate that all tested architectures fit within the latency requirements of a second-level trigger farm and that exploiting accelerator technologies for real-time processing of particle-physics collisions is a promising research field that deserves additional investigations, in particular with machine-learning models with a large number of trainable parameters.
    摘要 Note: The text has been translated using the Simplified Chinese language model, which is a standardized form of Chinese that is used in mainland China and Singapore. The translation is written in the traditional Chinese characters and not in the simplified ones, which are used in mainland China.

ConFL: Constraint-guided Fuzzing for Machine Learning Framework

  • paper_url: http://arxiv.org/abs/2307.05642
  • repo_url: None
  • paper_authors: Zhao Liu, Quanchen Zou, Tian Yu, Xuan Wang, Guozhu Meng, Kai Chen, Deyue Zhang
  • For: The paper aims to propose a constraint-guided fuzzer for machine learning (ML) frameworks to improve the efficiency and effectiveness of fuzzing.* Methods: The proposed ConFL fuzzer automatically extracts constraints from kernel codes without prior knowledge, and uses these constraints to generate valid inputs that can pass verification and explore deeper paths of kernel codes. The paper also designs a grouping technique to boost fuzzing efficiency.* Results: The paper evaluates the performance of ConFL mainly on Tensorflow and finds that it covers more code lines and generates more valid inputs than state-of-the-art fuzzers. ConFL also discovers 84 previously unknown vulnerabilities in different versions of Tensorflow, including 3 critical-severity and 13 high-severity vulnerabilities. The paper also extends ConFL to test PyTorch and Paddle, finding 7 vulnerabilities to date.Here is the simplified Chinese text for the three key points:* For: 本研究旨在提出一种基于约束的机器学习(ML)框架测试工具,以提高测试效率和可靠性。* Methods: 提议的 ConFL 测试工具可以自动从核心代码中提取约束,无需任何先驱知识。基于这些约束,ConFL 可以生成可通过验证的有效输入,并探索更深层次的核心代码。 paper 还提出了分组技术,以提高测试效率。* Results: 本研究主要测试 Tensorflow,发现 ConFL 能够覆盖更多的代码行数,并生成更多的有效输入。此外,ConFL 还发现了 Tensorflow 不同版本中的84个新的漏洞,其中有3个是严重性高的漏洞。 paper 还扩展了 ConFL 到测试 PyTorch 和 Paddle,到目前为止发现了7个漏洞。
    Abstract As machine learning gains prominence in various sectors of society for automated decision-making, concerns have risen regarding potential vulnerabilities in machine learning (ML) frameworks. Nevertheless, testing these frameworks is a daunting task due to their intricate implementation. Previous research on fuzzing ML frameworks has struggled to effectively extract input constraints and generate valid inputs, leading to extended fuzzing durations for deep execution or revealing the target crash. In this paper, we propose ConFL, a constraint-guided fuzzer for ML frameworks. ConFL automatically extracting constraints from kernel codes without the need for any prior knowledge. Guided by the constraints, ConFL is able to generate valid inputs that can pass the verification and explore deeper paths of kernel codes. In addition, we design a grouping technique to boost the fuzzing efficiency. To demonstrate the effectiveness of ConFL, we evaluated its performance mainly on Tensorflow. We find that ConFL is able to cover more code lines, and generate more valid inputs than state-of-the-art (SOTA) fuzzers. More importantly, ConFL found 84 previously unknown vulnerabilities in different versions of Tensorflow, all of which were assigned with new CVE ids, of which 3 were critical-severity and 13 were high-severity. We also extended ConFL to test PyTorch and Paddle, 7 vulnerabilities are found to date.
    摘要 Machine learning 在不同领域的自动化决策中受到推广,但是有关机器学习(ML)架构的可能漏洞问题却愈来愈严重。实际上,测试这些架构是一个艰辛的任务,因为它们的实现非常复杂。在这篇论文中,我们提出了 ConFL,一个基于条件的对ML架构的搜寻器。ConFL可以自动从核心代码中提取约束,不需要任何先前知识。根据这些约束,ConFL能够产生有效的输入,并让核心代码进行更深入的测试。此外,我们设计了一种分组技术,以提高搜寻效率。为证明 ConFL 的有效性,我们主要对 Tensorflow 进行评估。我们发现 ConFL 能够覆盖更多的代码行数,并产生更多的有效的输入,比起现有的 SOTA 搜寻器。更重要的是,ConFL 发现了 Tensorflow 不同版本中的84个未知漏洞,其中3个是严重性高的漏洞。我们还将 ConFL 扩展到 PyTorch 和 Paddle,发现了7个漏洞。

Learning Active Subspaces and Discovering Important Features with Gaussian Radial Basis Functions Neural Networks

  • paper_url: http://arxiv.org/abs/2307.05639
  • repo_url: https://github.com/dannyzx/grbf-nns
  • paper_authors: Danny D’Agostino, Ilija Ilievski, Christine Annette Shoemaker
  • for: 提出了一种可以同时实现强预测性和人类可读性的机器学习模型,以解决机器学习研究中最大化预测性和人类可读性之间的矛盾。
  • methods: 提出了一种基于径向基函数神经网络模型的修改,将其核函数加载了一个学习的精度矩阵。通过训练模型后提取precision矩阵的谱спект的方式,可以提取有价值信息,包括方向敏感度最大化和输入变量的重要性排名。
  • results: 通过对回归、分类和特征选择任务进行数值实验,与其他机器学习模型和深度学习基于嵌入特征选择技术进行比较,结果显示该模型不仅在预测性方面与竞争者具有优异性,还提供了可读性高的结果,可能为实际应用中决策过程提供帮助。
    Abstract Providing a model that achieves a strong predictive performance and at the same time is interpretable by humans is one of the most difficult challenges in machine learning research due to the conflicting nature of these two objectives. To address this challenge, we propose a modification of the Radial Basis Function Neural Network model by equipping its Gaussian kernel with a learnable precision matrix. We show that precious information is contained in the spectrum of the precision matrix that can be extracted once the training of the model is completed. In particular, the eigenvectors explain the directions of maximum sensitivity of the model revealing the active subspace and suggesting potential applications for supervised dimensionality reduction. At the same time, the eigenvectors highlight the relationship in terms of absolute variation between the input and the latent variables, thereby allowing us to extract a ranking of the input variables based on their importance to the prediction task enhancing the model interpretability. We conducted numerical experiments for regression, classification, and feature selection tasks, comparing our model against popular machine learning models and the state-of-the-art deep learning-based embedding feature selection techniques. Our results demonstrate that the proposed model does not only yield an attractive prediction performance with respect to the competitors but also provides meaningful and interpretable results that potentially could assist the decision-making process in real-world applications. A PyTorch implementation of the model is available on GitHub at the following link. https://github.com/dannyzx/GRBF-NNs
    摘要 提供一个具有强预测性能并且可以被人类理解的模型是机器学习研究中最大的挑战,这是因为这两个目标之间存在矛盾。为解决这个挑战,我们提议对卷积函数神经网络模型进行修改,并在其核函数中添加学习型精度矩阵。我们发现在训练过程完成后,模型的准确矩阵的谱有着极大的信息价值。特别是,准确矩阵的特征向量可以解释模型的最大敏感方向,披露活动子空间和可能的维度减少任务。同时,特征向量还可以描述输入和隐藏变量之间的绝对变化关系,从而提供输入变量的重要性排名,这有助于提高模型的解释性。我们对回归、分类和特征选择任务进行了数学实验,与流行的机器学习模型和深度学习基于嵌入特征选择技术进行比较。我们的结果表明,我们的模型不仅在竞争对手中具有吸引人的预测性能,而且提供了可读ible和可解释的结果,这些结果可能在实际应用中帮助决策过程。PyTorch实现的模型可以在GitHub上找到,请参考以下链接:https://github.com/dannyzx/GRBF-NNs。

A Comprehensive Survey of Deep Transfer Learning for Anomaly Detection in Industrial Time Series: Methods, Applications, and Directions

  • paper_url: http://arxiv.org/abs/2307.05638
  • repo_url: None
  • paper_authors: Peng Yan, Ahmed Abdulkadir, Matthias Rosenthal, Gerrit A. Schatte, Benjamin F. Grewe, Thilo Stadelmann
  • for: This paper focuses on the application of deep transfer learning in industrial process monitoring and anomaly detection, with the goal of enhancing efficiency and optimizing quality.
  • methods: The paper reviews and examines the problem settings of transfer learning and classifies the prevailing deep transfer learning methods, with a focus on their applications in industrial contexts.
  • results: The paper discusses the challenges and limitations of deep transfer learning in industrial contexts, and provides practical directions for solution design and implementation, including specific, actionable suggestions.Here’s the Chinese translation of the three key information points:
  • for: 这篇论文关注于工业过程监测和异常检测中使用深度转移学习,以提高效率和质量。
  • methods: 论文检查和分类深度转移学习方法的问题设置,主要在工业上下文中应用。
  • results: 论文描述了深度转移学习在工业上下文中的挑战和局限性,并提供了实践的解决方案设计和实施方法,包括具体的、可行的建议。
    Abstract Automating the monitoring of industrial processes has the potential to enhance efficiency and optimize quality by promptly detecting abnormal events and thus facilitating timely interventions. Deep learning, with its capacity to discern non-trivial patterns within large datasets, plays a pivotal role in this process. Standard deep learning methods are suitable to solve a specific task given a specific type of data. During training, the algorithms demand large volumes of labeled training data. However, due to the dynamic nature of processes and the environment, it is impractical to acquire the needed data for standard deep learning training for every slightly different case anew. Deep transfer learning offers a solution to this problem. By leveraging knowledge from related tasks and accounting for variations in data distributions, this learning framework solves new tasks even with little or no additional labeled data. The approach bypasses the need to retrain a model from scratch for every new setup and dramatically reduces the labeled data requirement. This survey provides an in-depth review of deep transfer learning, examining the problem settings of transfer learning and classifying the prevailing deep transfer learning methods. Moreover, we delve into applying deep transfer learning in the context of a broad spectrum of time series anomaly detection tasks prevalent in primary industrial domains, e.g., manufacturing process monitoring, predictive maintenance, energy management, and infrastructure facility monitoring. We conclude this survey by underlining the challenges and limitations of deep transfer learning in industrial contexts. We also provide practical directions for solution design and implementation for these tasks, leading to specific, actionable suggestions.
    摘要 自动监测工业过程有可能提高效率和优化质量,通过及时检测异常事件,以便及时干预。深度学习,作为检测非平凡模式的能力,在这个过程中扮演着关键角色。标准的深度学习方法适用于特定任务和数据类型。在训练过程中,算法需要大量标注数据。然而,由于生产过程和环境的动态性,获得需要的数据是不可能的。深度传输学习提供了一个解决方案,通过利用相关任务的知识和考虑数据分布的差异,解决新任务,甚至无需额外的标注数据。这种学习框架绕过了对新设置的模型重新训练的需要,减少了标注数据的需求,从而减少了训练时间。本文提供了深度传输学习的深入审查,包括传输学习问题的设定和深度传输学习方法的分类。此外,我们还探讨了在主要工业领域中广泛存在的时间序列异常检测任务中的深度传输学习应用,例如制造过程监测、预测维护、能源管理和基础设施监测。我们 conclude this survey by highlighting the challenges and limitations of deep transfer learning in industrial contexts, and provide practical directions for solution design and implementation, leading to specific, actionable suggestions.

Deep Probabilistic Movement Primitives with a Bayesian Aggregator

  • paper_url: http://arxiv.org/abs/2307.05141
  • repo_url: None
  • paper_authors: Michael Przystupa, Faezeh Haghverd, Martin Jagersand, Samuele Tosatto
  • for: 本研究的目的是提出一种深度运动原理模型,可以执行先前的运动操作,包括时间调整、混合、终点决定和上下文决定。
  • methods: 本研究使用了深度学习模型来实现运动原理模型,并使用bayesianContext aggregator来实现更好的上下文决定和混合。
  • results: 实验结果表明,我们的方法可以在更多的输入选择下实现复杂的运动复制,而且与基eline运动原理模型提供的操作相当。
    Abstract Movement primitives are trainable parametric models that reproduce robotic movements starting from a limited set of demonstrations. Previous works proposed simple linear models that exhibited high sample efficiency and generalization power by allowing temporal modulation of movements (reproducing movements faster or slower), blending (merging two movements into one), via-point conditioning (constraining a movement to meet some particular via-points) and context conditioning (generation of movements based on an observed variable, e.g., position of an object). Previous works have proposed neural network-based motor primitive models, having demonstrated their capacity to perform tasks with some forms of input conditioning or time-modulation representations. However, there has not been a single unified deep motor primitive's model proposed that is capable of all previous operations, limiting neural motor primitive's potential applications. This paper proposes a deep movement primitive architecture that encodes all the operations above and uses a Bayesian context aggregator that allows a more sound context conditioning and blending. Our results demonstrate our approach can scale to reproduce complex motions on a larger variety of input choices compared to baselines while maintaining operations of linear movement primitives provide.
    摘要 movement primitives 是可训练的参数化模型,可以复制 Starting from a limited set of demonstrations, robotic movements. Previous works have proposed simple linear models that exhibited high sample efficiency and generalization power by allowing temporal modulation of movements (reproducing movements faster or slower), blending (merging two movements into one), via-point conditioning (constraining a movement to meet some particular via-points) and context conditioning (generation of movements based on an observed variable, e.g., position of an object). Previous works have proposed neural network-based motor primitive models, having demonstrated their capacity to perform tasks with some forms of input conditioning or time-modulation representations. However, there has not been a single unified deep motor primitive's model proposed that is capable of all previous operations, limiting neural motor primitive's potential applications. This paper proposes a deep movement primitive architecture that encodes all the operations above and uses a Bayesian context aggregator that allows a more sound context conditioning and blending. Our results demonstrate our approach can scale to reproduce complex motions on a larger variety of input choices compared to baselines while maintaining operations of linear movement primitives provide.Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know and I can provide that as well.

Speech Diarization and ASR with GMM

  • paper_url: http://arxiv.org/abs/2307.05637
  • repo_url: None
  • paper_authors: Aayush Kumar Sharma, Vineet Bhavikatti, Amogh Nidawani, Dr. Siddappaji, Sanath P, Dr Geetishree Mishra
  • for: 这个研究论文主要探讨了语音分类和自动语音识别(ASR)问题。语音分类是将音频流中的各个发言者分类为不同的个体,使用ASR转文本来实现这一目标。
  • methods: 我们在语音分类方法中使用了 Gaussian Mixer Model(GMM)来表示语音段。计算间集距离基于GMM参数,使用距离阈值作为停止 criterion。
  • results: 我们的主要目标是开发一个可以最小化单词错误率(WER)度量的语音转文本模型。
    Abstract In this research paper, we delve into the topics of Speech Diarization and Automatic Speech Recognition (ASR). Speech diarization involves the separation of individual speakers within an audio stream. By employing the ASR transcript, the diarization process aims to segregate each speaker's utterances, grouping them based on their unique audio characteristics. On the other hand, Automatic Speech Recognition refers to the capability of a machine or program to identify and convert spoken words and phrases into a machine-readable format. In our speech diarization approach, we utilize the Gaussian Mixer Model (GMM) to represent speech segments. The inter-cluster distance is computed based on the GMM parameters, and the distance threshold serves as the stopping criterion. ASR entails the conversion of an unknown speech waveform into a corresponding written transcription. The speech signal is analyzed using synchronized algorithms, taking into account the pitch frequency. Our primary objective typically revolves around developing a model that minimizes the Word Error Rate (WER) metric during speech transcription.
    摘要 在这篇研究报告中,我们探讨了语音分类和自动语音识别(ASR)的话题。语音分类是将音频流中的各个说话人分割成不同的个体。通过使用ASR转cript,分类过程将每个说话人的讲话分组成为各自的音频特征。相反,自动语音识别是一种机器或程序能够识别和将口头语音转换成可读格式的能力。在我们的语音分类方法中,我们使用高斯混合模型(GMM)来表示语音段落。间隔距离是根据GMM参数计算的,并且距离阈值作为停止条件。ASR则是将未知的语音波形转换成相应的书面转cript。语音信号被同步的算法分析,考虑到抽屉频率。我们的主要目标通常是开发一个能够最小化单词错误率(WER)度量的模型。

TIAM – A Metric for Evaluating Alignment in Text-to-Image Generation

  • paper_url: http://arxiv.org/abs/2307.05134
  • repo_url: https://github.com/grimalpaul/tiam
  • paper_authors: Paul Grimal, Hervé Le Borgne, Olivier Ferret, Julien Tourille
  • for: 本研究旨在评估文本到图像(T2I)模型生成图像的质量,特别是考虑提示中的重要内容是否正确反映在生成图像中。
  • methods: 我们提出了一种基于提示模板的新评价指标,可以更好地衡量生成图像与提示中的内容的对应关系,包括提示中的对象类型、数量和颜色等方面。
  • results: 我们通过对多种最新的T2I模型进行研究发现,图像质量受到种子图像的随机变化的影响很大,同时提示中的概念数量、顺序和颜色属性也会影响图像质量。此外,我们还发现了一些特定的种子图像可以生成更高质量的图像,开启了新的研究方向。
    Abstract The progress in the generation of synthetic images has made it crucial to assess their quality. While several metrics have been proposed to assess the rendering of images, it is crucial for Text-to-Image (T2I) models, which generate images based on a prompt, to consider additional aspects such as to which extent the generated image matches the important content of the prompt. Moreover, although the generated images usually result from a random starting point, the influence of this one is generally not considered. In this article, we propose a new metric based on prompt templates to study the alignment between the content specified in the prompt and the corresponding generated images. It allows us to better characterize the alignment in terms of the type of the specified objects, their number, and their color. We conducted a study on several recent T2I models about various aspects. An additional interesting result we obtained with our approach is that image quality can vary drastically depending on the latent noise used as a seed for the images. We also quantify the influence of the number of concepts in the prompt, their order as well as their (color) attributes. Finally, our method allows us to identify some latent seeds that produce better images than others, opening novel directions of research on this understudied topic.
    摘要 “随着生成Synthetic图像的进步,它成为了评估图像质量的关键。多个指标有被提议来评估图像的渲染,但是Text-to-Image(T2I)模型,它们根据提示生成图像,需要考虑更多的方面,例如提示中重要内容和生成图像之间的对齐度。此外,生成图像通常是由Random Starting Point开始的,但是这个影响通常不被考虑。在这篇文章中,我们提出了一个基于提示模板的新指标,以研究提示中的内容和生成图像之间的对齐度。这允许我们更好地描述对齐度的类型、物件的数量和颜色等方面。我们对多个现代T2I模型进行了一系列研究,获得了一些有趣的结果。例如,图像质量可以很大程度上受到latent noise的影响,并且我们可以量化提示中的概念数量、顺序以及颜色属性的影响。最后,我们的方法允许我们识别一些latent seed的特定对图像质量的影响,开启了一些未经研究的方向。”

On the Use of Self-Supervised Speech Representations in Spontaneous Speech Synthesis

  • paper_url: http://arxiv.org/abs/2307.05132
  • repo_url: None
  • paper_authors: Siyang Wang, Gustav Eje Henter, Joakim Gustafson, Éva Székely
  • For: The paper is focused on exploring the use of self-supervised learning (SSL) in spontaneous text-to-speech (TTS) and predicting the mean opinion scores (MOS) of synthesized speech.* Methods: The paper uses six different SSL models and three layers within each SSL model to evaluate their effectiveness in spontaneous TTS. The authors also extend an existing SSL-based MOS prediction framework to predict the quality of synthesized spontaneous speech.* Results: The paper presents comprehensive experimental results on the use of SSL in spontaneous TTS and MOS prediction, including the performance of different SSL models and layers in predicting the MOS of synthesized speech. The results show that certain SSL models and layers perform better than others in spontaneous TTS and MOS prediction.
    Abstract Self-supervised learning (SSL) speech representations learned from large amounts of diverse, mixed-quality speech data without transcriptions are gaining ground in many speech technology applications. Prior work has shown that SSL is an effective intermediate representation in two-stage text-to-speech (TTS) for both read and spontaneous speech. However, it is still not clear which SSL and which layer from each SSL model is most suited for spontaneous TTS. We address this shortcoming by extending the scope of comparison for SSL in spontaneous TTS to 6 different SSLs and 3 layers within each SSL. Furthermore, SSL has also shown potential in predicting the mean opinion scores (MOS) of synthesized speech, but this has only been done in read-speech MOS prediction. We extend an SSL-based MOS prediction framework previously developed for scoring read speech synthesis and evaluate its performance on synthesized spontaneous speech. All experiments are conducted twice on two different spontaneous corpora in order to find generalizable trends. Overall, we present comprehensive experimental results on the use of SSL in spontaneous TTS and MOS prediction to further quantify and understand how SSL can be used in spontaneous TTS. Audios samples: https://www.speech.kth.se/tts-demos/sp_ssl_tts
    摘要 自我指导学习(SSL)所学习的语音表示法,从大量多样性的杂质语音数据中学习,无需转录。先前的研究表明,SSL 是两stage text-to-speech(TTS)中的有效中间表示,包括阅读和自发语音。然而,尚未确定哪些 SSL 和哪层在每个 SSL 模型最适合自发 TTS。我们解决这个缺陷,通过对 SSL 在自发 TTS 中进行扩展比较,包括 6 种 SSL 和每个 SSL 模型中的 3 层。此外,SSL 还显示了在synthesized speech的 mean opinion scores(MOS)预测中的潜力,但只有在阅读 speech MOS 预测中进行过。我们扩展了以前为阅读 speech synthesis 预测 MOS 的 SSL-based 框架,并对 synthesized spontaneous speech 进行评估。所有实验都在两个不同的自发语音 corpus 上进行了两次,以找到通用的趋势。总的来说,我们提供了 SSL 在自发 TTS 和 MOS 预测中的详细实验结果,以更好地量化和理解 SSL 在自发 TTS 中的使用。响应样本:https://www.speech.kth.se/tts-demos/sp_ssl_tts

Enhancing Continuous Time Series Modelling with a Latent ODE-LSTM Approach

  • paper_url: http://arxiv.org/abs/2307.05126
  • repo_url: None
  • paper_authors: C. Coelho, M. Fernanda P. Costa, L. L. Ferrás
  • for: 这个论文主要针对连续时间序列(CTS)的模型化,尤其是在 irregular sampling rate 和高频 sampling rate 下。
  • methods: 该论文提出了一种基于 ODE-RNN 和 Latent ODE 模型的新模型 - Latent ODE-LSTM,以解决 CTTS 模型化中的vanishing和exploding gradients问题。
  • results: 数值实验表明,新提出的 Latent ODE-LSTM 模型在模型 CTTS WITH regular和irregular sampling rates 时,表现更好于 Latent ODE-RNNs,并能够避免在训练过程中的vanishing和exploding gradients问题。
    Abstract Due to their dynamic properties such as irregular sampling rate and high-frequency sampling, Continuous Time Series (CTS) are found in many applications. Since CTS with irregular sampling rate are difficult to model with standard Recurrent Neural Networks (RNNs), RNNs have been generalised to have continuous-time hidden dynamics defined by a Neural Ordinary Differential Equation (Neural ODE), leading to the ODE-RNN model. Another approach that provides a better modelling is that of the Latent ODE model, which constructs a continuous-time model where a latent state is defined at all times. The Latent ODE model uses a standard RNN as the encoder and a Neural ODE as the decoder. However, since the RNN encoder leads to difficulties with missing data and ill-defined latent variables, a Latent ODE-RNN model has recently been proposed that uses a ODE-RNN model as the encoder instead. Both the Latent ODE and Latent ODE-RNN models are difficult to train due to the vanishing and exploding gradients problem. To overcome this problem, the main contribution of this paper is to propose and illustrate a new model based on a new Latent ODE using an ODE-LSTM (Long Short-Term Memory) network as an encoder -- the Latent ODE-LSTM model. To limit the growth of the gradients the Norm Gradient Clipping strategy was embedded on the Latent ODE-LSTM model. The performance evaluation of the new Latent ODE-LSTM (with and without Norm Gradient Clipping) for modelling CTS with regular and irregular sampling rates is then demonstrated. Numerical experiments show that the new Latent ODE-LSTM performs better than Latent ODE-RNNs and can avoid the vanishing and exploding gradients during training.
    摘要 因为它们的动态属性,如不规则采样速率和高频采样, kontinuous time series (CTS) 在许多应用中出现。由于标准的回归神经网络 (RNN) 不能模型 CTs 的不规则采样速率,因此 RNN 被推广为有连续时间隐藏动力学定义的神经普通微分方程 (Neural Ordinary Differential Equation, ODE),导致 ODE-RNN 模型的出现。另一种提供更好的模型是隐藏 ODE 模型,该模型在全部时间上定义了隐藏状态。隐藏 ODE 模型使用标准的 RNN 作为编码器和 ODE 作为解码器。然而,由 RNN 编码器导致缺失数据和不定义的隐藏变量,因此在最近提出了 Latent ODE-RNN 模型,该模型使用 ODE-RNN 作为编码器。两种 Latent ODE 和 Latent ODE-RNN 模型都困难于训练,主要是因为训练过程中的混血和爆炸梯度问题。为解决这个问题,本文的主要贡献是提出了一种基于新的隐藏 ODE 模型,使用 ODE-LSTM (Long Short-Term Memory) 网络作为编码器 - 隐藏 ODE-LSTM 模型。为限制梯度的增长,在隐藏 ODE-LSTM 模型中嵌入了 Norm Gradient Clipping 策略。对 CTs WITH 规则和无规则采样速率进行表达评估,并对新模型的性能进行比较。数值实验显示,新的隐藏 ODE-LSTM 模型比 Latent ODE-RNN 更好,并且可以在训练过程中避免混血和爆炸梯度问题。

Transaction Fraud Detection via Spatial-Temporal-Aware Graph Transformer

  • paper_url: http://arxiv.org/abs/2307.05121
  • repo_url: None
  • paper_authors: Yue Tian, Guanjun Liu
  • for: 预防金融 transactions 诈骗
  • methods: 使用 Graph Neural Networks (GNNs) 和 transformer module 模型,capture temporal dependencies 和 learn local and global information
  • results: 在两个金融数据集上比较于常见 GNN 模型和 GNN-based fraud detectors 表现出色,有效地检测 transaction fraud
    Abstract How to obtain informative representations of transactions and then perform the identification of fraudulent transactions is a crucial part of ensuring financial security. Recent studies apply Graph Neural Networks (GNNs) to the transaction fraud detection problem. Nevertheless, they encounter challenges in effectively learning spatial-temporal information due to structural limitations. Moreover, few prior GNN-based detectors have recognized the significance of incorporating global information, which encompasses similar behavioral patterns and offers valuable insights for discriminative representation learning. Therefore, we propose a novel heterogeneous graph neural network called Spatial-Temporal-Aware Graph Transformer (STA-GT) for transaction fraud detection problems. Specifically, we design a temporal encoding strategy to capture temporal dependencies and incorporate it into the graph neural network framework, enhancing spatial-temporal information modeling and improving expressive ability. Furthermore, we introduce a transformer module to learn local and global information. Pairwise node-node interactions overcome the limitation of the GNN structure and build up the interactions with the target node and long-distance ones. Experimental results on two financial datasets compared to general GNN models and GNN-based fraud detectors demonstrate that our proposed method STA-GT is effective on the transaction fraud detection task.
    摘要 如何获得有用的交易表示并实现交易识别是金融安全的关键环节。现有研究利用图神经网络(GNN)解决交易诈骗问题。然而,它们在有效地学习空间-时间信息方面遇到了结构限制。此外,先前的GNN基本检测器很少认可全球信息的重要性,这些信息包括类似行为模式,它们提供了价值的表示学习意味。因此,我们提出了一种新的多类型图神经网络模型,即空间-时间感知图Transformer(STA-GT),用于交易诈骗检测问题。具体来说,我们设计了时间编码策略,以捕捉时间依赖关系,并将其 integrate into GNN框架中,提高空间-时间信息模型化的表达能力。此外,我们引入了Transformer模块,以学习本地和全球信息。对于目标节点和远程节点之间的对比交互,我们建立了对抗限制GNN结构的对抗。实验结果表明,我们提出的STA-GT方法在两个金融数据集上与通用GNN模型和GNN基本检测器相比,在交易诈骗检测任务上表现出色。

$\ell_p$-Regression in the Arbitrary Partition Model of Communication

  • paper_url: http://arxiv.org/abs/2307.05117
  • repo_url: None
  • paper_authors: Yi Li, Honghao Lin, David P. Woodruff
  • for: 这个论文研究了分布式 $\ell_p$-回归问题在协调器模型中的随机通信复杂度, $p\in (0,2]$。
  • methods: 作者使用了随机化通信复杂度来研究这个问题,并提供了更好的上界和下界。
  • results: 作者得到了更好的上界,特别是在 $p = 2$ 情况下,提供了首个优等式 bound of $\tilde{\Theta}(sd^2 + sd/\epsilon)$ 比特。 在 $p \in (1,2]$ 情况下,作者提供了 $\tilde{O}(sd^2/\epsilon + sd/\text{poly}(\epsilon))$ 上界,其中 $d$ 是数据维度。
    Abstract We consider the randomized communication complexity of the distributed $\ell_p$-regression problem in the coordinator model, for $p\in (0,2]$. In this problem, there is a coordinator and $s$ servers. The $i$-th server receives $A^i\in\{-M, -M+1, \ldots, M\}^{n\times d}$ and $b^i\in\{-M, -M+1, \ldots, M\}^n$ and the coordinator would like to find a $(1+\epsilon)$-approximate solution to $\min_{x\in\mathbb{R}^n} \|(\sum_i A^i)x - (\sum_i b^i)\|_p$. Here $M \leq \mathrm{poly}(nd)$ for convenience. This model, where the data is additively shared across servers, is commonly referred to as the arbitrary partition model. We obtain significantly improved bounds for this problem. For $p = 2$, i.e., least squares regression, we give the first optimal bound of $\tilde{\Theta}(sd^2 + sd/\epsilon)$ bits. For $p \in (1,2)$,we obtain an $\tilde{O}(sd^2/\epsilon + sd/\mathrm{poly}(\epsilon))$ upper bound. Notably, for $d$ sufficiently large, our leading order term only depends linearly on $1/\epsilon$ rather than quadratically. We also show communication lower bounds of $\Omega(sd^2 + sd/\epsilon^2)$ for $p\in (0,1]$ and $\Omega(sd^2 + sd/\epsilon)$ for $p\in (1,2]$. Our bounds considerably improve previous bounds due to (Woodruff et al. COLT, 2013) and (Vempala et al., SODA, 2020).
    摘要 我们考虑分布式 $\ell_p$ 回传问题的随机通信复杂度在协调模型中,其中 $p \in (0,2]$。在这个问题中,有一个协调者和 $s$ 个服务器。每个服务器 $i$ 获得 $A^i \in \{-M, -M+1, \ldots, M\}^{n \times d}$ 和 $b^i \in \{-M, -M+1, \ldots, M\}^n$,并且协调者想要找到 $(1+\epsilon)$-近似解的 $\min_{x \in \mathbb{R}^n} \|(\sum_i A^i)x - (\sum_i b^i)\|_p$。其中 $M \leq \text{poly}(nd)$ 是一个实数。这个模型,称为可以分配式模型,是由数据以添加方式分享到服务器。我们获得了非常改善的上界。对于 $p = 2$,即最小二乘回传,我们给出了首个 $\tilde{\Theta}(sd^2 + sd/\epsilon)$ 比特的上界。对于 $p \in (1,2)$,我们获得了 $\tilde{O}(sd^2/\epsilon + sd/\text{poly}(\epsilon))$ 的上界。特别是当 $d$ 够大时,我们的主要项只靠对应 $\frac{1}{\epsilon}$ 而不是 $\frac{1}{\epsilon^2}$。我们还证明了分布式通信下界,包括 $\Omega(sd^2 + sd/\epsilon^2)$ 和 $\Omega(sd^2 + sd/\epsilon)$。我们的下界明显超过先前的 bounds,例如 (Woodruff et al., COLT, 2013) 和 (Vempala et al., SODA, 2020)。

Conformalization of Sparse Generalized Linear Models

  • paper_url: http://arxiv.org/abs/2307.05109
  • repo_url: https://github.com/etashguha/sparse_conformal
  • paper_authors: Etash Kumar Guha, Eugene Ndiaye, Xiaoming Huo
  • for: 这篇论文关注的是如何使用简单的线性模型和数字继续技术来实现可靠的预测集,以便在大数据量时进行高效的预测。
  • methods: 这篇论文使用的方法包括使用唯一的一些变量进行预测,并使用数字继续技术来精确地计算预测集。
  • results: 论文的结果表明,使用这种方法可以高效地生成预测集,并且可以在不同的数据集上进行高效的预测。
    Abstract Given a sequence of observable variables $\{(x_1, y_1), \ldots, (x_n, y_n)\}$, the conformal prediction method estimates a confidence set for $y_{n+1}$ given $x_{n+1}$ that is valid for any finite sample size by merely assuming that the joint distribution of the data is permutation invariant. Although attractive, computing such a set is computationally infeasible in most regression problems. Indeed, in these cases, the unknown variable $y_{n+1}$ can take an infinite number of possible candidate values, and generating conformal sets requires retraining a predictive model for each candidate. In this paper, we focus on a sparse linear model with only a subset of variables for prediction and use numerical continuation techniques to approximate the solution path efficiently. The critical property we exploit is that the set of selected variables is invariant under a small perturbation of the input data. Therefore, it is sufficient to enumerate and refit the model only at the change points of the set of active features and smoothly interpolate the rest of the solution via a Predictor-Corrector mechanism. We show how our path-following algorithm accurately approximates conformal prediction sets and illustrate its performance using synthetic and real data examples.
    摘要 In this paper, we focus on a sparse linear model with only a subset of variables for prediction and use numerical continuation techniques to approximate the solution path efficiently. The critical property we exploit is that the set of selected variables is invariant under a small perturbation of the input data. Therefore, it is sufficient to enumerate and refit the model only at the change points of the set of active features and smoothly interpolate the rest of the solution via a Predictor-Corrector mechanism.We show how our path-following algorithm accurately approximates conformal prediction sets and illustrate its performance using synthetic and real data examples.

Fundamental limits of overparametrized shallow neural networks for supervised learning

  • paper_url: http://arxiv.org/abs/2307.05635
  • repo_url: None
  • paper_authors: Francesco Camilli, Daria Tieplova, Jean Barbier
  • for: 这 paper 是关于两层神经网络在有限数据训练下的信息理论分析,以确定神经网络训练时的基本性能限制。
  • methods: 这 paper 使用了 teacher 网络生成的输入输出对来训练一个二层神经网络,并利用信息理论 bound 来描述神经网络的性能。
  • results: 这 paper 得到了一些 bounds,表明在有限数据训练下,神经网络的性能受到输入维度、隐藏层Unit 和训练样本数量的限制。这些 bounds 是对任何神经网络训练过程的基本性能限制,并且覆盖了所有网络参数的训练情况。
    Abstract We carry out an information-theoretical analysis of a two-layer neural network trained from input-output pairs generated by a teacher network with matching architecture, in overparametrized regimes. Our results come in the form of bounds relating i) the mutual information between training data and network weights, or ii) the Bayes-optimal generalization error, to the same quantities but for a simpler (generalized) linear model for which explicit expressions are rigorously known. Our bounds, which are expressed in terms of the number of training samples, input dimension and number of hidden units, thus yield fundamental performance limits for any neural network (and actually any learning procedure) trained from limited data generated according to our two-layer teacher neural network model. The proof relies on rigorous tools from spin glasses and is guided by ``Gaussian equivalence principles'' lying at the core of numerous recent analyses of neural networks. With respect to the existing literature, which is either non-rigorous or restricted to the case of the learning of the readout weights only, our results are information-theoretic (i.e. are not specific to any learning algorithm) and, importantly, cover a setting where all the network parameters are trained.
    摘要 我们进行了信息理论分析,对一个两层神经网络,从输入输出对生成的教师网络中训练,在过参数化的情况下。我们的结果以bounds的形式表达,关系于一、training数据和神经网络参数之间的共 informations,或二、最优化泛化误差的bayesian优化 bound。我们的 bound 表达式为数据训练样本数、输入维度和隐藏层单元数,从而获得任何神经网络(以及任何学习过程)从有限数据生成的基本性能上限。证明基于磁矿石的精确工具,受到“Gaussian equivalence principles”的引导,这些原理在神经网络分析中具有重要作用。相比现有文献,我们的结果是信息理论的(即不受任何学习算法限制),同时覆盖了所有网络参数都被训练的情况。

A Deep Dive into Perturbations as Evaluation Technique for Time Series XAI

  • paper_url: http://arxiv.org/abs/2307.05104
  • repo_url: https://github.com/visual-xai-for-time-series/time-series-xai-perturbation-analysis
  • paper_authors: Udo Schlegel, Daniel A. Keim
  • for: 本研究旨在评估时序数据XAI技术中的质量,提供可靠和可解释的机器学习模型。
  • methods: 本研究使用扰动分析方法评估XAI技术中的质量,通过修改输入数据来评估XAI方法生成的权重。
  • results: 研究结果表明,扰动分析方法可以有效评估XAI技术中的质量,并提供时序数据XAI技术的强点和局限性。这种方法可以帮助选择适合时序数据的XAI方法,以及开发更可靠和可解释的机器学习模型。
    Abstract Explainable Artificial Intelligence (XAI) has gained significant attention recently as the demand for transparency and interpretability of machine learning models has increased. In particular, XAI for time series data has become increasingly important in finance, healthcare, and climate science. However, evaluating the quality of explanations, such as attributions provided by XAI techniques, remains challenging. This paper provides an in-depth analysis of using perturbations to evaluate attributions extracted from time series models. A perturbation analysis involves systematically modifying the input data and evaluating the impact on the attributions generated by the XAI method. We apply this approach to several state-of-the-art XAI techniques and evaluate their performance on three time series classification datasets. Our results demonstrate that the perturbation analysis approach can effectively evaluate the quality of attributions and provide insights into the strengths and limitations of XAI techniques. Such an approach can guide the selection of XAI methods for time series data, e.g., focusing on return time rather than precision, and facilitate the development of more reliable and interpretable machine learning models for time series analysis.
    摘要 <>将文本翻译成简化中文。<>最近,可解释人工智能(XAI)已经受到了广泛关注,因为机器学习模型的透明度和可解释性的需求增加了。特别是在金融、医疗和气候科学等领域,XAI for time series data已经变得非常重要。然而,评估XAI技术提供的解释质量仍然是一个挑战。这篇论文提供了对使用扰动分析评估XAI方法提供的解释的深入分析。扰动分析通过系统地修改输入数据,并评估对XAI方法生成的解释产生的影响。我们对several state-of-the-art XAI技术进行了应用,并对三个时间序列分类 dataset进行了评估。我们的结果表明,扰动分析方法可以有效评估解释质量,并为XAI技术的选择和开发更可靠和可解释的机器学习模型提供了指导。

PePNet: A Periodicity-Perceived Workload Prediction Network Supporting Rare Occurrence of Heavy Workload

  • paper_url: http://arxiv.org/abs/2308.01917
  • repo_url: None
  • paper_authors: Feiyi Chen, Zhen Qin, Hailiang Zhao, Mengchu Zhou, Shuiguang Deng
  • for: 提高云服务器的工作负载预测精度,尤其是高工作负载。
  • methods: 使用统计方法和神经网络方法,并将两种方法结合使用。
  • results: 对Alibaba2018、SMD数据集和Dinda的数据集进行了广泛的实验,并显示了与现有方法相比的MAPЭ值下降20.0%,特别是对高工作负载下的MAPЭ值下降23.9%。
    Abstract Cloud providers can greatly benefit from accurate workload prediction. However, the workload of cloud servers is highly variable, with occasional heavy workload bursts. This makes workload prediction challenging. There are mainly two categories of workload prediction methods: statistical methods and neural-network-based ones. The former ones rely on strong mathematical assumptions and have reported low accuracy when predicting highly variable workload. The latter ones offer higher overall accuracy, yet they are vulnerable to data imbalance between heavy workload and common one. This impairs the prediction accuracy of neural network-based models on heavy workload. Either the overall inaccuracy of statistic methods or the heavy-workload inaccuracy of neural-network-based models can cause service level agreement violations. Thus, we propose PePNet to improve overall especially heavy workload prediction accuracy. It has two distinctive characteristics: (i) A Periodicity-Perceived Mechanism to detect the existence of periodicity and the length of one period automatically, without any priori knowledge. Furthermore, it fuses periodic information adaptively, which is suitable for periodic, lax periodic and aperiodic time series. (ii) An Achilles' Heel Loss Function iteratively optimizing the most under-fitting part in predicting sequence for each step, which significantly improves the prediction accuracy of heavy load. Extensive experiments conducted on Alibaba2018, SMD dataset and Dinda's dataset demonstrate that PePNet improves MAPE for overall workload by 20.0% on average, compared with state-of-the-art methods. Especially, PePNet improves MAPE for heavy workload by 23.9% on average.
    摘要 云提供商可以受益很大地由于准确的工作负载预测。然而,云服务器的工作负载很变化, occasional 强大的工作负载峰值。这使工作负载预测变得困难。根据文章,有两类主要的工作负载预测方法:统计方法和神经网络基于的方法。前者基于强大的数学假设,报告了低精度when predicting highly variable workload。后者提供更高的总精度,但它们对数据不均衡between heavy workload and common one而言,这会降低神经网络基于模型的预测精度。因此,文章提出了PePNet以提高特别是高负载预测精度。它有两个特点:(i)一种Periodicity-Perceived机制,自动检测工作负载是否存在周期性和一个周期的长度,无需任何先验知识。此外,它适应 periodic, lax periodic 和频繁无法时序列。(ii)一种Achilles' Heel Loss Function, iteratively 优化预测序列中最下降的部分,以提高高负载预测精度。根据文章的实验结果,PePNet在Alibaba2018、SMD dataset 和 Dinda's dataset上提高了MAPEs的平均值,相比之下state-of-the-art方法。尤其是,PePNet在高负载预测方面提高了MAPEs的平均值23.9%。

Transaction Fraud Detection via an Adaptive Graph Neural Network

  • paper_url: http://arxiv.org/abs/2307.05633
  • repo_url: None
  • paper_authors: Yue Tian, Guanjun Liu, Jiacun Wang, Mengchu Zhou
  • for: 提高交易验证 fraud detection 精度,以保障个人和银行的金融安全。
  • methods: 提出了 Adaptive Sampling and Aggregation-based Graph Neural Network (ASA-GNN),可以学习交易数据中的准确表示。
  • results: 对三个真实的金融数据集进行了广泛的实验,显示了 ASA-GNN 的提出方法在交易验证预测中的优于现有方法。
    Abstract Many machine learning methods have been proposed to achieve accurate transaction fraud detection, which is essential to the financial security of individuals and banks. However, most existing methods leverage original features only or require manual feature engineering. They lack the ability to learn discriminative representations from transaction data. Moreover, criminals often commit fraud by imitating cardholders' behaviors, which causes the poor performance of existing detection models. In this paper, we propose an Adaptive Sampling and Aggregation-based Graph Neural Network (ASA-GNN) that learns discriminative representations to improve the performance of transaction fraud detection. A neighbor sampling strategy is performed to filter noisy nodes and supplement information for fraudulent nodes. Specifically, we leverage cosine similarity and edge weights to adaptively select neighbors with similar behavior patterns for target nodes and then find multi-hop neighbors for fraudulent nodes. A neighbor diversity metric is designed by calculating the entropy among neighbors to tackle the camouflage issue of fraudsters and explicitly alleviate the over-smoothing phenomena. Extensive experiments on three real financial datasets demonstrate that the proposed method ASA-GNN outperforms state-of-the-art ones.
    摘要 多种机器学习方法已经被提议用于实现准确的交易诈骗检测,这是个人和银行的金融安全的关键。然而,大多数现有方法只使用原始特征或需要手动工程特征。它们缺乏学习特征表示的能力。此外,诈骗者们常常通过模仿卡户的行为进行诈骗,这导致现有的检测模型表现不佳。在这篇论文中,我们提出了一种适应采样和汇集基于图 neural network(ASA-GNN),用于提高交易诈骗检测的性能。我们使用cosine相似性和边重要性来适应选择target节点周围的相似行为模式的邻居,然后找到多跳邻居 для诈骗节点。我们还设计了邻居多样性度量,通过计算邻居Entropy来解决诈骗者的掩盖问题,并直接缓解过滤问题。我们在三个实际金融dataset上进行了广泛的实验,结果显示,我们提出的方法ASA-GNN在相比之前的方法上表现出优异的表现。

Estimating label quality and errors in semantic segmentation data via any model

  • paper_url: http://arxiv.org/abs/2307.05080
  • repo_url: None
  • paper_authors: Vedang Lad, Jonas Mueller
  • for: 提高 semantic segmentation 数据集的标注质量,减少人工标注错误。
  • methods: 使用 probabilistic 预测来评估标注质量,可以使用任何模型架构和训练方法。
  • results: 通过评估多种标注质量分数方法,发现使用soft-minimum 方法可以最 effectively 标识图像中的错误标注,并且适用于多种类型的标注错误。
    Abstract The labor-intensive annotation process of semantic segmentation datasets is often prone to errors, since humans struggle to label every pixel correctly. We study algorithms to automatically detect such annotation errors, in particular methods to score label quality, such that the images with the lowest scores are least likely to be correctly labeled. This helps prioritize what data to review in order to ensure a high-quality training/evaluation dataset, which is critical in sensitive applications such as medical imaging and autonomous vehicles. Widely applicable, our label quality scores rely on probabilistic predictions from a trained segmentation model -- any model architecture and training procedure can be utilized. Here we study 7 different label quality scoring methods used in conjunction with a DeepLabV3+ or a FPN segmentation model to detect annotation errors in a version of the SYNTHIA dataset. Precision-recall evaluations reveal a score -- the soft-minimum of the model-estimated likelihoods of each pixel's annotated class -- that is particularly effective to identify images that are mislabeled, across multiple types of annotation error.
    摘要 人工标注过程中的 semantic segmentation 数据集的劳动密集程度可能会导致错误,因为人们很难将每个像素都正确地标注。我们研究自动检测标注错误的算法,特别是计算标签质量分数,以便根据分数 lowest 优先级顺序对数据进行审核,以确保高质量的训练/评估数据集,这是敏感应用,如医学影像和自动驾驶等。我们的标签质量分数可以任意选择模型结构和训练方法。在这里,我们研究了7种不同的标签质量分数方法,与 DeepLabV3+ 或 FPN segmentation模型结合使用,检测 SYNTHIA 数据集中的标注错误。精度-回归评估表明,使用模型估计每个像素的类别概率的软最小值是特别有效地标识错批标注, across multiple types of annotation error。

A Theory of Bounded Inductive Rationality

  • paper_url: http://arxiv.org/abs/2307.05068
  • repo_url: None
  • paper_authors: Caspar Oesterheld, Abram Demski, Vincent Conitzer
  • for: 这篇论文是为了研究不假设完整知识的理性决策理论而写的。
  • methods: 论文使用了 inductive reasoning 和 infinitely often testing 来定义理性。
  • results: 论文提出了一种新的理性决策理论,并证明了这种理性决策可以在各种决策问题上达到 Desirable Properties,如值 random and pseudo-random lotteries at their expected reward。此外,论文还证明了在不同代理之间的竞争中, bounded rational inductive agents 可以 converges to certain strategies。
    Abstract The dominant theories of rational choice assume logical omniscience. That is, they assume that when facing a decision problem, an agent can perform all relevant computations and determine the truth value of all relevant logical/mathematical claims. This assumption is unrealistic when, for example, we offer bets on remote digits of pi or when an agent faces a computationally intractable planning problem. Furthermore, the assumption of logical omniscience creates contradictions in cases where the environment can contain descriptions of the agent itself. Importantly, strategic interactions as studied in game theory are decision problems in which a rational agent is predicted by its environment (the other players). In this paper, we develop a theory of rational decision making that does not assume logical omniscience. We consider agents who repeatedly face decision problems (including ones like betting on digits of pi or games against other agents). The main contribution of this paper is to provide a sensible theory of rationality for such agents. Roughly, we require that a boundedly rational inductive agent tests each efficiently computable hypothesis infinitely often and follows those hypotheses that keep their promises of high rewards. We then prove that agents that are rational in this sense have other desirable properties. For example, they learn to value random and pseudo-random lotteries at their expected reward. Finally, we consider strategic interactions between different agents and prove a folk theorem for what strategies bounded rational inductive agents can converge to.
    摘要 dominant 理论假设了推理完全性。即在面临决策问题时,一个代理人可以完成所有相关计算并确定所有有关逻辑/数学声明的真值。这个假设是不现实的,例如在购买远程数字π的赌注或面临计算不可解压缩的规划问题时。此外,假设推理完全性会导致环境中描述代理人本身的矛盾。在这篇论文中,我们开发了一种不假设推理完全性的决策理论。我们考虑代理人在面临决策问题(包括购买数字π的赌注或与其他代理人的游戏)时的行为。我们的主要贡献是提供了一种合理的决策理论,即代理人应该对每个可计算的假设进行无限次测试,并且在这些假设保证高奖励时采取这些假设。然后,我们证明了这种合理性具有其他愉悦的属性,例如对随机和 Pseudo-Random 抽签有价值。最后,我们考虑不同代理人之间的战略互动,并证明了这些代理人可以 converges 到的策略。

Portfolio Optimization: A Comparative Study

  • paper_url: http://arxiv.org/abs/2307.05048
  • repo_url: https://github.com/riddhi927/Portfolio-Optimization
  • paper_authors: Jaydip Sen, Subhasis Dasgupta
  • for: 这篇论文主要研究了股票组合设计方法的比较,包括mean-variance portfolio(MVP)、 hierarchical risk parity(HRP)基于的股票组合和自适应神经网络(Autoencoder)基于的股票组合。
  • methods: 这篇论文使用了历史股票价格数据,从国家证券交易所(NSE)的十个主题领域选择了股票。使用了股票价格数据从2018年1月1日至2022年12月31日,并对这些股票组合进行了测试。
  • results: 研究发现,MVP股票组合在对数据上的风险考虑返回最好。然而,Autoencoder股票组合在一年 Returns 方面表现更好。
    Abstract Portfolio optimization has been an area that has attracted considerable attention from the financial research community. Designing a profitable portfolio is a challenging task involving precise forecasting of future stock returns and risks. This chapter presents a comparative study of three portfolio design approaches, the mean-variance portfolio (MVP), hierarchical risk parity (HRP)-based portfolio, and autoencoder-based portfolio. These three approaches to portfolio design are applied to the historical prices of stocks chosen from ten thematic sectors listed on the National Stock Exchange (NSE) of India. The portfolios are designed using the stock price data from January 1, 2018, to December 31, 2021, and their performances are tested on the out-of-sample data from January 1, 2022, to December 31, 2022. Extensive results are analyzed on the performance of the portfolios. It is observed that the performance of the MVP portfolio is the best on the out-of-sample data for the risk-adjusted returns. However, the autoencoder portfolios outperformed their counterparts on annual returns.
    摘要 股票股票组合优化已经吸引了金融研究社区的广泛关注。设计一个有利可图的股票组合是一项复杂的任务,涉及精准预测未来股票收益和风险。这章介绍了三种股票组合设计方法:均值风险股票组合(MVP)、层次风险平衡基于股票组合(HRP)以及自动编码基于股票组合。这三种股票组合设计方法在印度国家证券交易所(NSE)上市的10个主题领域上的股票历史价格数据上进行了应用。这些股票组合使用的价格数据是从2018年1月1日至2021年12月31日,并在这些数据上测试了其性能。结果显示,MVP股票组合在审核数据上的风险衡量回报最佳。然而,自动编码股票组合在年度收益上超过了其对应的股票组合。

DNAGPT: A Generalized Pre-trained Tool for Versatile DNA Sequence Analysis Tasks

  • paper_url: http://arxiv.org/abs/2307.05628
  • repo_url: None
  • paper_authors: Daoan Zhang, Weitong Zhang, Bing He, Yu Zhao, Jianguo Zhang, Chenchen Qin, Jianhua Yao
    for:DNAGPT is proposed to handle various DNA analysis tasks, including genomic signals and regions recognition, pseudo genomes generation, and mRNA abundance regression.methods:DNAGPT uses a pre-trained model with a modified architecture that includes binary classification and numerical regression tasks, as well as a comprehensive token language to encode sequence, number, and task-related information.results:DNAGPT demonstrates superior performance on various DNA analysis tasks, especially when compared to existing models specialized for specific downstream tasks.
    Abstract GPT has been proven to be capable of extracting general information from language sequences, thereby benefiting all downstream tasks. This motivates us to use pre-trained models to explore the hidden inherent information in DNA sequences. However, data and task requirements in DNA sequence analyses are tasked in different formats such as generation, prediction and regression, and are complexity and involve different modalities, such as nucleotides sequences and, expression levels, etc. Existing BERT-based models are mostly for generation tasks and use sequence data as input and output, thus cannot easily handle various DNA analysis tasks in one single model. Herein, we propose a generalized DNA pre-training DNA model, DNAGPT, that was trained on over 200 billion base pairs from all the mammals. We enhance the classic GPT model by adding binary classification task (DNA sequence order) and numerical regression task (guanine-cytosine content prediction) in the pre-training period and enhancing the architecture with corresponding embedding layers and encoding heads. We also design a comprehensive token language to encode sequence, number and task related information in the same token space. Therefore, DNAGPT can handle versatile DNA analysis tasks and simultaneously process handle both sequence and numerical data. We have evaluated our model on genomic signals and regions recognition, pseudo genomes generation and mRNA abudance regression tasks. We demonstrate that benefiting from pre-training, DNAGPT can shows superior performance than the existing models specially designed for various downstreams tasks.
    摘要

Number Systems for Deep Neural Network Architectures: A Survey

  • paper_url: http://arxiv.org/abs/2307.05035
  • repo_url: None
  • paper_authors: Ghada Alsuhli, Vasileios Sakellariou, Hani Saleh, Mahmoud Al-Qutayri, Baker Mohammad, Thanos Stouraitis
  • for: 本文主要探讨了深度神经网络(DNNs)中的数据表示方法,以提高DNNs的计算效率和能效性。
  • methods: 本文提出了多种非标准数学系统,包括循环乘法、基于扩展的Gaussian数学系统、基于循环的数学系统等,以优化DNNs的表示方法。
  • results: 本文结果表明,使用非标准数学系统可以提高DNNs的计算效率和能效性,同时也可以降低硬件设计的复杂性。但是,每种数学系统都有其缺点和挑战,需要进一步的研究和优化。
    Abstract Deep neural networks (DNNs) have become an enabling component for a myriad of artificial intelligence applications. DNNs have shown sometimes superior performance, even compared to humans, in cases such as self-driving, health applications, etc. Because of their computational complexity, deploying DNNs in resource-constrained devices still faces many challenges related to computing complexity, energy efficiency, latency, and cost. To this end, several research directions are being pursued by both academia and industry to accelerate and efficiently implement DNNs. One important direction is determining the appropriate data representation for the massive amount of data involved in DNN processing. Using conventional number systems has been found to be sub-optimal for DNNs. Alternatively, a great body of research focuses on exploring suitable number systems. This article aims to provide a comprehensive survey and discussion about alternative number systems for more efficient representations of DNN data. Various number systems (conventional/unconventional) exploited for DNNs are discussed. The impact of these number systems on the performance and hardware design of DNNs is considered. In addition, this paper highlights the challenges associated with each number system and various solutions that are proposed for addressing them. The reader will be able to understand the importance of an efficient number system for DNN, learn about the widely used number systems for DNN, understand the trade-offs between various number systems, and consider various design aspects that affect the impact of number systems on DNN performance. In addition, the recent trends and related research opportunities will be highlighted
    摘要

FairLay-ML: Intuitive Remedies for Unfairness in Data-Driven Social-Critical Algorithms

  • paper_url: http://arxiv.org/abs/2307.05029
  • repo_url: None
  • paper_authors: Normen Yu, Gang Tan, Saeid Tizpaz-Niari
  • For: This paper aims to provide a user-friendly interface for laypeople to understand and remedy unfairness in machine learning models.* Methods: The paper uses open-sourced machine learning model explanation tools, such as Local Interpretable Model-Agnostic Explanations (LIME), and integrates them with existing machine learning-focused graphical user interfaces (GUIs) like Python Streamlit.* Results: The paper tests the effectiveness of FairLay-ML, a proof-of-concept GUI, using models of various accuracy and fairness generated by an unfairness detector tool, Parfait-ML, and validates the results using Themis. The study finds that the technology stack used for FairLay-ML is easy to install and provides real-time black-box explanations of pre-trained models to users, and the explanations provided translate to actionable remedies.Here is the information in Simplified Chinese text:* For: 这项研究旨在提供一个用户友好的界面,使普通人能够理解和修复机器学习模型中的不公正性。* Methods: 这篇论文使用开源的机器学习模型解释工具,如Local Interpretable Model-Agnostic Explanations(LIME),并将其与现有的机器学习专注的图形用户界面(GUI)如Python Streamlit集成。* Results: 这篇论文测试了 FairLay-ML,一个证明性的 GUI,使用 Parfait-ML 生成的不公正度探测器生成的模型,并使用 Themis 验证结果。研究发现,FairLay-ML 使用的技术栈易于安装,可以在实时提供黑盒解释,并且解释提供了实际的纠正措施。
    Abstract This thesis explores open-sourced machine learning (ML) model explanation tools to understand whether these tools can allow a layman to visualize, understand, and suggest intuitive remedies to unfairness in ML-based decision-support systems. Machine learning models trained on datasets biased against minority groups are increasingly used to guide life-altering social decisions, prompting the urgent need to study their logic for unfairness. Due to this problem's impact on vast populations of the general public, it is critical for the layperson -- not just subject matter experts in social justice or machine learning experts -- to understand the nature of unfairness within these algorithms and the potential trade-offs. Existing research on fairness in machine learning focuses mostly on the mathematical definitions and tools to understand and remedy unfair models, with some directly citing user-interactive tools as necessary for future work. This thesis presents FairLay-ML, a proof-of-concept GUI integrating some of the most promising tools to provide intuitive explanations for unfair logic in ML models by integrating existing research tools (e.g. Local Interpretable Model-Agnostic Explanations) with existing ML-focused GUI (e.g. Python Streamlit). We test FairLay-ML using models of various accuracy and fairness generated by an unfairness detector tool, Parfait-ML, and validate our results using Themis. Our study finds that the technology stack used for FairLay-ML makes it easy to install and provides real-time black-box explanations of pre-trained models to users. Furthermore, the explanations provided translate to actionable remedies.
    摘要

Unleashing the Potential of Regularization Strategies in Learning with Noisy Labels

  • paper_url: http://arxiv.org/abs/2307.05025
  • repo_url: None
  • paper_authors: Hui Kang, Sheng Liu, Huaxi Huang, Jun Yu, Bo Han, Dadong Wang, Tongliang Liu
  • for: 本研究旨在探讨学习含杂标签时的稳定性和泛化性问题,并提出简单的基线方法来解决这个问题。
  • methods: 本研究使用了混合抑制技术,包括学习率衰退、模型权重平均、数据增强等方法来提高模型的稳定性和泛化性。
  • results: 研究发现,使用简单的基线方法可以超越当前的状态艺术方法,并且这些方法可以更好地发挥其潜力。这些结果 suggessts that employing a combination of regularization strategies can be more effective than intricate algorithms in tackling the challenges of learning with noisy labels.
    Abstract In recent years, research on learning with noisy labels has focused on devising novel algorithms that can achieve robustness to noisy training labels while generalizing to clean data. These algorithms often incorporate sophisticated techniques, such as noise modeling, label correction, and co-training. In this study, we demonstrate that a simple baseline using cross-entropy loss, combined with widely used regularization strategies like learning rate decay, model weights average, and data augmentations, can outperform state-of-the-art methods. Our findings suggest that employing a combination of regularization strategies can be more effective than intricate algorithms in tackling the challenges of learning with noisy labels. While some of these regularization strategies have been utilized in previous noisy label learning research, their full potential has not been thoroughly explored. Our results encourage a reevaluation of benchmarks for learning with noisy labels and prompt reconsideration of the role of specialized learning algorithms designed for training with noisy labels.
    摘要 在近年来,学习含杂标签的研究主要关注开发 novel 算法,以实现对含杂训练标签的Robustness 性,同时能够泛化到干净数据上。这些算法通常包括复杂的技术,如噪声模型、标签修正和合作学习。在这项研究中,我们发现,使用 Cross-Entropy 损失函数,并与通用的规则化策略(如学习率减少、模型权重平均和数据扩展)结合使用,可以超越当前的state-of-the-art 方法。我们的发现表明,结合规则化策略可以更有效地解决含杂标签学习中的挑战。虽然一些这些规则化策略在过去的含杂标签学习研究中已经被利用,但它们的潜力尚未得到了全面的探索。我们的结果鼓励我们重新评估含杂标签学习的标准准则,并重新考虑特殊的含杂标签学习算法。

Feature Activation Map: Visual Explanation of Deep Learning Models for Image Classification

  • paper_url: http://arxiv.org/abs/2307.05017
  • repo_url: None
  • paper_authors: Yi Liao, Yongsheng Gao, Weichuan Zhang
  • for: 这 paper 的目的是为了解释 deep learning 模型不含全连接层的分类器的决策。
  • methods: 该 paper 提出了一种后处解释工具 named feature activation map (FAM),可以用于解释不含全连接层的 deep learning 模型。 FAM 算法通过计算图像嵌入的相似度分布来 derive 通道 wise 贡献权重,然后将 activation map 与相应的正规化贡献权重进行线性组合,形成解释图。
  • results: 在十种 deep learning 模型上,包括几种 few-shot 图像分类、对比学习图像分类和图像检索任务,Quantitative 和 Qualitative 实验结果表明 FAM 算法的有效性。
    Abstract Decisions made by convolutional neural networks(CNN) can be understood and explained by visualizing discriminative regions on images. To this end, Class Activation Map (CAM) based methods were proposed as powerful interpretation tools, making the prediction of deep learning models more explainable, transparent, and trustworthy. However, all the CAM-based methods (e.g., CAM, Grad-CAM, and Relevance-CAM) can only be used for interpreting CNN models with fully-connected (FC) layers as a classifier. It is worth noting that many deep learning models classify images without FC layers, e.g., few-shot learning image classification, contrastive learning image classification, and image retrieval tasks. In this work, a post-hoc interpretation tool named feature activation map (FAM) is proposed, which can interpret deep learning models without FC layers as a classifier. In the proposed FAM algorithm, the channel-wise contribution weights are derived from the similarity scores between two image embeddings. The activation maps are linearly combined with the corresponding normalized contribution weights, forming the explanation map for visualization. The quantitative and qualitative experiments conducted on ten deep learning models for few-shot image classification, contrastive learning image classification and image retrieval tasks demonstrate the effectiveness of the proposed FAM algorithm.
    摘要 <>translate_language: zh-CN文本:深度学习模型的决策可以通过图像视觉化的方式来理解和解释。为此,基于图像活动映射(CAM)的方法被提出,使得深度学习模型的预测变得更加可解、透明和信任worthy。然而,所有的CAM基于方法(如CAM、Grad-CAM和Relevance-CAM)都只适用于深度学习模型中的完全连接(FC)层作为分类器。它们无法用于解释不含FC层的深度学习模型,例如几拍学习图像分类、对比学习图像分类和图像检索任务。在这种情况下,一种后期解释工具named feature activation map(FAM)被提出,可以解释不含FC层的深度学习模型。在提出的FAM算法中,通过两个图像嵌入的相似度分布来 derivate通道 wise的贡献重量。然后,将活动地图与相应的 норmal化贡献重量进行线性组合,形成解释地图 для视觉化。对于几拍学习图像分类、对比学习图像分类和图像检索任务中的十个深度学习模型,我们进行了量化和质量测试,结果表明FAM算法的效果。Translation:<>translate_language: zh-CN文本:深度学习模型的决策可以通过图像视觉化的方式来理解和解释。为此,基于图像活动映射(CAM)的方法被提出,使得深度学习模型的预测变得更加可解、透明和信任worthy。然而,所有的CAM基于方法(如CAM、Grad-CAM和Relevance-CAM)都只适用于深度学习模型中的完全连接(FC)层作为分类器。它们无法用于解释不含FC层的深度学习模型,例如几拍学习图像分类、对比学习图像分类和图像检索任务。在这种情况下,一种后期解释工具named feature activation map(FAM)被提出,可以解释不含FC层的深度学习模型。在提出的FAM算法中,通过两个图像嵌入的相似度分布来 derivate通道 wise的贡献重量。然后,将活动地图与相应的 норmal化贡献重量进行线性组合,形成解释地图 для视觉化。对于几拍学习图像分类、对比学习图像分类和图像检索任务中的十个深度学习模型,我们进行了量化和质量测试,结果表明FAM算法的效果。

CILF:Causality Inspired Learning Framework for Out-of-Distribution Vehicle Trajectory Prediction

  • paper_url: http://arxiv.org/abs/2307.05624
  • repo_url: None
  • paper_authors: Shengyi Li, Qifan Xue, Yezhuo Zhang, Xuanpeng Li
  • for: 提高自动驾驶车辆的路径预测精度
  • methods: 提出了一种基于 causal graph 的 Out-of-Distribution Causal Graph (OOD-CG) 方法,并提出了一种基于这个 causal graph 的 Causal Inspired Learning Framework (CILF)
  • results: 在 NGSIM 和 INTERACTION 两个主流数据集上,CILF 实现了提高域间泛化性的表现
    Abstract Trajectory prediction is critical for autonomous driving vehicles. Most existing methods tend to model the correlation between history trajectory (input) and future trajectory (output). Since correlation is just a superficial description of reality, these methods rely heavily on the i.i.d. assumption and evince a heightened susceptibility to out-of-distribution data. To address this problem, we propose an Out-of- Distribution Causal Graph (OOD-CG), which explicitly defines the underlying causal structure of the data with three entangled latent features: 1) domain-invariant causal feature (IC), 2) domain-variant causal feature (VC), and 3) domain-variant non-causal feature (VN ). While these features are confounded by confounder (C) and domain selector (D). To leverage causal features for prediction, we propose a Causal Inspired Learning Framework (CILF), which includes three steps: 1) extracting domain-invariant causal feature by means of an invariance loss, 2) extracting domain variant feature by domain contrastive learning, and 3) separating domain-variant causal and non-causal feature by encouraging causal sufficiency. We evaluate the performance of CILF in different vehicle trajectory prediction models on the mainstream datasets NGSIM and INTERACTION. Experiments show promising improvements in CILF on domain generalization.
    摘要 几何预测是自动驾驶车辆中的关键技术。大多数现有方法都是根据历史轨迹(输入)和未来轨迹(输出)之间的相互相关性模型。由于相互相关性只是现象的表面描述,这些方法对于非常用数据的敏感性较高。为了解决这个问题,我们提出了一个 OUT-OF-DISTRIBUTION causal graph(OOD-CG),它明确地定义了数据的底层 causal 结构,包括三个涉及的隐藏特征:1)域对称 causal 特征(IC),2)域特有 causal 特征(VC),和3)域特有 non-causal 特征(VN)。这些特征受到干扰因子(C)和域选择器(D)的混合影响。为了利用 causal 特征进行预测,我们提出了一个 causal 灵感学习框架(CILF),包括三个步骤:1)通过不对称损失提取域对称 causal 特征,2)通过域区别学习提取域特有 causal 特征,和3)通过鼓励 causal 充分性来分离域特有 causal 和 non-causal 特征。我们在主流的 NGSIM 和 INTERACTION 等数据集上评估了 CILF 的表现,实验结果显示了 CILF 在域泛化中的明显改进。

Test-Time Training on Video Streams

  • paper_url: http://arxiv.org/abs/2307.05014
  • repo_url: https://github.com/molyswu/hand_detection
  • paper_authors: Renhao Wang, Yu Sun, Yossi Gandelsman, Xinlei Chen, Alexei A. Efros, Xiaolong Wang
  • for: 这个论文是为了提高在测试时使用已经训练好的模型的性能而设计的。
  • methods: 这个论文使用了在测试时使用自我监督任务,如图像重建使用压缩 autoencoders,来进行模型进一步改进。
  • results: 这个论文在三个实际 datasets 上对四个任务进行了实验,并取得了45% 和 66% 的相对提升。
    Abstract Prior work has established test-time training (TTT) as a general framework to further improve a trained model at test time. Before making a prediction on each test instance, the model is trained on the same instance using a self-supervised task, such as image reconstruction with masked autoencoders. We extend TTT to the streaming setting, where multiple test instances - video frames in our case - arrive in temporal order. Our extension is online TTT: The current model is initialized from the previous model, then trained on the current frame and a small window of frames immediately before. Online TTT significantly outperforms the fixed-model baseline for four tasks, on three real-world datasets. The relative improvement is 45% and 66% for instance and panoptic segmentation. Surprisingly, online TTT also outperforms its offline variant that accesses more information, training on all frames from the entire test video regardless of temporal order. This differs from previous findings using synthetic videos. We conceptualize locality as the advantage of online over offline TTT. We analyze the role of locality with ablations and a theory based on bias-variance trade-off.
    摘要 Translated into Simplified Chinese:先前的研究已经确立了测试时训练(TTT)为一种通用的框架,以进一步改进已经训练的模型。在测试时,模型会被训练在每个测试实例上,使用自我指导任务,如图像重建 WITH 掩码 autoencoders。我们将 TTT 扩展到流处理设置,其中多个测试实例(视频帧)会在时间顺序下 arrive。我们的扩展是在线 TTT:当前模型将从前一个模型 initialized,然后在当前帧和当前时间范围内的一小Window of frames上进行训练。在线 TTT 与固定模型基eline 相比,显著提高了四个任务的性能,分别是实例和杂点分割。奇怪的是,在线 TTT 还超过了它的离线变体,即训练所有测试视频帧的整体信息。这与以往使用 sintetic videos 的发现不同。我们认为本地性是在线 TTT 的优势。我们通过减少和基准范变换来分析本地性的作用。

A DeepLearning Framework for Dynamic Estimation of Origin-Destination Sequence

  • paper_url: http://arxiv.org/abs/2307.05623
  • repo_url: None
  • paper_authors: Zheli Xiong, Defu Lian, Enhong Chen, Gang Chen, Xiaomin Cheng
    for: 本研究旨在解决交通领域中OD矩阵估算中的主要问题,即使用交通传感器测量信息来估算交通需求表示的OD矩阵。methods: 本研究提议一种集成方法,利用深度学习方法来推断OD序列的结构,并使用结构约束导引传统的数值优化。results: 实验表明,神经网络可以有效地推断OD序列的结构,并提供实用的约束 для数值优化以获得更好的结果。此外,实验还表明,提供的结构信息不仅包含OD矩阵的空间结构约束,还包含时间结构约束,可以有效解决延迟问题。
    Abstract OD matrix estimation is a critical problem in the transportation domain. The principle method uses the traffic sensor measured information such as traffic counts to estimate the traffic demand represented by the OD matrix. The problem is divided into two categories: static OD matrix estimation and dynamic OD matrices sequence(OD sequence for short) estimation. The above two face the underdetermination problem caused by abundant estimated parameters and insufficient constraint information. In addition, OD sequence estimation also faces the lag challenge: due to different traffic conditions such as congestion, identical vehicle will appear on different road sections during the same observation period, resulting in identical OD demands correspond to different trips. To this end, this paper proposes an integrated method, which uses deep learning methods to infer the structure of OD sequence and uses structural constraints to guide traditional numerical optimization. Our experiments show that the neural network(NN) can effectively infer the structure of the OD sequence and provide practical constraints for numerical optimization to obtain better results. Moreover, the experiments show that provided structural information contains not only constraints on the spatial structure of OD matrices but also provides constraints on the temporal structure of OD sequence, which solve the effect of the lagging problem well.
    摘要 OD矩阵估计是交通领域中的关键问题。主要方法使用交通仪器测量信息,如交通统计数据,来估计交通需求表示的OD矩阵。问题分为两类:静态OD矩阵估计和动态OD序列(简称OD序列)估计。两者都面临了不充分约束的问题,导致估计过多的参数。此外,OD序列估计还面临着延迟问题:由于不同的交通情况,如拥堵,同一段道路上的同一辆车辆在同一个观察时期出现,导致同一个OD需求对应不同的旅行。为解决这些问题,本文提出了一种集成方法,使用深度学习方法来推断OD序列的结构,并使用结构约束来导引传统的数值优化。我们的实验表明,神经网络(NN)可以有效地推断OD序列的结构,并为数值优化提供实用的约束。此外,实验还表明,提供的结构信息不仅包含OD矩阵的空间结构约束,还提供了OD序列的时间结构约束,这有效解决了延迟问题。

Improving RNN-Transducers with Acoustic LookAhead

  • paper_url: http://arxiv.org/abs/2307.05006
  • repo_url: None
  • paper_authors: Vinit S. Unni, Ashish Mittal, Preethi Jyothi, Sunita Sarawagi
  • for: 这篇论文是为了提高语音转文字的精度和流动性而写的。
  • methods: 这篇论文使用了RNN-T模型,并提出了一个名为LookAhead的技术来让文本表现更加声音背景测量。
  • results: 这篇论文的实验结果显示,使用LookAhead技术可以导致文本误差率降低5%-20%,包括在域内和域外评估集上。
    Abstract RNN-Transducers (RNN-Ts) have gained widespread acceptance as an end-to-end model for speech to text conversion because of their high accuracy and streaming capabilities. A typical RNN-T independently encodes the input audio and the text context, and combines the two encodings by a thin joint network. While this architecture provides SOTA streaming accuracy, it also makes the model vulnerable to strong LM biasing which manifests as multi-step hallucination of text without acoustic evidence. In this paper we propose LookAhead that makes text representations more acoustically grounded by looking ahead into the future within the audio input. This technique yields a significant 5%-20% relative reduction in word error rate on both in-domain and out-of-domain evaluation sets.
    摘要

Latent Space Perspicacity and Interpretation Enhancement (LS-PIE) Framework

  • paper_url: http://arxiv.org/abs/2307.05620
  • repo_url: None
  • paper_authors: Jesse Stevens, Daniel N. Wilke, Itumeleng Setshedi
  • for: 这个论文的目的是提高线性隐 Variable 模型中的隐空间表示,以提高这些模型的可解释性。
  • methods: 这个论文提出了一个通用框架,用于自动对隐向量进行归类、缩放和排序,以提高每个隐向量的信息含量。这个框架还包括单通道和多通道数据源、数据预处理策略和特定 метри来自动确定隐向量的归类数量。
  • results: 在两个基础问题上,这个框架的效果被证明了,包括LR、LS和LCON等功能。这些功能可以帮助提高线性隐 Variable 模型的可解释性和应用范围。
    Abstract Linear latent variable models such as principal component analysis (PCA), independent component analysis (ICA), canonical correlation analysis (CCA), and factor analysis (FA) identify latent directions (or loadings) either ordered or unordered. The data is then projected onto the latent directions to obtain their projected representations (or scores). For example, PCA solvers usually rank the principal directions by explaining the most to least variance, while ICA solvers usually return independent directions unordered and often with single sources spread across multiple directions as multiple sub-sources, which is of severe detriment to their usability and interpretability. This paper proposes a general framework to enhance latent space representations for improving the interpretability of linear latent spaces. Although the concepts in this paper are language agnostic, the framework is written in Python. This framework automates the clustering and ranking of latent vectors to enhance the latent information per latent vector, as well as, the interpretation of latent vectors. Several innovative enhancements are incorporated including latent ranking (LR), latent scaling (LS), latent clustering (LC), and latent condensing (LCON). For a specified linear latent variable model, LR ranks latent directions according to a specified metric, LS scales latent directions according to a specified metric, LC automatically clusters latent directions into a specified number of clusters, while, LCON automatically determines an appropriate number of clusters into which to condense the latent directions for a given metric. Additional functionality of the framework includes single-channel and multi-channel data sources, data preprocessing strategies such as Hankelisation to seamlessly expand the applicability of linear latent variable models (LLVMs) to a wider variety of data. The effectiveness of LR, LS, and LCON are showcased on two crafted foundational problems with two applied latent variable models, namely, PCA and ICA.
    摘要 Linear 隐变量模型,如主成分分析(PCA)、独立成分分析(ICA)、共谱分析(CCA)和因素分析(FA),可以找到隐向量(或负荷)是有序还是无序的。然后将数据Project onto these latent directions to obtain their projected representations(或分数)。例如,PCA 解决器通常会根据解释最多变量来排序主方向,而 ICA 解决器通常会返回独立的方向,无序,经常有多个来源分散在多个方向中,这会影响其可用性和可读性。这篇文章提出了一种通用框架,用于提高线性隐空间表示的可解释性。尽管这些概念是语言无关的,但框架是写在Python语言中。这个框架可以自动将隐向量集中到减少隐信息的latent vector,以及提高隐向量的解释性。框架包含了多种创新的改进,包括隐向量排名(LR)、隐向量缩放(LS)、隐向量划分(LC)和隐向量压缩(LCON)。对于指定的线性隐变量模型,LR 可以根据指定的度量将隐方向排名,LS 可以根据指定的度量缩放隐方向,LC 可以自动将隐方向分为指定数量的集中,而 LCON 可以自动确定隐方向压缩到指定度量的最佳数量。框架还包括单通道和多通道数据源,以及数据预处理策略,例如束腾化来扩展线性隐变量模型(LLVMs)的应用范围。LR、LS和LCON 的效果在两个基本问题上进行了示例,这两个问题分别使用 PCA 和 ICA 作为应用隐变量模型。

Control as Probabilistic Inference as an Emergent Communication Mechanism in Multi-Agent Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2307.05004
  • repo_url: None
  • paper_authors: Tomoaki Nakamura, Akira Taniguchi, Tadahiro Taniguchi
  • for: 这种论文旨在提出一种生成概率模型,整合emergent communication和多个代理人强化学习。
  • methods: 该模型使用概率推理进行控制,并通过隐藏变量和估计来实现代理人之间的交流。
  • results: 通过在网格环境中的实验,我们表明了该PGM可以推理出有意义的消息,以完成合作任务。
    Abstract This paper proposes a generative probabilistic model integrating emergent communication and multi-agent reinforcement learning. The agents plan their actions by probabilistic inference, called control as inference, and communicate using messages that are latent variables and estimated based on the planned actions. Through these messages, each agent can send information about its actions and know information about the actions of another agent. Therefore, the agents change their actions according to the estimated messages to achieve cooperative tasks. This inference of messages can be considered as communication, and this procedure can be formulated by the Metropolis-Hasting naming game. Through experiments in the grid world environment, we show that the proposed PGM can infer meaningful messages to achieve the cooperative task.
    摘要

Selective Sampling and Imitation Learning via Online Regression

  • paper_url: http://arxiv.org/abs/2307.04998
  • repo_url: None
  • paper_authors: Ayush Sekhari, Karthik Sridharan, Wen Sun, Runzhe Wu
  • for: 本文提出了一种解决Imitation Learning(IL)问题的交互算法,使得在受到噪声专家反馈的情况下,IL可以更加成功。
  • methods: 本文使用选择样本算法,通过咨询噪声专家来获得反馈,以提高IL的性能。
  • results: 本文提供了一种新的选择样本算法,可以在涉及到多个动作和通用函数类型的情况下实现IL。这个算法的 regret bound和查询次数都与在线回归 oracle 相关,并且可以与噪声专家进行有限次的交互。
    Abstract We consider the problem of Imitation Learning (IL) by actively querying noisy expert for feedback. While imitation learning has been empirically successful, much of prior work assumes access to noiseless expert feedback which is not practical in many applications. In fact, when one only has access to noisy expert feedback, algorithms that rely on purely offline data (non-interactive IL) can be shown to need a prohibitively large number of samples to be successful. In contrast, in this work, we provide an interactive algorithm for IL that uses selective sampling to actively query the noisy expert for feedback. Our contributions are twofold: First, we provide a new selective sampling algorithm that works with general function classes and multiple actions, and obtains the best-known bounds for the regret and the number of queries. Next, we extend this analysis to the problem of IL with noisy expert feedback and provide a new IL algorithm that makes limited queries. Our algorithm for selective sampling leverages function approximation, and relies on an online regression oracle w.r.t.~the given model class to predict actions, and to decide whether to query the expert for its label. On the theoretical side, the regret bound of our algorithm is upper bounded by the regret of the online regression oracle, while the query complexity additionally depends on the eluder dimension of the model class. We complement this with a lower bound that demonstrates that our results are tight. We extend our selective sampling algorithm for IL with general function approximation and provide bounds on both the regret and the number of queries made to the noisy expert. A key novelty here is that our regret and query complexity bounds only depend on the number of times the optimal policy (and not the noisy expert, or the learner) go to states that have a small margin.
    摘要 我们考虑伪模仿学习(IL)问题,通过活动地发送受惊访问来获得不精确的专家反馈。过往的大部分研究假设可以得到不受扰动的专家反馈,但这在许多应用中不实际。事实上,只有可以获得不精确的专家反馈时,基于专家反馈的非互动式IL(非互动式学习)的算法可以证明需要极大的样本数量才能成功。相比之下,在这个研究中,我们提出了一个互动式IL算法,使用选择性样本来活动地发送受惊访问。我们的贡献包括:首先,我们提出了一个新的选择性样本算法,适用于一般函数类别和多个动作。我们获得了最好的known bounds的 regret和询问数量。其次,我们将这一分析扩展到伪模仿学习问题中,提出了一个新的IL算法,使用选择性样本来获得有限的询问数量。我们的算法利用函数近似,并且透过在line上的 regression oracle 来预测动作,并决定是否需要受惊访问专家。从理论上看,我们的算法的 regret bound是由online regression oracle的 regret bound所 upper bounded,而且询问 complexity 还dependent于模型类别的eluder dimension。我们补充了一个下界,证明我们的结果是紧缩的。 finally,我们扩展了我们的选择性样本算法,提供了具有 regret和询问数量 bounds的IL算法,适用于一般函数类别和多个动作。这个新的特点是,我们的 regret和询问 complexity bounds仅dependent于Optimal policy(而不是专家、学习者)在状态空间中的小margin次数。

Empowering recommender systems using automatically generated Knowledge Graphs and Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2307.04996
  • repo_url: https://github.com/GhanshyamVerma/Explainable-Recommender-System
  • paper_authors: Ghanshyam Verma, Shovon Sengupta, Simon Simanta, Huan Chen, Janos A. Perge, Devishree Pillai, John P. McCrae, Paul Buitelaar
  • For: This paper focuses on developing interpretable knowledge graph-based recommender systems for personalized article recommendations in financial services.* Methods: The authors propose two approaches: one using Reinforcement Learning and the other using XGBoost, both of which leverage a knowledge graph generated from structured and unstructured data. The Reinforcement Learning-based approach utilizes graph traversal paths to provide interpretations, while the XGBoost-based approach uses post-hoc methods like SHAP and ELI5 to provide explainable results.* Results: The approach offers explainable results, promoting better decision-making, and demonstrates the potential of combining advanced machine learning techniques with KG-driven insights for enhancing customer experience in relationship management.Here’s the Chinese version:* For: 这篇论文关注开发可解释知识图基于文章个性化推荐在金融服务中。* Methods: 作者提出了两种方法:一种使用奖励学习,另一种使用XGBoost,两者都利用了基于结构化和无结构化数据生成的知识图。奖励学习基本方法使用图 traversal 路径来提供解释,而 XGBoost 基本方法使用 SHAP 和 ELI5 等后置方法提供解释结果。* Results: 方法提供了可解释结果,促进更好的决策,并证明了结合先进机器学习技术和知识图驱动的想法可以增强客户关系管理的体验。
    Abstract Personalized recommendations have a growing importance in direct marketing, which motivates research to enhance customer experiences by knowledge graph (KG) applications. For example, in financial services, companies may benefit from providing relevant financial articles to their customers to cultivate relationships, foster client engagement and promote informed financial decisions. While several approaches center on KG-based recommender systems for improved content, in this study we focus on interpretable KG-based recommender systems for decision making.To this end, we present two knowledge graph-based approaches for personalized article recommendations for a set of customers of a large multinational financial services company. The first approach employs Reinforcement Learning and the second approach uses the XGBoost algorithm for recommending articles to the customers. Both approaches make use of a KG generated from both structured (tabular data) and unstructured data (a large body of text data).Using the Reinforcement Learning-based recommender system we could leverage the graph traversal path leading to the recommendation as a way to generate interpretations (Path Directed Reasoning (PDR)). In the XGBoost-based approach, one can also provide explainable results using post-hoc methods such as SHAP (SHapley Additive exPlanations) and ELI5 (Explain Like I am Five).Importantly, our approach offers explainable results, promoting better decision-making. This study underscores the potential of combining advanced machine learning techniques with KG-driven insights to bolster experience in customer relationship management.
    摘要 personalized recommendations 的重要性在直接市场营销中增长,这些研究旨在提高客户体验 durch 知识图(KG)应用。例如,在金融服务中,公司可能会从提供相关的金融文章来培养关系,促进客户参与度和提高客户做出的Financial 决策。虽然许多方法集中在 KG 基于的推荐系统中,但在这项研究中,我们关注可解释 KG 基于的推荐系统。为此,我们提出了两种基于知识图的方法,用于个性化文章推荐。首先,我们使用强化学习来实现 Reinforcement Learning 基于的推荐系统。这种方法可以利用知识图的搜索路径来生成可解释的结果(Path Directed Reasoning (PDR))。其次,我们使用 XGBoost 算法来推荐文章。这种方法可以通过后处方法,如 SHAP 和 ELI5,提供可解释的结果。我们的方法可以提供可解释的结果,这会促进更好的决策。这项研究证明了将高级机器学习技术与知识图驱动的 Insights 结合使用,可以提高客户关系管理的经验。

PowerFusion: A Tensor Compiler with Explicit Data Movement Description and Instruction-level Graph IR

  • paper_url: http://arxiv.org/abs/2307.04995
  • repo_url: None
  • paper_authors: Zixuan Ma, Haojie Wang, Jingze Xing, Liyan Zheng, Chen Zhang, Huanqi Cao, Kezhao Huang, Shizhi Tang, Penghan Wang, Jidong Zhai
  • for: 这个论文主要是为了提高深度神经网络(DNN)的计算效率,以及对不同领域的加速器上的代码生成。
  • methods: 这个论文提出了一个名为IntelliGen的tensor compiler,可以为memory-intensive操作生成高性能的代码,并考虑到 computation和data movement optimizations。
  • results: 在试验IntelliGen时,在NVIDIA GPU、AMD GPU和Cambricon MLU上得到了1.97倍、2.93倍和16.91倍的速度提升(在average上是1.28倍、1.23倍和2.31倍),较现有最高效的框架还要快。
    Abstract Deep neural networks (DNNs) are of critical use in different domains. To accelerate DNN computation, tensor compilers are proposed to generate efficient code on different domain-specific accelerators. Existing tensor compilers mainly focus on optimizing computation efficiency. However, memory access is becoming a key performance bottleneck because the computational performance of accelerators is increasing much faster than memory performance. The lack of direct description of memory access and data dependence in current tensor compilers' intermediate representation (IR) brings significant challenges to generate memory-efficient code. In this paper, we propose IntelliGen, a tensor compiler that can generate high-performance code for memory-intensive operators by considering both computation and data movement optimizations. IntelliGen represent a DNN program using GIR, which includes primitives indicating its computation, data movement, and parallel strategies. This information will be further composed as an instruction-level dataflow graph to perform holistic optimizations by searching different memory access patterns and computation operations, and generating memory-efficient code on different hardware. We evaluate IntelliGen on NVIDIA GPU, AMD GPU, and Cambricon MLU, showing speedup up to 1.97x, 2.93x, and 16.91x(1.28x, 1.23x, and 2.31x on average), respectively, compared to current most performant frameworks.
    摘要 深度神经网络(DNN)在不同领域都有critical使用。为加速DNN计算,tensor编译器被提议,以生成在不同领域特定加速器上的高效代码。现有的tensor编译器主要关注计算效率。然而,内存访问已成为计算加速器的性能瓶颈,因为计算性能的提升速度比内存性能提升的速度要快得多。现有的tensor编译器中间表示(IR)缺乏直接描述内存访问和数据依赖,这使得生成内存高效代码带来重大挑战。为解决这个问题,我们提出了IntelliGen,一种tensor编译器,可以通过考虑计算和数据移动优化来生成高性能的内存高效代码。IntelliGen使用GIR表示DNN程序,GIR包括计算、数据移动和并行策略的元素。这些信息将被组合成为数字水平的数据流图,以实现整体优化。我们对IntelliGen进行了NVIDIA GPU、AMD GPU和Cambricon MLU的测试,并显示了与当前最高性能框架的比较,获得了速度提升为1.97倍、2.93倍和16.91倍(1.28倍、1.23倍和2.31倍的平均提升)。

Uncertainty Quantification of the Virial Black Hole Mass with Conformal Prediction

  • paper_url: http://arxiv.org/abs/2307.04993
  • repo_url: https://github.com/yongsukyee/uncertain_blackholemass
  • paper_authors: Suk Yee Yong, Cheng Soon Ong
  • for: 这个研究的目的是为了测量黑洞质量的精度,以了解黑洞和宿主 галактика之间的演化。
  • methods: 这个研究使用了对称化量划 regression (CQR) 来衡量黑洞预测的不确定性。
  • results: 研究发现,使用 CQR 方法可以提供更有用的预测 интерVAL指标,并且可以根据黑洞质量和相关属性进行调整。
    Abstract Precise measurements of the black hole mass are essential to gain insight on the black hole and host galaxy co-evolution. A direct measure of the black hole mass is often restricted to nearest galaxies and instead, an indirect method using the single-epoch virial black hole mass estimation is used for objects at high redshifts. However, this method is subjected to biases and uncertainties as it is reliant on the scaling relation from a small sample of local active galactic nuclei. In this study, we propose the application of conformalised quantile regression (CQR) to quantify the uncertainties of the black hole predictions in a machine learning setting. We compare CQR with various prediction interval techniques and demonstrated that CQR can provide a more useful prediction interval indicator. In contrast to baseline approaches for prediction interval estimation, we show that the CQR method provides prediction intervals that adjust to the black hole mass and its related properties. That is it yields a tighter constraint on the prediction interval (hence more certain) for a larger black hole mass, and accordingly, bright and broad spectral line width source. Using a combination of neural network model and CQR framework, the recovered virial black hole mass predictions and uncertainties are comparable to those measured from the Sloan Digital Sky Survey. The code is publicly available at https://github.com/yongsukyee/uncertain_blackholemass.
    摘要 精确测量黑洞质量是研究黑洞和宿主 галактики共EVOLUTION的关键。直接测量黑洞质量通常只能在最近的 галактиках中进行,而高红shift объек图使用单个epoch virial黑洞质量估计法则受到偏见和不确定性的限制。在这种情况下,我们提议使用尺度化量表 regression(CQR)来评估黑洞预测的不确定性。我们比较了CQR与various prediction interval技术,并证明了CQR可以提供更有用的预测间隔指标。与基eline方法相比,CQR方法提供的预测间隔变化随着黑洞质量和相关的属性而变化,即在更大的黑洞质量和明亮宽 spectral line width sources 时提供更紧张的预测间隔(更确定)。使用一种带有神经网络模型的CQR框架,我们 retrieved virial黑洞质量预测和不确定性结果与SDSS中的测量结果相符。代码可以在https://github.com/yongsukyee/uncertain_blackholemass上获取。

Monotone deep Boltzmann machines

  • paper_url: http://arxiv.org/abs/2307.04990
  • repo_url: None
  • paper_authors: Zhili Feng, Ezra Winston, J. Zico Kolter
  • for: This paper explores the possibility of efficient approximate inference in deep Boltzmann machines (DBMs) by developing a new class of restricted models called monotone DBMs.
  • methods: The paper uses tools from the recently-proposed monotone Deep Equilibrium model to develop a fixed-point iteration that gives a variational mean-field solution for monotone DBMs.
  • results: The paper demonstrates that the proposed approach allows for tasks such as joint completion and classification of images within a single deep probabilistic setting, while avoiding the pitfalls of mean-field inference in traditional RBMs.Here is the same information in Simplified Chinese text:
  • for: 这篇论文探讨了深度博尔茨曼机器 (DBM) 的有效减少概率 approximate inference 的可能性,通过开发一种新的受限模型,即 monotone DBM。
  • methods: 论文使用最近提出的 monotone Deep Equilibrium 模型的工具,开发了一种固有点迭代,以获得 monotone DBM 的变量场解。
  • results: 论文示出了该方法可以在单个深度概率设定下完成图像的联合完成和分类任务,而不是传统 RBM 中的mean-field inference 中的陷阱。
    Abstract Deep Boltzmann machines (DBMs), one of the first ``deep'' learning methods ever studied, are multi-layered probabilistic models governed by a pairwise energy function that describes the likelihood of all variables/nodes in the network. In practice, DBMs are often constrained, i.e., via the \emph{restricted} Boltzmann machine (RBM) architecture (which does not permit intra-layer connections), in order to allow for more efficient inference. In this work, we revisit the generic DBM approach, and ask the question: are there other possible restrictions to their design that would enable efficient (approximate) inference? In particular, we develop a new class of restricted model, the monotone DBM, which allows for arbitrary self-connection in each layer, but restricts the \emph{weights} in a manner that guarantees the existence and global uniqueness of a mean-field fixed point. To do this, we leverage tools from the recently-proposed monotone Deep Equilibrium model and show that a particular choice of activation results in a fixed-point iteration that gives a variational mean-field solution. While this approach is still largely conceptual, it is the first architecture that allows for efficient approximate inference in fully-general weight structures for DBMs. We apply this approach to simple deep convolutional Boltzmann architectures and demonstrate that it allows for tasks such as the joint completion and classification of images, within a single deep probabilistic setting, while avoiding the pitfalls of mean-field inference in traditional RBMs.
    摘要 深度博尔茨曼机(DBM)是一种多层概率模型,其中每个层都有一个对应的概率分布,用于描述网络中所有变量/节点的可能性。在实践中,DBM通常会被限制,例如通过使用Restricted Boltzmann Machine(RBM)架构,以便更加有效地进行推理。在这个工作中,我们回到了基本的DBM方法,并问:是否有其他可能的限制,以实现更加有效的推理?特别是,我们开发了一种新的受限模型,即偏好DBM,该模型允许每层任意自连接,但是限制权重的方式,以保证存在和全局唯一的均衡点。为此,我们利用了最近提出的偏好深度平衡模型的工具,并证明在某种特定的激活函数下,这种模型会导致一个均衡点逻辑的解。尽管这种方法仍然是概念上的,但它是DBM中第一种允许高效近似推理的建筑。我们应用这种方法于深度卷积博尔茨曼架构,并示出它可以在单个深度概率设定下完成图像的联合完成和分类任务,而不需要传统RBM的含义场推理。

Benchmarking Bayesian Causal Discovery Methods for Downstream Treatment Effect Estimation

  • paper_url: http://arxiv.org/abs/2307.04988
  • repo_url: None
  • paper_authors: Chris Chinenye Emezue, Alexandre Drouin, Tristan Deleu, Stefan Bauer, Yoshua Bengio
  • for: 评估 causal discovery 方法的下游 task 的效果,即 treatment effect estimation。
  • methods: 使用 seven 个基准方法,包括一种新提出的 GFlowNets 方法,对 causal discovery 方法的下游 task 进行评估。
  • results: 研究结果显示,一些算法能够有效地捕捉各种有用和多样的 ATE 模式,而其他一些算法往往学习低概率模式,影响 (不relax) 精度和准确性。
    Abstract The practical utility of causality in decision-making is widespread and brought about by the intertwining of causal discovery and causal inference. Nevertheless, a notable gap exists in the evaluation of causal discovery methods, where insufficient emphasis is placed on downstream inference. To address this gap, we evaluate seven established baseline causal discovery methods including a newly proposed method based on GFlowNets, on the downstream task of treatment effect estimation. Through the implementation of a distribution-level evaluation, we offer valuable and unique insights into the efficacy of these causal discovery methods for treatment effect estimation, considering both synthetic and real-world scenarios, as well as low-data scenarios. The results of our study demonstrate that some of the algorithms studied are able to effectively capture a wide range of useful and diverse ATE modes, while some tend to learn many low-probability modes which impacts the (unrelaxed) recall and precision.
    摘要 “ causality 在决策中的实际用途广泛,这与 causal discovery 和 causal inference 的相互关联有关。然而,评估 causal discovery 方法的一个显著的差距是在下游推理方面,现有的评估方法强调上游推理。为了解决这个差距,我们评估了七种已有的基准 causal discovery 方法,包括一种基于 GFlowNets 的新方法,在对医疗效果估计任务上。通过实施分布水平评估,我们提供了有价值和独特的洞察,探讨这些 causal discovery 方法在医疗效果估计任务中的效果,包括合成和实际场景,以及低数据场景。结果显示,一些算法可以有效地捕捉各种有用和多样的 ATE 模式,而其他些则往往学习低概率模式,影响(不压缩)准确率和准确率。”

Secrets of RLHF in Large Language Models Part I: PPO

  • paper_url: http://arxiv.org/abs/2307.04964
  • repo_url: https://github.com/openlmlab/moss-rlhf
  • paper_authors: Rui Zheng, Shihan Dou, Songyang Gao, Yuan Hua, Wei Shen, Binghai Wang, Yan Liu, Senjie Jin, Qin Liu, Yuhao Zhou, Limao Xiong, Lu Chen, Zhiheng Xi, Nuo Xu, Wenbin Lai, Minghao Zhu, Cheng Chang, Zhangyue Yin, Rongxiang Weng, Wensen Cheng, Haoran Huang, Tianxiang Sun, Hang Yan, Tao Gui, Qi Zhang, Xipeng Qiu, Xuanjing Huang
    for:这篇论文的目标是提出一种技术Alignment的方法,以便在人类中心的辅助下,使大型语言模型(LLMs)得到人类Feedback(RLHF)的改进。methods:该论文使用了 reward models 来测量人类的偏好,Proximal Policy Optimization(PPO)来优化策略模型的输出,以及 process supervision 来提高步骤逻辑能力。results:研究发现,Policy constraints 是 PPO 算法的关键因素,因此,他们提出了 PPO-max 算法,以提高策略模型的训练稳定性。基于主要结果,他们进行了RLHF的全面分析,并与 SFT 模型和 ChatGPT 进行了比较。
    Abstract Large language models (LLMs) have formulated a blueprint for the advancement of artificial general intelligence. Its primary objective is to function as a human-centric (helpful, honest, and harmless) assistant. Alignment with humans assumes paramount significance, and reinforcement learning with human feedback (RLHF) emerges as the pivotal technological paradigm underpinning this pursuit. Current technical routes usually include \textbf{reward models} to measure human preferences, \textbf{Proximal Policy Optimization} (PPO) to optimize policy model outputs, and \textbf{process supervision} to improve step-by-step reasoning capabilities. However, due to the challenges of reward design, environment interaction, and agent training, coupled with huge trial and error cost of large language models, there is a significant barrier for AI researchers to motivate the development of technical alignment and safe landing of LLMs. The stable training of RLHF has still been a puzzle. In the first report, we dissect the framework of RLHF, re-evaluate the inner workings of PPO, and explore how the parts comprising PPO algorithms impact policy agent training. We identify policy constraints being the key factor for the effective implementation of the PPO algorithm. Therefore, we explore the PPO-max, an advanced version of PPO algorithm, to efficiently improve the training stability of the policy model. Based on our main results, we perform a comprehensive analysis of RLHF abilities compared with SFT models and ChatGPT. The absence of open-source implementations has posed significant challenges to the investigation of LLMs alignment. Therefore, we are eager to release technical reports, reward models and PPO codes, aiming to make modest contributions to the advancement of LLMs.
    摘要 大型语言模型(LLM)已经制定了人工通用智能的发展蓝图。其主要目标是作为人acentric(帮助、诚实、无害)助手。与人类Alignment相当重要,而使用人类反馈学习(RLHF)成为了这一努力的核心技术。现有的技术路径通常包括了优先项目模型来度量人类喜好,使用Proximal Policy Optimization(PPO)来优化政策模型的输出,以及过程监控来提高步骤逻辑能力。但由于优先项目设计、环境互动和机器人训练等因素,加上大型语言模型的实验成本巨大,使得AI研究人员对LLM的技术Alignment和安全落地带出了很大的挑战。RLHF的稳定训练仍然是一个谜。在本报告中,我们分析RLHF的框架,重新评估PPO内部运作,并探索PPO算法中的不同部分对政策代理训练的影响。我们发现政策限制是PPO算法的关键因素。因此,我们探索了PPO-max,一种PPO算法的进阶版本,以提高政策模型训练的稳定性。根据我们的主要结果,我们进行了RLHF能力的全面分析,与SFT模型和ChatGPT进行比较。由于LLMs的开源实现缺乏,因此我们对LLMs的对齐做出了很大的挑战。因此,我们将释出技术报告、优先项目模型和PPO代码,以做出一定的贡献 LLMs的进一步发展。

DyCL: Dynamic Neural Network Compilation Via Program Rewriting and Graph Optimization

  • paper_url: http://arxiv.org/abs/2307.04963
  • repo_url: None
  • paper_authors: Simin Chen, Shiyi Wei, Cong Liu, Wei Yang
  • for: 提高动态神经网络(DyNNs)的编译效率和性能。
  • methods: 提出一种通用的方法,使得现有的深度学习(DL)编译器可以成功编译DyNNs。该方法包括程序分析和程序转换技术,将动态神经网络转换为多个子神经网络。每个子神经网络独立编译,并且使用主机模块来模拟控制流。
  • results: 对多个动态神经网络进行编译,实现了100%的成功率。同时,生成的执行代码在一些场景下运行速度提高了1.12-20.21倍。
    Abstract DL compiler's primary function is to translate DNN programs written in high-level DL frameworks such as PyTorch and TensorFlow into portable executables. These executables can then be flexibly executed by the deployed host programs. However, existing DL compilers rely on a tracing mechanism, which involves feeding a runtime input to a neural network program and tracing the program execution paths to generate the computational graph necessary for compilation. Unfortunately, this mechanism falls short when dealing with modern dynamic neural networks (DyNNs) that possess varying computational graphs depending on the inputs. Consequently, conventional DL compilers struggle to accurately compile DyNNs into executable code. To address this limitation, we propose \tool, a general approach that enables any existing DL compiler to successfully compile DyNNs. \tool tackles the dynamic nature of DyNNs by introducing a compilation mechanism that redistributes the control and data flow of the original DNN programs during the compilation process. Specifically, \tool develops program analysis and program transformation techniques to convert a dynamic neural network into multiple sub-neural networks. Each sub-neural network is devoid of conditional statements and is compiled independently. Furthermore, \tool synthesizes a host module that models the control flow of the DyNNs and facilitates the invocation of the sub-neural networks. Our evaluation demonstrates the effectiveness of \tool, achieving a 100\% success rate in compiling all dynamic neural networks. Moreover, the compiled executables generated by \tool exhibit significantly improved performance, running between $1.12\times$ and $20.21\times$ faster than the original DyNNs executed on general-purpose DL frameworks.
    摘要 DL编译器的主要功能是将深度学习(DL)程序从高级框架such as PyTorch和TensorFlow转换为可移植的执行程序。这些执行程序可以在部署的主机程序上灵活执行。然而,现有的DL编译器都 rely on一种跟踪机制,即通过 feeding runtime输入到深度学习程序并跟踪程序执行路径来生成必要的计算图来进行编译。然而,这种机制在处理现代动态神经网络(DyNNs)时会遇到问题,因为DyNNs具有因输入而变化的计算图。因此,传统的DL编译器无法准确地编译DyNNs。为解决这个限制,我们提出了\tool,一种通用的方法,可以使任何现有的DL编译器成功编译DyNNs。\tool 处理动态神经网络的方式是通过在编译过程中重新分配控制和数据流的方式来转换动态神经网络。具体来说,\tool 开发了程序分析和程序转换技术,将动态神经网络转换为多个子神经网络。每个子神经网络都是无条件语句的,可以独立地编译。此外,\tool synthesizes主机模块,模拟动态神经网络的控制流,并且促进了子神经网络的邀请。我们的评估表明,\tool 的效果非常出色,所有的动态神经网络都成功编译。此外,由\tool 生成的执行程序表现出色,在特定的情况下,与原始 DyNNs 执行在通用DL框架上的性能相比,具有1.12倍至20.21倍的提升。

Intrinsically motivated graph exploration using network theories of human curiosity

  • paper_url: http://arxiv.org/abs/2307.04962
  • repo_url: https://github.com/spatank/GraphRL
  • paper_authors: Shubhankar P. Patankar, Mathieu Ouellet, Juan Cervino, Alejandro Ribeiro, Kieran A. Murphy, Dani S. Bassett
  • for: 这篇论文主要是为了解决如何在图structured数据中引导探索,而不需要外部奖励。
  • methods: 该论文提出了一种基于图神经网络学习的探索方法,使用了人类好奇的两种理论:信息差距理论和压缩进步理论。
  • results: 在多个synthetically生成的图上,训练过的代理人能够在更大的环境和更长的探索步骤上generalize,并且比较有效率地计算 topological feature。此外,好奇基于探索的推荐系统也比PageRank中心性更能预测人类行为,在MovieLens、Amazon Books和Wikispeedia等实际图 dataset上得到了证明。
    Abstract Intrinsically motivated exploration has proven useful for reinforcement learning, even without additional extrinsic rewards. When the environment is naturally represented as a graph, how to guide exploration best remains an open question. In this work, we propose a novel approach for exploring graph-structured data motivated by two theories of human curiosity: the information gap theory and the compression progress theory. The theories view curiosity as an intrinsic motivation to optimize for topological features of subgraphs induced by the visited nodes in the environment. We use these proposed features as rewards for graph neural-network-based reinforcement learning. On multiple classes of synthetically generated graphs, we find that trained agents generalize to larger environments and to longer exploratory walks than are seen during training. Our method computes more efficiently than the greedy evaluation of the relevant topological properties. The proposed intrinsic motivations bear particular relevance for recommender systems. We demonstrate that curiosity-based recommendations are more predictive of human behavior than PageRank centrality for several real-world graph datasets, including MovieLens, Amazon Books, and Wikispeedia.
    摘要 天生有探索的恩恵,即使没有外部奖励,也可以有效地促进学习。当环境自然表示为图时,如何最好引导探索仍然是一个开放问题。在这项工作中,我们提议一种新的方法,通过利用访问节点所induced的子图特征来驱动graph neural network基于的奖励学习。我们使用这些提议的特征作为奖励,以便促进探索。在多种 sintetically生成的图上,我们发现训练的代理人可以在更大的环境和更长的探索步骤上generalize。我们的方法更加高效,而不是仅仅是评估相关的topological特征。我们的内在动机具有特别 relevance для推荐系统。我们示示了curiosity-based推荐的更高predictive power than PageRank中心性for several real-world graph datasets, including MovieLens, Amazon Books, and Wikispeedia.

Reinforcement Learning with Non-Cumulative Objective

  • paper_url: http://arxiv.org/abs/2307.04957
  • repo_url: https://github.com/willtop/Reinforcement_Learning_With_Non-Cumulative_Objective
  • paper_authors: Wei Cui, Wei Yu
  • for: solving optimal control and reinforcement learning problems with non-cumulative objectives
  • methods: modifying existing algorithms using a generalized Bellman update rule and providing sufficient conditions for globally optimal convergence
  • results: demonstrating the idea experimentally on classical tasks and network routing problems, and achieving better performance compared to traditional methods
    Abstract In reinforcement learning, the objective is almost always defined as a \emph{cumulative} function over the rewards along the process. However, there are many optimal control and reinforcement learning problems in various application fields, especially in communications and networking, where the objectives are not naturally expressed as summations of the rewards. In this paper, we recognize the prevalence of non-cumulative objectives in various problems, and propose a modification to existing algorithms for optimizing such objectives. Specifically, we dive into the fundamental building block for many optimal control and reinforcement learning algorithms: the Bellman optimality equation. To optimize a non-cumulative objective, we replace the original summation operation in the Bellman update rule with a generalized operation corresponding to the objective. Furthermore, we provide sufficient conditions on the form of the generalized operation as well as assumptions on the Markov decision process under which the globally optimal convergence of the generalized Bellman updates can be guaranteed. We demonstrate the idea experimentally with the bottleneck objective, i.e., the objectives determined by the minimum reward along the process, on classical optimal control and reinforcement learning tasks, as well as on two network routing problems on maximizing the flow rates.
    摘要 在再增强学习中,目标通常是一个累积函数,表示过程中的奖励的总和。然而,在各种应用领域中,特别是在通信和网络领域,有许多优化控制和再增强学习问题,其目标不是自然地表示为奖励的总和。在这篇论文中,我们认为这种非累积目标在各种问题中很普遍,并提出修改现有算法来优化这类目标的方法。specifically,我们探究了许多优化控制和再增强学习算法的基本构建块:贝尔曼优化Equation。为了优化非累积目标,我们在贝尔曼更新规则中replace原始的总和操作,使用一种通用化的操作,与目标相对应。此外,我们还提供了 garantía global optimal convergence of the generalized Bellman updates可以 garantía的条件和假设,即Markov decision process的形式和assumptions。我们在经典优化控制和再增强学习任务上,以及两个网络流量最大化问题上,通过实验证明了这个想法。

Hybrid hidden Markov LSTM for short-term traffic flow prediction

  • paper_url: http://arxiv.org/abs/2307.04954
  • repo_url: None
  • paper_authors: Agnimitra Sengupta, Adway Das, S. Ilgin Guler
  • for: 预测交通流量
  • methods: 使用深度学习方法(如RNN和其变种)和hybrid hidden Markov-LSTM模型
  • results: 比使用传统方法(如Markov switching ARIMA和LSTM)表现出显著的性能提升
    Abstract Deep learning (DL) methods have outperformed parametric models such as historical average, ARIMA and variants in predicting traffic variables into short and near-short future, that are critical for traffic management. Specifically, recurrent neural network (RNN) and its variants (e.g. long short-term memory) are designed to retain long-term temporal correlations and therefore are suitable for modeling sequences. However, multi-regime models assume the traffic system to evolve through multiple states (say, free-flow, congestion in traffic) with distinct characteristics, and hence, separate models are trained to characterize the traffic dynamics within each regime. For instance, Markov-switching models with a hidden Markov model (HMM) for regime identification is capable of capturing complex dynamic patterns and non-stationarity. Interestingly, both HMM and LSTM can be used for modeling an observation sequence from a set of latent or, hidden state variables. In LSTM, the latent variable is computed in a deterministic manner from the current observation and the previous latent variable, while, in HMM, the set of latent variables is a Markov chain. Inspired by research in natural language processing, a hybrid hidden Markov-LSTM model that is capable of learning complementary features in traffic data is proposed for traffic flow prediction. Results indicate significant performance gains in using hybrid architecture compared to conventional methods such as Markov switching ARIMA and LSTM.
    摘要 深度学习(DL)方法已经超过参数模型,如历史平均值、ARIMA和其变种,在预测交通变量的短期和近期未来方面表现出色。特别是,循环神经网络(RNN)和其变种(例如长短期记忆)能够保留长期时间的相关性,因此适用于序列模elling。然而,多态模型假设交通系统会逐渐发展到多个状态(例如自由流、堵塞),每个状态都具有独特的特征,因此需要分别训练特定的模型来描述交通动态。例如,Markov switching模型可以使用隐藏Markov模型(HMM)来确定状态转换,可以捕捉复杂的动态模式和非站ARY。在LSTM中,隐藏变量是基于当前观察和上一个隐藏变量的决定方式计算的,而在HMM中,隐藏变量是一个Markov链。受自然语言处理研究的启发,一种hybrid隐藏Markov-LSTM模型被提出,可以学习交通数据中的补充特征。结果表明,使用hybrid体系可以与传统方法,如Markov switching ARIMA和LSTM,相比较显著提高预测性能。

Compact Twice Fusion Network for Edge Detection

  • paper_url: http://arxiv.org/abs/2307.04952
  • repo_url: https://github.com/li-yachuan/ctfn-pytorch-master
  • paper_authors: Yachuan Li, Zongmin Li, Xavier Soria P., Chaozhi Yang, Qian Xiao, Yun Bai, Hua Li, Xiangdong Wang
  • for: 本研究旨在提出一种可靠的多尺度特征融合网络,以便实现高精度Edge detection的目标。
  • methods: 该网络使用了两种轻量级多尺度特征融合模块:一个具有semantic enhancement module(SEM),可以利用粗略度特征中的semantic信息来引导细粒度特征的学习;另一个具有pseudo pixel-level weighting(PPW)模块,可以将多尺度特征的补做weighting,以便更好地融合多尺度特征。
  • results: 该方法在BSDS500、NYUDv2和BIPEDv2等三个 dataset上进行了评估,与state-of-the-art方法相比,CTFN达到了竞争性的准确率,而且具有较少的参数和计算成本。特别是,除了基础模型外,CTFN只需要0.1M的额外参数,这使得其计算成本降低到了60%以下。
    Abstract The significance of multi-scale features has been gradually recognized by the edge detection community. However, the fusion of multi-scale features increases the complexity of the model, which is not friendly to practical application. In this work, we propose a Compact Twice Fusion Network (CTFN) to fully integrate multi-scale features while maintaining the compactness of the model. CTFN includes two lightweight multi-scale feature fusion modules: a Semantic Enhancement Module (SEM) that can utilize the semantic information contained in coarse-scale features to guide the learning of fine-scale features, and a Pseudo Pixel-level Weighting (PPW) module that aggregate the complementary merits of multi-scale features by assigning weights to all features. Notwithstanding all this, the interference of texture noise makes the correct classification of some pixels still a challenge. For these hard samples, we propose a novel loss function, coined Dynamic Focal Loss, which reshapes the standard cross-entropy loss and dynamically adjusts the weights to correct the distribution of hard samples. We evaluate our method on three datasets, i.e., BSDS500, NYUDv2, and BIPEDv2. Compared with state-of-the-art methods, CTFN achieves competitive accuracy with less parameters and computational cost. Apart from the backbone, CTFN requires only 0.1M additional parameters, which reduces its computation cost to just 60% of other state-of-the-art methods. The codes are available at https://github.com/Li-yachuan/CTFN-pytorch-master.
    摘要 “多尺度特征的重要性逐渐被edge detection社区所认可。然而,多尺度特征的融合增加模型的复杂度,不易应用。在这个工作中,我们提出了一个Compact Twice Fusion Network(CTFN),可以充分融合多尺度特征,同时保持模型的简洁性。CTFN包括两个轻量级多尺度特征融合模组:一个Semantic Enhancement Module(SEM)可以利用粗细度特征中的 semantics信息来导引细细度特征的学习,以及一个Pseudo Pixel-level Weighting(PPW)模组可以将多尺度特征中的 complementary advantages聚集到所有特征上。尽管如此,隐藏在文本腐败中的杂质项目仍然是一个挑战。为了解决这个问题,我们提出了一个新的损失函数,即Dynamic Focal Loss,它可以重新定义标准十进法损失函数,并动态调整权重,以正确处理困难样本。我们在BSDS500、NYUDv2和BIPEDv2三个 dataset上评估了我们的方法,与现有的方法相比,CTFN实现了竞争的精度,仅需0.1M的额外参数,对应computational cost的减少为60%。代码可以在https://github.com/Li-yachuan/CTFN-pytorch-master上获取。”

DDGM: Solving inverse problems by Diffusive Denoising of Gradient-based Minimization

  • paper_url: http://arxiv.org/abs/2307.04946
  • repo_url: None
  • paper_authors: Kyle Luther, H. Sebastian Seung
  • for: This paper is written for solving the inverse problem of tomographic reconstruction from electron micrographs acquired at multiple tilt angles.
  • methods: The paper proposes a simpler approach that combines traditional gradient-based minimization of reconstruction error with denoising, using a convolutional neural network (CNN) as a prior. The method adds noise at each step and uses an iterative dynamics resembling a Langevin or diffusion process, with the level of added noise and the size of the denoising step decaying exponentially with time.
  • results: The paper shows that high accuracy can be achieved with as few as 50 denoising steps, and compares the proposed method with more complex diffusion methods such as DDRM and DPS. The results demonstrate that the proposed method is more accurate (as measured by MSE and SSIM) for the tomography problem, and can be applied to reconstruction of arbitrary-sized images.
    Abstract Inverse problems generally require a regularizer or prior for a good solution. A recent trend is to train a convolutional net to denoise images, and use this net as a prior when solving the inverse problem. Several proposals depend on a singular value decomposition of the forward operator, and several others backpropagate through the denoising net at runtime. Here we propose a simpler approach that combines the traditional gradient-based minimization of reconstruction error with denoising. Noise is also added at each step, so the iterative dynamics resembles a Langevin or diffusion process. Both the level of added noise and the size of the denoising step decay exponentially with time. We apply our method to the problem of tomographic reconstruction from electron micrographs acquired at multiple tilt angles. With empirical studies using simulated tilt views, we find parameter settings for our method that produce good results. We show that high accuracy can be achieved with as few as 50 denoising steps. We also compare with DDRM and DPS, more complex diffusion methods of the kinds mentioned above. These methods are less accurate (as measured by MSE and SSIM) for our tomography problem, even after the generation hyperparameters are optimized. Finally we extend our method to reconstruction of arbitrary-sized images and show results on 128 $\times$ 1568 pixel images
    摘要 “倒Problems通常需要一个正规化或先验的方法以获得好的解决方案。现在的趋势是使用卷积网来去噪图像,并将这个网络用作解决倒Problem的先验。一些提案靠摄Singular value decomposition of the forward operator,另一些透过在Runtime backpropagating through the denoising net。我们提出了一种更简单的方法,它结合了传统的梯度基于的最小化重建错误和去噪。噪音也会在每步加入,因此迭代运算类似于兰格温或漫游过程。噪音水平和去噪步骤的减少阶段落逐渐呈指数衰减。我们将方法应用到电子显微镜中的 Tomographic Reconstruction问题上。通过实验使用模拟的倾斜角度,我们获得了适当的参数设定,并证明高精度可以在50个去噪步骤中获得。我们还与DDRM和DPS等更复杂的演化方法进行比较,这些方法在我们的Tomography问题上较低的Mean Squared Error和Structural Similarity Index Measure。最后,我们将方法扩展到任意大小的图像重建问题上,并在128 $\times$ 1568像素图像上显示结果。”

Benchmarking Algorithms for Federated Domain Generalization

  • paper_url: http://arxiv.org/abs/2307.04942
  • repo_url: https://github.com/inouye-lab/feddg_benchmark
  • paper_authors: Ruqi Bai, Saurabh Bagchi, David I. Inouye
  • for: This paper is written for evaluating the performance of Federated Domain Generalization (FedDG) methods, which is a new challenge in Federated Learning (FL) that involves dealing with diverse client datasets.
  • methods: The paper proposes a benchmark methodology for FedDG, which includes controlling the number and heterogeneity of clients and providing metrics for dataset difficulty. The authors also evaluate 13 FedDG methods, including centralized DG methods adapted to the FL context, FL methods that handle client heterogeneity, and methods designed specifically for FedDG.
  • results: The paper shows that despite some progress, there remain significant performance gaps in FedDG, particularly when evaluating with a large number of clients, high client heterogeneity, or more realistic datasets. The authors also find that the performance of FedDG methods can be improved by using a larger number of clients and more diverse datasets.Here’s the simplified Chinese text for the three key points:
  • for: 这篇论文是为评估 Federated Domain Generalization(FedDG)方法的性能,这是 Federated Learning(FL)中新的挑战,它涉及处理多个客户端数据集的多样性。
  • methods: 论文提出了一种 FedDG 评估方法方法,包括控制客户端数量和多样性,以及提供数据集困难度指标。作者还评估了 13 种 FedDG 方法,包括中央 DG 方法在 FL 上的修改,FL 方法可以处理客户端多样性,以及特制 для FedDG 的方法。
  • results: 论文显示,尽管有一些进步,但 FedDG 中的性能仍然存在显著的性能差距,特别是在评估多个客户端、高客户端多样性或更真实的数据集时。作者们还发现,通过使用更多的客户端和更多的多样的数据集,FedDG 方法的性能可以得到改进。
    Abstract While prior domain generalization (DG) benchmarks consider train-test dataset heterogeneity, we evaluate Federated DG which introduces federated learning (FL) specific challenges. Additionally, we explore domain-based heterogeneity in clients' local datasets - a realistic Federated DG scenario. Prior Federated DG evaluations are limited in terms of the number or heterogeneity of clients and dataset diversity. To address this gap, we propose an Federated DG benchmark methodology that enables control of the number and heterogeneity of clients and provides metrics for dataset difficulty. We then apply our methodology to evaluate 13 Federated DG methods, which include centralized DG methods adapted to the FL context, FL methods that handle client heterogeneity, and methods designed specifically for Federated DG. Our results suggest that despite some progress, there remain significant performance gaps in Federated DG particularly when evaluating with a large number of clients, high client heterogeneity, or more realistic datasets. Please check our extendable benchmark code here: https://github.com/inouye-lab/FedDG_Benchmark.
    摘要 “对于联边学习(Federated Learning,FL)中的网络统一化(Domain Generalization,DG),我们评估了联边网络统一化(Federated DG),并将联边学习特有的挑战纳入考虑。另外,我们还探索了客户端本地数据中的领域差异,这是现实中联边学习中的常见情况。过去的联边DG评估仅仅具有一些客户和数据的限制,无法反映现实中联边学习的多样性和问题。为了缓解这个问题,我们提出了一个联边DG评估方法,可以控制客户和数据的数量和多样性,并提供了评估数据的困难度的指标。我们运用这个方法评估了13种联边DG方法,包括中央化DG方法在FL上的适应,FL方法可以处理客户端的多样性,以及特地设计 для联边DG的方法。我们的结果显示,虽然有一些进步,但在处理大量客户、高客户多样性或更真实的数据时仍然存在较大的性能差距。您可以在以下链接中获取我们的可extendable benchmark代码:https://github.com/inouye-lab/FedDG_Benchmark。”

Impact of Feature Encoding on Malware Classification Explainability

  • paper_url: http://arxiv.org/abs/2307.05614
  • repo_url: None
  • paper_authors: Elyes Manai, Mohamed Mejri, Jaouhar Fattahi
  • for: 这个论文研究了对于可解释人工智能(XAI)算法的特征编码技术的影响。
  • methods: 使用一个恶意软件分类 dataset,我们训练了一个 XGBoost 模型,并比较了两种特征编码方法:标签编码(LE)和一个热点编码(OHE)。
  • results: 我们发现,使用 OHE 相比 LE,表现只有微不足。但是,OHE 提供了更详细的解释,使得更深入的探究详细信息。此外,我们发现 OHE 使得解释文件更小,降低了人类分析员的分析时间。这些发现强调了考虑特征编码技术在 XAI 研究中的重要性,并建议进一步探索采用其他编码方法和创新视觉方法。
    Abstract This paper investigates the impact of feature encoding techniques on the explainability of XAI (Explainable Artificial Intelligence) algorithms. Using a malware classification dataset, we trained an XGBoost model and compared the performance of two feature encoding methods: Label Encoding (LE) and One Hot Encoding (OHE). Our findings reveal a marginal performance loss when using OHE instead of LE. However, the more detailed explanations provided by OHE compensated for this loss. We observed that OHE enables deeper exploration of details in both global and local contexts, facilitating more comprehensive answers. Additionally, we observed that using OHE resulted in smaller explanation files and reduced analysis time for human analysts. These findings emphasize the significance of considering feature encoding techniques in XAI research and suggest potential for further exploration by incorporating additional encoding methods and innovative visualization approaches.
    摘要

Towards Fair Graph Neural Networks via Graph Counterfactual

  • paper_url: http://arxiv.org/abs/2307.04937
  • repo_url: https://github.com/timelovercc/caf-gnn
  • paper_authors: Zhimeng Guo, Jialiang Li, Teng Xiao, Yao Ma, Suhang Wang
  • for: 本文针对 Graph Neural Networks (GNNs) 的偏见问题进行研究,尤其是 GNNs 在训练数据中继承和增强偏见的问题。
  • methods: 本文提出了一个名为 CAF 的新框架,它可以从训练数据中选择合理的 counterfactual,以避免非现实的 counterfactual,并将选择的 counterfactual 用于学习公平的node表示。
  • results: 实验结果显示,CAF 可以对 synthetic 和 real-world 数据进行优化,并且可以增强 GNNs 的公平性。
    Abstract Graph neural networks have shown great ability in representation (GNNs) learning on graphs, facilitating various tasks. Despite their great performance in modeling graphs, recent works show that GNNs tend to inherit and amplify the bias from training data, causing concerns of the adoption of GNNs in high-stake scenarios. Hence, many efforts have been taken for fairness-aware GNNs. However, most existing fair GNNs learn fair node representations by adopting statistical fairness notions, which may fail to alleviate bias in the presence of statistical anomalies. Motivated by causal theory, there are several attempts utilizing graph counterfactual fairness to mitigate root causes of unfairness. However, these methods suffer from non-realistic counterfactuals obtained by perturbation or generation. In this paper, we take a causal view on fair graph learning problem. Guided by the casual analysis, we propose a novel framework CAF, which can select counterfactuals from training data to avoid non-realistic counterfactuals and adopt selected counterfactuals to learn fair node representations for node classification task. Extensive experiments on synthetic and real-world datasets show the effectiveness of CAF. Our code is available at https://github.com/TimeLovercc/CAF-GNN.
    摘要 GRAPH Neural Networks (GNNs) 有出色的能力在图上学习 Representation, 促进多种任务。 despite their great performance in modeling graphs, recent works show that GNNs tend to inherit and amplify the bias from training data, causing concerns about the adoption of GNNs in high-stakes scenarios. Therefore, many efforts have been made for fairness-aware GNNs. However, most existing fair GNNs learn fair node representations by adopting statistical fairness notions, which may fail to alleviate bias in the presence of statistical anomalies. Motivated by causal theory, there are several attempts using graph counterfactual fairness to mitigate the root causes of unfairness. However, these methods suffer from non-realistic counterfactuals obtained by perturbation or generation.In this paper, we take a causal view on the fair graph learning problem. Guided by the causal analysis, we propose a novel framework CAF, which can select counterfactuals from training data to avoid non-realistic counterfactuals and adopt selected counterfactuals to learn fair node representations for node classification tasks. Extensive experiments on synthetic and real-world datasets show the effectiveness of CAF. Our code is available at .Here's the translation in Simplified Chinese:GRNNs 有出色的能力在图上学习 Representation, 促进多种任务。 despite their great performance in modeling graphs, recent works show that GRNNs tend to inherit and amplify the bias from training data, causing concerns about the adoption of GRNNs in high-stakes scenarios. Therefore, many efforts have been made for fairness-aware GRNNs. However, most existing fair GRNNs learn fair node representations by adopting statistical fairness notions, which may fail to alleviate bias in the presence of statistical anomalies. Motivated by causal theory, there are several attempts using graph counterfactual fairness to mitigate the root causes of unfairness. However, these methods suffer from non-realistic counterfactuals obtained by perturbation or generation.In this paper, we take a causal view on the fair graph learning problem. Guided by the causal analysis, we propose a novel framework CAF, which can select counterfactuals from training data to avoid non-realistic counterfactuals and adopt selected counterfactuals to learn fair node representations for node classification tasks. Extensive experiments on synthetic and real-world datasets show the effectiveness of CAF. Our code is available at .

Substance or Style: What Does Your Image Embedding Know?

  • paper_url: http://arxiv.org/abs/2307.05610
  • repo_url: None
  • paper_authors: Cyrus Rashtchian, Charles Herrmann, Chun-Sung Ferng, Ayan Chakrabarti, Dilip Krishnan, Deqing Sun, Da-Cheng Juan, Andrew Tomkins
  • for: 这个论文是为了研究图像基础模型中的非 semantic信息,以及这些基础模型在不同下游任务中的表现。
  • methods: 作者使用了一系列的变换预测任务来测试图像基础模型中的非 semantic信息,包括图像风格、质量和自然/人工变换等多个轴。
  • results: 研究发现,六种图像基础模型(包括SimCLR)中的embeddings含有许多非 semantic信息,可以识别多达数十种变换。此外,作者还发现,使用图像文本模型(CLIP和ALIGN)可以更好地识别新的样式转移示例,而使用面具模型(CAN和MAE)则更适合隐藏变换任务。
    Abstract Probes are small networks that predict properties of underlying data from embeddings, and they provide a targeted, effective way to illuminate the information contained in embeddings. While analysis through the use of probes has become standard in NLP, there has been much less exploration in vision. Image foundation models have primarily been evaluated for semantic content. Better understanding the non-semantic information in popular embeddings (e.g., MAE, SimCLR, or CLIP) will shed new light both on the training algorithms and on the uses for these foundation models. We design a systematic transformation prediction task and measure the visual content of embeddings along many axes, including image style, quality, and a range of natural and artificial transformations. Surprisingly, six embeddings (including SimCLR) encode enough non-semantic information to identify dozens of transformations. We also consider a generalization task, where we group similar transformations and hold out several for testing. We find that image-text models (CLIP and ALIGN) are better at recognizing new examples of style transfer than masking-based models (CAN and MAE). Overall, our results suggest that the choice of pre-training algorithm impacts the types of information in the embedding, and certain models are better than others for non-semantic downstream tasks.
    摘要 probes 是小型网络,可以预测嵌入数据中的性质,并提供一种targeted、有效的方式来照明嵌入数据中的信息。在 NLP 领域中, probes 已经成为标准的分析方法,而在视觉领域中, however, 只有很少的探索。 popular 的嵌入模型(如 MAE、SimCLR 和 CLIP)主要被评估为semantic content,但是更深入的理解这些嵌入模型中的非 semantic information(例如图像风格、质量等)会对训练算法和这些基础模型的应用有新的灯光。我们设计了一个系统性的变换预测任务,并测量嵌入中的视觉内容 along 多个轴,包括图像风格、质量和一些自然和人工变换。结果显示, six 个嵌入(包括 SimCLR)encode enough non-semantic information,可以识别多达 dozen 种变换。我们还考虑一个总结任务,将类似的变换分组,并将一些用作测试集。我们发现, image-text 模型(CLIP 和 ALIGN)在新的样式转移例子中表现更好,而 masking-based 模型(CAN 和 MAE)则更适合掩码转换。总之,我们的结果表明,选择预训练算法的选择会影响嵌入中的信息类型,而certain 模型在非 semantic 下游任务中表现更好。

Probabilistic Counterexample Guidance for Safer Reinforcement Learning (Extended Version)

  • paper_url: http://arxiv.org/abs/2307.04927
  • repo_url: None
  • paper_authors: Xiaotong Ji, Antonio Filieri
  • for: 本研究旨在解决RL在安全关键场景中的限制,因为失败可能导致高成本。
  • methods: 我们使用Counterexample-based safe exploration方法,把批处理和抽象模型结合在一起,以便在不同的环境中快速地训练agent。
  • results: 我们的方法可以在静止训练和在线探索中减少安全违反的风险,并且可以与QL和DQN标准算法和先前的相关工作相比,提高了安全性和总奖励的性能。
    Abstract Safe exploration aims at addressing the limitations of Reinforcement Learning (RL) in safety-critical scenarios, where failures during trial-and-error learning may incur high costs. Several methods exist to incorporate external knowledge or to use proximal sensor data to limit the exploration of unsafe states. However, reducing exploration risks in unknown environments, where an agent must discover safety threats during exploration, remains challenging. In this paper, we target the problem of safe exploration by guiding the training with counterexamples of the safety requirement. Our method abstracts both continuous and discrete state-space systems into compact abstract models representing the safety-relevant knowledge acquired by the agent during exploration. We then exploit probabilistic counterexample generation to construct minimal simulation submodels eliciting safety requirement violations, where the agent can efficiently train offline to refine its policy towards minimising the risk of safety violations during the subsequent online exploration. We demonstrate our method's effectiveness in reducing safety violations during online exploration in preliminary experiments by an average of 40.3% compared with QL and DQN standard algorithms and 29.1% compared with previous related work, while achieving comparable cumulative rewards with respect to unrestricted exploration and alternative approaches.
    摘要 安全探索targets addressing the limitations of reinforcement learning (RL) in safety-critical scenarios, where failures during trial-and-error learning may incur high costs. Several methods exist to incorporate external knowledge or to use proximal sensor data to limit the exploration of unsafe states. However, reducing exploration risks in unknown environments, where an agent must discover safety threats during exploration, remains challenging. In this paper, we target the problem of safe exploration by guiding the training with counterexamples of the safety requirement. Our method abstracts both continuous and discrete state-space systems into compact abstract models representing the safety-relevant knowledge acquired by the agent during exploration. We then exploit probabilistic counterexample generation to construct minimal simulation submodels eliciting safety requirement violations, where the agent can efficiently train offline to refine its policy towards minimising the risk of safety violations during the subsequent online exploration. We demonstrate our method's effectiveness in reducing safety violations during online exploration in preliminary experiments by an average of 40.3% compared with QL and DQN standard algorithms and 29.1% compared with previous related work, while achieving comparable cumulative rewards with respect to unrestricted exploration and alternative approaches.

SimpleMTOD: A Simple Language Model for Multimodal Task-Oriented Dialogue with Symbolic Scene Representation

  • paper_url: http://arxiv.org/abs/2307.04907
  • repo_url: None
  • paper_authors: Bhathiya Hemanthage, Christian Dondrup, Phil Bartie, Oliver Lemon
  • for: 这篇论文主要是为了提出一种简单的语言模型,用于处理多modal任务对话。
  • methods: 该模型基于大规模的 transformer 框架,并利用了 transfer learning 技术,从 pre-trained GPT-2 中提取了知识。为了捕捉视觉场景的 semantics,该模型引入了本地和 де-本地 токен。
  • results: 该模型在 Response Generation 子任务上达到了 state-of-the-art BLEU 分数 (0.327),并在其他多modal 子任务中表现良好,包括 Disambiguation、Coreference Resolution 和 Dialog State Tracking。这是 despite 该模型采取了一种 minimalist 的方法来提取视觉(以及非视觉)信息。
    Abstract SimpleMTOD is a simple language model which recasts several sub-tasks in multimodal task-oriented dialogues as sequence prediction tasks. SimpleMTOD is built on a large-scale transformer-based auto-regressive architecture, which has already proven to be successful in uni-modal task-oriented dialogues, and effectively leverages transfer learning from pre-trained GPT-2. In-order to capture the semantics of visual scenes, we introduce both local and de-localized tokens for objects within a scene. De-localized tokens represent the type of an object rather than the specific object itself and so possess a consistent meaning across the dataset. SimpleMTOD achieves a state-of-the-art BLEU score (0.327) in the Response Generation sub-task of the SIMMC 2.0 test-std dataset while performing on par in other multimodal sub-tasks: Disambiguation, Coreference Resolution, and Dialog State Tracking. This is despite taking a minimalist approach for extracting visual (and non-visual) information. In addition the model does not rely on task-specific architectural changes such as classification heads.
    摘要 SimpleMTOD 是一个简单的语言模型,它将多modal task-oriented对话中的多个子任务转化为序列预测任务。 SimpleMTOD 基于大规模的 transformer 自动生成架构,这种架构在单modal task-oriented对话中已经证明成功,并且有效地利用了预训练的 GPT-2 的转移学习。为了捕捉视觉场景的 semantics,我们引入了场景中对象的本地和非本地符号。非本地符号表示对象的类型而不是特定的对象,因此具有 dataset 中的一致性。 SimpleMTOD 在 SIMMC 2.0 测试标准数据集中的 Response Generation 子任务中 achieved state-of-the-art BLEU 分数 (0.327),并在其他多modal 子任务中(歧义解决、核心引用解决和对话状态跟踪)表现良好,尽管使用了 minimalist 的方法来提取视觉(和非视觉)信息。此外,模型不需要任务特定的建筑性Changes,如分类头。

FedYolo: Augmenting Federated Learning with Pretrained Transformers

  • paper_url: http://arxiv.org/abs/2307.04905
  • repo_url: None
  • paper_authors: Xuechen Zhang, Mingchen Li, Xiangyu Chang, Jiasi Chen, Amit K. Roy-Chowdhury, Ananda Theertha Suresh, Samet Oymak
  • for: 这个论文的目的是探讨如何使用预训练 трансформа器(PTF)来实现在移动设备和边缘设备上进行学习,以满足多样化的客户端目标和有限的不同数据的学习。
  • methods: 这个论文使用了联合学习(Federated Learning)和预训练 transformer(PTF)来实现在移动设备和边缘设备上进行学习,并 investigate了模型大小和模块化的影响。
  • results: 研究发现,可以通过增大模型规模和使用模块化来提高设备和网络约束下的学习效果,同时减少了通信轮次数。此外,模块化还可以使得客户端可以同时解决多个无关的任务,而不会出现归化问题。这些发现 inspirited a new federated learning approach called “You Only Load Once” (FedYolo), clients可以通过通信有效的模块来更新模型,而不需要每次更新整个模型。
    Abstract The growth and diversity of machine learning applications motivate a rethinking of learning with mobile and edge devices. How can we address diverse client goals and learn with scarce heterogeneous data? While federated learning aims to address these issues, it has challenges hindering a unified solution. Large transformer models have been shown to work across a variety of tasks achieving remarkable few-shot adaptation. This raises the question: Can clients use a single general-purpose model, rather than custom models for each task, while obeying device and network constraints? In this work, we investigate pretrained transformers (PTF) to achieve these on-device learning goals and thoroughly explore the roles of model size and modularity, where the latter refers to adaptation through modules such as prompts or adapters. Focusing on federated learning, we demonstrate that: (1) Larger scale shrinks the accuracy gaps between alternative approaches and improves heterogeneity robustness. Scale allows clients to run more local SGD epochs which can significantly reduce the number of communication rounds. At the extreme, clients can achieve respectable accuracy locally highlighting the potential of fully-local learning. (2) Modularity, by design, enables $>$100$\times$ less communication in bits. Surprisingly, it also boosts the generalization capability of local adaptation methods and the robustness of smaller PTFs. Finally, it enables clients to solve multiple unrelated tasks simultaneously using a single PTF, whereas full updates are prone to catastrophic forgetting. These insights on scale and modularity motivate a new federated learning approach we call "You Only Load Once" (FedYolo): The clients load a full PTF model once and all future updates are accomplished through communication-efficient modules with limited catastrophic-forgetting, where each task is assigned to its own module.
    摘要 machine learning应用的增长和多样性,需要我们重新思考手持设备和边缘设备上的学习。如何处理多样化客户端目标和缺乏多样化数据的学习呢?联邦学习旨在解决这些问题,但它存在一些阻碍统一解决方案的挑战。大型转换器模型已经在多种任务上显示出惊人的几次适应性,这引起了问题:客户可以使用单一通用模型来满足每个任务,而不是为每个任务创建特定的模型吗?在这种情况下,我们 investigate pretrained transformers (PTF) 以实现这些在设备上学习的目标,并且全面探索模型大小和模块化的角色。我们将注重联邦学习,并证明:1. 规模的扩大可以降低客户端和网络约束下的准确性差距,同时提高多样性的鲁棒性。通过在本地进行更多的SGD迭代,客户可以减少通信轮次数。在极限情况下,客户可以在本地达到尽可能高的准确性,这 highlights 了本地学习的潜在可能性。2. 模块化设计可以减少通信量,并且意外地提高了本地适应方法的总体化能力和小型PTF的鲁棒性。此外,它还允许客户同时解决多个无关的任务,而不是通过全部更新而导致彻底忘记。这些策略对规模和模块化的影响,驱动我们提出一种新的联邦学习方法,我们称之为“只上载一次”(FedYolo)。在这种方法中,客户在第一次上载一个完整的PTF模型后,所有的未来更新都可以通过通信效率低的模块来完成,而不会导致彻底忘记。

Fast dynamic time warping and clustering in C++

  • paper_url: http://arxiv.org/abs/2307.04904
  • repo_url: None
  • paper_authors: Volkan Kumtepeli, Rebecca Perriment, David A. Howey
  • for: Computationally efficient dynamic time warping (DTW) and clustering of time-series data.
  • methods: Dynamic programming and mixed-integer programming (MIP) for DTW and clustering, with task-level parallelization for efficiency.
  • results: 33% faster than the next fastest option on average, with a 64% speedup for larger datasets (over 1000 time series). The MIP clustering is most effective for small numbers of longer time series.
    Abstract We present an approach for computationally efficient dynamic time warping (DTW) and clustering of time-series data. The method frames the dynamic warping of time series datasets as an optimisation problem solved using dynamic programming, and then clusters time series data by solving a second optimisation problem using mixed-integer programming (MIP). There is also an option to use k-medoids clustering for increased speed, when a certificate for global optimality is not essential. The improved efficiency of our approach is due to task-level parallelisation of the clustering alongside DTW. Our approach was tested using the UCR Time Series Archive, and was found to be, on average, 33% faster than the next fastest option when using the same clustering method. This increases to 64% faster when considering only larger datasets (with more than 1000 time series). The MIP clustering is most effective on small numbers of longer time series, because the DTW computation is faster than other approaches, but the clustering problem becomes increasingly computationally expensive as the number of time series to be clustered increases.
    摘要 我们提出了一种 computationally efficient 的动态时间扭曲(DTW)和时间序列数据 clustering 方法。该方法将动态扭曲时间序列数据集 frames 为一个优化问题,使用动态Programming 解决,然后使用杂Integer Programming(MIP)解决第二个优化问题,并且可以选择使用 k-medoids clustering 以提高速度。我们的方法在UCR Time Series Archive 上进行测试,与其他相同 clustering 方法相比,平均提高了33%的速度,对于更大的数据集(包括1000个时间序列),则提高到64%。MIP clustering 对于少量 longer 时间序列表示最高效,因为 DTW 计算 faster than other approaches,但是 clustering 问题的计算成本随着时间序列数据集的数量增加。

  • paper_url: http://arxiv.org/abs/2307.05603
  • repo_url: https://github.com/fatemehab/polis
  • paper_authors: Fatemeh Abdollahi, Saqib Ameen, Matthew E. Taylor, Levi H. S. Lelis
  • for: 这篇论文目标是提高现有程序的性能,通过利用程序的结构和已有的猛烈合成算法。
  • methods: 该方法使用了本地搜索,不断地改进单个程序行,以提高程序的性能。
  • results: 经过27名参与者的用户研究,发现POLIS可以在两个单机游戏中(即月球降落和高速公路)提高参与者的程序性能。此外,对现有Stack Overflow代码进行了一个证明示例,表明POLIS在实际问题中也有应用价值。
    Abstract This paper introduces a local search method for improving an existing program with respect to a measurable objective. Program Optimization with Locally Improving Search (POLIS) exploits the structure of a program, defined by its lines. POLIS improves a single line of the program while keeping the remaining lines fixed, using existing brute-force synthesis algorithms, and continues iterating until it is unable to improve the program's performance. POLIS was evaluated with a 27-person user study, where participants wrote programs attempting to maximize the score of two single-agent games: Lunar Lander and Highway. POLIS was able to substantially improve the participants' programs with respect to the game scores. A proof-of-concept demonstration on existing Stack Overflow code measures applicability in real-world problems. These results suggest that POLIS could be used as a helpful programming assistant for programming problems with measurable objectives.
    摘要 The effectiveness of POLIS was evaluated through a 27-person user study, where participants wrote programs aiming to maximize the scores of two single-agent games: Lunar Lander and Highway. The results showed that POLIS was able to significantly improve the participants' programs with respect to game scores. Additionally, a proof-of-concept demonstration on existing Stack Overflow code demonstrated the applicability of POLIS in real-world problems. These findings suggest that POLIS could be a useful programming assistant for programming problems with measurable objectives.Translated into Simplified Chinese:这篇论文介绍了一种基于可测量目标的地方搜索方法,用于改进现有的程序。该方法称为程序优化与本地改进搜索(POLIS),利用程序的线程结构,在保持其余线程固定的情况下,使用现有的毫干搜索算法,对程序中的单个线程进行改进,并继续迭代直到无法提高程序的性能。为证明POLIS的有用性,该论文进行了27名用户参与的实验,参与者需要通过设计两个单机游戏的分数来尝试提高他们的程序:月球降落和高速公路。结果表明,POLIS能够有效地改进参与者们的程序,增加分数。此外,对现有的Stack Overflow代码进行了一个证明性示例,以证明POLIS在实际问题中的应用可行性。这些结果表明,POLIS可能成为程序问题中的有用编程助手。

Learning to Solve Constraint Satisfaction Problems with Recurrent Transformer

  • paper_url: http://arxiv.org/abs/2307.04895
  • repo_url: https://github.com/azreasoners/recurrent_transformer
  • paper_authors: Zhun Yang, Adam Ishay, Joohyung Lee
  • for: 解决干式约束问题 (Constraint Satisfaction Problems, CSPs)
  • methods: 使用Transformer扩展了回归,结合Visual input和逻辑知识进行端到端学习
  • results: 提出了一种新的方法,可以在端到端学习中解决CSPs,并且可以使用逻辑知识进行 semi-supervised learning 和 sample-efficient learning
    Abstract Constraint satisfaction problems (CSPs) are about finding values of variables that satisfy the given constraints. We show that Transformer extended with recurrence is a viable approach to learning to solve CSPs in an end-to-end manner, having clear advantages over state-of-the-art methods such as Graph Neural Networks, SATNet, and some neuro-symbolic models. With the ability of Transformer to handle visual input, the proposed Recurrent Transformer can straightforwardly be applied to visual constraint reasoning problems while successfully addressing the symbol grounding problem. We also show how to leverage deductive knowledge of discrete constraints in the Transformer's inductive learning to achieve sample-efficient learning and semi-supervised learning for CSPs.
    摘要

Choosing Well Your Opponents: How to Guide the Synthesis of Programmatic Strategies

  • paper_url: http://arxiv.org/abs/2307.04893
  • repo_url: https://github.com/rubensolv/locallearnerijcai
  • paper_authors: Rubens O. Moraes, David S. Aleixo, Lucas N. Ferreira, Levi H. S. Lelis
  • for: 这篇论文旨在提供一种用于指导搜索算法的参考策略集,以提高在两个玩家零风险游戏中搜索策略的效果。
  • methods: 本论文提出了一种名为本地学习(2L)算法,该算法可以活动选择一组参考策略,以提高搜索信号。
  • results: 实验表明,使用2L算法可以比较IBR、FP和DO算法更好地学习参考策略,并在Synthesizing策略中提高搜索效果。此外,我们还通过模拟一场MicroRTS比赛,发现使用2L算法 synthesizer 可以比较两个最近的MicroRTS比赛的赢家,这些赢家都是由人工程师编写的程序策略。
    Abstract This paper introduces Local Learner (2L), an algorithm for providing a set of reference strategies to guide the search for programmatic strategies in two-player zero-sum games. Previous learning algorithms, such as Iterated Best Response (IBR), Fictitious Play (FP), and Double-Oracle (DO), can be computationally expensive or miss important information for guiding search algorithms. 2L actively selects a set of reference strategies to improve the search signal. We empirically demonstrate the advantages of our approach while guiding a local search algorithm for synthesizing strategies in three games, including MicroRTS, a challenging real-time strategy game. Results show that 2L learns reference strategies that provide a stronger search signal than IBR, FP, and DO. We also simulate a tournament of MicroRTS, where a synthesizer using 2L outperformed the winners of the two latest MicroRTS competitions, which were programmatic strategies written by human programmers.
    摘要 (Simplified Chinese translation)这篇论文介绍了本地学习器(2L),一种算法用于提供指导搜索算法的参考策略集。先前的学习算法,如趋同最佳回应(IBR)、虚拟游戏(FP)和双重oracle(DO),可能具有计算成本高或搜索信号不准确的缺陷。2L活动选择一组参考策略,以提高搜索信号。我们通过对三个游戏进行实验,包括实时战略游戏MicroRTS,证明了2L所学习的参考策略比IBR、FP和DO更强的搜索信号。此外,我们还在MicroRTS的赛事中使用2L进行合成策略,并在赛事中击败了最近两届MicroRTS的赛事冠军,这些赛事冠军是由人工程师编写的程序策略。

Unsupervised Domain Adaptation with Deep Neural-Network

  • paper_url: http://arxiv.org/abs/2307.05601
  • repo_url: https://github.com/jetwev/domain-adaptation
  • paper_authors: Artem Bituitskii
  • for: 这篇论文为了解决不监督领域适应问题提供分析现有方法、推出新方法,并在不同领域下进行视觉识别任务的改进。
  • methods: 这篇论文使用了现有方法的分析和一种新的方法,用于解决不同领域下的视觉识别任务。
  • results: 这篇论文的结果预示了适应领域下的视觉识别任务可以通过提高现有方法的性能来改进。
    Abstract This report contributes to the field of unsupervised domain adaptation by providing an analysis of existing methods, introducing a new approach, and demonstrating the potential for improving visual recognition tasks across different domains. The results of this study open up opportunities for further study and development of advanced methods in the field of domain adaptation.
    摘要 这份报告对不监督领域适应进行了分析,提出了一种新的方法,并证明了在不同领域中进行视觉识别任务的可能性。这些研究结果为领域适应领域的进一步研究开创了新的可能性。Here's a breakdown of the translation:* 这份报告 (zhè fā bào gāo) - This report* 对 (duì) - To* 不监督 (bù jiān dǎo) - Unsupervised* 领域 (lǐng yù) - Domain* 适应 (shì yìng) - Adaptation* 进行 (jìn cè) - To perform* 分析 (fēn xī) - Analysis* 提出 (tím zhè) - To propose* 一种 (yī zhǒng) - A new* 方法 (fāng fá) - Method* 并 (bìng) - And* 证明 (zhèng míng) - To demonstrate* 在 (zhè) - In* 不同 (bù tiěng) - Different* 领域 (lǐng yù) - Domains* 进行 (jìn cè) - To perform* 视觉识别 (shì jìng bìa) - Visual recognition* 任务 (réng zhì) - Tasks* 可能性 (kě néng xìng) - Possibilities* 开创 (kāi chuàng) - New possibilitiesI hope this helps! Let me know if you have any further questions.

Accelerated Discovery of Machine-Learned Symmetries: Deriving the Exceptional Lie Groups G2, F4 and E6

  • paper_url: http://arxiv.org/abs/2307.04891
  • repo_url: None
  • paper_authors: Roy T. Forestano, Konstantin T. Matchev, Katia Matcheva, Alexander Roman, Eyup B. Unlu, Sarunas Verner
  • for: 这些研究用超vised深度学习来找到保持数据标签的连续对称变换,以及对应的对称生成器的代数。
  • methods: 这封信使用两种改进的算法来加速对称变换的发现,并用 sparse 形式表示发现的生成器。
  • results: 这些新算法对标准方法的性能进行了比较,并在对称群 $G_2$, $F_4$, $E_6$ 中发现了完整的生成器集。
    Abstract Recent work has applied supervised deep learning to derive continuous symmetry transformations that preserve the data labels and to obtain the corresponding algebras of symmetry generators. This letter introduces two improved algorithms that significantly speed up the discovery of these symmetry transformations. The new methods are demonstrated by deriving the complete set of generators for the unitary groups U(n) and the exceptional Lie groups $G_2$, $F_4$, and $E_6$. A third post-processing algorithm renders the found generators in sparse form. We benchmark the performance improvement of the new algorithms relative to the standard approach. Given the significant complexity of the exceptional Lie groups, our results demonstrate that this machine-learning method for discovering symmetries is completely general and can be applied to a wide variety of labeled datasets.
    摘要 最近的工作已经使用监督深度学习来找到保持数据标签的连续 симметry 变换,并获得相应的symmetry生成器的代数。这封信件介绍了两种改进的算法,可以很快速地找到这些symmetry变换。新方法在U(n)和G_2、F_4、E_6等特例 Lie grupos中找到了完整的生成器集。此外,我们还提出了一种后处理算法,将找到的生成器变换成简约的形式。我们对标准方法和新方法进行了性能比较,结果显示,对特例 Lie grupos来说,这种机器学习方法可以完全应用于各种标注数据集。

Measuring and Mitigating Interference in Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2307.04887
  • repo_url: None
  • paper_authors: Vincent Liu, Han Wang, Ruo Yu Tao, Khurram Javed, Adam White, Martha White
  • for: 本研究旨在提供一种定义和衡量值基的强化学习方法中的干扰量的方法,以及一种测试这种干扰量和控制性能的方法。
  • methods: 本研究使用了 fitted Q-iteration 和 DQN 等值基强化学习方法,并提出了一种新的干扰量测试方法。
  • results: 研究发现,该测试方法与控制性能的变化高度相关,并且可以用来评估不同网络架构和学习算法对干扰量的影响。此外,研究还提出了一种名为 “online-aware” 的算法,可以减少干扰量,并且在一些经典控制环境中提高了稳定性和性能。
    Abstract Catastrophic interference is common in many network-based learning systems, and many proposals exist for mitigating it. Before overcoming interference we must understand it better. In this work, we provide a definition and novel measure of interference for value-based reinforcement learning methods such as Fitted Q-Iteration and DQN. We systematically evaluate our measure of interference, showing that it correlates with instability in control performance, across a variety of network architectures. Our new interference measure allows us to ask novel scientific questions about commonly used deep learning architectures and study learning algorithms which mitigate interference. Lastly, we outline a class of algorithms which we call online-aware that are designed to mitigate interference, and show they do reduce interference according to our measure and that they improve stability and performance in several classic control environments.
    摘要 Catastrophic interference is a common problem in many network-based learning systems, and many proposals have been made to mitigate it. Before we can overcome interference, we must first understand it better. In this study, we provide a definition and a novel measure of interference for value-based reinforcement learning methods such as Fitted Q-Iteration and DQN. We systematically evaluate our measure of interference and show that it correlates with instability in control performance across a variety of network architectures. Our new interference measure allows us to ask novel scientific questions about commonly used deep learning architectures and study learning algorithms that mitigate interference. Finally, we outline a class of algorithms we call "online-aware" that are designed to mitigate interference, and show that they do reduce interference according to our measure and improve stability and performance in several classic control environments.

AI For Global Climate Cooperation 2023 Competition Proceedings

  • paper_url: http://arxiv.org/abs/2307.06951
  • repo_url: None
  • paper_authors: Yoshua Bengio, Prateek Gupta, Lu Li, Soham Phade, Sunil Srinivasa, Andrew Williams, Tianyu Zhang, Yang Zhang, Stephan Zheng
  • For: The paper aims to design international frameworks for mitigating climate change and promoting sustainable economic growth, using AI-driven integrated assessment models (IAM) and simulations.* Methods: The paper uses RICE-N, an AI-driven IAM that supports modeling regional decision-making using AI agents, to model the climate-economic impact of decisions into the future. The proposals were evaluated both quantitatively and qualitatively, with a combination of performance metrics and human expert evaluation.* Results: The paper seeks to provide a promising solution to the challenges of collaboration in mitigating climate change and promoting sustainable economic growth, by combining AI with climate-economic simulations and involving human experts from multiple disciplines. The results of the competition and the improvements to RICE-N are expected to contribute to the development of effective and sustainable international frameworks for climate cooperation.
    Abstract The international community must collaborate to mitigate climate change and sustain economic growth. However, collaboration is hard to achieve, partly because no global authority can ensure compliance with international climate agreements. Combining AI with climate-economic simulations offers a promising solution to design international frameworks, including negotiation protocols and climate agreements, that promote and incentivize collaboration. In addition, these frameworks should also have policy goals fulfillment, and sustained commitment, taking into account climate-economic dynamics and strategic behaviors. These challenges require an interdisciplinary approach across machine learning, economics, climate science, law, policy, ethics, and other fields. Towards this objective, we organized AI for Global Climate Cooperation, a Mila competition in which teams submitted proposals and analyses of international frameworks, based on (modifications of) RICE-N, an AI-driven integrated assessment model (IAM). In particular, RICE-N supports modeling regional decision-making using AI agents. Furthermore, the IAM then models the climate-economic impact of those decisions into the future. Whereas the first track focused only on performance metrics, the proposals submitted to the second track were evaluated both quantitatively and qualitatively. The quantitative evaluation focused on a combination of (i) the degree of mitigation of global temperature rise and (ii) the increase in economic productivity. On the other hand, an interdisciplinary panel of human experts in law, policy, sociology, economics and environmental science, evaluated the solutions qualitatively. In particular, the panel considered the effectiveness, simplicity, feasibility, ethics, and notions of climate justice of the protocols. In the third track, the participants were asked to critique and improve RICE-N.
    摘要 国际社区必须合作以 Mitigate климатиче变化并保持经济增长。然而,合作困难,其中一个原因是没有全球权威机构可以确保国际气候协议的遵从性。通过将 AI 与气候经济仿真模型相结合,可以开发出国际框架,包括谈判协议和气候协议,以便促进和激励合作。此外,这些框架还应该包括政策目标实现和持续承诺,考虑气候经济动态和策略行为。这些挑战需要跨学科的方法,包括机器学习、经济学、气候科学、法律、政策、伦理和其他领域。为了实现这个目标,我们组织了 AI for Global Climate Cooperation 竞赛,其中团队提交了国际框架的建议和分析,基于(修改后)的 RICE-N 气候经济仿真模型(IAM)。特别是,RICE-N 支持用 AI 代理模型地区决策。而 IAM 则模拟了这些决策对未来气候经济的影响。在第一个轨道中,只评估性能指标。而在第二个轨道中,提交的提案被评估了 both quantitatively 和 qualitatively。量化评估包括(i)全球气温升高的减轻程度和(ii)经济生产力的增长。然而,人类专家组成的多学科评审团(包括法律、政策、社会学、经济学和环境科学)对解决方案进行了质量评估。特别是,评审团考虑了效果、简洁、可行性、伦理和气候正义方面的评估。在第三个轨道中,参与者被要求批评并改进 RICE-N。

Onion Universe Algorithm: Applications in Weakly Supervised Learning

  • paper_url: http://arxiv.org/abs/2307.04870
  • repo_url: None
  • paper_authors: Woojoo Na
  • for: 本研究旨在提出一种新的分类方法,即 Onion Universe Algorithm (OUA),用于弱监督学习。
  • methods: OUA 基于弱信号空间的几何解释,不需要任何假设,可以快速实现并且简单易用。
  • results: 实验结果表明,OUA 在常见的标准数据集上表现出色,比既有的标签模型更好。
    Abstract We introduce Onion Universe Algorithm (OUA), a novel classification method in ensemble learning. In particular, we show its applicability as a label model for weakly supervised learning. OUA offers simplicity in implementation, computational efficiency, and does not rely on any assumptions regarding the data or weak signals. The model is well suited for scenarios where fully labeled data is not available. Our method is built upon geometrical interpretation of the space spanned by weak signals. Empirical results support our analysis of the hidden geometric structure underlying general set of weak signals and also illustrates that OUA works well in practice. We show empirical evidence that OUA performs favorably on common benchmark datasets compared to existing label models for weakly supervised learning.
    摘要 我团队介绍了葱宇宙算法(OUA),一种新型的集成学习分类方法。具体来说,我们证明了它在弱监督学习中的应用性。OUA具有简单的实现、计算效率和不假设数据或弱信号的特点。该模型适用于具有受限数据的场景。我们的方法基于弱信号空间的几何解释。我们的分析表明,OUA在通用的弱信号集中隐藏的几何结构下适用。实验证明了OUA在实践中表现良好,并且与现有的弱监督学习标签模型相比,OUA的性能较好。

Fed-CPrompt: Contrastive Prompt for Rehearsal-Free Federated Continual Learning

  • paper_url: http://arxiv.org/abs/2307.04869
  • repo_url: None
  • paper_authors: Gaurav Bagwe, Xiaoyong Yuan, Miao Pan, Lan Zhang
  • for: 这篇论文是针对 Federated Continual Learning (FCL) 的研究,尤其是在不需要练习的情况下学习多个任务。
  • methods: 本论文使用了 Prompt Learning 技术,通过强制学习 Task-specific 的描述来解决 FCL 中的忘记问题。它还 introduce two key components: asynchronous prompt learning 和 contrastive continual loss,以 Handling asynchronous task arrival 和 heterogeneous data distributions 在 FCL 中。
  • results: 实验结果显示 Fed-CPrompt 可以实现 SOTA 的 rehearsal-free FCL 性能。
    Abstract Federated continual learning (FCL) learns incremental tasks over time from confidential datasets distributed across clients. This paper focuses on rehearsal-free FCL, which has severe forgetting issues when learning new tasks due to the lack of access to historical task data. To address this issue, we propose Fed-CPrompt based on prompt learning techniques to obtain task-specific prompts in a communication-efficient way. Fed-CPrompt introduces two key components, asynchronous prompt learning, and contrastive continual loss, to handle asynchronous task arrival and heterogeneous data distributions in FCL, respectively. Extensive experiments demonstrate the effectiveness of Fed-CPrompt in achieving SOTA rehearsal-free FCL performance.
    摘要 联邦不断学习(FCL)逐渐学习课程,随着时间的推移,从静态分布在客户端上的机密数据集中学习新任务。这篇论文专注于无卷重复FCL,由于缺乏历史任务数据,因此受到严重的忘记问题困扰。为解决这个问题,我们提议了Fed-CPrompt,基于提示学习技术来获得任务特定的提示。Fed-CPrompt具有异步提示学习和对异构数据分布的对比连续损失两个关键组件,可以有效地处理FCL中的异步任务到达和不同数据分布问题。广泛的实验表明Fed-CPrompt可以实现SOTA的无卷重复FCL性能。

Leveraging an Alignment Set in Tackling Instance-Dependent Label Noise

  • paper_url: http://arxiv.org/abs/2307.04868
  • repo_url: https://github.com/MLD3/Instance_Dependent_Label_Noise
  • paper_authors: Donna Tjandra, Jenna Wiens
  • for: 这篇论文是为了解决受标签错误影响的模型性能问题。
  • methods: 这篇论文提出了一个two-stage方法来在标签错误中学习。这个方法使用了“anchor points”,一小部分数据,其标签已知。
  • results: 这篇论文的方法在多个任务上实现了显著的改善(AUROC),同时减少了偏见(AUEOC)。例如,在预测MIMIC-IIIdataset上的严重呼吸系统失常开始时,这篇论文的方法取得了0.84(SD 0.01)的和谐平均值(AUROC和AUEOC),比下一个最佳基eline的0.81(SD 0.01)高。总的来说,这篇论文的方法可以提高精度,同时减少可能的偏见。
    Abstract Noisy training labels can hurt model performance. Most approaches that aim to address label noise assume label noise is independent from the input features. In practice, however, label noise is often feature or \textit{instance-dependent}, and therefore biased (i.e., some instances are more likely to be mislabeled than others). E.g., in clinical care, female patients are more likely to be under-diagnosed for cardiovascular disease compared to male patients. Approaches that ignore this dependence can produce models with poor discriminative performance, and in many healthcare settings, can exacerbate issues around health disparities. In light of these limitations, we propose a two-stage approach to learn in the presence instance-dependent label noise. Our approach utilizes \textit{\anchor points}, a small subset of data for which we know the observed and ground truth labels. On several tasks, our approach leads to consistent improvements over the state-of-the-art in discriminative performance (AUROC) while mitigating bias (area under the equalized odds curve, AUEOC). For example, when predicting acute respiratory failure onset on the MIMIC-III dataset, our approach achieves a harmonic mean (AUROC and AUEOC) of 0.84 (SD [standard deviation] 0.01) while that of the next best baseline is 0.81 (SD 0.01). Overall, our approach improves accuracy while mitigating potential bias compared to existing approaches in the presence of instance-dependent label noise.
    摘要

Compositional Generalization from First Principles

  • paper_url: http://arxiv.org/abs/2307.05596
  • repo_url: https://github.com/brendel-group/compositional-ood-generalization
  • paper_authors: Thaddäus Wiedemer, Prasanna Mayilvahanan, Matthias Bethge, Wieland Brendel
  • for: 本研究旨在探讨机器学习中的compositional generalization问题,即如何使模型能够通过学习数据的组成结构来泛化到新的数据集。
  • methods: 我们采用了一种底层的方法,通过对数据生成过程的分析,将compositional generalization问题转化为了一种数据生成问题。然后,我们提出了一些某种条件,这些条件只需要支持Training distribution和模型结构,即可以确保模型的泛化能力。
  • results: 我们的研究结果表明,在实际场景中,我们的方法可以有效地推广模型的泛化能力。此外,我们还进行了一些实验来验证我们的理论结论,并得到了正面的结果。
    Abstract Leveraging the compositional nature of our world to expedite learning and facilitate generalization is a hallmark of human perception. In machine learning, on the other hand, achieving compositional generalization has proven to be an elusive goal, even for models with explicit compositional priors. To get a better handle on compositional generalization, we here approach it from the bottom up: Inspired by identifiable representation learning, we investigate compositionality as a property of the data-generating process rather than the data itself. This reformulation enables us to derive mild conditions on only the support of the training distribution and the model architecture, which are sufficient for compositional generalization. We further demonstrate how our theoretical framework applies to real-world scenarios and validate our findings empirically. Our results set the stage for a principled theoretical study of compositional generalization.
    摘要 利用世界的compositional nature来加速学习和推广是人类视觉的一种特征。在机器学习中, however,实现compositional generalization是一个困难的目标,即使模型具有显式的compositional priors。为了更好地理解compositional generalization,我们在这里从底向上方法: inspirited by identifiable representation learning, we investigate compositionality as a property of the data-generating process rather than the data itself.这种重新定义允许我们 derive mild conditions on only the support of the training distribution and the model architecture, which are sufficient for compositional generalization. we further demonstrate how our theoretical framework applies to real-world scenarios and validate our findings empirically. our results set the stage for a principled theoretical study of compositional generalization.Note: "compositional nature" is translated as "世界的compositional nature" in Simplified Chinese, where "世界" (shì jiè) means "world" and "compositional" is an adjective.

Automated Detection of Gait Events and Travel Distance Using Waist-worn Accelerometers Across a Typical Range of Walking and Running Speeds

  • paper_url: http://arxiv.org/abs/2307.04866
  • repo_url: None
  • paper_authors: Albara Ah Ramli, Xin Liu, Kelly Berndt, Chen-Nee Chuah, Erica Goude, Lynea B. Kaethler, Amanda Lopez, Alina Nicorici, Corey Owens, David Rodriguez, Jane Wang, Daniel Aranki, Craig M. McDonald, Erik K. Henricson
  • For: The paper aims to evaluate the accuracy of using accelerometer data from commercially-available smartphones to measure clinical features of gait (CFs) in children with Duchenne muscular dystrophy (DMD) and typically developing controls (TDs) using machine learning (ML)-based methods.* Methods: The study used a multi-step machine learning-based process to extract CFs from accelerometer data collected from 15 children with DMD and 15 TDs during supervised clinical testing across a range of gait speeds, including 10 or 25m run/walk (10MRW, 25MRW), 100m run/walk (100MRW), 6-minute walk (6MWT) and free-walk (FW) evaluations.* Results: The study found that the estimates of CFs obtained from the accelerometer data showed a strong correlation with ground-truth observation data, with a mean (SD) percentage error of 1.49% (7.04%) for step counts, 1.18% (9.91%) for distance traveled, and 0.37% (7.52%) for step length.Here is the information in Simplified Chinese text:* For: 这个研究是用来评估使用商业可用的智能手机陀螺仪数据来评估 Duchenne muscular dystrophy(DMD)和 typically developing controls(TDs)中的步行临床特征(CFs)的准确性的。* Methods: 这个研究使用了多步机器学习基于的过程来从陀螺仪数据中提取CFs,并在15个DMD儿童和15个TD儿童的指导临床测试中进行了多种步速评估,包括10米/25米跑步(10MRW、25MRW)、100米跑步(100MRW)、6分钟步行(6MWT)和自由步行(FW)测试。* Results: 研究发现,通过陀螺仪数据获取的CFs估算与实际观察数据之间呈现了强相关关系,其中step count、距离旅行和步长的估算 errors的mean(SD)为1.49%(7.04%)、1.18%(9.91%)和0.37%(7.52%)。
    Abstract Background: Estimation of temporospatial clinical features of gait (CFs), such as step count and length, step duration, step frequency, gait speed and distance traveled is an important component of community-based mobility evaluation using wearable accelerometers. However, challenges arising from device complexity and availability, cost and analytical methodology have limited widespread application of such tools. Research Question: Can accelerometer data from commercially-available smartphones be used to extract gait CFs across a broad range of attainable gait velocities in children with Duchenne muscular dystrophy (DMD) and typically developing controls (TDs) using machine learning (ML)-based methods Methods: Fifteen children with DMD and 15 TDs underwent supervised clinical testing across a range of gait speeds using 10 or 25m run/walk (10MRW, 25MRW), 100m run/walk (100MRW), 6-minute walk (6MWT) and free-walk (FW) evaluations while wearing a mobile phone-based accelerometer at the waist near the body's center of mass. Gait CFs were extracted from the accelerometer data using a multi-step machine learning-based process and results were compared to ground-truth observation data. Results: Model predictions vs. observed values for step counts, distance traveled, and step length showed a strong correlation (Pearson's r = -0.9929 to 0.9986, p<0.0001). The estimates demonstrated a mean (SD) percentage error of 1.49% (7.04%) for step counts, 1.18% (9.91%) for distance traveled, and 0.37% (7.52%) for step length compared to ground truth observations for the combined 6MWT, 100MRW, and FW tasks. Significance: The study findings indicate that a single accelerometer placed near the body's center of mass can accurately measure CFs across different gait speeds in both TD and DMD peers, suggesting that there is potential for accurately measuring CFs in the community with consumer-level smartphones.
    摘要 背景:评估社区基尼行动(CF),如步数和长度、步duration、步频、走速和总距离是评估社区基尼行动评估器(CB-GAIT)的重要组成部分。然而,设备复杂性和可用性、成本和分析方法的问题有限制了这些工具的广泛应用。研究问题:可以使用商业化手机的加速度仪数据来提取CF在各种可行的步速下的儿童 Duchenne muscular dystrophy(DMD)和正常发育 controls(TD)中使用机器学习(ML)方法来提取CF。方法:15名DMD儿童和15名TD儿童在不同的步速下进行了监测,包括10米跑步/跑步(10MRW)、25米跑步/跑步(25MRW)、100米跑步/跑步(100MRW)、6分钟步行(6MWT)和自由步行(FW)测试,同时穿着在腰部近身体中心的手机加速度仪。CF从加速度数据中提取使用多步骤机器学习基本过程,并与实际观测数据进行比较。结果:模型预测与实际观测值之间的相关性(Pearson的r = -0.9929到0.9986,p < 0.0001),并且估计结果表明了步数、总距离和步长的mean(SD)百分比误差为1.49%(7.04%)、1.18%(9.91%)和0.37%(7.52%),与实际观测值相比。意义:这些结果表明,一个单一的加速度仪可以在不同的步速下准确地测量CF,并且这些测量可以在社区中使用consumer级别的智能手机进行。

Articulated 3D Head Avatar Generation using Text-to-Image Diffusion Models

  • paper_url: http://arxiv.org/abs/2307.04859
  • repo_url: None
  • paper_authors: Alexander W. Bergman, Wang Yifan, Gordon Wetzstein
  • for: 本研究旨在提供一种基于文本描述的3D人物头部生成方法,以满足人工智能、虚拟现实、电影制作和教育等领域的需求。
  • methods: 本研究使用了已经训练过的2D文本到图像扩散模型,直接生成3D-多视图一致的辐射场,以生成3D人物头部。新的优化方法可以保持2D和3D的表情特征相对应。
  • results: 研究表明,使用 diffusion-based 方法可以生成高质量的3D人物头部,并且可以在特定的领域内操作,例如人类头部。与之前的CLIP方法相比,我们的方法可以提供更高的多样性和准确性。
    Abstract The ability to generate diverse 3D articulated head avatars is vital to a plethora of applications, including augmented reality, cinematography, and education. Recent work on text-guided 3D object generation has shown great promise in addressing these needs. These methods directly leverage pre-trained 2D text-to-image diffusion models to generate 3D-multi-view-consistent radiance fields of generic objects. However, due to the lack of geometry and texture priors, these methods have limited control over the generated 3D objects, making it difficult to operate inside a specific domain, e.g., human heads. In this work, we develop a new approach to text-guided 3D head avatar generation to address this limitation. Our framework directly operates on the geometry and texture of an articulable 3D morphable model (3DMM) of a head, and introduces novel optimization procedures to update the geometry and texture while keeping the 2D and 3D facial features aligned. The result is a 3D head avatar that is consistent with the text description and can be readily articulated using the deformation model of the 3DMM. We show that our diffusion-based articulated head avatars outperform state-of-the-art approaches for this task. The latter are typically based on CLIP, which is known to provide limited diversity of generation and accuracy for 3D object generation.
    摘要 “三维人物头像生成能力是许多应用中的重要能力,包括增强现实、电影拍摄和教育。现有的文本导向三维物体生成研究已经展示了很大的应用潜力。这些方法直接利用预训的二维文本扩散模型来生成三维多视角具有颜色场的对应物体。但由于缺乏几何和纹理偏好,这些方法对生成的三维物体有限的控制,很难在特定领域内运作,例如人头。在这个工作中,我们开发了一新的文本导向三维头像生成方法,以解决这个限制。我们的框架直接运算在头像3DMM中的几何和纹理,并引入了新的优化程序以更新几何和纹理,并保持2D和3D脸部特征相互Alignment。结果是一个跟文本描述相符的3D头像,可以轻松地运动使用3DMM的扭变模型。我们显示了我们的扩散基于3DMM的头像比以前的方法更高效。这些方法通常基于CLIP,CLIP知道提供有限的多样性和精度 для三维物体生成。”

SHAP@k:Efficient and Probably Approximately Correct (PAC) Identification of Top-k Features

  • paper_url: http://arxiv.org/abs/2307.04850
  • repo_url: None
  • paper_authors: Sanjay Kariyappa, Leonidas Tsepenekas, Freddy Lécué, Daniele Magazzeni
  • for: 本研究的目的是解释模型预测结果的原因,通过计算特征重要性。
  • methods: 本研究使用了SHAP框架,并引入了Top-k标识问题(TkIP),以解决高级特征选择问题。
  • results: 研究人员通过引入多重采样和适应采样策略,提高了现有方法的样本效率和运行时间,平均提高了5倍。
    Abstract The SHAP framework provides a principled method to explain the predictions of a model by computing feature importance. Motivated by applications in finance, we introduce the Top-k Identification Problem (TkIP), where the objective is to identify the k features with the highest SHAP values. While any method to compute SHAP values with uncertainty estimates (such as KernelSHAP and SamplingSHAP) can be trivially adapted to solve TkIP, doing so is highly sample inefficient. The goal of our work is to improve the sample efficiency of existing methods in the context of solving TkIP. Our key insight is that TkIP can be framed as an Explore-m problem--a well-studied problem related to multi-armed bandits (MAB). This connection enables us to improve sample efficiency by leveraging two techniques from the MAB literature: (1) a better stopping-condition (to stop sampling) that identifies when PAC (Probably Approximately Correct) guarantees have been met and (2) a greedy sampling scheme that judiciously allocates samples between different features. By adopting these methods we develop KernelSHAP@k and SamplingSHAP@k to efficiently solve TkIP, offering an average improvement of $5\times$ in sample-efficiency and runtime across most common credit related datasets.
    摘要 <> translate "The SHAP framework provides a principled method to explain the predictions of a model by computing feature importance. Motivated by applications in finance, we introduce the Top-k Identification Problem (TkIP), where the objective is to identify the k features with the highest SHAP values. While any method to compute SHAP values with uncertainty estimates (such as KernelSHAP and SamplingSHAP) can be trivially adapted to solve TkIP, doing so is highly sample inefficient. The goal of our work is to improve the sample efficiency of existing methods in the context of solving TkIP. Our key insight is that TkIP can be framed as an Explore-m problem--a well-studied problem related to multi-armed bandits (MAB). This connection enables us to improve sample efficiency by leveraging two techniques from the MAB literature: (1) a better stopping-condition (to stop sampling) that identifies when PAC (Probably Approximately Correct) guarantees have been met and (2) a greedy sampling scheme that judiciously allocates samples between different features. By adopting these methods we develop KernelSHAP@k and SamplingSHAP@k to efficiently solve TkIP, offering an average improvement of $5\times$ in sample-efficiency and runtime across most common credit related datasets."中文翻译:<>SHAP框架提供一种原则正的方法来解释模型预测的原因,通过计算特征重要性。在金融应用场景下,我们引入了Top-k标识问题(TkIP), objective是Identify k 最高 SHAP 值的特征。尽管任何可以计算 SHAP 值的不确定性估计(如 KernelSHAP 和 SamplingSHAP)可以轻松地适应 TkIP,但这样做是高度样本不效率的。我们的目标是提高现有方法的样本效率,在 TkIP 的 контекス中。我们的关键发现是,TkIP 可以视为一个 Explore-m 问题,这是一个已经研究了多臂枪(MAB)中的问题。这种连接使我们可以通过利用 MAB литературе中的两种技术来提高样本效率:(1)更好的停止条件(以确保 PAC (Probably Approximately Correct)的保证)和(2)贪婪的采样方案,智能分配样本 между不同的特征。通过采用这些方法,我们开发了 KernelSHAP@k 和 SamplingSHAP@k,以有效地解决 TkIP,在大多数常见的借款相关数据集上提供了平均提高 $5\times$ 的样本效率和运行时间。

SigOpt Mulch: An Intelligent System for AutoML of Gradient Boosted Trees

  • paper_url: http://arxiv.org/abs/2307.04849
  • repo_url: None
  • paper_authors: Aleksei Sorokin, Xinran Zhu, Eric Hans Lee, Bolong Cheng
  • for: 提高Gradient Boosted Trees(GBTs)模型的hyperparameter优化
  • methods: 使用meta学和多信度优化技术进行模型 aware的hyperparameter优化,自动学习performant的hyperparameter
  • results: 比现有系统更高效地优化GBTs hyperparameter,减少用户域知识的需求,提供更易用的用户体验
    Abstract Gradient boosted trees (GBTs) are ubiquitous models used by researchers, machine learning (ML) practitioners, and data scientists because of their robust performance, interpretable behavior, and ease-of-use. One critical challenge in training GBTs is the tuning of their hyperparameters. In practice, selecting these hyperparameters is often done manually. Recently, the ML community has advocated for tuning hyperparameters through black-box optimization and developed state-of-the-art systems to do so. However, applying such systems to tune GBTs suffers from two drawbacks. First, these systems are not \textit{model-aware}, rather they are designed to apply to a \textit{generic} model; this leaves significant optimization performance on the table. Second, using these systems requires \textit{domain knowledge} such as the choice of hyperparameter search space, which is an antithesis to the automatic experimentation that black-box optimization aims to provide. In this paper, we present SigOpt Mulch, a model-aware hyperparameter tuning system specifically designed for automated tuning of GBTs that provides two improvements over existing systems. First, Mulch leverages powerful techniques in metalearning and multifidelity optimization to perform model-aware hyperparameter optimization. Second, it automates the process of learning performant hyperparameters by making intelligent decisions about the optimization search space, thus reducing the need for user domain knowledge. These innovations allow Mulch to identify good GBT hyperparameters far more efficiently -- and in a more seamless and user-friendly way -- than existing black-box hyperparameter tuning systems.
    摘要 Gradient Boosted Trees (GBTs) 是研究人员、机器学习(ML)专业人员和数据科学家们广泛使用的模型,因其性能 Robust 、可解释性和易用性。但在训练 GBTs 时,一个关键挑战是调整它们的超参数。在实践中,选择这些超参数通常是手动进行的。近年来,ML 社区强调通过黑盒优化进行超参数调整,并开发出了state-of-the-art 系统。但是,对 GBTs 进行黑盒优化具有两个缺点。首先,这些系统不是 GBTs 模型具有的,而是为普通模型而设计的,这会吃到大量的优化性能。第二,使用这些系统需要域知识,如选择超参数搜索空间,这与黑盒优化的自动实验相对抵触。在本文中,我们介绍了 SigOpt Mulch,一个特地设计用于自动调整 GBTs 超参数的模型具有优化系统。相比现有系统,Mulch 具有两个优势:首先,Mulch 利用了强大的元学习和多 fidelt 优化技术来进行模型具有的超参数优化。其次,它自动化了超参数优化的过程,从而减少了用户需要域知识的需求。这些创新使得 Mulch 可以更加高效地找到好的 GBTs 超参数,并且在更加易用和愉悦的方式进行自动化。

Dynamics of Temporal Difference Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2307.04841
  • repo_url: https://github.com/pehlevan-group/td-rl-dynamics
  • paper_authors: Blake Bordelon, Paul Masset, Henry Kuo, Cengiz Pehlevan
  • for: 这篇论文旨在研究反射学习中参数和状态表示方法如何控制学习动态。
  • methods: 这篇论文使用统计物理概念来研究价值函数学习的时间差分学习 Curves。
  • results: 研究发现,在抽样 espacio 的情况下,涉及到随机漫步的 Stochastic semi-gradient noise 会导致价值错误出现极大板块,而不同于传统的梯度下降动力学。研究还发现,学习率渐变和奖励调整可以改善学习动态和板块。
    Abstract Reinforcement learning has been successful across several applications in which agents have to learn to act in environments with sparse feedback. However, despite this empirical success there is still a lack of theoretical understanding of how the parameters of reinforcement learning models and the features used to represent states interact to control the dynamics of learning. In this work, we use concepts from statistical physics, to study the typical case learning curves for temporal difference learning of a value function with linear function approximators. Our theory is derived under a Gaussian equivalence hypothesis where averages over the random trajectories are replaced with temporally correlated Gaussian feature averages and we validate our assumptions on small scale Markov Decision Processes. We find that the stochastic semi-gradient noise due to subsampling the space of possible episodes leads to significant plateaus in the value error, unlike in traditional gradient descent dynamics. We study how learning dynamics and plateaus depend on feature structure, learning rate, discount factor, and reward function. We then analyze how strategies like learning rate annealing and reward shaping can favorably alter learning dynamics and plateaus. To conclude, our work introduces new tools to open a new direction towards developing a theory of learning dynamics in reinforcement learning.
    摘要 <>将文本翻译成简化中文。<>基于实验的成功,强化学习已经在具有罕见反馈的环境中的多种应用中成功。然而,虽然有这些实际成功,但是还没有充分的理论理解强化学习模型参数和表示状态的特征之间如何控制学习动态。在这项工作中,我们使用统计物理学的概念,研究延迟差值学习值函数的时间梯度学习曲线的典型情况。我们的理论基于 Gaussian 等价假设,将 episodic 随机轨迹的平均值替换为时间相关的 Gaussian 特征平均值,并在小规模 Markov Decision Processes 上验证我们的假设。我们发现,由于 episodic 随机轨迹的抽样而导致的随机半gradient 噪声会在传统的梯度下降动力学中引起显著的板块,而不同于传统的梯度下降动力学。我们研究学习动态和板块如何受特征结构、学习率、折扣因子和奖励函数的影响。然后,我们分析如何通过学习率减退和奖励修饰来改善学习动态和板块。综上所述,我们的工作开启了一个新的方向,以开发强化学习动态学的理论。

CREPE: Learnable Prompting With CLIP Improves Visual Relationship Prediction

  • paper_url: http://arxiv.org/abs/2307.04838
  • repo_url: https://github.com/llnl/crepe
  • paper_authors: Rakshith Subramanyam, T. S. Jayram, Rushil Anirudh, Jayaraman J. Thiagarajan
  • for: 这篇论文探讨了使用视觉语言模型(VLM),尤其是CLIP,预测视觉对象关系的潜力,从图像中提取语言基于关系。
  • methods: 我们采用了UVTransE关系预测框架,该框架学习关系为图像中的翻译嵌入。我们系统地探索CLIP中的subject、object和union-box表示方法,并提出了CREPE(CLIP表示增强预测)。CREPE使用了文本基于表示,并引入了一种新的对比训练策略来自动推理union-box的文本提示。
  • results: 我们的方法在Visual Genome benchmark上实现了 predicate estimation的状态aru-the-art性能,mR@5 27.79,mR@20 31.95,与最近的状态aru-the-art在mR@20上提高15.3%。这个工作证明了CLIP在对象关系预测中的效果,并鼓励了更多的研究在这个挑战性的领域。
    Abstract In this paper, we explore the potential of Vision-Language Models (VLMs), specifically CLIP, in predicting visual object relationships, which involves interpreting visual features from images into language-based relations. Current state-of-the-art methods use complex graphical models that utilize language cues and visual features to address this challenge. We hypothesize that the strong language priors in CLIP embeddings can simplify these graphical models paving for a simpler approach. We adopt the UVTransE relation prediction framework, which learns the relation as a translational embedding with subject, object, and union box embeddings from a scene. We systematically explore the design of CLIP-based subject, object, and union-box representations within the UVTransE framework and propose CREPE (CLIP Representation Enhanced Predicate Estimation). CREPE utilizes text-based representations for all three bounding boxes and introduces a novel contrastive training strategy to automatically infer the text prompt for union-box. Our approach achieves state-of-the-art performance in predicate estimation, mR@5 27.79, and mR@20 31.95 on the Visual Genome benchmark, achieving a 15.3\% gain in performance over recent state-of-the-art at mR@20. This work demonstrates CLIP's effectiveness in object relation prediction and encourages further research on VLMs in this challenging domain.
    摘要 在这篇论文中,我们探索了视觉语言模型(VLM),尤其是CLIP,在预测视觉对象关系方面的潜力。当前领先方法使用复杂的图形模型,利用语言提示和视觉特征来解决这个挑战。我们假设CLIP的强语言优先可以简化这些图形模型,为更简单的方法提供条件。我们采用UVTransE关系预测框架,该框架学习关系为图像中的翻译嵌入。我们系统地探索CLIP基于表示的主体、对象和联合盒子表示的设计,并提出CREPE(CLIP表示增强预测Predicate)。CREPE使用文本基于表示 для所有三个盒子,并 introduce了一种新的对比训练策略,自动推断联合盒子的文本提示。我们的方法在Visual Genome标准 benchmark上实现了预测 predicate 的状态对应性,MR@5 27.79,MR@20 31.95,与最近领先方法相比提高了15.3%的性能。这项工作证明CLIP在对象关系预测中的效iveness,并鼓励进一步的VLM在这个领域的研究。

Graph Representation of the Magnetic Field Topology in High-Fidelity Plasma Simulations for Machine Learning Applications

  • paper_url: http://arxiv.org/abs/2307.09469
  • repo_url: None
  • paper_authors: Ioanna Bouri, Fanni Franssila, Markku Alho, Giulia Cozzani, Ivan Zaitsev, Minna Palmroth, Teemu Roos
  • for: study of magnetic reconnection in three-dimensional magnetic vector fields
  • methods: scalable pipeline for topological data analysis and spatiotemporal graph representation
  • results: demonstration on simulations of the Earth’s magnetosphere produced by Vlasiator
    Abstract Topological analysis of the magnetic field in simulated plasmas allows the study of various physical phenomena in a wide range of settings. One such application is magnetic reconnection, a phenomenon related to the dynamics of the magnetic field topology, which is difficult to detect and characterize in three dimensions. We propose a scalable pipeline for topological data analysis and spatiotemporal graph representation of three-dimensional magnetic vector fields. We demonstrate our methods on simulations of the Earth's magnetosphere produced by Vlasiator, a supercomputer-scale Vlasov theory-based simulation for near-Earth space. The purpose of this work is to challenge the machine learning community to explore graph-based machine learning approaches to address a largely open scientific problem with wide-ranging potential impact.
    摘要 topological 分析 магнитного场在模拟的气体中允许研究各种物理现象在广泛的设置下。一种应用是 магнит重连,这是关于磁场拓扑结构动态的现象,Difficult to detect and characterize in three dimensions. We propose a scalable pipeline for topological data analysis and spatiotemporal graph representation of three-dimensional magnetic vector fields. We demonstrate our methods on simulations of the Earth's magnetosphere produced by Vlasiator, a supercomputer-scale Vlasov theory-based simulation for near-Earth space. The purpose of this work is to challenge the machine learning community to explore graph-based machine learning approaches to address a largely open scientific problem with wide-ranging potential impact.Note: Simplified Chinese is used here, which is a more casual and widely-used version of Chinese. If you prefer Traditional Chinese, please let me know.

Functional PCA and Deep Neural Networks-based Bayesian Inverse Uncertainty Quantification with Transient Experimental Data

  • paper_url: http://arxiv.org/abs/2307.05592
  • repo_url: None
  • paper_authors: Ziyu Xie, Mahmoud Yaseen, Xu Wu
  • for: 这个论文的目的是开发一种逆Quantification of Uncertainty(UQ)过程,用于量化模型输入不确定性基于实验数据。
  • methods: 这篇论文使用了功能归一分析(PCA)和深度神经网络(DNN)来建立一个快速的Surrogate模型,从而减少计算成本。
  • results: 该研究比较了不同的逆UQ过程,并结果表明,提posed方法可以更好地减少TRACE транзиент计算模型的维度,并且预测结果与实验数据更加一致。
    Abstract Inverse UQ is the process to inversely quantify the model input uncertainties based on experimental data. This work focuses on developing an inverse UQ process for time-dependent responses, using dimensionality reduction by functional principal component analysis (PCA) and deep neural network (DNN)-based surrogate models. The demonstration is based on the inverse UQ of TRACE physical model parameters using the FEBA transient experimental data. The measurement data is time-dependent peak cladding temperature (PCT). Since the quantity-of-interest (QoI) is time-dependent that corresponds to infinite-dimensional responses, PCA is used to reduce the QoI dimension while preserving the transient profile of the PCT, in order to make the inverse UQ process more efficient. However, conventional PCA applied directly to the PCT time series profiles can hardly represent the data precisely due to the sudden temperature drop at the time of quenching. As a result, a functional alignment method is used to separate the phase and amplitude information of the transient PCT profiles before dimensionality reduction. DNNs are then trained using PC scores from functional PCA to build surrogate models of TRACE in order to reduce the computational cost in Markov Chain Monte Carlo sampling. Bayesian neural networks are used to estimate the uncertainties of DNN surrogate model predictions. In this study, we compared four different inverse UQ processes with different dimensionality reduction methods and surrogate models. The proposed approach shows an improvement in reducing the dimension of the TRACE transient simulations, and the forward propagation of inverse UQ results has a better agreement with the experimental data.
    摘要 <>转换为简化中文:<>反UQ是量化模型输入不确定性的过程,基于实验数据。这项工作关注于开发一种反UQ过程,用于时间相依 responses,使用函数 principales component analysis (PCA) 和深度神经网络 (DNN) 模型来替代模型。示例基于TRACE物理模型参数的反UQ,使用FEBA过程数据。测量数据是时间相依的皮层温度 (PCT)。由于QoI是无穷维度的响应,PCA 用于减少 QoI 维度,保持温度过程的演变轨迹,以便更有效地进行反UQ。然而,直接应用常规 PCA 到 PCT 时间序列profile 的数据可能无法准确表示数据,因为冷却过程中的温度快速下降。因此,我们使用函数对齐方法分离 PCT 时间序列profile 的频率和幅度信息。然后,使用 PC scores 从函数 PCA 训练 DNN 模型,以减少计算成本。bayesian neural networks 用于估计 DNN 模型预测结果的不确定性。在这项研究中,我们比较了不同的反UQ过程,包括不同的维度减少方法和模型。我们的方法显示可以更好地减少 TRACE 过程的维度,并且反UQ结果的前向传播与实验数据更好匹配。

SITTA: A Semantic Image-Text Alignment for Image Captioning

  • paper_url: http://arxiv.org/abs/2307.05591
  • repo_url: https://github.com/ml-jku/semantic-image-text-alignment
  • paper_authors: Fabian Paischer, Thomas Adler, Markus Hofmarcher, Sepp Hochreiter
    for: 这 paper 的目的是提供一种将 Semantic 信息从视觉模型传输到生成型语言模型中,以实现图像描述的详细语言能力。methods: 该 paper 使用了两种新的建构方法来实现将 Semantic 信息从视觉模型的 embedding space 传输到生成型语言模型中,包括通过token对应关系和使用额外数据来构建直接从视觉空间到语言空间的映射。results: 该 paper 使用了这两种方法,在 MS-COCO 和 Flickr30k 数据集上实现了强大的描述性能,甚至在有限的数据情况下,与零参数和训练参数的竞争对手相比,有部分表现出色。具体来说,使用该方法,只需要250M参数的语言模型就可以生成不错的描述。这种方法使得图像描述更加可 accessible для机构具有限制的计算资源。
    Abstract Textual and semantic comprehension of images is essential for generating proper captions. The comprehension requires detection of objects, modeling of relations between them, an assessment of the semantics of the scene and, finally, representing the extracted knowledge in a language space. To achieve rich language capabilities while ensuring good image-language mappings, pretrained language models (LMs) were conditioned on pretrained multi-modal (image-text) models that allow for image inputs. This requires an alignment of the image representation of the multi-modal model with the language representations of a generative LM. However, it is not clear how to best transfer semantics detected by the vision encoder of the multi-modal model to the LM. We introduce two novel ways of constructing a linear mapping that successfully transfers semantics between the embedding spaces of the two pretrained models. The first aligns the embedding space of the multi-modal language encoder with the embedding space of the pretrained LM via token correspondences. The latter leverages additional data that consists of image-text pairs to construct the mapping directly from vision to language space. Using our semantic mappings, we unlock image captioning for LMs without access to gradient information. By using different sources of data we achieve strong captioning performance on MS-COCO and Flickr30k datasets. Even in the face of limited data, our method partly exceeds the performance of other zero-shot and even finetuned competitors. Our ablation studies show that even LMs at a scale of merely 250M parameters can generate decent captions employing our semantic mappings. Our approach makes image captioning more accessible for institutions with restricted computational resources.
    摘要 文本和 semantic 理解图像是图像描述的关键。这种理解需要检测对象、模型图像中对象之间的关系、Scene 的 semantics 评估,然后将抽象到语言空间中。为了实现良好的语言功能并确保图像语言映射,我们使用预训练的语言模型(LM) conditioned 在预训练的多modal(图像文本)模型中。这需要将图像表示的多modal 语言encoder 与 LM 的语言表示空间进行Alignment。然而,不知道如何最好地传递多modal 视觉encoder 中检测到的 semantics 到 LM。我们提出了两种新的方法,它们可以成功地在多modal 和 LM 的 embedding 空间之间传递 semantics。第一种方法是将多modal 语言encoder 的 embedding 空间与预训练 LM 的 embedding 空间进行Token 对应。第二种方法是通过使用更多的数据, direkt 从视觉到语言空间进行映射。使用我们的semantic mapping,我们可以在 LM 没有梯度信息的情况下实现图像描述。通过使用不同的数据源,我们在 MS-COCO 和 Flickr30k 数据集上达到了强大的描述性性能。即使 faced 有限的数据,我们的方法可以比其他零 shot 和精度调整的竞争对手表现更好。我们的抽象研究表明,即使 LM 的参数数量只有 250M,我们的方法仍可以生成不错的描述。我们的方法使图像描述更加可 accessible для机构具有限制的计算资源。

Information decomposition to identify relevant variation in complex systems with machine learning

  • paper_url: http://arxiv.org/abs/2307.04755
  • repo_url: None
  • paper_authors: Kieran A. Murphy, Dani S. Bassett
  • for: 本研究旨在提供一种实用、有效和通用的方法,以解compose the information contained in a set of measurements,以便更好地理解复杂系统的行为。
  • methods: 该方法基于分布式信息瓶颈作为学习目标,通过lossily compressing each measurement,对measurements的变化进行分类,并对不同量的预测信息进行排序。
  • results: 研究表明,该方法可以帮助分解复杂系统的信息,并在不同的预测量上提供更多的细节。在两个典型的复杂系统中(Boolean circuit和塑性变形材料),研究人员可以通过查看学习 compression scheme 来了解系统中关键的变化,并从而更好地理解系统的行为。
    Abstract One of the fundamental steps toward understanding a complex system is identifying variation at the scale of the system's components that is most relevant to behavior on a macroscopic scale. Mutual information is a natural means of linking variation across scales of a system due to its independence of the particular functional relationship between variables. However, estimating mutual information given high-dimensional, continuous-valued data is notoriously difficult, and the desideratum -- to reveal important variation in a comprehensible manner -- is only readily achieved through exhaustive search. Here we propose a practical, efficient, and broadly applicable methodology to decompose the information contained in a set of measurements by lossily compressing each measurement with machine learning. Guided by the distributed information bottleneck as a learning objective, the information decomposition sorts variation in the measurements of the system state by relevance to specified macroscale behavior, revealing the most important subsets of measurements for different amounts of predictive information. Additional granularity is achieved by inspection of the learned compression schemes: the variation transmitted during compression is composed of distinctions among measurement values that are most relevant to the macroscale behavior. We focus our analysis on two paradigmatic complex systems: a Boolean circuit and an amorphous material undergoing plastic deformation. In both examples, specific bits of entropy are identified out of the high entropy of the system state as most related to macroscale behavior for insight about the connection between micro- and macro- in the complex system. The identification of meaningful variation in data, with the full generality brought by information theory, is made practical for the study of complex systems.
    摘要 一个基本的步骤到理解复杂系统是识别系统组件的变化 scales 最 relevante to macroscopic behavior. 互补信息是一种自然地将变化 across scales of a system link 的方法,但是估计高维数据 Continuous valued mutual information 是非常困难的。我们提出了一种实用的、高效的和通用的方法,通过使用机器学习来压缩每个测量,以实现信息剖析。我们的方法基于分布式信息瓶颈作为学习目标,通过对测量集进行损失压缩,对测量集中的信息进行分类,并对不同Amount of predictive information 的情况进行分类。通过 inspecting the learned compression schemes ,我们可以获得额外的细化,并识别出系统状态测量中最相关的变化。我们的分析涉及到两种典型的复杂系统:布尔电路和杂质材料在塑性变形过程中。在这两个例子中,我们可以通过对系统状态测量中的高 entropy 进行分类,并识别出与 macroscopic behavior 相关的特定比特 entropy。通过这种方法,我们可以实际地识别复杂系统数据中的有用变化,并且通过信息理论来获得全面的一般性。

Shelving, Stacking, Hanging: Relational Pose Diffusion for Multi-modal Rearrangement

  • paper_url: http://arxiv.org/abs/2307.04751
  • repo_url: None
  • paper_authors: Anthony Simeonov, Ankit Goyal, Lucas Manuelli, Lin Yen-Chen, Alina Sarmiento, Alberto Rodriguez, Pulkit Agrawal, Dieter Fox
  • for: 提供了一个系统来重新排序场景中的物品,以达到想要的物品-场景排序关系,如一本书插入开放架上的槽中。
  • methods: 使用了一个管线,处理了三维点云的变化,并从示范训练出来运作。系统可以处理不同的场景和物品的几种对称和位置,并且可以快速地处理多种不同的重新排序任务。
  • results: 系统可以实现多种不同的重新排序任务,包括处理多模式和物体形状和位置的变化。实验和真实世界中的评估结果显示,系统可以精确地完成重新排序任务,并且可以快速地处理多种不同的任务。
    Abstract We propose a system for rearranging objects in a scene to achieve a desired object-scene placing relationship, such as a book inserted in an open slot of a bookshelf. The pipeline generalizes to novel geometries, poses, and layouts of both scenes and objects, and is trained from demonstrations to operate directly on 3D point clouds. Our system overcomes challenges associated with the existence of many geometrically-similar rearrangement solutions for a given scene. By leveraging an iterative pose de-noising training procedure, we can fit multi-modal demonstration data and produce multi-modal outputs while remaining precise and accurate. We also show the advantages of conditioning on relevant local geometric features while ignoring irrelevant global structure that harms both generalization and precision. We demonstrate our approach on three distinct rearrangement tasks that require handling multi-modality and generalization over object shape and pose in both simulation and the real world. Project website, code, and videos: https://anthonysimeonov.github.io/rpdiff-multi-modal/
    摘要 我们提出了一个系统,用于将对象重新排序Scene以实现想要的对象-Scene的放置关系,如一本书插入开放架上的槽中。我们的管道可以普适到新的几何结构、姿态和布局,并从示例中训练直接操作3D点云。我们的系统可以超越对于给定场景中的多个几何相似的重新排序解。通过利用循环pose减噪训练过程,我们可以适应多模态示例数据,并且生成多模态输出,同时保持精度和准确。我们还发现,通过关注相关的本地几何特征,而忽略无关的全局结构,可以提高both泛化和精度。我们在三个不同的重新排序任务中证明了我们的方法的优势,这些任务需要处理多模态和对象形状和姿态的泛化。项目网站、代码和视频:https://anthonysimeonov.github.io/rpdiff-multi-modal/

Divide, Evaluate, and Refine: Evaluating and Improving Text-to-Image Alignment with Iterative VQA Feedback

  • paper_url: http://arxiv.org/abs/2307.04749
  • repo_url: None
  • paper_authors: Jaskirat Singh, Liang Zheng
  • for: This paper aims to improve the accuracy of text-to-image alignment in latent diffusion models, which have shown remarkable progress in the field of text-conditioned image generation.
  • methods: The proposed approach uses a decompositional approach to evaluate and improve text-to-image alignment. This involves introducing a Decompositional-Alignment-Score, which decomposes a complex prompt into a set of disjoint assertions and measures the alignment of each assertion with generated images using a VQA model.
  • results: The proposed approach shows significantly higher correlation with human ratings compared to traditional CLIP and BLIP scores, and also provides useful feedback in the form of assertion level alignment scores. Human user studies indicate that the proposed approach surpasses previous state-of-the-art by 8.7% in overall text-to-image alignment accuracy.Here is the same information in Simplified Chinese text:
  • for: 这个研究旨在提高文本到图像对适性的准确性,文本conditioned图像生成领域内,latest扩散模型已经做出了无 precedent的进步。
  • methods: 提议的方法使用分解法来评估和改进文本到图像对适性。这里引入了Decompositional-Alignment-Score,将复杂的提示分解成一系列独立的声明,然后使用VQA模型测量每个声明与生成图像之间的对适性。
  • results: 提议的方法与人类评分之间显著相关性高于传统的CLIP和BLIP分数,同时也提供了有用的反馈,即声明级对适性分数。人类用户研究表明,提议的方法比前一代最佳方法提高了8.7%的全面文本到图像对适性精度。
    Abstract The field of text-conditioned image generation has made unparalleled progress with the recent advent of latent diffusion models. While remarkable, as the complexity of given text input increases, the state-of-the-art diffusion models may still fail in generating images which accurately convey the semantics of the given prompt. Furthermore, it has been observed that such misalignments are often left undetected by pretrained multi-modal models such as CLIP. To address these problems, in this paper we explore a simple yet effective decompositional approach towards both evaluation and improvement of text-to-image alignment. In particular, we first introduce a Decompositional-Alignment-Score which given a complex prompt decomposes it into a set of disjoint assertions. The alignment of each assertion with generated images is then measured using a VQA model. Finally, alignment scores for different assertions are combined aposteriori to give the final text-to-image alignment score. Experimental analysis reveals that the proposed alignment metric shows significantly higher correlation with human ratings as opposed to traditional CLIP, BLIP scores. Furthermore, we also find that the assertion level alignment scores provide a useful feedback which can then be used in a simple iterative procedure to gradually increase the expression of different assertions in the final image outputs. Human user studies indicate that the proposed approach surpasses previous state-of-the-art by 8.7% in overall text-to-image alignment accuracy. Project page for our paper is available at https://1jsingh.github.io/divide-evaluate-and-refine
    摘要 “文本调整图像生成领域在最近的几年内取得了无 precedent 的进步,特别是透过潜在散射模型。然而,当文本输入变得越来越复杂时,现代的散射模型可能仍然无法生成具有准确传递 semantics 的图像。此外,实验表明,这些不一致性常常会被先验的多modal模型如 CLIP 掩盖。为了解决这些问题,在本文中我们探索了一个简单 yet effective 的分解分析方法,包括:(1)将复杂的文本提示分解为一系列不耦合的宣告(Decompositional-Alignment-Score);(2)使用 VQA 模型评估每个宣告与生成的图像之间的Alignment度;(3)将不同宣告的Alignment度联合后续构成最终的文本至图像Alignment度。实验分析显示,我们提出的Alignment度指标与人工评分呈现高度相关,并且比Traditional CLIP、BLIP 分数更高。此外,我们发现 assertion 阶层Alignment度也提供了有用的反馈,可以用于一个简单的迭代程序,逐步增加生成图像中不同宣告的表达。人工用户研究显示,我们的方法与前一代最佳状态差异8.7%。”Note: The translation is done using Google Translate, and may not be perfect. Please let me know if you need any further assistance.

RoCo: Dialectic Multi-Robot Collaboration with Large Language Models

  • paper_url: http://arxiv.org/abs/2307.04738
  • repo_url: https://github.com/MandiZhao/robot-collab
  • paper_authors: Zhao Mandi, Shreeya Jain, Shuran Song
  • for: 这个论文旨在探讨多机器人合作的新方法,利用预训大量语言模型(LLM)来实现高层次通信和低层次路径规划。
  • methods: 在这个方法中,机器人具有LLM,可以集体推理和讨论任务策略。它们产生了子任务计划和任务空间径路径,这些路径被用于加速曲线规划。此外,环境反馈,例如碰撞检查,并提醒LLM代理从中 improvise它们的计划和径路。
  • results: 在 RoCoBench benchmark 中,这个方法得到了高成功率,并能够适应任务 semantics 的变化。对话设置具有高可读性和灵活性,在实际世界实验中,RoCo 可以与人合作完成任务。
    Abstract We propose a novel approach to multi-robot collaboration that harnesses the power of pre-trained large language models (LLMs) for both high-level communication and low-level path planning. Robots are equipped with LLMs to discuss and collectively reason task strategies. They then generate sub-task plans and task space waypoint paths, which are used by a multi-arm motion planner to accelerate trajectory planning. We also provide feedback from the environment, such as collision checking, and prompt the LLM agents to improve their plan and waypoints in-context. For evaluation, we introduce RoCoBench, a 6-task benchmark covering a wide range of multi-robot collaboration scenarios, accompanied by a text-only dataset for agent representation and reasoning. We experimentally demonstrate the effectiveness of our approach -- it achieves high success rates across all tasks in RoCoBench and adapts to variations in task semantics. Our dialog setup offers high interpretability and flexibility -- in real world experiments, we show RoCo easily incorporates human-in-the-loop, where a user can communicate and collaborate with a robot agent to complete tasks together. See project website https://project-roco.github.io for videos and code.
    摘要 我们提出了一种新的多机器人合作方法,利用预训练的大型自然语言模型(LLM)来实现高级沟通和低级路径规划。机器人通过LLM进行集体理解任务策略,然后生成子任务计划和任务空间弧轨路径,这些路径被用于加速曲线规划。我们还提供了环境反馈,如碰撞检测,并让LLM代理更新其计划和弧轨路径。为评估,我们提出了RoCoBench,一个6个任务的benchmark,涵盖了多机器人合作场景的各种情况,并附带了文本数据集,用于代理表示和理解。我们的对话设置具有高可读性和灵活性,在实际世界实验中,我们示例了RoCo可以轻松地与人类在Loop合作,以完成任务。更多信息请访问项目网站https://project-roco.github.io。

A unifying framework for differentially private quantum algorithms

  • paper_url: http://arxiv.org/abs/2307.04733
  • repo_url: None
  • paper_authors: Armando Angrisani, Mina Doosti, Elham Kashefi
  • for: 本研究旨在提供一种通用的量子隐私定义,以保护敏感信息的处理。
  • methods: 本文提出了一种新的量子隐私定义,基于量子状态的近似性。此外,本文还提出了一种将类型和量子噪声添加到混合噪声中的方法,以提供更加紧密的隐私保障。
  • results: 本文的研究结果表明,使用该新的量子隐私定义和混合噪声方法可以提供更加紧密的隐私保障,同时具有较小的失真率。此外,本文还证明了量子隐私的先进共轨性,并应用到量子隐私中。
    Abstract Differential privacy is a widely used notion of security that enables the processing of sensitive information. In short, differentially private algorithms map "neighbouring" inputs to close output distributions. Prior work proposed several quantum extensions of differential privacy, each of them built on substantially different notions of neighbouring quantum states. In this paper, we propose a novel and general definition of neighbouring quantum states. We demonstrate that this definition captures the underlying structure of quantum encodings and can be used to provide exponentially tighter privacy guarantees for quantum measurements. Our approach combines the addition of classical and quantum noise and is motivated by the noisy nature of near-term quantum devices. Moreover, we also investigate an alternative setting where we are provided with multiple copies of the input state. In this case, differential privacy can be ensured with little loss in accuracy combining concentration of measure and noise-adding mechanisms. En route, we prove the advanced joint convexity of the quantum hockey-stick divergence and we demonstrate how this result can be applied to quantum differential privacy. Finally, we complement our theoretical findings with an empirical estimation of the certified adversarial robustness ensured by differentially private measurements.
    摘要 differential privacy 是一种广泛使用的安全概念,允许处理敏感信息。简而言之, differentially private 算法将 "邻近" 输入映射到相似的输出分布。先前的工作已经提出了多种量子扩展 differential privacy,每一种都基于不同的量子邻近状态的几何。在这篇论文中,我们提出了一个新的和通用的量子邻近状态定义。我们示出了这个定义捕捉了量子编码的下面结构,可以提供 exponentially 紧密的隐私保证 для量子测量。我们的方法 combinest классической和量子噪声的添加,被动机是近期的量子设备的噪声性。此外,我们还 investigate了另一种情况,在这种情况下,我们被提供多份输入状态。在这种情况下, differential privacy 可以在减少精度的情况下确保。在路过中,我们证明了量子奶酪散度的高级联合几何性,并示出了如何将这一结果应用于量子隐私。最后,我们补充了我们的理论发现,通过实际的验证证明了 differentially private 测量确保的抗对抗性。

Diffusion Policies for Out-of-Distribution Generalization in Offline Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2307.04726
  • repo_url: None
  • paper_authors: Suzan Ece Ada, Erhan Oztop, Emre Ugur
  • for: 本研究旨在提高Offline Reinforcement Learning(RL)方法的灵活性和可靠性,以便在不同的环境中学习更好的策略。
  • methods: 本研究使用了Diffusion Policies,并添加了State Reconstruction Feature Learning来解决Out-of-distribution扩展问题。
  • results: 本研究在2D Multimodal Contextual Bandit环境中实现了State-of-the-art的性能,并在多个D4RLbenchmark任务上也达到了优秀的结果。
    Abstract Offline Reinforcement Learning (RL) methods leverage previous experiences to learn better policies than the behavior policy used for experience collection. In contrast to behavior cloning, which assumes the data is collected from expert demonstrations, offline RL can work with non-expert data and multimodal behavior policies. However, offline RL algorithms face challenges in handling distribution shifts and effectively representing policies due to the lack of online interaction during training. Prior work on offline RL uses conditional diffusion models to obtain expressive policies to represent multimodal behavior in the dataset. Nevertheless, they are not tailored toward alleviating the out-of-distribution state generalization. We introduce a novel method incorporating state reconstruction feature learning in the recent class of diffusion policies to address the out-of-distribution generalization problem. State reconstruction loss promotes more descriptive representation learning of states to alleviate the distribution shift incurred by the out-of-distribution states. We design a 2D Multimodal Contextual Bandit environment to demonstrate and evaluate our proposed model. We assess the performance of our model not only in this new environment but also on several D4RL benchmark tasks, achieving state-of-the-art results.
    摘要 <> translate "Offline Reinforcement Learning (RL) methods leverage previous experiences to learn better policies than the behavior policy used for experience collection. In contrast to behavior cloning, which assumes the data is collected from expert demonstrations, offline RL can work with non-expert data and multimodal behavior policies. However, offline RL algorithms face challenges in handling distribution shifts and effectively representing policies due to the lack of online interaction during training. Prior work on offline RL uses conditional diffusion models to obtain expressive policies to represent multimodal behavior in the dataset. Nevertheless, they are not tailored toward alleviating the out-of-distribution state generalization. We introduce a novel method incorporating state reconstruction feature learning in the recent class of diffusion policies to address the out-of-distribution generalization problem. State reconstruction loss promotes more descriptive representation learning of states to alleviate the distribution shift incurred by the out-of-distribution states. We design a 2D Multimodal Contextual Bandit environment to demonstrate and evaluate our proposed model. We assess the performance of our model not only in this new environment but also on several D4RL benchmark tasks, achieving state-of-the-art results."into Simplified Chinese:<>在线RL方法可以利用先前的经验学习更好的策略,而不是使用行为策略来收集经验。与行为克隆不同,在线RL可以处理非专家数据和多Modal行为策略。然而,在线RL算法面临着处理分布偏移和有效表达策略的挑战。先前的在线RL使用条件扩散模型获得表达性策略来表示数据集中的多Modal行为。然而,它们不是专门解决异常状态泛化问题。我们提出了一种新的方法,利用状态重建特征学习在最近的扩散策略中来解决异常状态泛化问题。状态重建损失使得状态表示学习更加描述性,以适应数据集中的分布偏移。我们设计了2D多Modal上下文随机策略环境,用以评估和评测我们的提议模型。我们不仅在这个新环境中评估我们的模型,还在D4RL benchmark任务上达到了最佳成绩。

AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning

  • paper_url: http://arxiv.org/abs/2307.04725
  • repo_url: https://github.com/guoyww/animatediff
  • paper_authors: Yuwei Guo, Ceyuan Yang, Anyi Rao, Yaohui Wang, Yu Qiao, Dahua Lin, Bo Dai
  • for: 提出了一个实用的框架,以便使得大多数现有的个性化文本到图像模型都能够通过一次性调整生成动画图像。
  • methods: 核心思想是插入一个新初始化的动态模型模块到冻结的文本到图像模型中,并在视频clip上进行训练,以储存合理的动作假设。
  • results: 对多种公共代表性的个性化文本到图像模型进行评估,并示出了该框架可以帮助这些模型生成满足时间滑动的动画clip,同时保持域和多样性的输出。
    Abstract With the advance of text-to-image models (e.g., Stable Diffusion) and corresponding personalization techniques such as DreamBooth and LoRA, everyone can manifest their imagination into high-quality images at an affordable cost. Subsequently, there is a great demand for image animation techniques to further combine generated static images with motion dynamics. In this report, we propose a practical framework to animate most of the existing personalized text-to-image models once and for all, saving efforts in model-specific tuning. At the core of the proposed framework is to insert a newly initialized motion modeling module into the frozen text-to-image model and train it on video clips to distill reasonable motion priors. Once trained, by simply injecting this motion modeling module, all personalized versions derived from the same base T2I readily become text-driven models that produce diverse and personalized animated images. We conduct our evaluation on several public representative personalized text-to-image models across anime pictures and realistic photographs, and demonstrate that our proposed framework helps these models generate temporally smooth animation clips while preserving the domain and diversity of their outputs. Code and pre-trained weights will be publicly available at https://animatediff.github.io/ .
    摘要 随着文本到图像模型(如稳定扩散)和相关个性化技术(如梦 Broadcast和LoRA)的发展,现在任何人都可以将想象力转化成高质量图像,并且在便宜的成本下进行。然而,随后需要图像动画技术来将生成的静止图像与动态运动结合起来。在这份报告中,我们提出一个实用的框架,可以将大多数现有的个性化文本到图像模型动画一遍,从而避免模型特定的调整。核心思想是插入一个新初始化的运动模型模块到冻结的文本到图像模型中,然后在视频clip上进行准则学习,以提取合理的运动偏好。一旦训练完成,只需将这个运动模型模块注入到基础T2I的所有个性化版本中,即使这些版本都是基于同一个基础T2I模型的,也可以通过文本驱动生成个性化的动画图像。我们在各种公共代表性的个性化文本到图像模型上进行了评估,包括漫画图片和真实摄影,并证明了我们提出的框架可以帮助这些模型生成 temporally Smooth的动画clip,保持域和多样性的输出。代码和预训练 веса将在https://animatediff.github.io/公共上发布。

Advances and Challenges in Meta-Learning: A Technical Review

  • paper_url: http://arxiv.org/abs/2307.04722
  • repo_url: None
  • paper_authors: Anna Vettoruzzo, Mohamed-Rafik Bouguelia, Joaquin Vanschoren, Thorsteinn Rögnvaldsson, KC Santosh
  • for: 这篇评论旨在提供一个全面的技术概述,探讨meta-learning在实际应用中的重要性,以及它如何帮助学习系统从多个任务中获得知识,以更快地适应和泛化到新任务。
  • methods: 评论涵盖了当前meta-learning领域的state-of-the-art方法,并探讨了meta-learning与多任务学习、传输学习、领域适应和泛化、自我指导学习、个性化联合学习以及持续学习的关系。
  • results: 评论总结了当前领域的最新研究发展,并提出了未解决的问题和挑战,以便未来研究者可以更好地发挥创新力和积极性。
    Abstract Meta-learning empowers learning systems with the ability to acquire knowledge from multiple tasks, enabling faster adaptation and generalization to new tasks. This review provides a comprehensive technical overview of meta-learning, emphasizing its importance in real-world applications where data may be scarce or expensive to obtain. The paper covers the state-of-the-art meta-learning approaches and explores the relationship between meta-learning and multi-task learning, transfer learning, domain adaptation and generalization, self-supervised learning, personalized federated learning, and continual learning. By highlighting the synergies between these topics and the field of meta-learning, the paper demonstrates how advancements in one area can benefit the field as a whole, while avoiding unnecessary duplication of efforts. Additionally, the paper delves into advanced meta-learning topics such as learning from complex multi-modal task distributions, unsupervised meta-learning, learning to efficiently adapt to data distribution shifts, and continual meta-learning. Lastly, the paper highlights open problems and challenges for future research in the field. By synthesizing the latest research developments, this paper provides a thorough understanding of meta-learning and its potential impact on various machine learning applications. We believe that this technical overview will contribute to the advancement of meta-learning and its practical implications in addressing real-world problems.
    摘要 Meta-学习授予学习系统多个任务知识的能力,以便更快地适应和泛化新任务。本文提供了关于Meta-学习的全面技术综述,强调其在实际应用中的重要性,特别是数据可能罕见或便宜得不到的情况下。文章涵盖了当前Meta-学习领域的state-of-the-art方法,并探讨Meta-学习和多任务学习、传输学习、领域适应和泛化、自动学习、个性化联合学习和持续学习之间的关系。文章指出这些话题之间的相互关系,并表明在一个领域进步可以对另一个领域产生积极影响,而不需要重复努力。此外,文章还探讨了Meta-学习高级主题,如从复杂多Modal任务分布学习、无监督Meta-学习、高效地适应数据分布变化学习和持续Meta-学习。最后,文章揭示了未解决的问题和未来研究的挑战。通过总结最新的研究发展,本文提供了Meta-学习的全面理解,以及其在不同机器学习应用中的实际影响。我们认为这种技术综述将对Meta-学习的进一步发展和实际应用产生贡献。

On the curvature of the loss landscape

  • paper_url: http://arxiv.org/abs/2307.04719
  • repo_url: https://github.com/Enosh-P/Study-on-Loss-Landscape-Geometry-for-Improving-Generalization-in-Adaptive-Optimization-Methods
  • paper_authors: Alison Pouplin, Hrittik Roy, Sidak Pal Singh, Georgios Arvanitidis
  • for: Understanding the generalization abilities of over-parameterized deep learning models.
  • methods: Analyzing the loss landscape as an embedded Riemannian manifold, focusing on the scalar curvature.
  • results: Connections between the scalar curvature and generalization in deep learning models.Here’s the same information in Simplified Chinese:
  • for: 了解深度学习模型的泛化能力。
  • methods: 分析损失ландшаф特为嵌入的里曼尼投影,关注scalar curvature。
  • results: 关于scalar curvature和泛化的连接。
    Abstract One of the main challenges in modern deep learning is to understand why such over-parameterized models perform so well when trained on finite data. A way to analyze this generalization concept is through the properties of the associated loss landscape. In this work, we consider the loss landscape as an embedded Riemannian manifold and show that the differential geometric properties of the manifold can be used when analyzing the generalization abilities of a deep net. In particular, we focus on the scalar curvature, which can be computed analytically for our manifold, and show connections to several settings that potentially imply generalization.
    摘要 (一个主要挑战在现代深度学习中是理解如何over-parameterized模型在finite数据上表现得如此出色。我们可以通过损失景观的属性来分析这种泛化能力。在这个工作中,我们将损失景观视为一个嵌入在Riemannian多重空间中的抽象 manifold,并证明了这种多重空间的几何属性可以用于分析深度网络的泛化能力。特别是,我们将关注scalar curvature,可以在我们的 manifold 上计算analytically,并显示了与several settings的连接,这些设置可能导致泛化。)

Cobalt: Optimizing Mining Rewards in Proof-of-Work Network Games

  • paper_url: http://arxiv.org/abs/2307.04695
  • repo_url: None
  • paper_authors: Arti Vedula, Abhishek Gupta, Shaileshh Bojja Venkatakrishnan
  • for: 提高挖矿 reward 的最佳方式
  • methods: 使用 combinatorial bandit 算法,利用网络坐标来学习网络结构
  • results: 对多种网络设置进行实验,提出的方法能够超过或匹配基线方法的性能
    Abstract Mining in proof-of-work blockchains has become an expensive affair requiring specialized hardware capable of executing several megahashes per second at huge electricity costs. Miners earn a reward each time they mine a block within the longest chain, which helps offset their mining costs. It is therefore of interest to miners to maximize the number of mined blocks in the blockchain and increase revenue. A key factor affecting mining rewards earned is the connectivity between miners in the peer-to-peer network. To maximize rewards a miner must choose its network connections carefully, ensuring existence of paths to other miners that are on average of a lower latency compared to paths between other miners. We formulate the problem of deciding whom to connect to for miners as a combinatorial bandit problem. Each node picks its neighbors strategically to minimize the latency to reach 90\% of the hash power of the network relative to the 90-th percentile latency from other nodes. A key contribution of our work is the use of a network coordinates based model for learning the network structure within the bandit algorithm. Experimentally we show our proposed algorithm outperforming or matching baselines on diverse network settings.
    摘要 钱币证明(proof-of-work)分布式区块链中的矿工活动已成为一项昂贵的 affair,需要特殊的硬件,可以每秒执行几百万次的Hash算法,并且需要巨大的电力成本。矿工每次在最长链中挖矿一个块,就会获得一定的奖励,这有助于 offset 矿工的矿工成本。因此,矿工想 maximize 矿工奖励 earned,需要选择网络连接优化。我们将这个问题定义为一个 combinatorial bandit problem。每个节点选择其邻居策略地,以最小化与90%的哈希能量网络的路径延迟相比,90%的哈希能量网络的路径延迟。我们的工作的一个重要贡献是使用基于网络坐标的模型来学习网络结构 dentro de bandit 算法。我们在多种网络设置下进行实验,并证明我们的提议算法可以超过或与基eline相当。

FreeDrag: Point Tracking is Not What You Need for Interactive Point-based Image Editing

  • paper_url: http://arxiv.org/abs/2307.04684
  • repo_url: https://github.com/lpengyang/freedrag
  • paper_authors: Pengyang Ling, Lin Chen, Pan Zhang, Huaian Chen, Yi Jin
  • for: 提高图像修改精度和灵活性,解决DragGAN在点追踪方面存在缺陷和困难。
  • methods: 提出了一种基于特征的方法,即FreeDrag,通过适应模板特征、直线搜索和杂化本地化技术来解决点追踪困难。
  • results: 对 DragGAN 进行比较,FreeDrag 能够在具有相似结构、细节或多个目标点的场景下实现稳定和高效的点基型图像修改。
    Abstract To serve the intricate and varied demands of image editing, precise and flexible manipulation of image content is indispensable. Recently, DragGAN has achieved impressive editing results through point-based manipulation. However, we have observed that DragGAN struggles with miss tracking, where DragGAN encounters difficulty in effectively tracking the desired handle points, and ambiguous tracking, where the tracked points are situated within other regions that bear resemblance to the handle points. To deal with the above issues, we propose FreeDrag, which adopts a feature-oriented approach to free the burden on point tracking within the point-oriented methodology of DragGAN. The FreeDrag incorporates adaptive template features, line search, and fuzzy localization techniques to perform stable and efficient point-based image editing. Extensive experiments demonstrate that our method is superior to the DragGAN and enables stable point-based editing in challenging scenarios with similar structures, fine details, or under multi-point targets.
    摘要 为了满足图像编辑的复杂和多样化需求,图像内容精确和 flexible 的操作是不可或缺的。近期,DragGAN 已经实现了吸引人的编辑结果通过点基的操作。然而,我们发现 DragGAN 会遇到跟踪困难和模糊跟踪问题,其中 DragGAN 在跟踪感兴趣的执行点时遇到了困难,并且执行点可能会位于其他区域中,这些区域具有执行点的相似特征。为解决以上问题,我们提出了 FreeDrag,它采用了特征对应方法来解除 DragGAN 中点跟踪的压力。FreeDrag integrate了适应模板特征、线搜索和杂化地址技术,以实现稳定和高效的点基图像编辑。广泛的实验表明,我们的方法比 DragGAN 更高效和稳定,能够在复杂的场景下,如同构件、细节等,或者多点目标下进行稳定的点基编辑。

Generalization Error of First-Order Methods for Statistical Learning with Generic Oracles

  • paper_url: http://arxiv.org/abs/2307.04679
  • repo_url: None
  • paper_authors: Kevin Scaman, Mathieu Even, Laurent Massoulié
  • for: 本文提出了一种新的泛化误差分析框架,用于对首选优化算法的统计学学习中的泛化误差进行分析,当Gradient只能通过部分观察到了由oracle提供的时候。我们的分析基于数据样本的Regularity,并可以得到多种学习问题的泛化误差的近似上下限,包括超vised学习、转移学习、Robust学习、分布式学习和通信效率的学习使用Gradient量化。这些结果适用于光滑和强型优化问题,以及非光滑优化问题,只要满足Polyak-Lojasiewicz假设。特别是,我们的上下限和下限取决于一个新的量,它扩展了条件标准差的概念,并是让Optimization of the statistical learning objective几乎等于Estimation of its gradient的准确度。
  • methods: 我们的分析基于数据样本的Regularity,并使用了Conditional standard deviation的概念来扩展Gradient的近似。
  • results: 我们的结果显示,在标准超vised学习问题中,采用增加 batch size的mini-batch gradient descent,并使用了温start的优化策略可以 дости到最佳的泛化误差,即在一定的多项式因子下,与理论最佳值相同。这些结果鼓励使用这种优化策略在实际应用中。
    Abstract In this paper, we provide a novel framework for the analysis of generalization error of first-order optimization algorithms for statistical learning when the gradient can only be accessed through partial observations given by an oracle. Our analysis relies on the regularity of the gradient w.r.t. the data samples, and allows to derive near matching upper and lower bounds for the generalization error of multiple learning problems, including supervised learning, transfer learning, robust learning, distributed learning and communication efficient learning using gradient quantization. These results hold for smooth and strongly-convex optimization problems, as well as smooth non-convex optimization problems verifying a Polyak-Lojasiewicz assumption. In particular, our upper and lower bounds depend on a novel quantity that extends the notion of conditional standard deviation, and is a measure of the extent to which the gradient can be approximated by having access to the oracle. As a consequence, our analysis provides a precise meaning to the intuition that optimization of the statistical learning objective is as hard as the estimation of its gradient. Finally, we show that, in the case of standard supervised learning, mini-batch gradient descent with increasing batch sizes and a warm start can reach a generalization error that is optimal up to a multiplicative factor, thus motivating the use of this optimization scheme in practical applications.
    摘要 在这篇论文中,我们提供了一种新的框架来分析首领优化算法在统计学学习中的泛化误差。我们的分析基于数据样本的规则性,可以 derive near matching 上下 bounds for the generalization error of multiple learning problems, including supervised learning, transfer learning, robust learning, distributed learning and communication efficient learning using gradient quantization.这些结果适用于光滑和强 converges 优化问题,以及非 convex 优化问题, whenever the gradient can be approximated by having access to an oracle. In particular, our upper and lower bounds depend on a novel quantity that extends the notion of conditional standard deviation, and is a measure of the extent to which the gradient can be approximated by having access to the oracle.因此,我们的分析给出了一个准确的含义,即优化统计学学习 objective 是等价于 estimation of its gradient的困难。最后,我们表明,在标准的supervised learning情况下,采用增加 batch size 和 warm start的 mini-batch gradient descent可以达到一个泛化误差,是optimal up to a multiplicative factor,因此推荐使用这种优化策略在实际应用中。

LINFA: a Python library for variational inference with normalizing flow and annealing

  • paper_url: http://arxiv.org/abs/2307.04675
  • repo_url: https://github.com/desreslab/linfa
  • paper_authors: Yu Wang, Emma R. Cobian, Jubilee Lee, Fang Liu, Jonathan D. Hauenstein, Daniele E. Schiavazzi
  • for: 这 paper 是为了提供一种用于变量推断的 Python 库,以便处理复杂的模型和难以采样的分布。
  • methods: 这 paper 使用了变量推断和均化流程来解决计算成本高和难以采样的问题。
  • results: 这 paper 在多个 benchmark 中表现出色,可以快速和高效地处理复杂的模型和分布。
    Abstract Variational inference is an increasingly popular method in statistics and machine learning for approximating probability distributions. We developed LINFA (Library for Inference with Normalizing Flow and Annealing), a Python library for variational inference to accommodate computationally expensive models and difficult-to-sample distributions with dependent parameters. We discuss the theoretical background, capabilities, and performance of LINFA in various benchmarks. LINFA is publicly available on GitHub at https://github.com/desResLab/LINFA.
    摘要 <> translate "Variational inference is an increasingly popular method in statistics and machine learning for approximating probability distributions. We developed LINFA (Library for Inference with Normalizing Flow and Annealing), a Python library for variational inference to accommodate computationally expensive models and difficult-to-sample distributions with dependent parameters. We discuss the theoretical background, capabilities, and performance of LINFA in various benchmarks. LINFA is publicly available on GitHub at https://github.com/desResLab/LINFA." into Simplified Chinese.习惯性推论是现代统计学和机器学习中越来越受欢迎的方法,用于简化机会概率分布。我们发展了LINFA(对应流和气适化库),一个用于统计学和机器学习中的推论方法,以应对 computationally expensive 模型和困难样本分布。我们在不同的审核中讨论了 LINFA 的理论背景、能力和性能。 LINFA 可以在 GitHub 上遍历:https://github.com/desResLab/LINFA。

Quantifying the Echo Chamber Effect: An Embedding Distance-based Approach

  • paper_url: http://arxiv.org/abs/2307.04668
  • repo_url: https://github.com/faalatawi/echo-chamber-score
  • paper_authors: Faisal Alatawi, Paras Sheth, Huan Liu
  • for: 这 paper 的目的是开发一种量化 echo chamber 的方法,以便更好地理解在社交媒体平台上的信息传播和社会架构。
  • methods: 这 paper 使用了一种新的衡量方法,即 Echo Chamber Score (ECS),可以不需要用户政治信仰的标签,同时不需要假设交互图的结构。具体来说,ECS 使用了一种自我超vised graph autoencoder-based 的用户嵌入模型,以便在嵌入空间中度量用户之间的距离。
  • results: 根据 Twitter 数据集的四个话题(两个极化话题和两个非极化话题),我们的结果表明 ECS 是一种有效的量化 echo chamber 的工具,可以帮助我们更好地理解在线讨论的动态。
    Abstract The rise of social media platforms has facilitated the formation of echo chambers, which are online spaces where users predominantly encounter viewpoints that reinforce their existing beliefs while excluding dissenting perspectives. This phenomenon significantly hinders information dissemination across communities and fuels societal polarization. Therefore, it is crucial to develop methods for quantifying echo chambers. In this paper, we present the Echo Chamber Score (ECS), a novel metric that assesses the cohesion and separation of user communities by measuring distances between users in the embedding space. In contrast to existing approaches, ECS is able to function without labels for user ideologies and makes no assumptions about the structure of the interaction graph. To facilitate measuring distances between users, we propose EchoGAE, a self-supervised graph autoencoder-based user embedding model that leverages users' posts and the interaction graph to embed them in a manner that reflects their ideological similarity. To assess the effectiveness of ECS, we use a Twitter dataset consisting of four topics - two polarizing and two non-polarizing. Our results showcase ECS's effectiveness as a tool for quantifying echo chambers and shedding light on the dynamics of online discourse.
    摘要 社交媒体平台的兴起使得听众圈(echo chamber)在线空间中增长,这些圈中听众主要遇到支持他们现有的信仰的观点,同时排除不同意见。这种现象严重阻碍信息在社区之间传播,促进社会分化。因此,我们需要开发方法来评估听众圈。在这篇论文中,我们提出了听众圈分数(ECS),一种新的评估方法,可以评估用户社区的凝聚和分化程度。与现有方法不同的是,ECS不需要用户政治信仰的标签,并且不 assumptions 关于交互图的结构。为了计算用户之间的距离,我们提出了 EchoGAE,一种基于用户帖子和交互图的自适应图自动encoder模型,可以帮助将用户嵌入到一个表示他们意识形态相似性的空间中。为了评估ECS的有效性,我们使用了一个 Twitter 数据集,包括四个话题:两个极化话题和两个非极化话题。我们的结果显示ECS 是一种有效的听众圈评估工具,可以揭示在线讨论的动态。