2023-07-12

cs.AI

cs.AI - 2023-07-12

DSSE: a drone swarm search environment

paper_url: http://arxiv.org/abs/2307.06240
repo_url: https://github.com/pfe-embraer/drone-swarm-search
paper_authors: Manuel Castanares, Luis F. S. Carrete, Enrico F. Damiani, Leonardo D. M. de Abreu, José Fernando B. Brancalion, Fabrício J. Barth
for: 这个项目是用于研究基于可变概率输入的奖励学习算法的环境，用于搜索失事人员。
methods: 该项目使用多代理（或单代理）奖励学习算法，其中代理（飞机）不知道目标（失事人员）的位置，也不会根据自己与目标之间的距离 полу receive 奖励。然而，代理会收到地图中各个单元的目标概率。
results: 该项目的目的是通过使用动态概率作为输入，研究奖励学习算法的性能。

Abstract
The Drone Swarm Search project is an environment, based on PettingZoo, that is to be used in conjunction with multi-agent (or single-agent) reinforcement learning algorithms. It is an environment in which the agents (drones), have to find the targets (shipwrecked people). The agents do not know the position of the target and do not receive rewards related to their own distance to the target(s). However, the agents receive the probabilities of the target(s) being in a certain cell of the map. The aim of this project is to aid in the study of reinforcement learning algorithms that require dynamic probabilities as inputs.

摘要
《飞行群体搜索项目》是基于《喂喂的 zoo》环境，用于与多体（或单体）奖励学习算法结合使用。在这个环境中，代理（飞行器）需要找到目标（船难人），但它们不知道目标的位置，也不会根据自己与目标之间的距离获得奖励。然而，代理会收到目标的可能存在于地图中的维度概率。该项目的目标是为奖励学习算法，以 dynamic probabilities 作为输入进行研究。

Testing different Log Bases For Vector Model Weighting Technique

paper_url: http://arxiv.org/abs/2307.06213
repo_url: None
paper_authors: Kamel Assaf
for: 本研究旨在测试TF-IDF权重技术的效果，并研究不同的底数对Vector模型的表现有优化的影响。
methods: 本研究使用MED、CRAN、NPL、LISA和CISI测试集，这些测试集是由科学家专门为数据信息检索系统实验而整理的。研究人员使用TF-IDF权重技术，并在不同的底数（0.1-100.0）下计算IDF，以测试系统在不同权重值下的表现。
results: 研究人员发现，不同的底数对Vector模型的表现有很大的影响。在0.1-10的底数范围内，系统的准确率逐渐增长，但超过10的底数后，准确率开始下降。此外，在不同的测试集中，系统的准确率也有所差异。

Abstract
Information retrieval systems retrieves relevant documents based on a query submitted by the user. The documents are initially indexed and the words in the documents are assigned weights using a weighting technique called TFIDF which is the product of Term Frequency (TF) and Inverse Document Frequency (IDF). TF represents the number of occurrences of a term in a document. IDF measures whether the term is common or rare across all documents. It is computed by dividing the total number of documents in the system by the number of documents containing the term and then computing the logarithm of the quotient. By default, we use base 10 to calculate the logarithm. In this paper, we are going to test this weighting technique by using a range of log bases from 0.1 to 100.0 to calculate the IDF. Testing different log bases for vector model weighting technique is to highlight the importance of understanding the performance of the system at different weighting values. We use the documents of MED, CRAN, NPL, LISA, and CISI test collections that scientists assembled explicitly for experiments in data information retrieval systems.

摘要
信息检索系统可以根据用户提交的查询来检索相关的文档。文档在系统中首先被索引，并且文档中的词语被赋予权重使用一种权重技术 called TFIDF（Term Frequency-Inverse Document Frequency）。TF表示文档中词语的次数，IDF则计算文档中词语的通用程度，即文档中词语的次数与整个系统中文档的数量之间的比率。我们在这篇论文中将使用一个范围的对数几何来计算IDF，从0.1到100.0。这样可以评估不同的对数几何对Vector模型的Weighting技术的影响。我们使用MED、CRAN、NPL、LISA和CISI测试集，这些测试集由科学家专门为数据信息检索系统实验而搜集的文档。

Self-Adaptive Large Language Model (LLM)-Based Multiagent Systems

paper_url: http://arxiv.org/abs/2307.06187
repo_url: None
paper_authors: Nathalia Nascimento, Paulo Alencar, Donald Cowan
for: 本研究旨在提高多代理系统（MAS）的自适应能力，以应对复杂的环境和需求。
methods: 本研究提出将大型自然语言模型（LLM）如GPT技术 integrate into MAS，以实现更高效的交互通信。方法基于MAP-K模型，实现监控、分析、规划和执行系统自适应。
results: 本研究实现了将LLM技术应用到MAS自适应中，实现了增加自适应系统的能力和效能。

Abstract
In autonomic computing, self-adaptation has been proposed as a fundamental paradigm to manage the complexity of multiagent systems (MASs). This achieved by extending a system with support to monitor and adapt itself to achieve specific concerns of interest. Communication in these systems is key given that in scenarios involving agent interaction, it enhances cooperation and reduces coordination challenges by enabling direct, clear information exchange. However, improving the expressiveness of the interaction communication with MASs is not without challenges. In this sense, the interplay between self-adaptive systems and effective communication is crucial for future MAS advancements. In this paper, we propose the integration of large language models (LLMs) such as GPT-based technologies into multiagent systems. We anchor our methodology on the MAPE-K model, which is renowned for its robust support in monitoring, analyzing, planning, and executing system adaptations in response to dynamic environments. We also present a practical illustration of the proposed approach, in which we implement and assess a basic MAS-based application. The approach significantly advances the state-of-the-art of self-adaptive systems by proposing a new paradigm for MAS self-adaptation of autonomous systems based on LLM capabilities.

摘要
在自主计算中，自适应被提议为多代理系统（MAS）的基本思路，以扩展系统支持监测和适应自己以实现特定关注点。在这些系统中，交流是关键，因为在代理之间交流可以增强合作并减少协调挑战，使信息直接、明确地交换。然而，提高交流表达性的挑战仍然存在。在这种情况下，自适应系统和有效交流之间的互动是未来 MAS 的关键。在本文中，我们提议将大型自然语言模型（LLM），如 GPT 技术，integrated into multiagent systems。我们基于MAP-K模型，这是一种已知的稳定监测、分析、规划和执行系统变化的环境中的系统适应模型。我们还提供了一个实践的示例，在该示例中，我们实现和评估了一个基本的 MAS-based 应用。该方法在自适应系统领域中提供了一个新的思路，即基于 LLM 能力的 MAS 自适应思路。

Can Vision-Language Models be a Good Guesser? Exploring VLMs for Times and Location Reasoning

paper_url: http://arxiv.org/abs/2307.06166
repo_url: None
paper_authors: Gengyuan Zhang, Yurui Zhang, Kerui Zhang, Volker Tresp
for: This paper aims to investigate the ability of Vision-Language Models (VLMs) to reason with commonsense knowledge, specifically the ability to recognize times and locations based on visual cues.
methods: The authors propose a two-stage recognition and reasoning probing task to evaluate the ability of VLMs to recognize times and location-relevant features and reason about them. They use a well-curated image dataset called WikiTiLo, which contains images with rich socio-cultural cues.
results: The authors find that although VLMs can effectively retain relevant features in visual encoders, they still fail to make perfect reasoning. They also release their dataset and codes to facilitate future studies.

Abstract
Vision-Language Models (VLMs) are expected to be capable of reasoning with commonsense knowledge as human beings. One example is that humans can reason where and when an image is taken based on their knowledge. This makes us wonder if, based on visual cues, Vision-Language Models that are pre-trained with large-scale image-text resources can achieve and even outperform human's capability in reasoning times and location. To address this question, we propose a two-stage \recognition\space and \reasoning\space probing task, applied to discriminative and generative VLMs to uncover whether VLMs can recognize times and location-relevant features and further reason about it. To facilitate the investigation, we introduce WikiTiLo, a well-curated image dataset compromising images with rich socio-cultural cues. In the extensive experimental studies, we find that although VLMs can effectively retain relevant features in visual encoders, they still fail to make perfect reasoning. We will release our dataset and codes to facilitate future studies.

摘要
“视觉语言模型（VLM）预期能够与人类一样理解常识知识。一个例子是人们可以根据自己的知识来判断图像的拍摄时间和地点。这让我们感到奇怪，是否可以通过视觉上下文来让VLM在理解时间和地点方面超越人类水平？为了回答这个问题，我们提出了两个阶段的认知和理解探测任务，应用于推理型和生成型VLM，以探索VLM是否可以识别时间和地点相关的特征，并进一步进行这些特征的理解。为了促进这些研究，我们提出了WikiTiLo图像集，这是一个包含了丰富社会文化提示的图像集。在广泛的实验研究中，我们发现，虽然VLM可以很好地保留视觉编码器中的相关特征，但它们仍然无法做出完美的理解。我们将发布我们的数据集和代码，以便未来的研究。”

Deep Generative Models for Physiological Signals: A Systematic Literature Review

paper_url: http://arxiv.org/abs/2307.06162
repo_url: None
paper_authors: Nour Neifar, Afef Mdhaffar, Achraf Ben-Hamadou, Mohamed Jmaiel
for: 本文系统性地审查了深度生成模型在生物学信号处理中的应用，尤其是电生字agram、电脑�agram、光敏脉agram和电动肌agram等生理信号。
methods: 本文分析了深度生成模型的最新状态艺术，以及其主要应用和挑战。
results: 本文对深度生成模型在生理信号处理中的应用进行了系统性的审查，并将主要的评价卷宗和生物学数据库进行了推荐。

Abstract
In this paper, we present a systematic literature review on deep generative models for physiological signals, particularly electrocardiogram, electroencephalogram, photoplethysmogram and electromyogram. Compared to the existing review papers, we present the first review that summarizes the recent state-of-the-art deep generative models. By analysing the state-of-the-art research related to deep generative models along with their main applications and challenges, this review contributes to the overall understanding of these models applied to physiological signals. Additionally, by highlighting the employed evaluation protocol and the most used physiological databases, this review facilitates the assessment and benchmarking of deep generative models.

摘要
在这篇论文中，我们提供了深入的文献评审，涉及到生理信号的深度生成模型，特别是电听脉agram、电энце法agram、光折射 Plethysmogram 和电动肌agram。与现有的评审文献不同，我们的评审是第一个总结最新的深度生成模型的。通过分析最新的深度生成模型相关的研究，以及他们的主要应用和挑战，这篇评审对深度生成模型应用于生理信号的全面理解做出了贡献。此外，通过描述使用的评价协议和最常用的生理数据库，这篇评审为评价和比较深度生成模型提供了便捷的工具。

Reflective Hybrid Intelligence for Meaningful Human Control in Decision-Support Systems

paper_url: http://arxiv.org/abs/2307.06159
repo_url: None
paper_authors: Catholijn M. Jonker, Luciano Cavalcante Siebert, Pradeep K. Murukannaiah
for: This paper is written for the purpose of exploring the idea of self-reflective AI systems and their potential to increase meaningful human control over AI systems.
methods: The paper proposes a framework that integrates knowledge from psychology and philosophy with formal reasoning methods and machine learning approaches to create AI systems responsive to human values and social norms.
results: The paper argues that self-reflective AI systems can lead to self-reflective hybrid systems (human + AI), which can increase meaningful human control and empower human moral reasoning by providing comprehensible information and insights on possible human moral blind spots.

Abstract
With the growing capabilities and pervasiveness of AI systems, societies must collectively choose between reduced human autonomy, endangered democracies and limited human rights, and AI that is aligned to human and social values, nurturing collaboration, resilience, knowledge and ethical behaviour. In this chapter, we introduce the notion of self-reflective AI systems for meaningful human control over AI systems. Focusing on decision support systems, we propose a framework that integrates knowledge from psychology and philosophy with formal reasoning methods and machine learning approaches to create AI systems responsive to human values and social norms. We also propose a possible research approach to design and develop self-reflective capability in AI systems. Finally, we argue that self-reflective AI systems can lead to self-reflective hybrid systems (human + AI), thus increasing meaningful human control and empowering human moral reasoning by providing comprehensible information and insights on possible human moral blind spots.

摘要
随着人工智能系统的能力和普遍性的增长，社会必须共同选择：削减人类自主权、损害民主制度和限制人权，或者AI系统与人类和社会价值相吻合，促进合作、韧性、知识和伦理行为。在这一章中，我们介绍了自适应AI系统的概念，以实现人类有意义的控制 над AI系统。我们专注于决策支持系统，并提出了一种整合心理学和哲学知识、正式逻辑方法和机器学习方法的框架，以创造响应人类价值和社会规范的AI系统。我们还提出了设计和开发自适应能力的可能的研究方法。最后，我们 argue that自适应AI系统可以导致人类+AI的自适应系统，从而增强人类有意义的控制和使人类伦理思维更加明了自己的人类道德盲点。

Maneuver Decision-Making Through Automatic Curriculum Reinforcement Learning Without Handcrafted Reward functions

paper_url: http://arxiv.org/abs/2307.06152
repo_url: None
paper_authors: Zhang Hong-Peng
for: 本研究旨在提出一种自动课程学习方法，以帮助无人战斗飞机在自主空中作战中做出有效的决策。
methods: 本文使用自动课程学习方法，将战斗机动决策分解为一系列逐渐增加的低难度课程，然后通过测试结果来调整课程。这种方法使得代理人逐渐从易度增加的低难度课程学习到更高难度的课程，从而学习完成有效的决策。
results: 实验结果显示，使用自动课程学习方法可以让代理人在不同的状态下做出有效的决策，包括跟踪、攻击和逃脱等。这些决策都是理解和可行的。

Abstract
Maneuver decision-making is the core of unmanned combat aerial vehicle for autonomous air combat. To solve this problem, we propose an automatic curriculum reinforcement learning method, which enables agents to learn effective decisions in air combat from scratch. The range of initial states are used for distinguishing curricula of different difficulty levels, thereby maneuver decision is divided into a series of sub-tasks from easy to difficult, and test results are used to change sub-tasks. As sub-tasks change, agents gradually learn to complete a series of sub-tasks from easy to difficult, enabling them to make effective maneuvering decisions to cope with various states without the need to spend effort designing reward functions. The ablation studied show that the automatic curriculum learning proposed in this article is an essential component for training through reinforcement learning, namely, agents cannot complete effective decisions without curriculum learning. Simulation experiments show that, after training, agents are able to make effective decisions given different states, including tracking, attacking and escaping, which are both rational and interpretable.

摘要
<>输入文本翻译成简化中文。<>无人战斗飞机的战斗决策是核心问题。为解决这个问题，我们提出了自动课程强化学习方法，允许代理人从零开始学习有效的决策。不同Difficulty Level的初始状态范围用于分别定义不同的课程，以便将战斗决策分解成一系列从易到Difficult的子任务，并使用测试结果来改变子任务。随着子任务的变化，代理人逐渐学习完成从易到Difficult的一系列子任务，从而允许它们在不同的状态下完成有效的战斗决策，而无需额外设计奖励函数。试验表明，本文中提出的自动课程学习是训练通过强化学习的必要组成部分，即代理人无法完成有效的决策 без curriculum learning。在实验中，经过训练后，代理人能够在不同的状态下做出有效的决策，包括追踪、攻击和逃脱，这些决策都是合理和可解释的。

SayPlan: Grounding Large Language Models using 3D Scene Graphs for Scalable Task Planning

paper_url: http://arxiv.org/abs/2307.06135
repo_url: None
paper_authors: Krishan Rana, Jesse Haviland, Sourav Garg, Jad Abou-Chakra, Ian Reid, Niko Suenderhauf
for: 这篇论文的目的是发展一个可扩展的大型语言模型（LLM）基于的大型任务观念构成器，以便在多层、多房、多室环境中执行任务。
methods: 我们提出了一个可扩展的方法，使用3D scene graph（3DSG）表示，并将LLM应用在这些表示上进行搜寻和规划。我们还使用了一个传统的路径规划器来缩短LLM的规划时间，并导入了一个迭代的重规划管道，以更新初始的规划，并对于无法执行的动作和规划失败进行调整。
results: 我们在两个大规模环境中进行了评估，这些环境包括3层、36个房间、140个物品，并示出我们的方法可以将大规模、长期任务观念构成器转换为mobile manipulator robot可以执行的实际动作。

Abstract
Large language models (LLMs) have demonstrated impressive results in developing generalist planning agents for diverse tasks. However, grounding these plans in expansive, multi-floor, and multi-room environments presents a significant challenge for robotics. We introduce SayPlan, a scalable approach to LLM-based, large-scale task planning for robotics using 3D scene graph (3DSG) representations. To ensure the scalability of our approach, we: (1) exploit the hierarchical nature of 3DSGs to allow LLMs to conduct a semantic search for task-relevant subgraphs from a smaller, collapsed representation of the full graph; (2) reduce the planning horizon for the LLM by integrating a classical path planner and (3) introduce an iterative replanning pipeline that refines the initial plan using feedback from a scene graph simulator, correcting infeasible actions and avoiding planning failures. We evaluate our approach on two large-scale environments spanning up to 3 floors, 36 rooms and 140 objects, and show that our approach is capable of grounding large-scale, long-horizon task plans from abstract, and natural language instruction for a mobile manipulator robot to execute.

摘要
To ensure the scalability of our approach, we leverage the hierarchical nature of 3DSGs to allow LLMs to conduct a semantic search for task-relevant subgraphs from a smaller, collapsed representation of the full graph. We also reduce the planning horizon for the LLM by integrating a classical path planner and introduce an iterative replanning pipeline that refines the initial plan using feedback from a scene graph simulator, correcting infeasible actions and avoiding planning failures.We evaluate our approach on two large-scale environments spanning up to 3 floors, 36 rooms, and 140 objects, and show that our approach is capable of grounding large-scale, long-horizon task plans from abstract, natural language instruction for a mobile manipulator robot to execute.

Guided Bottom-Up Interactive Constraint Acquisition

paper_url: http://arxiv.org/abs/2307.06126
repo_url: https://github.com/dimosts/activeconlearn
paper_authors: Dimos Tsouros, Senne Berden, Tias Guns
for: 提高 constraint acquisition (CA) 系统的效率，使其更能够快速地模型约束满足问题。
methods: 我们提出了两种新方法来改进 CA 系统的效率，包括底层方法名为 GrowAcq，可以减少用户等待时间，并处理更大的候选约束集。第二种方法是根据概率来引导查询生成，可以减少需要的查询数量，并且可以使用公开 accessible CP 解决方案来生成查询。
results: 我们的提议方法比状态艺术 CA 方法更高效，可以减少查询数量达到 60%。我们的方法在候选约束集是 50 倍于常见 литературе的情况下也表现良好。

Abstract
Constraint Acquisition (CA) systems can be used to assist in the modeling of constraint satisfaction problems. In (inter)active CA, the system is given a set of candidate constraints and posts queries to the user with the goal of finding the right constraints among the candidates. Current interactive CA algorithms suffer from at least two major bottlenecks. First, in order to converge, they require a large number of queries to be asked to the user. Second, they cannot handle large sets of candidate constraints, since these lead to large waiting times for the user. For this reason, the user must have fairly precise knowledge about what constraints the system should consider. In this paper, we alleviate these bottlenecks by presenting two novel methods that improve the efficiency of CA. First, we introduce a bottom-up approach named GrowAcq that reduces the maximum waiting time for the user and allows the system to handle much larger sets of candidate constraints. It also reduces the total number of queries for problems in which the target constraint network is not sparse. Second, we propose a probability-based method to guide query generation and show that it can significantly reduce the number of queries required to converge. We also propose a new technique that allows the use of openly accessible CP solvers in query generation, removing the dependency of existing methods on less well-maintained custom solvers that are not publicly available. Experimental results show that our proposed methods outperform state-of-the-art CA methods, reducing the number of queries by up to 60%. Our methods work well even in cases where the set of candidate constraints is 50 times larger than the ones commonly used in the literature.

摘要
优化约束检索（CA）系统可以帮助模型约束满足问题。在互动式CA中，系统会给用户提供一组候选约束，然后向用户提出问题，以找到正确的约束之一。现有的互动式CA算法受到至少两大瓶颈。首先，以让系统 converge 需要让用户提供大量的问题。其次，它们无法处理大量的候选约束，因为这会导致用户等待时间过长。因此，用户需要对系统中的约束有很好的知识。在这篇论文中，我们解决这些瓶颈，并提出了两种新的方法来提高CA的效率。首先，我们引入了底层方法 named GrowAcq，它可以降低用户等待时间的最大值，并让系统处理大量的候选约束。它还可以降低对于目标约束网络不稀有的问题中的总数量。其次，我们提出了基于概率的方法来引导问题生成，并证明它可以减少需要 converge 的问题数量。最后，我们提出了一种新的技术，使得可以使用公开 accessible CP 解决方案来生成问题，从而消除了现有方法对于不太养的自定义解决方案的依赖。我们的实验结果表明，我们的提议方法可以与当前状态的CA方法进行比较，减少问题数量达到 60%。我们的方法在候选约束集是 50 倍于常见文献中的情况下也能够良好工作。

Learning Hierarchical Interactive Multi-Object Search for Mobile Manipulation

paper_url: http://arxiv.org/abs/2307.06125
repo_url: https://github.com/robot-learning-freiburg/HIMOS
paper_authors: Fabian Schmalstieg, Daniel Honerkamp, Tim Welschehold, Abhinav Valada
for: 本研究旨在开发一种能够在人类中心环境中搜寻多个物品的机器人系统，该系统需要同时掌握搜寻、导航和机械操作技能。
methods: 本研究使用了层次强化学习方法（HIMOS），该方法通过将搜寻、导航和机械操作技能组合在一起，以便在未经探索的环境中实现多个物品搜寻任务。
results: 实验和实际应用中的result表明，HIMOS可以在新环境中 zeroshot 协调机器人系统，并且具有较好的可靠性和灵活性。

Abstract
Existing object-search approaches enable robots to search through free pathways, however, robots operating in unstructured human-centered environments frequently also have to manipulate the environment to their needs. In this work, we introduce a novel interactive multi-object search task in which a robot has to open doors to navigate rooms and search inside cabinets and drawers to find target objects. These new challenges require combining manipulation and navigation skills in unexplored environments. We present HIMOS, a hierarchical reinforcement learning approach that learns to compose exploration, navigation, and manipulation skills. To achieve this, we design an abstract high-level action space around a semantic map memory and leverage the explored environment as instance navigation points. We perform extensive experiments in simulation and the real-world that demonstrate that HIMOS effectively transfers to new environments in a zero-shot manner. It shows robustness to unseen subpolicies, failures in their execution, and different robot kinematics. These capabilities open the door to a wide range of downstream tasks across embodied AI and real-world use cases.

摘要
现有的对象搜索方法可以让机器人在自由的通道上进行搜索，但是在人类中心的不结构环境中，机器人经常需要 manipulate 环境以满足自己的需求。在这项工作中，我们介绍了一种新的互动式多对象搜索任务，在这个任务中，机器人需要打开门来导航房间，并在抽屉和柜中搜索目标对象。这些新的挑战需要机器人结合探索、导航和 manipulate 技能。我们提出了一种层次学习扩进方法，即 HIMOS，它可以学习拼接探索、导航和 manipulate 技能。为了实现这一点，我们设计了一个抽象的高级动作空间，基于 semantic map 的快照，并利用已经探索的环境作为实例导航点。我们在 simulate 和实际场景中进行了广泛的实验，并证明了 HIMOS 可以适应新环境，并且在零基础情况下转移。它具有对不见的补做、执行失败和不同机器人骨干的 Robustness。这些能力开启了许多下游任务和实际应用场景。

TreeFormer: a Semi-Supervised Transformer-based Framework for Tree Counting from a Single High Resolution Image

paper_url: http://arxiv.org/abs/2307.06118
repo_url: https://github.com/haaclassic/treeformer
paper_authors: Hamed Amini Amirkolaee, Miaojing Shi, Mark Mulligan
for: 本文提出了一个semi-supervised transformer-based树数计算框架，用于自动树密度估计和计数，从而减少了远程感知图像中树注释的成本。
methods: 该方法首先开发了一个基于transformer块的pyramid树表示模块，用于提取多尺度特征。接着， Contextual attention-based feature fusion和树密度预测模块被设计来利用编码阶段中提取的强大特征来估计树密度图。此外，我们还提出了一种pyramid学习策略，包括本地树密度一致性和本地树数排名损失，以使用无标注图像进行训练。
results: 我们的TreeFormer模型在两个 benchmark树数计算数据集（Jiangsu和Yosemite）以及我们自己创建的新数据集（KCL-London）上进行了评估，并与现有的semi-supervised方法进行了比较。结果显示，我们的TreeFormer模型在同一个设定下出于state of the art semi-supervised方法，并在同样使用同样多的标注图像下超越了fully-supervised方法。代码和数据集可以在https://github.com/HAAClassic/TreeFormer上下载。

Abstract
Automatic tree density estimation and counting using single aerial and satellite images is a challenging task in photogrammetry and remote sensing, yet has an important role in forest management. In this paper, we propose the first semisupervised transformer-based framework for tree counting which reduces the expensive tree annotations for remote sensing images. Our method, termed as TreeFormer, first develops a pyramid tree representation module based on transformer blocks to extract multi-scale features during the encoding stage. Contextual attention-based feature fusion and tree density regressor modules are further designed to utilize the robust features from the encoder to estimate tree density maps in the decoder. Moreover, we propose a pyramid learning strategy that includes local tree density consistency and local tree count ranking losses to utilize unlabeled images into the training process. Finally, the tree counter token is introduced to regulate the network by computing the global tree counts for both labeled and unlabeled images. Our model was evaluated on two benchmark tree counting datasets, Jiangsu, and Yosemite, as well as a new dataset, KCL-London, created by ourselves. Our TreeFormer outperforms the state of the art semi-supervised methods under the same setting and exceeds the fully-supervised methods using the same number of labeled images. The codes and datasets are available at https://github.com/HAAClassic/TreeFormer.

摘要
自动计算树密度和数量使用单一飞行和卫星图像是摄ogrammetry和远程感知领域中的挑战，但对森林管理具有重要的作用。在这篇论文中，我们提出了首个半监督式 transformer 框架 для树数计算，可以减少 Remote sensing 图像上的昂贵树注释。我们的方法，称为 TreeFormer，首先开发了基于 transformer 块的 pyramid 树表示模块，以EXTRACT 多尺度特征 во время编码阶段。接着，我们设计了基于上下文注意力的特征融合和树密度预测模块，以利用编码器中的强特征来估算树密度地图。此外，我们还提出了一种 pyramid 学习策略，包括地方树密度一致性和地方树数排名损失函数，以利用无标注图像进行训练过程。最后，我们引入了树计数卡，以计算树 counts 的全局值，并且用此值来调控网络。我们的模型在 Jiangsu 和 Yosemite 两个标准树数据集上进行了评估，以及我们自己创建的 KCL-London 数据集。结果显示，我们的 TreeFormer 在同一个设定下超过了现有的半监督式方法，并且与完全监督方法使用同样多的标注图像进行训练时达到了更高的性能。代码和数据集可以在上获取。

CLAIMED – the open source framework for building coarse-grained operators for accelerated discovery in science

paper_url: http://arxiv.org/abs/2307.06824
repo_url: https://github.com/claimed-framework/component-library
paper_authors: Romeo Kienzler, Rafflesia Khan, Jerome Nilmeier, Ivan Nesic, Ibrahim Haddad
for: 这篇论文的目的是解决现代数据驱动科学中重复性和可重用性的挑战，帮助科学家更好地重新运行和验证实验。
methods: 这篇论文使用了CLAIMED框架，该框架可以帮助科学家从existingscientific operators库中重新组合 workflows，以实现可重用的科学数据处理代码。CLAIMED支持多种编程语言、科学库和执行环境，因此可以在不同的平台上使用。
results: 这篇论文通过使用CLAIMED框架，实现了可重用的科学数据处理代码的重新组合和执行，提高了现代数据驱动科学中重复性和可重用性的能力。

Abstract
In modern data-driven science, reproducibility and reusability are key challenges. Scientists are well skilled in the process from data to publication. Although some publication channels require source code and data to be made accessible, rerunning and verifying experiments is usually hard due to a lack of standards. Therefore, reusing existing scientific data processing code from state-of-the-art research is hard as well. This is why we introduce CLAIMED, which has a proven track record in scientific research for addressing the repeatability and reusability issues in modern data-driven science. CLAIMED is a framework to build reusable operators and scalable scientific workflows by supporting the scientist to draw from previous work by re-composing workflows from existing libraries of coarse-grained scientific operators. Although various implementations exist, CLAIMED is programming language, scientific library, and execution environment agnostic.

摘要
现代数据驱动科学中，重复性和可重用性是关键问题。科学家们具备了从数据到出版的过程的技能，但是一些出版渠道要求提供源代码和数据以便重新运行和验证实验，但实际上这是很困难的，因为没有标准。因此，我们介绍了 CLAIMED，它在科学研究中成功地解决了现代数据驱动科学中重复性和可重用性问题。CLAIMED 是一个框架，用于建立可重用的运算员和扩展科学工作流程，通过支持科学家们将之前的工作重新组合成现有的库中的粗粒度科学运算员。尽管有多种实现，CLAIMED 是编程语言、科学库和执行环境无关的。

Quantitative CLTs in Deep Neural Networks

paper_url: http://arxiv.org/abs/2307.06092
repo_url: None
paper_authors: Stefano Favaro, Boris Hanin, Domenico Marinucci, Ivan Nourdin, Giovanni Peccati
for: 这个论文主要研究了一种具有随机 Gaussian 权重和偏置的完全连接神经网络的分布。
methods: 作者使用了随机 Gaussian 权重和偏置来研究这种神经网络的分布，并在不太强的假设下获得了量化的Bounds。
results: 研究结果表明，当网络宽度趋于无穷大时，这种神经网络的分布与相应的无穷宽 Gaussian 过程之间的距离将随着网络宽度的减少而减少，并且这个距离的减少速率取决于使用的度量。

Abstract
We study the distribution of a fully connected neural network with random Gaussian weights and biases in which the hidden layer widths are proportional to a large constant $n$. Under mild assumptions on the non-linearity, we obtain quantitative bounds on normal approximations valid at large but finite $n$ and any fixed network depth. Our theorems show both for the finite-dimensional distributions and the entire process, that the distance between a random fully connected network (and its derivatives) to the corresponding infinite width Gaussian process scales like $n^{-\gamma}$ for $\gamma>0$, with the exponent depending on the metric used to measure discrepancy. Our bounds are strictly stronger in terms of their dependence on network width than any previously available in the literature; in the one-dimensional case, we also prove that they are optimal, i.e., we establish matching lower bounds.

摘要
我们研究一个全连接神经网络，其隐藏层宽度是一个大常数$n$。我们假设非线性函数的严格性，并取得了在大数字$n$和任意神经网络层数之间的量化上界。我们的定理表明，隐藏层宽度与神经网络宽度成正比，并且距离随着神经网络宽度的增加而增加，具体取值为$n^{-\gamma}$，其中$\gamma>0$。我们的上界是文献中已知的对宽度的依赖性较强，在一维情况下，我们还证明了它们是优化的，即我们建立了对应的下界。

paper_url: http://arxiv.org/abs/2307.06082
repo_url: https://github.com/raphael-sch/velma
paper_authors: Raphael Schumann, Wanrong Zhu, Weixi Feng, Tsu-Jui Fu, Stefan Riezler, William Yang Wang
for: 这个论文的目的是解决在实际环境中进行步骤决策是人工智能embodied最大的挑战之一，具体来说是视觉语言导航（VLN）任务。
methods: 这个论文使用了一种名为VELMA的机器人，它使用了语音描述的方式来提供下一步行动的准备。视觉信息被一个抽象管道处理，从人类写的导航指南中提取了标志性的地标，并使用CLIP来确定它们在当前的拓扑视图中的可见性。
results: 这个论文表明VELMA可以在Street View中成功遵循导航指南，只需要两个在线示例。进一步的 fine-tune 这个LLM Agent在一些千个示例后，可以达到25%-30%的任务完成度提升，比前一个状态的占据领先地位。

Abstract
Incremental decision making in real-world environments is one of the most challenging tasks in embodied artificial intelligence. One particularly demanding scenario is Vision and Language Navigation~(VLN) which requires visual and natural language understanding as well as spatial and temporal reasoning capabilities. The embodied agent needs to ground its understanding of navigation instructions in observations of a real-world environment like Street View. Despite the impressive results of LLMs in other research areas, it is an ongoing problem of how to best connect them with an interactive visual environment. In this work, we propose VELMA, an embodied LLM agent that uses a verbalization of the trajectory and of visual environment observations as contextual prompt for the next action. Visual information is verbalized by a pipeline that extracts landmarks from the human written navigation instructions and uses CLIP to determine their visibility in the current panorama view. We show that VELMA is able to successfully follow navigation instructions in Street View with only two in-context examples. We further finetune the LLM agent on a few thousand examples and achieve 25%-30% relative improvement in task completion over the previous state-of-the-art for two datasets.

摘要
实际环境中的步骤决策是人工智能体现化的挑战之一，特别是视觉语言导航（VLN）。这需要视觉和自然语言理解能力，以及空间和时间逻辑能力。实体代理需要将导航指令的理解与现场环境观察相连接。虽然其他研究领域中LLMs的成果很印象，但是这是一个持续的问题。在这个工作中，我们提出了VELMA，一个具有视觉和自然语言理解能力的实体LLM代理。VELMA使用航向和视觉环境观察的描述作为下一步行动的Contextual prompt。视觉信息是通过将人写的导航指令中的地标抽出，使用CLIP来决定它们在当前的潘视图中的可见性。我们显示了VELMA可以在Street View中成功遵循导航指令，只需要两个内容示例。我们进一步精练了LLM代理，使其在几千个示例后 achieve 25%-30%的相对改善。

Assessment of the suitability of degradation models for the planning of CCTV inspections of sewer pipes

paper_url: http://arxiv.org/abs/2307.06341
repo_url: https://github.com/fidaeic/sewer-pred
paper_authors: Fidae El Morer, Stefan Wittek, Andreas Rausch
for: This paper aims to develop a methodology for planning inspections of sewer pipes based on degradation models that consider statistical and machine learning methods.
methods: The paper proposes using accuracy metrics, long-term degradation curves, and explainability to evaluate the suitability of different degradation models for inspection planning. The authors use ensemble models, Logistic Regression, and other methods to assess the pipes’ degradation.
results: The results show that while ensemble models have high accuracy, they are unable to infer long-term degradation curves. In contrast, the Logistic Regression model provides slightly less accurate results but can produce consistent degradation curves with high explainability. The authors demonstrate the efficiency of their methodology using a real-world use case.

Abstract
The degradation of sewer pipes poses significant economical, environmental and health concerns. The maintenance of such assets requires structured plans to perform inspections, which are more efficient when structural and environmental features are considered along with the results of previous inspection reports. The development of such plans requires degradation models that can be based on statistical and machine learning methods. This work proposes a methodology to assess their suitability to plan inspections considering three dimensions: accuracy metrics, ability to produce long-term degradation curves and explainability. Results suggest that although ensemble models yield the highest accuracy, they are unable to infer the long-term degradation of the pipes, whereas the Logistic Regression offers a slightly less accurate model that is able to produce consistent degradation curves with a high explainability. A use case is presented to demonstrate this methodology and the efficiency of model-based planning compared to the current inspection plan.

摘要
水管系统的衰退对经济、环境和健康造成了重要的潜在影响。维护这些资产需要结构化的计划，以便进行定期检查，并且在结构和环境特征以及前一次检查报告的结果相互考虑。开发这些计划需要衰退模型，这些模型可以基于统计学和机器学习方法。本工作提出了一种方法来评估这些模型的适用性，包括三个维度：准确指标、长期衰退曲线的生成能力和可解释性。结果表明， ensemble 模型具有最高准确性，但它们无法推断管道的长期衰退趋势，而逻辑回归模型则具有较低准确性，但能够生成高可解释性的衰退曲线。一个使用情况被提出，以示这种方法的效果和模型基本规划的效率比现有检查计划。

Machine Learning for Autonomous Vehicle’s Trajectory Prediction: A comprehensive survey, Challenges, and Future Research Directions

paper_url: http://arxiv.org/abs/2307.07527
repo_url: None
paper_authors: Vibha Bharilya, Neetesh Kumar
For: The paper is written to provide a comprehensive review of trajectory prediction methods for autonomous vehicles (AVs), with a focus on machine learning techniques such as deep learning and reinforcement learning.* Methods: The paper evaluates several deep learning-based techniques and reinforcement learning-based methods for trajectory prediction in the context of AVs. It also discusses the various datasets and evaluation metrics commonly used in these tasks.* Results: The paper provides a detailed analysis of the strengths and weaknesses of each method, and identifies challenges in the existing literature and potential research directions for future study.

Abstract
Autonomous Vehicles (AVs) have emerged as a promising solution by replacing human drivers with advanced computer-aided decision-making systems. However, for AVs to effectively navigate the road, they must possess the capability to predict the future behavior of nearby traffic participants, similar to the predictive driving abilities of human drivers. Building upon existing literature is crucial to advance the field and develop a comprehensive understanding of trajectory prediction methods in the context of automated driving. To address this need, we have undertaken a comprehensive review that focuses on trajectory prediction methods for AVs, with a particular emphasis on machine learning techniques including deep learning and reinforcement learning-based approaches. We have extensively examined over two hundred studies related to trajectory prediction in the context of AVs. The paper begins with an introduction to the general problem of predicting vehicle trajectories and provides an overview of the key concepts and terminology used throughout. After providing a brief overview of conventional methods, this review conducts a comprehensive evaluation of several deep learning-based techniques. Each method is summarized briefly, accompanied by a detailed analysis of its strengths and weaknesses. The discussion further extends to reinforcement learning-based methods. This article also examines the various datasets and evaluation metrics that are commonly used in trajectory prediction tasks. Encouraging an unbiased and objective discussion, we compare two major learning processes, considering specific functional features. By identifying challenges in the existing literature and outlining potential research directions, this review significantly contributes to the advancement of knowledge in the domain of AV trajectory prediction.

摘要
自动驾驶车（AV）已经出现为解决问题，替代人类 drivers 使用高级计算机助成决策系统。然而，为了让 AV 能够有效地导航道路，它们必须具备预测周围交通参与者的未来行为的能力，类似于人类 drivers 的预测驾驶能力。在现有文献的基础之上，我们需要进一步发展预测方法，以便更好地理解自动驾驶预测方法的整体情况。为此，我们已经进行了全面的审查，重点是机器学习技术，包括深度学习和奖励学习等方法。我们对二百余篇与自动驾驶预测相关的研究进行了广泛的审查。本文从预测车辆轨迹的总问题开始，并提供了关键概念和术语的概述。然后，我们对传统方法进行了简要的概述，并对深度学习和奖励学习等方法进行了详细的分析。每种方法都被简要概述，并且附加了其优点和缺点的分析。讨论还扩展到奖励学习基于方法。本文还考虑了在预测任务中通常使用的数据集和评价指标。为促进不偏不倚的讨论，我们对两种主要的学习过程进行了比较，考虑了特定的功能特征。本文通过发现现有文献中的挑战和提出可能的研究方向，对自动驾驶预测领域的知识进行了重要贡献。

Visualization for Multivariate Gaussian Anomaly Detection in Images

paper_url: http://arxiv.org/abs/2307.06052
repo_url: None
paper_authors: Joao P C Bertoldo, David Arrustico
for: 这个论文是为了提出一种简化版的PaDiM方法，用于图像异常检测。
methods: 这个方法使用了一个多变量 Gaussian（MVG）分布来适应特征向量，并使用这些 Mahalanobis 距离作为异常分数。在这个框架中，我们还提出了一个中间步骤，即通过归一化变换来生成可视化的热图，以解释由 MVG 学习的特征。
results: 该方法在 MVTec-AD 数据集上进行了评估，结果表明了视觉模型验证的重要性，并提供了一些不可见的问题的视觉化解释。

Abstract
This paper introduces a simplified variation of the PaDiM (Pixel-Wise Anomaly Detection through Instance Modeling) method for anomaly detection in images, fitting a single multivariate Gaussian (MVG) distribution to the feature vectors extracted from a backbone convolutional neural network (CNN) and using their Mahalanobis distance as the anomaly score. We introduce an intermediate step in this framework by applying a whitening transformation to the feature vectors, which enables the generation of heatmaps capable of visually explaining the features learned by the MVG. The proposed technique is evaluated on the MVTec-AD dataset, and the results show the importance of visual model validation, providing insights into issues in this framework that were otherwise invisible. The visualizations generated for this paper are publicly available at https://doi.org/10.5281/zenodo.7937978.

摘要
Simplified Chinese:这篇论文介绍了一种简化版的PaDiM方法，用于图像异常检测。该方法是使用一个单个多变量 Gaussian (MVG) 分布来适应 CNN 后处理的特征向量，并使用其 Mahalanobis 距离作为异常分数。在这个框架中，我们还提出了一个中间步骤，即对特征向量应用白化变换，以生成可以可见地解释 MVG 学习的热图。我们对 MVTec-AD 数据集进行评估，结果显示了视觉模型验证的重要性，并提供了不可见的问题的视觉化。论文中生成的视觉化可以在https://doi.org/10.5281/zenodo.7937978中获取。

An OOD Multi-Task Perspective for Link Prediction with New Relation Types and Nodes

paper_url: http://arxiv.org/abs/2307.06046
repo_url: None
paper_authors: Jincheng Zhou, Beatrice Bevilacqua, Bruno Ribeiro
for: 这篇论文旨在提供一种可以在新的测试多边Graph中预测缺失的关联（关系）的方法，并且能够在没有额外资讯的情况下进行预测。
methods: 本研究使用了一种基于双交换性（for nodes & relation types）的方法，即使用了双交换性来设计Graph Neural Networks（GNNs），并且进一步将双交换性扩展到多任务双交换性，以便在多个任务下进行预测。
results: 实验结果显示，本方法可以对于没有额外资讯的测试多边Graph进行有效的预测，并且能够对于整个新的关联类型进行预测，不需要更多的训练数据或额外资讯。

Abstract
The task of inductive link prediction in (discrete) attributed multigraphs infers missing attributed links (relations) between nodes in new test multigraphs. Traditional relational learning methods face the challenge of limited generalization to OOD test multigraphs containing both novel nodes and novel relation types not seen in training. Recently, under the only assumption that all relation types share the same structural predictive patterns (single task), Gao et al. (2023) proposed an OOD link prediction method using the theoretical concept of double exchangeability (for nodes & relation types), in contrast to the (single) exchangeability (only for nodes) used to design Graph Neural Networks (GNNs). In this work we further extend the double exchangeability concept to multi-task double exchangeability, where we define link prediction in attributed multigraphs that can have distinct and potentially conflicting predictive patterns for different sets of relation types (multiple tasks). Our empirical results on real-world datasets demonstrate that our approach can effectively generalize to entirely new relation types in test, without access to additional information, yielding significant performance improvements over existing methods.

摘要
假设我们有一个批处理图（discrete attributed multigraph），其中每个节点都有一些特征（attributes）。Link prediction task在新的测试图中推断缺失的关系（relations）。传统的关系学习方法面临的挑战是对外部数据（out-of-distribution，OOD）测试图中的新节点和关系类型进行泛化。近年来，GAO等人（2023）提出了一种OOD链接预测方法，基于理论概念“双交换性”（double exchangeability），而不是传统的单一交换性（single exchangeability），用于设计图 neural networks（GNNs）。在这个工作中，我们进一步扩展了双交换性概念到多任务双交换性，其中我们定义在彩色图中预测缺失关系的任务，可能有多个任务，每个任务可能有不同的预测模式。我们的实验结果表明，我们的方法可以有效地泛化到整个新关系类型，无需访问额外信息，从而实现显著性能提升。

AI-Generated Imagery: A New Era for the `Readymade’

paper_url: http://arxiv.org/abs/2307.06033
repo_url: None
paper_authors: Amy Smith, Michael Cook
for: 本研究旨在探讨 digitization 技术生成的图像是如何被称为艺术作品。
methods: 本文使用现有的哲学框架和语言理论来建议一些 AI 生成的图像可以被视为“准备好的”艺术作品。
results: 研究表明，一些 AI 生成的图像具有一定的艺术性，并且可以被视为艺术作品。

Abstract
While the term `art' defies any concrete definition, this paper aims to examine how digital images produced by generative AI systems, such as Midjourney, have come to be so regularly referred to as such. The discourse around the classification of AI-generated imagery as art is currently somewhat homogeneous, lacking the more nuanced aspects that would apply to more traditional modes of artistic media production. This paper aims to bring important philosophical considerations to the surface of the discussion around AI-generated imagery in the context of art. We employ existing philosophical frameworks and theories of language to suggest that some AI-generated imagery, by virtue of its visual properties within these frameworks, can be presented as `readymades' for consideration as art.

摘要
“艺术”（art）这个术语无法准确定义，这篇论文旨在研究如何使用生成AI系统，如Midjourney生成的数字图像被称为“艺术”的现象。现在关于AI生成图像是否为艺术的讨论，存在一定的同化性，缺乏传统艺术媒体生产中更加细腻的方面。这篇论文想要把关于AI生成图像在艺术领域的哲学考虑问题推到前台。我们利用现有的哲学框架和语言理论，提出一些AI生成图像，因其视觉特点，可以被视为“准备好的”艺术作品。

An Effective and Efficient Time-aware Entity Alignment Framework via Two-aspect Three-view Label Propagation

paper_url: http://arxiv.org/abs/2307.06013
repo_url: None
paper_authors: Li Cai, Xin Mao, Youshao Xiao, Changxu Wu, Man Lan
for: 提高知识融合的适用性和精度，即实现高质量的实体对应关系检索 between 不同的知识图(KG)。
methods: 提出了一种非神经网络基于的高效精度的实体对应关系检索方法，包括两个关键组成部分：一是两个视角三个观察符的标签卷积，二是稀疏相似度与时间约束的权重学习。
results: 经过广泛的实验表明，提出的方法在公共数据集上显著超越了现有的最佳方法，并且时间占用在最多只有毫秒级，不超过10%的TEA方法时间占用。

Abstract
Entity alignment (EA) aims to find the equivalent entity pairs between different knowledge graphs (KGs), which is crucial to promote knowledge fusion. With the wide use of temporal knowledge graphs (TKGs), time-aware EA (TEA) methods appear to enhance EA. Existing TEA models are based on Graph Neural Networks (GNN) and achieve state-of-the-art (SOTA) performance, but it is difficult to transfer them to large-scale TKGs due to the scalability issue of GNN. In this paper, we propose an effective and efficient non-neural EA framework between TKGs, namely LightTEA, which consists of four essential components: (1) Two-aspect Three-view Label Propagation, (2) Sparse Similarity with Temporal Constraints, (3) Sinkhorn Operator, and (4) Temporal Iterative Learning. All of these modules work together to improve the performance of EA while reducing the time consumption of the model. Extensive experiments on public datasets indicate that our proposed model significantly outperforms the SOTA methods for EA between TKGs, and the time consumed by LightTEA is only dozens of seconds at most, no more than 10% of the most efficient TEA method.

摘要
Entity alignment (EA) 目标是找到不同知识Graph (KG) 中相应的实体对，这对知识融合具有关键作用。随着时间知识Graph (TKG) 的广泛使用，时间意识的 EA 方法 (TEA) 得到推广。现有的 TEA 模型基于图神经网络 (GNN) ，实现了状态前景 (SOTA) 性能，但是对大规模 TKG 的扩展存在可扩展性问题。在本文中，我们提出了一种高效和高效的非神经 EA 框架 между TKGs，即 LightTEA，该框架包括四个关键组成部分：1. 两面三视Label协同传播2. 稀疏相似度 temporal 约束3. Sinkhorn 算子4. 时间迭代学习这些模块结合起来，以提高 EA 性能，同时降低模型的时间消耗。我们在公共数据集上进行了广泛的实验，结果显示，我们提出的模型在 EA между TKGs 方面具有显著优势，并且模型的时间消耗只有毫不到十个秒，最多只有最高效的 TEA 方法的十分之一。

Pathway: a fast and flexible unified stream data processing framework for analytical and Machine Learning applications

paper_url: http://arxiv.org/abs/2307.13116
repo_url: None
paper_authors: Michal Bartoszkiewicz, Jan Chorowski, Adrian Kosowski, Jakub Kowalski, Sergey Kulik, Mateusz Lewandowski, Krzysztof Nowicki, Kamil Piechowiak, Olivier Ruas, Zuzanna Stamirowska, Przemyslaw Uznanski
for: 本研究旨在解决物理经济数据分析和处理中的挑战，包括互联网对象（IoT）和企业系统生成的数据流。
methods: 本研究使用了一种新的数据处理框架，叫做Pathway，可以处理有约束和无约束数据流。Pathway使用Python和Python/SQL工作流程，并由rust编程语言实现分布式增量数据流。
results: 本研究表明，Pathway在批处理和流处理上都能够超越当前行业框架的性能。此外，Pathway还可以处理一些现成industry框架无法解决的流处理用例，如流动迭代图算法（PageRank等）。

Abstract
We present Pathway, a new unified data processing framework that can run workloads on both bounded and unbounded data streams. The framework was created with the original motivation of resolving challenges faced when analyzing and processing data from the physical economy, including streams of data generated by IoT and enterprise systems. These required rapid reaction while calling for the application of advanced computation paradigms (machinelearning-powered analytics, contextual analysis, and other elements of complex event processing). Pathway is equipped with a Table API tailored for Python and Python/SQL workflows, and is powered by a distributed incremental dataflow in Rust. We describe the system and present benchmarking results which demonstrate its capabilities in both batch and streaming contexts, where it is able to surpass state-of-the-art industry frameworks in both scenarios. We also discuss streaming use cases handled by Pathway which cannot be easily resolved with state-of-the-art industry frameworks, such as streaming iterative graph algorithms (PageRank, etc.).

摘要
我们现在宣布Pathway，一个新的统一资料处理框架，可以在bounded和unbounded数据流中运行工作负载。这个框架是为了解决物理经济中资料分析和处理的挑战，包括来自IoT和企业系统的数据流，需要快速反应，并且需要应用进阶计算理论（机器学习驱动分析、 контек斯特义分析等）。Pathway拥有适合Python和Python/SQL工作流程的表格API，并且由rust语言所构成的分布式增量数据流程所动力。我们详细描述这个系统，并提供了实验结果，证明其在批处理和流处理上都能够超越现有的产业框架，同时也能够处理流程中的迭代图算法（PageRank等）。

Transformers in Reinforcement Learning: A Survey

paper_url: http://arxiv.org/abs/2307.05979
repo_url: None
paper_authors: Pranav Agarwal, Aamer Abdul Rahman, Pierre-Luc St-Charles, Simon J. D. Prince, Samira Ebrahimi Kahou
for: This paper explores the use of transformers in reinforcement learning (RL) to address challenges such as unstable training, credit assignment, lack of interpretability, and partial observability.
methods: The paper discusses the properties of transformers and their variants, and how they can be applied to various aspects of RL, including representation learning, transition and reward function modeling, and policy optimization.
results: The paper presents a broad overview of how transformers have been adapted for several applications, including robotics, medicine, language modeling, cloud computing, and combinatorial optimization, and discusses the limitations of using transformers in RL.

Abstract
Transformers have significantly impacted domains like natural language processing, computer vision, and robotics, where they improve performance compared to other neural networks. This survey explores how transformers are used in reinforcement learning (RL), where they are seen as a promising solution for addressing challenges such as unstable training, credit assignment, lack of interpretability, and partial observability. We begin by providing a brief domain overview of RL, followed by a discussion on the challenges of classical RL algorithms. Next, we delve into the properties of the transformer and its variants and discuss the characteristics that make them well-suited to address the challenges inherent in RL. We examine the application of transformers to various aspects of RL, including representation learning, transition and reward function modeling, and policy optimization. We also discuss recent research that aims to enhance the interpretability and efficiency of transformers in RL, using visualization techniques and efficient training strategies. Often, the transformer architecture must be tailored to the specific needs of a given application. We present a broad overview of how transformers have been adapted for several applications, including robotics, medicine, language modeling, cloud computing, and combinatorial optimization. We conclude by discussing the limitations of using transformers in RL and assess their potential for catalyzing future breakthroughs in this field.

摘要
transformers 对于自然语言处理、计算机视觉和机器人等领域有着显著的影响，提高了其他神经网络的性能。这篇评论探讨了 transformers 在奖励学习（RL）中的应用，认为它们可以解决一些挑战，如不稳定的训练、奖励分配、 interpretability 和部分可见性。我们从 RL 领域的概述开始，然后讨论了 классиical RL 算法的挑战。接着，我们介绍 transformers 的性能和其变种，并讨论了它们在 RL 中的应用。我们还讨论了使 transformers 更加可读性和效率的研究，使用可视化技术和高效的训练策略。在应用 transformers 时，需要根据特定应用场景进行定制。我们将展示一些在机器人、医学、语言模型、云计算和组合优化等领域中的 transformers 的应用。最后，我们讨论了使用 transformers 在 RL 中的局限性和未来可能的突破。

Towards Safe Self-Distillation of Internet-Scale Text-to-Image Diffusion Models

paper_url: http://arxiv.org/abs/2307.05977
repo_url: https://github.com/nannullna/safe-diffusion
paper_authors: Sanghyun Kim, Seohyeon Jung, Balhae Kim, Moonseok Choi, Jinwoo Shin, Juho Lee
for: 防止文本到图像扩散模型中的危险或版权内容生成
methods: 提出了基于自适应扩散模型的混合约束方法，通过自适应扩散模型自身的约束来引导噪声估计，使得生成的图像中减少了大量危险或版权内容
results: 比前一种方法更高效地减少了生成图像中的危险或版权内容，同时允许同时除掉多个概念，而不会降低整体图像质量

Abstract
Large-scale image generation models, with impressive quality made possible by the vast amount of data available on the Internet, raise social concerns that these models may generate harmful or copyrighted content. The biases and harmfulness arise throughout the entire training process and are hard to completely remove, which have become significant hurdles to the safe deployment of these models. In this paper, we propose a method called SDD to prevent problematic content generation in text-to-image diffusion models. We self-distill the diffusion model to guide the noise estimate conditioned on the target removal concept to match the unconditional one. Compared to the previous methods, our method eliminates a much greater proportion of harmful content from the generated images without degrading the overall image quality. Furthermore, our method allows the removal of multiple concepts at once, whereas previous works are limited to removing a single concept at a time.

摘要
大规模图像生成模型，具有吸引人艺术质量，受互联网庞大数据的支持，但也引发社会问题，这些模型可能生成有害或版权内容。这些偏见和有害性在训练过程中产生，难以完全去除，成为安全部署这些模型的重大障碍。在这篇论文中，我们提出了一种方法called SDD，用于防止文本到图像扩散模型中的问题内容生成。我们通过自我概拟扩散模型，使噪声估计条件于目标Removal概念与无条件噪声估计匹配。相比之前的方法，我们的方法可以完全除去大量有害内容，而不会影响整体图像质量。此外，我们的方法允许同时去除多个概念，而前一代工作只能一个概念一次去除。

VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models

paper_url: http://arxiv.org/abs/2307.05973
repo_url: None
paper_authors: Wenlong Huang, Chen Wang, Ruohan Zhang, Yunzhu Li, Jiajun Wu, Li Fei-Fei
for: 这 paper 是为了开发一种可以在不同的 manipulation task 上 Synthesize robot trajectory，以实现在 open-set instruction 和 open-set object 下进行物理交互。
methods: 该 paper 使用了 LLMs 的 reasoning 和 planning 能力，以及 VLM 的 visual-language 能力，将知识 compose 到 observation space 中，并使用 model-based planning 框架 synthesize closed-loop robot trajectory。
results: 该 paper 在 simulated 和 real-robot 环境中进行了大规模的研究，并达到了在 free-form natural language 下实现 everyday manipulation task 的能力。

Abstract
Large language models (LLMs) are shown to possess a wealth of actionable knowledge that can be extracted for robot manipulation in the form of reasoning and planning. Despite the progress, most still rely on pre-defined motion primitives to carry out the physical interactions with the environment, which remains a major bottleneck. In this work, we aim to synthesize robot trajectories, i.e., a dense sequence of 6-DoF end-effector waypoints, for a large variety of manipulation tasks given an open-set of instructions and an open-set of objects. We achieve this by first observing that LLMs excel at inferring affordances and constraints given a free-form language instruction. More importantly, by leveraging their code-writing capabilities, they can interact with a visual-language model (VLM) to compose 3D value maps to ground the knowledge into the observation space of the agent. The composed value maps are then used in a model-based planning framework to zero-shot synthesize closed-loop robot trajectories with robustness to dynamic perturbations. We further demonstrate how the proposed framework can benefit from online experiences by efficiently learning a dynamics model for scenes that involve contact-rich interactions. We present a large-scale study of the proposed method in both simulated and real-robot environments, showcasing the ability to perform a large variety of everyday manipulation tasks specified in free-form natural language. Project website: https://voxposer.github.io

摘要
大型语言模型（LLM）有丰富的实用知识，可以用于机器人操作中的理解和观念。 despite progress, most still rely on pre-defined motion primitives to interact with the environment, which remains a major bottleneck. In this work, we aim to synthesize robot trajectories, i.e., a dense sequence of 6-DoF end-effector waypoints, for a large variety of manipulation tasks given an open-set of instructions and an open-set of objects. We achieve this by first observing that LLMs excel at inferring affordances and constraints given a free-form language instruction. More importantly, by leveraging their code-writing capabilities, they can interact with a visual-language model (VLM) to compose 3D value maps to ground the knowledge into the observation space of the agent. The composed value maps are then used in a model-based planning framework to zero-shot synthesize closed-loop robot trajectories with robustness to dynamic perturbations. We further demonstrate how the proposed framework can benefit from online experiences by efficiently learning a dynamics model for scenes that involve contact-rich interactions. We present a large-scale study of the proposed method in both simulated and real-robot environments, showcasing the ability to perform a large variety of everyday manipulation tasks specified in free-form natural language. Project website:

Giving Robots a Hand: Learning Generalizable Manipulation with Eye-in-Hand Human Video Demonstrations

paper_url: http://arxiv.org/abs/2307.05959
repo_url: None
paper_authors: Moo Jin Kim, Jiajun Wu, Chelsea Finn
for: 这篇论文旨在使用人工视频示例来提高机器人视omyotor策略的普适性。
methods: 这篇论文使用了人工视频示例，并通过对这些示例进行增强来增强机器人的普适性。
results: 该方法在八个真实世界任务中，使得机器人的抓取率提高了58%（绝对）的平均值，允许机器人在新的环境配置和新任务中进行普适的抓取。

Abstract
Eye-in-hand cameras have shown promise in enabling greater sample efficiency and generalization in vision-based robotic manipulation. However, for robotic imitation, it is still expensive to have a human teleoperator collect large amounts of expert demonstrations with a real robot. Videos of humans performing tasks, on the other hand, are much cheaper to collect since they eliminate the need for expertise in robotic teleoperation and can be quickly captured in a wide range of scenarios. Therefore, human video demonstrations are a promising data source for learning generalizable robotic manipulation policies at scale. In this work, we augment narrow robotic imitation datasets with broad unlabeled human video demonstrations to greatly enhance the generalization of eye-in-hand visuomotor policies. Although a clear visual domain gap exists between human and robot data, our framework does not need to employ any explicit domain adaptation method, as we leverage the partial observability of eye-in-hand cameras as well as a simple fixed image masking scheme. On a suite of eight real-world tasks involving both 3-DoF and 6-DoF robot arm control, our method improves the success rates of eye-in-hand manipulation policies by 58% (absolute) on average, enabling robots to generalize to both new environment configurations and new tasks that are unseen in the robot demonstration data. See video results at https://giving-robots-a-hand.github.io/ .

摘要
眼手相机已经在视觉基于机器人操作中表现出了扩大样本效率和通用性的推荐。然而，对于机器人模仿，仍然是非常昂贵的收集大量专家示范数据。人类视频示范则比较便宜，因为它们消除了机器人 теле操作的专业知识的需求，并可以快速在各种场景中采集。因此，人类视频示范是学习普遍的机器人 manipulation 策略的有效数据源。在这项工作中，我们将宽频狭频的机器人模仿数据与广泛的无标签人类视频示范相结合，以大幅提高眼手视motor策略的普遍性。虽然机器人和人类数据之间存在明显的视觉领域差异，但我们的框架不需要使用任何明确的领域适应方法，因为我们利用眼手相机的偏见以及简单的固定图像遮盾。在八个实际任务中，我们的方法提高了眼手操作策略的成功率 by 58% (绝对) 的平均值，使机器人能够普遍应对新环境配置和新任务，这些任务在机器人示范数据中未被见过。请参考视频结果在。

Automatically Reconciling the Trade-off between Prediction Accuracy and Earliness in Prescriptive Business Process Monitoring

paper_url: http://arxiv.org/abs/2307.05939
repo_url: None
paper_authors: Andreas Metzger, Tristan Kley, Aristide Rothweiler, Klaus Pohl
for: 这个论文是为了提供对业务过程监控中的决策支持，帮助过程管理者在进行预测和适应的过程中做出更好的决策。
methods: 这篇论文使用了不同的方法来平衡预测准确性和预测早晚的负面影响，包括使用不同的预测模型和数据集。
results: 研究结果显示，不同的方法在不同的情况下具有不同的效果，而且具体的选择哪种方法需要考虑具体的情况和目标。

Abstract
Prescriptive business process monitoring provides decision support to process managers on when and how to adapt an ongoing business process to prevent or mitigate an undesired process outcome. We focus on the problem of automatically reconciling the trade-off between prediction accuracy and prediction earliness in determining when to adapt. Adaptations should happen sufficiently early to provide enough lead time for the adaptation to become effective. However, earlier predictions are typically less accurate than later predictions. This means that acting on less accurate predictions may lead to unnecessary adaptations or missed adaptations. Different approaches were presented in the literature to reconcile the trade-off between prediction accuracy and earliness. So far, these approaches were compared with different baselines, and evaluated using different data sets or even confidential data sets. This limits the comparability and replicability of the approaches and makes it difficult to choose a concrete approach in practice. We perform a comparative evaluation of the main alternative approaches for reconciling the trade-off between prediction accuracy and earliness. Using four public real-world event log data sets and two types of prediction models, we assess and compare the cost savings of these approaches. The experimental results indicate which criteria affect the effectiveness of an approach and help us state initial recommendations for the selection of a concrete approach in practice.

摘要
现有的文献中提出了多种方法来协调预测精度和预测早期性的负面关系。这些方法与不同的基准点进行比较，并使用不同的数据集或者even confidential data set进行评估。这限制了比较和复制性的可能性，使其难以在做实践中选择一个具体的方法。我们进行了对主要 altenative approaches的比较性评估。使用四个公共的实际世界事件日志数据集和两种预测模型，我们评估和比较这些方法的成本节省。实验结果表明哪些因素影响了方法的效果，并帮助我们提出初步的实践中的选择建议。

BiRP: Learning Robot Generalized Bimanual Coordination using Relative Parameterization Method on Human Demonstration

paper_url: http://arxiv.org/abs/2307.05933
repo_url: https://github.com/skylark0924/rofunc
paper_authors: Junjia Liu, Hengyi Sim, Chenzui Li, Fei Chen
for: 本研究旨在提出一种能够从人类示范学习（Learning from Demonstration，LfD）的简单易用的双手协调方法，以便应用于机器人大型抓取模型训练中。
methods: 本研究使用了分类型双手任务（leader-follower和synergistic coordination），并提出了一种相对参数化方法来从人类示范中学习这两种协调方式。该方法通过描述人类动作中协调变化的概率分布来表示协调。
results: 研究人员使用了人工动作和人类示范数据，并在一个人类形态机器人上进行了一系列实验，以证明该方法可以通过学习人类示范来实现新任务参数下的一致协调。

Abstract
Human bimanual manipulation can perform more complex tasks than a simple combination of two single arms, which is credited to the spatio-temporal coordination between the arms. However, the description of bimanual coordination is still an open topic in robotics. This makes it difficult to give an explainable coordination paradigm, let alone applied to robotics. In this work, we divide the main bimanual tasks in human daily activities into two types: leader-follower and synergistic coordination. Then we propose a relative parameterization method to learn these types of coordination from human demonstration. It represents coordination as Gaussian mixture models from bimanual demonstration to describe the change in the importance of coordination throughout the motions by probability. The learned coordinated representation can be generalized to new task parameters while ensuring spatio-temporal coordination. We demonstrate the method using synthetic motions and human demonstration data and deploy it to a humanoid robot to perform a generalized bimanual coordination motion. We believe that this easy-to-use bimanual learning from demonstration (LfD) method has the potential to be used as a data augmentation plugin for robot large manipulation model training. The corresponding codes are open-sourced in https://github.com/Skylark0924/Rofunc.

摘要
人类双手把握可以完成更复杂的任务，比如单独的两只手不能做的，这是因为双手之间的空间时间协调。然而，人类双手协调的描述仍然是Robotics中的一个开放问题。在这种情况下，我们将主要的双手任务分为两类：领导者-跟进和协调协调。然后，我们提议一种相对参数化方法来学习这两种协调类型从人类示范中。它将协调表示为 Gaussian mixture models，从双手示范中描述变化的协调重要性。学习的协调表示可以通过新任务参数进行扩展，保证空间时间协调。我们使用人工动作和人类示范数据进行示例，并将其应用于人型机器人执行通用双手协调运动。我们认为这种易于使用的双手学习从示范（LfD）方法有可能被用作机器人大型抓取模型训练的数据增强插件。相关代码在https://github.com/Skylark0924/Rofunc上开源。

paper_url: http://arxiv.org/abs/2308.01919
repo_url: None
paper_authors: Yunfei Guo, Tao Zhang, Wu Huang
For: The paper is written for researchers and practitioners in the field of emotion recognition and artificial intelligence, as well as those interested in multimodal electrophysiological signals and their applications.* Methods: The paper proposes a self-supervised contrastive learning-based multimodal emotion recognition method called ME-MHACL, which uses unlabeled electrophysiological signals and multi-head attention mechanisms to learn meaningful feature representations and improve recognition performance.* Results: The paper reports that the proposed ME-MHACL method outperformed existing benchmark methods in emotion recognition tasks and had good cross-individual generalization ability, as demonstrated through experiments on two public datasets (DEAP and MAHNOB-HCI).Here’s the information in Simplified Chinese text:* 为：该论文适用于人工智能领域的情绪识别和多modal生理学信号等领域的研究人员和实践者，以及关注情绪识别和生理学信号应用的人士。* 方法：该论文提出一种基于自我超vised contrastive learning的多modal情绪识别方法ME-MHACL，该方法可以从无标注的生理学信号中学习有用的特征表示，并使用多头注意机制进行特征融合，以提高识别性能。* 结果：论文报告称，提出的ME-MHACL方法在两个公共数据集（DEAP和MAHNOB-HCI）上的情绪识别任务中，与现有的参考方法相比，表现出了更好的跨个体普遍能力和识别性能。

Abstract
Emotion recognition is an important research direction in artificial intelligence, helping machines understand and adapt to human emotional states. Multimodal electrophysiological(ME) signals, such as EEG, GSR, respiration(Resp), and temperature(Temp), are effective biomarkers for reflecting changes in human emotions. However, using electrophysiological signals for emotion recognition faces challenges such as data scarcity, inconsistent labeling, and difficulty in cross-individual generalization. To address these issues, we propose ME-MHACL, a self-supervised contrastive learning-based multimodal emotion recognition method that can learn meaningful feature representations from unlabeled electrophysiological signals and use multi-head attention mechanisms for feature fusion to improve recognition performance. Our method includes two stages: first, we use the Meiosis method to group sample and augment unlabeled electrophysiological signals and design a self-supervised contrastive learning task; second, we apply the trained feature extractor to labeled electrophysiological signals and use multi-head attention mechanisms for feature fusion. We conducted experiments on two public datasets, DEAP and MAHNOB-HCI, and our method outperformed existing benchmark methods in emotion recognition tasks and had good cross-individual generalization ability.

摘要
artificial intelligence的重要研究方向之一是情感识别，帮助机器人理解和适应人类情感状态。多Modal生物电Physiological(ME)信号，如EEG、GSR、呼吸(Resp)和体温(Temp)，是人类情感变化的有效生物标志。然而，使用生物电信号进行情感识别存在数据缺乏、标签不一致和跨个体普遍化的问题。为解决这些问题，我们提出了ME-MHACL方法，它是一种基于自动学习的多Modal生物电情感识别方法。我们的方法包括两个阶段：第一个阶段，我们使用Meiosis方法将样本分组和增强无标签生物电信号，并设计了一个自我supervised Contrastive学习任务；第二个阶段，我们使用训练过的特征提取器对标签生物电信号进行特征融合，使用多头注意力机制进行特征融合。我们在DEAP和MAHNOB-HCI两个公共数据集上进行了实验，并证明了我们的方法在情感识别任务中的优于现有 referbenchmark方法，同时具有良好的跨个体普遍能力。

A New Dataset and Comparative Study for Aphid Cluster Detection

paper_url: http://arxiv.org/abs/2307.05929
repo_url: None
paper_authors: Tianxiao Zhang, Kaidong Li, Xiangyu Chen, Cuncong Zhong, Bo Luo, Ivan Grijalva Teran, Brian McCornack, Daniel Flippo, Ajay Sharda, Guanghui Wang
for: This paper aims to estimate the aphid infestation level in sorghum fields by detecting aphid clusters using machine learning models.methods: The authors use millions of images taken in sorghum fields, manually select images with aphids, and annotate each aphid cluster in the images. They then crop the images into patches and create a labeled dataset with over 151,000 image patches to train and compare the performance of four state-of-the-art object detection models.results: The authors evaluate the performance of the four object detection models and compare their results, achieving an average precision of 84.1% and an average recall of 81.3%. They also demonstrate the effectiveness of their approach in estimating the aphid infestation level in sorghum fields.

Abstract
Aphids are one of the main threats to crops, rural families, and global food security. Chemical pest control is a necessary component of crop production for maximizing yields, however, it is unnecessary to apply the chemical approaches to the entire fields in consideration of the environmental pollution and the cost. Thus, accurately localizing the aphid and estimating the infestation level is crucial to the precise local application of pesticides. Aphid detection is very challenging as each individual aphid is really small and all aphids are crowded together as clusters. In this paper, we propose to estimate the infection level by detecting aphid clusters. We have taken millions of images in the sorghum fields, manually selected 5,447 images that contain aphids, and annotated each aphid cluster in the image. To use these images for machine learning models, we crop the images into patches and created a labeled dataset with over 151,000 image patches. Then, we implement and compare the performance of four state-of-the-art object detection models.

摘要
螨虫是农业生产中的一大难题，对农家、全球食品安全也有很大的影响。化学防除虫害是农业生产中必不可少的一部分，但是应该尽量避免对环境和成本而过度使用化学药品。因此，准确地Local化螨虫和评估感染程度非常重要。螨虫检测非常困难，因为每个螨虫都很小，而且所有螨虫都会集结在一起。在这篇论文中，我们提议通过检测螨虫群体来估算感染程度。我们在甫酒田中拍摄了数百万张照片， manually选择了5,447张包含螨虫的照片，并对每个螨虫群体在图像中进行了标注。然后，我们使用这些图像进行机器学习模型的训练， cropped the images into patches and created a labeled dataset with over 151,000 image patches。最后，我们实现并比较了四种当前最佳对象检测模型的性能。

Reading Radiology Imaging Like The Radiologist

paper_url: http://arxiv.org/abs/2307.05921
repo_url: None
paper_authors: Yuhao Wang
for: 这个论文旨在提高自动放射报告生成的质量，使其包含更多的细节和精细的疾病描述。
methods: 该论文提出了一种基于疾病对准检索框架的方法，使用类似报告作为知识参照，并采用了事实一致描述生成器来生成更加准确和事实一致的疾病描述。
results: 该论文的实验结果表明，使用该方法可以生成更加精细和准确的放射报告，并且可以减少Visual和文本数据偏见。

Abstract
Automated radiology report generation aims to generate radiology reports that contain rich, fine-grained descriptions of radiology imaging. Compared with image captioning in the natural image domain, medical images are very similar to each other, with only minor differences in the occurrence of diseases. Given the importance of these minor differences in the radiology report, it is crucial to encourage the model to focus more on the subtle regions of disease occurrence. Secondly, the problem of visual and textual data biases is serious. Not only do normal cases make up the majority of the dataset, but sentences describing areas with pathological changes also constitute only a small part of the paragraph. Lastly, generating medical image reports involves the challenge of long text generation, which requires more expertise and empirical training in medical knowledge. As a result, the difficulty of generating such reports is increased. To address these challenges, we propose a disease-oriented retrieval framework that utilizes similar reports as prior knowledge references. We design a factual consistency captioning generator to generate more accurate and factually consistent disease descriptions. Our framework can find most similar reports for a given disease from the CXR database by retrieving a disease-oriented mask consisting of the position and morphological characteristics. By referencing the disease-oriented similar report and the visual features, the factual consistency model can generate a more accurate radiology report.

摘要
自动化放射学报告生成目标是生成包含详细放射学影像描述的放射学报告。与自然图像领域中的图像描述不同，医疗图像具有只有小差异的疾病出现。由于这些小差异对放射学报告的重要性，因此需要鼓励模型更加注重细微的疾病区域。其次，图像和文本数据偏见问题严重。一般案例占大多数数据集，而描述疾病改变的句子也只占报告中的一小部分。最后，生成医学影像报告需要进行长文本生成，需要更多的专业知识和实践医学知识。因此，生成这些报告的难度更高。为解决这些挑战，我们提议一种疾病启发式检索框架，利用相似报告作为启发知识参考。我们设计了一个精准一致描述生成器，以生成更加准确和精准一致的疾病描述。我们的框架可以从CXR数据库中找到最相似的报告，并通过对疾病启发式掩码进行检索，使用视觉特征和疾病启发式报告进行匹配。通过对疾病启发式报告和视觉特征进行参照，Factual Consistency Model可以生成更加准确的放射学报告。

Close-up View synthesis by Interpolating Optical Flow

paper_url: http://arxiv.org/abs/2307.05913
repo_url: None
paper_authors: Xinyi Bai, Ze Wang, Lu Yang, Hong Cheng
for: 本文提出了一种实现近距离虚拟视角的方法，不需要深度信息和摄像头参数。
methods: 该方法使用光流来构建假3D投影，并通过反向光流计算获得任意虚拟视角。
results: 该方法可以在Google街景视图系统中实现高清晰和视觉准确的虚拟视角变换和放大，并且可以解决视角变换和放大所导致的视觉扭曲和图像模糊。

Abstract
The virtual viewpoint is perceived as a new technique in virtual navigation, as yet not supported due to the lack of depth information and obscure camera parameters. In this paper, a method for achieving close-up virtual view is proposed and it only uses optical flow to build parallax effects to realize pseudo 3D projection without using depth sensor. We develop a bidirectional optical flow method to obtain any virtual viewpoint by proportional interpolation of optical flow. Moreover, with the ingenious application of the optical-flow-value, we achieve clear and visual-fidelity magnified results through lens stretching in any corner, which overcomes the visual distortion and image blur through viewpoint magnification and transition in Google Street View system.

摘要
<>虚拟视角被视为虚拟导航新技术，目前还没有支持，主要原因是缺乏深度信息和摄像头参数的不明确。在这篇论文中，我们提出了实现近距离虚拟视角的方法，只使用光流来构建质感效果，实现 Pseudo 3D 投影，不需要使用深度传感器。我们开发了双向光流方法，通过质感 interpolate 来获取任何虚拟视点。此外，我们通过光流值的创新应用，实现了高清晰度和视觉准确性的放大结果，在任何角落都可以清晰地看到，这些结果超越了视点放大和转换所带来的视觉扭曲和图像模糊。<>

Stability Guarantees for Feature Attributions with Multiplicative Smoothing

paper_url: http://arxiv.org/abs/2307.05902
repo_url: None
paper_authors: Anton Xue, Rajeev Alur, Eric Wong
for: 这个论文目的是提出一种稳定的特征归因方法，以确保模型的决策过程是可靠的。
methods: 这个论文使用了一种名为多项平滑（MuS）的缓和方法，以确保模型在掩蔽特征时的稳定性。此外，论文还使用了其他的标准缓和技术，并证明了MuS可以与任何分类器和特征归因方法结合使用。
results: 论文通过对视觉和语言模型进行测试，证明了MuS可以提供非致命的稳定性保证，并且可以与其他特征归因方法结合使用。

Abstract
Explanation methods for machine learning models tend to not provide any formal guarantees and may not reflect the underlying decision-making process. In this work, we analyze stability as a property for reliable feature attribution methods. We prove that relaxed variants of stability are guaranteed if the model is sufficiently Lipschitz with respect to the masking of features. To achieve such a model, we develop a smoothing method called Multiplicative Smoothing (MuS). We show that MuS overcomes theoretical limitations of standard smoothing techniques and can be integrated with any classifier and feature attribution method. We evaluate MuS on vision and language models with a variety of feature attribution methods, such as LIME and SHAP, and demonstrate that MuS endows feature attributions with non-trivial stability guarantees.

摘要
机器学习模型的解释方法通常无法提供正式的保证和不能反映模型做出决策的下面过程。在这种工作中，我们研究稳定性作为可靠特征归因方法的属性。我们证明如果模型具有特定的 lipschitz 性，那么可以遮盖特征的模型会具有稳定性。为达到这种模型，我们开发了一种平滑方法called Multiplicative Smoothing（MuS）。我们表明了MuS可以超越标准平滑技术的理论限制，并且可以与任何分类器和特征归因方法集成。我们在视觉和语言模型中进行了多种特征归因方法的评估，包括LIME和SHAP，并证明了MuS可以为特征归因提供非常有用的稳定性保证。

PID-Inspired Inductive Biases for Deep Reinforcement Learning in Partially Observable Control Tasks

paper_url: http://arxiv.org/abs/2307.05891
repo_url: https://github.com/ianchar/gpide
paper_authors: Ian Char, Jeff Schneider
for: 这篇论文旨在探讨深度强化学习（RL）如何在数据alone下学习控制系统。
methods: 这篇论文使用了PID控制器的思想，提出了两种历史编码方法，其中一种直接使用PID特征，另一种是扩展这些核心思想，可以应用于任何控制任务。
results: 与之前的方法比较，这篇论文的政策可以更好地适应环境变化，并在跟踪任务上达到更高的性能。此外，这些政策在高维控制任务上的平均性能也高于过去的状态之 искус智能方法的1.7倍。

Abstract
Deep reinforcement learning (RL) has shown immense potential for learning to control systems through data alone. However, one challenge deep RL faces is that the full state of the system is often not observable. When this is the case, the policy needs to leverage the history of observations to infer the current state. At the same time, differences between the training and testing environments makes it critical for the policy not to overfit to the sequence of observations it sees at training time. As such, there is an important balancing act between having the history encoder be flexible enough to extract relevant information, yet be robust to changes in the environment. To strike this balance, we look to the PID controller for inspiration. We assert the PID controller's success shows that only summing and differencing are needed to accumulate information over time for many control tasks. Following this principle, we propose two architectures for encoding history: one that directly uses PID features and another that extends these core ideas and can be used in arbitrary control tasks. When compared with prior approaches, our encoders produce policies that are often more robust and achieve better performance on a variety of tracking tasks. Going beyond tracking tasks, our policies achieve 1.7x better performance on average over previous state-of-the-art methods on a suite of high dimensional control tasks.

摘要

Ecosystem-level Analysis of Deployed Machine Learning Reveals Homogeneous Outcomes

paper_url: http://arxiv.org/abs/2307.05862
repo_url: None
paper_authors: Connor Toups, Rishi Bommasani, Kathleen A. Creel, Sarah H. Bana, Dan Jurafsky, Percy Liang
for: 本研究旨在探讨机器学习技术在社会中的影响，尤其是在不同情况下如何导致系统性失败。
methods: 本研究使用多种模式（文本、图像、语音）和11个数据集进行了评估，发现在不同的情况下，机器学习模型的部署常常导致系统性失败，即一些用户被所有模型都错误地分类。
results: 研究发现，尽管各个模型在人口水平上得到改进，但这些改进很少降低了系统性失败的频率。此外，研究还发现了新的种族差距在模型预测中，这些差距不存在于人类预测中。这些例子表明，生态系统级分析具有描述机器学习技术在社会中的影响的独特优势。

Abstract
Machine learning is traditionally studied at the model level: researchers measure and improve the accuracy, robustness, bias, efficiency, and other dimensions of specific models. In practice, the societal impact of machine learning is determined by the surrounding context of machine learning deployments. To capture this, we introduce ecosystem-level analysis: rather than analyzing a single model, we consider the collection of models that are deployed in a given context. For example, ecosystem-level analysis in hiring recognizes that a job candidate's outcomes are not only determined by a single hiring algorithm or firm but instead by the collective decisions of all the firms they applied to. Across three modalities (text, images, speech) and 11 datasets, we establish a clear trend: deployed machine learning is prone to systemic failure, meaning some users are exclusively misclassified by all models available. Even when individual models improve at the population level over time, we find these improvements rarely reduce the prevalence of systemic failure. Instead, the benefits of these improvements predominantly accrue to individuals who are already correctly classified by other models. In light of these trends, we consider medical imaging for dermatology where the costs of systemic failure are especially high. While traditional analyses reveal racial performance disparities for both models and humans, ecosystem-level analysis reveals new forms of racial disparity in model predictions that do not present in human predictions. These examples demonstrate ecosystem-level analysis has unique strengths for characterizing the societal impact of machine learning.

摘要
machine learning traditionally studied at model level: researchers measure improve accuracy, robustness, bias, efficiency, other dimensions specific models. in practice, societal impact machine learning determined surrounding context machine learning deployments. to capture this, introduce ecosystem-level analysis: analyze collection models deployed given context. for example, ecosystem-level analysis hiring recognizes job candidate's outcomes not determined single hiring algorithm firm but collective decisions all firms applied. across three modalities (text, images, speech) 11 datasets, establish clear trend: deployed machine learning prone systemic failure, some users exclusively misclassified all models available. even individual models improve population level over time, find improvements rarely reduce prevalence systemic failure. instead, benefits improvements predominantly accrue individuals correctly classified other models. light these trends, consider medical imaging dermatology costs systemic failure especially high. traditional analyses reveal racial performance disparities both models humans, ecosystem-level analysis reveals new forms racial disparity model predictions do not present human predictions. these examples demonstrate ecosystem-level analysis unique strengths characterizing societal impact machine learning.

FAIRO: Fairness-aware Adaptation in Sequential-Decision Making for Human-in-the-Loop Systems

paper_url: http://arxiv.org/abs/2307.05857
repo_url: None
paper_authors: Tianyu Zhao, Mojtaba Taherisadr, Salma Elmalaki
for: 本研究旨在提高在人类在Loop（HITL）环境中的决策系统中的公平性，特别是当多个人的不同行为和期望被同一个逻辑决策系统影响时。
methods: 本文使用了 Options 权值学习框架，将公平性问题分解成适应性任务，考虑人类行为变量和时间变化的人类偏好。
results: 对三种不同的 HITL 应用场景进行了广泛的评估，证明 FAIRO 能够有效地促进公平性，同时考虑人类变量和时间变化。比较其他方法，FAIRO 在所有三个应用场景中平均提高公平性约35.36%。

Abstract
Achieving fairness in sequential-decision making systems within Human-in-the-Loop (HITL) environments is a critical concern, especially when multiple humans with different behavior and expectations are affected by the same adaptation decisions in the system. This human variability factor adds more complexity since policies deemed fair at one point in time may become discriminatory over time due to variations in human preferences resulting from inter- and intra-human variability. This paper addresses the fairness problem from an equity lens, considering human behavior variability, and the changes in human preferences over time. We propose FAIRO, a novel algorithm for fairness-aware sequential-decision making in HITL adaptation, which incorporates these notions into the decision-making process. In particular, FAIRO decomposes this complex fairness task into adaptive sub-tasks based on individual human preferences through leveraging the Options reinforcement learning framework. We design FAIRO to generalize to three types of HITL application setups that have the shared adaptation decision problem. Furthermore, we recognize that fairness-aware policies can sometimes conflict with the application's utility. To address this challenge, we provide a fairness-utility tradeoff in FAIRO, allowing system designers to balance the objectives of fairness and utility based on specific application requirements. Extensive evaluations of FAIRO on the three HITL applications demonstrate its generalizability and effectiveness in promoting fairness while accounting for human variability. On average, FAIRO can improve fairness compared with other methods across all three applications by 35.36%.

摘要
We propose FAIRO, a novel algorithm for fairness-aware sequential decision-making in HITL adaptation. FAIRO decomposes this complex fairness task into adaptive sub-tasks based on individual human preferences, leveraging the Options reinforcement learning framework. FAIRO generalizes to three types of HITL application setups with shared adaptation decision problems.We acknowledge that fairness-aware policies can sometimes conflict with application utility. To address this challenge, FAIRO provides a fairness-utility tradeoff, allowing system designers to balance the objectives of fairness and utility based on specific application requirements. Extensive evaluations of FAIRO on the three HITL applications demonstrate its generalizability and effectiveness in promoting fairness while accounting for human variability. On average, FAIRO improves fairness compared with other methods across all three applications by 35.36%.

Influential Simplices Mining via Simplicial Convolutional Network

paper_url: http://arxiv.org/abs/2307.05841
repo_url: None
paper_authors: Yujie Zeng, Yiming Huang, Qiang Wu, Linyuan Lü
for: 本研究旨在透过Identifying influential simplices mine neural network（ISMnet）模型，更好地理解 simplicial complexes 中的高阶环境和功能。
methods: 本研究使用了一种新的高阶图学习模型，即ISMnet，可以同时利用图laplacian和节点特征来捕捉高阶环境的结构和功能。
results: 实验结果表明，ISMnet 可以准确地identify 0-simplices和2-simplices的影响 simplices，并且在某些情况下可以大幅提高对 simplicial complexes 的理解。

Abstract
Simplicial complexes have recently been in the limelight of higher-order network analysis, where a minority of simplices play crucial roles in structures and functions due to network heterogeneity. We find a significant inconsistency between identifying influential nodes and simplices. Therefore, it remains elusive how to characterize simplices' influence and identify influential simplices, despite the relative maturity of research on influential nodes (0-simplices) identification. Meanwhile, graph neural networks (GNNs) are potent tools that can exploit network topology and node features simultaneously, but they struggle to tackle higher-order tasks. In this paper, we propose a higher-order graph learning model, named influential simplices mining neural network (ISMnet), to identify vital h-simplices in simplicial complexes. It can tackle higher-order tasks by leveraging novel higher-order presentations: hierarchical bipartite graphs and higher-order hierarchical (HoH) Laplacians, where targeted simplices are grouped into a hub set and can interact with other simplices. Furthermore, ISMnet employs learnable graph convolutional operators in each HoH Laplacian domain to capture interactions among simplices, and it can identify influential simplices of arbitrary order by changing the hub set. Empirical results demonstrate that ISMnet significantly outperforms existing methods in ranking 0-simplices (nodes) and 2-simplices. In general, this novel framework excels in identifying influential simplices and promises to serve as a potent tool in higher-order network analysis.

摘要
高等网络分析中的 simplicial 复体 recent 在更高阶网络分析中受到重点关注，因为网络对称性导致一小部分 simplicial 扮演至关重要的作用。然而，评估 simplicial 的影响和找出关键 simplicial 仍然是一个悬峰问题，即使研究了 influential 节点（0-simplices）的识别已经有一定的成熔。此外，图神经网络（GNNs）是一种可以同时利用网络结构和节点特征的强大工具，但它们在更高阶任务上陷入困难。在本文中，我们提出了一种更高阶的图学学习模型，即 influential simplices mining neural network（ISMnet），可以在 simplicial 复体中识别关键的 h-simplices。ISMnet 可以通过利用新的高阶表示：层次二分图和高阶层次（HoH）拉普拉凯，将目标 simplicial 分组到一个枢纽集中，并与其他 simplicial 进行交互。此外，ISMnet 使用可学习的图 convolutional 算子在每个 HoH Laplacian 领域中捕捉 simplicial 之间的交互，并可以根据枢纽集来识别关键的 h-simplices。实验结果表明，ISMnet 在评估 0-simplices 和 2-simplices 时表现出色，在整体来说，这种新的框架在识别关键 simplicial 方面表现出优异，并且承诺成为高阶网络分析中的强大工具。

paper_url: http://arxiv.org/abs/2307.05834
repo_url: None
paper_authors: Sanae Amani, Khushbu Pahwa, Vladimir Braverman, Lin F. Yang
for: 本文旨在研究分布式多任务 reinforcement learning（RL），以帮助分布式学习代理人在面对新挑战时适应。
methods: 我们使用 linearly parameterized contextual Markov decision processes（MDPs）来形式化问题，每个任务被表示为一个上下文，该上下文指定了过程动态和奖励。我们提出了一个名为 DistMT-LSVI 的算法，其中每个代理人先标识任务，然后通过中央服务器交换信息，以 derive $\epsilon$-优化策略。
results: 我们的研究表明，使用 DistMT-LSVI，每个代理人只需要运行 $\tilde{\mathcal{O}({d^3H^6(\epsilon^{-2}+c_{\rm sep}^{-2})}\cdot M/N)$ 集 Episodes，以达到 $\epsilon$-优化策略 для所有 $M$ 个任务。这比非分布式设置的样本复杂性提高 factor of $1/N$。此外，我们通过 OpenAI Gym Atari 环境的数值实验 validate 我们的理论发现。

Abstract
Recently, DARPA launched the ShELL program, which aims to explore how experience sharing can benefit distributed lifelong learning agents in adapting to new challenges. In this paper, we address this issue by conducting both theoretical and empirical research on distributed multi-task reinforcement learning (RL), where a group of $N$ agents collaboratively solves $M$ tasks without prior knowledge of their identities. We approach the problem by formulating it as linearly parameterized contextual Markov decision processes (MDPs), where each task is represented by a context that specifies the transition dynamics and rewards. To tackle this problem, we propose an algorithm called DistMT-LSVI. First, the agents identify the tasks, and then they exchange information through a central server to derive $\epsilon$-optimal policies for the tasks. Our research demonstrates that to achieve $\epsilon$-optimal policies for all $M$ tasks, a single agent using DistMT-LSVI needs to run a total number of episodes that is at most $\tilde{\mathcal{O}({d^3H^6(\epsilon^{-2}+c_{\rm sep}^{-2})}\cdot M/N)$, where $c_{\rm sep}>0$ is a constant representing task separability, $H$ is the horizon of each episode, and $d$ is the feature dimension of the dynamics and rewards. Notably, DistMT-LSVI improves the sample complexity of non-distributed settings by a factor of $1/N$, as each agent independently learns $\epsilon$-optimal policies for all $M$ tasks using $\tilde{\mathcal{O}(d^3H^6M\epsilon^{-2})$ episodes. Additionally, we provide numerical experiments conducted on OpenAI Gym Atari environments that validate our theoretical findings.

摘要
近期，DARPA发布了Shell计划，旨在探索经验分享如何促进分布式一生学习代理人在面临新挑战时适应。在这篇论文中，我们解决这个问题，通过对分布式多任务强化学习（RL）进行理论和实验研究，其中一群N代理人共同解决M任务，无需先知道任务的标识。我们将问题转化为线性参数化上下文 Markov决策过程（MDP），每个任务由一个上下文表示，该上下文描述了过程动态和奖励。为解决这个问题，我们提议一种名为DistMT-LSVI的算法。首先，代理人识别任务，然后通过中央服务器交换信息，以 derivation ε-优化策略。我们的研究表明，要为所有M任务获得ε-优化策略，单个代理人使用DistMT-LSVI只需要总共运行episode数量为最多 $\tilde{\mathcal{O}({d^3H^6(\epsilon^{-2}+c_{\rm sep}^{-2})}\cdot M/N)$，其中 $c_{\rm sep}>0$ 是任务分离度， $H$ 是每集的 horizon， $d$ 是动态和奖励的特征维度。需要注意的是，DistMT-LSVI提高了非分布式设置的样本复杂度的系数，由每个代理人独立学习所有M任务的ε-优化策略，需要 $\tilde{\mathcal{O}(d^3H^6M\epsilon^{-2})$ 集。此外，我们对OpenAI Gym Atari环境进行了实验 validate我们的理论发现。

Bag of Views: An Appearance-based Approach to Next-Best-View Planning for 3D Reconstruction

paper_url: http://arxiv.org/abs/2307.05832
repo_url: https://github.com/acis2021/viewplanningtoolbox
paper_authors: Sara Hatami Gazani, Matthew Tucsok, Iraj Mantegh, Homayoun Najjaran
for: 这篇论文的目的是提出一种基于UAV的智能数据收集技术，用于3D重建和基础设施监测。
methods: 这篇论文使用了图像处理和深度学习技术，并提出了一种基于视图计划的fully appearance-based模型，用于分配视图的有用性。
results: 经过实验，这种模型可以减少数据收集的视图数量，提高3D重建的质量。

Abstract
UAV-based intelligent data acquisition for 3D reconstruction and monitoring of infrastructure has been experiencing an increasing surge of interest due to the recent advancements in image processing and deep learning-based techniques. View planning is an essential part of this task that dictates the information capture strategy and heavily impacts the quality of the 3D model generated from the captured data. Recent methods have used prior knowledge or partial reconstruction of the target to accomplish view planning for active reconstruction; the former approach poses a challenge for complex or newly identified targets while the latter is computationally expensive. In this work, we present Bag-of-Views (BoV), a fully appearance-based model used to assign utility to the captured views for both offline dataset refinement and online next-best-view (NBV) planning applications targeting the task of 3D reconstruction. With this contribution, we also developed the View Planning Toolbox (VPT), a lightweight package for training and testing machine learning-based view planning frameworks, custom view dataset generation of arbitrary 3D scenes, and 3D reconstruction. Through experiments which pair a BoV-based reinforcement learning model with VPT, we demonstrate the efficacy of our model in reducing the number of required views for high-quality reconstructions in dataset refinement and NBV planning.

摘要
In this work, we propose a fully appearance-based model called Bag-of-Views (BoV) to assign utility to captured views for both offline dataset refinement and online next-best-view (NBV) planning applications. We also developed the View Planning Toolbox (VPT), a lightweight package for training and testing machine learning-based view planning frameworks, custom view dataset generation of arbitrary 3D scenes, and 3D reconstruction.Through experiments that pair a BoV-based reinforcement learning model with VPT, we demonstrate the effectiveness of our model in reducing the number of required views for high-quality reconstructions in dataset refinement and NBV planning.

Memorization Through the Lens of Curvature of Loss Function Around Samples

paper_url: http://arxiv.org/abs/2307.05831
repo_url: None
paper_authors: Isha Garg, Kaushik Roy
for: 该研究旨在探讨神经网络在训练集上的溯源和适应性问题。
methods: 该研究使用损失函数的曲线性作为评估神经网络的记忆和泛化性的指标，并在各个训练轮数中平均计算。
results: 研究发现，在各种图像集中，神经网络可以记忆训练集，并且可以通过对损失函数的曲线性进行分析来找到特别的训练样本。此外，该研究还发现了一种在CIFAR100集中的新的失败模型，即拥有不同标签的图像 duplicates。此外，该研究还通过随机损害一些样本的标签，示出了对损失函数曲线性的排序可以高效地分类出异常标签的样本。

Abstract
Neural networks are overparametrized and easily overfit the datasets they train on. In the extreme case, it is shown that they can memorize a training set with fully randomized labels. We propose using the curvature of loss function around the training sample as a measure of its memorization, averaged over all training epochs. We use this to study the generalization versus memorization properties of different samples in popular image datasets. We visualize samples with the highest curvature of loss around them, and show that these visually correspond to long-tailed, mislabeled or conflicting samples. This analysis helps us find a, to the best of our knowledge, novel failure model on the CIFAR100 dataset, that of duplicated images with different labels. We also synthetically mislabel a proportion of the dataset by randomly corrupting the labels of a few samples, and show that sorting by curvature yields high AUROC values for identifying the mislabeled samples.

摘要
神经网络具有过参数和易于适应训练集的问题。在极端情况下，它们可以记忆训练集的全部标签。我们提议使用损失函数的曲线在训练样本周围的平均幅度作为记忆度量，并在各训练轮次中计算。我们利用这种方法来研究不同样本的泛化与记忆性质。我们可视化具有最高损失函数曲线幅度的样本，并发现这些样本视觉上对应于长尾、错误标签或冲突样本。这种分析帮助我们发现了，到目前知道的，CIFAR100数据集上的复制图像标签错误模型。我们还随机扰乱了一些样本的标签，并显示了按照曲线排序可以高AUROC值来识别错误标签样本。

Relational Extraction on Wikipedia Tables using Convolutional and Memory Networks

paper_url: http://arxiv.org/abs/2307.05827
repo_url: https://github.com/simpleparadox/re_656
paper_authors: Arif Shahriar, Rohan Saha, Denilson Barbosa
for: 本研究旨在探讨基于表格数据的关系提取（RE）问题。
methods: 我们提出一种新的模型， combining Convolutional Neural Network (CNN) 和 Bidirectional-Long Short Term Memory (BiLSTM) 网络来编码实体和学习实体之间的依赖关系。
results: 我们对一个大型和最新的数据集进行了评估，并与之前的神经网络方法进行比较。实验结果表明，我们的模型在关系提取任务中的表格数据上表现出了不败的成绩。我们还进行了全面的错误分析和减少研究，以显示模型的组成部分的贡献。

Abstract
Relation extraction (RE) is the task of extracting relations between entities in text. Most RE methods extract relations from free-form running text and leave out other rich data sources, such as tables. We explore RE from the perspective of applying neural methods on tabularly organized data. We introduce a new model consisting of Convolutional Neural Network (CNN) and Bidirectional-Long Short Term Memory (BiLSTM) network to encode entities and learn dependencies among them, respectively. We evaluate our model on a large and recent dataset and compare results with previous neural methods. Experimental results show that our model consistently outperforms the previous model for the task of relation extraction on tabular data. We perform comprehensive error analyses and ablation study to show the contribution of various components of our model. Finally, we discuss the usefulness and trade-offs of our approach, and provide suggestions for fostering further research.

摘要
<>关系提取（RE）是文本中实体之间关系的提取任务。大多数RE方法从自由文本中提取关系，而忽略其他丰富数据源，如表格。我们从表格化数据的视角出发，应用神经网络方法进行关系提取。我们介绍一种新的模型，包括卷积神经网络（CNN）和双向长短期记忆（BiLSTM）网络，用于编码实体和学习实体之间的依赖关系。我们对大量最新数据进行评估，与之前的神经方法进行比较。实验结果表明，我们的模型在关系提取任务中一直表现出色，并且与之前的模型相比，具有更高的性能。我们进行了全面的错误分析和剥离研究，以示模型各部分的贡献。最后，我们讨论了我们的方法的实用性和缺点，并提供了进一步研究的建议。

Neuro-Inspired Efficient Map Building via Fragmentation and Recall

paper_url: http://arxiv.org/abs/2307.05793
repo_url: https://github.com/fietelab/farmap
paper_authors: Jaedong Hwang, Zhang-Wei Hong, Eric Chen, Akhilan Boopathy, Pulkit Agrawal, Ila Fiete
for: 该论文旨在提出一种基于神经科学的Fragmentation-and-Recall（FarMap）策略，以帮助动物和机器人在空间中穿梭和探索环境。
methods: 该论文使用了一种基于预测 surprisal 的归一化方法，通过将空间分解成多个地方图，然后使用这些地方图来设置空间探索的争取目标。当遇到冲击事件时，当地的地方图会被 truncate 并被存储在长期内存（LTM）中，而不是抛弃。
results: 论文通过在复杂的生成的空间环境中测试和评估 FarMap 策略，发现该策略可以更快地探索环境，并且更高效地使用活动内存，而不会影响性能。

Abstract
Animals and robots navigate through environments by building and refining maps of the space. These maps enable functions including navigating back to home, planning, search, and foraging. In large environments, exploration of the space is a hard problem: agents can become stuck in local regions. Here, we use insights from neuroscience to propose and apply the concept of Fragmentation-and-Recall (FarMap), with agents solving the mapping problem by building local maps via a surprisal-based clustering of space, which they use to set subgoals for spatial exploration. Agents build and use a local map to predict their observations; high surprisal leads to a ``fragmentation event'' that truncates the local map. At these events, the recent local map is placed into long-term memory (LTM), and a different local map is initialized. If observations at a fracture point match observations in one of the stored local maps, that map is recalled (and thus reused) from LTM. The fragmentation points induce a natural online clustering of the larger space, forming a set of intrinsic potential subgoals that are stored in LTM as a topological graph. Agents choose their next subgoal from the set of near and far potential subgoals from within the current local map or LTM, respectively. Thus, local maps guide exploration locally, while LTM promotes global exploration. We evaluate FarMap on complex procedurally-generated spatial environments to demonstrate that this mapping strategy much more rapidly covers the environment (number of agent steps and wall clock time) and is more efficient in active memory usage, without loss of performance.

摘要
Animals and robots通过建立和改进环境空间的地图来导航。这些地图使得功能包括返回家园、规划、搜索和搜寻。在大型环境中，探索空间是一个困难的问题：代理人可能会被局部区域困住。我们使用 neuroscience 的发现来提出和应用 Fragmentation-and-Recall（FarMap）概念，代理人通过在空间上基于喜 surprisal 的归一化分组来解决地图问题。代理人建立和使用本地地图，预测其观察结果，高 surprisal 会导致一个“分解事件”，截断本地地图。在这些事件中，最近的本地地图被置入长期记忆（LTM），并初始化一个不同的本地地图。如果观察结果与存储在 LTM 中的地图匹配，那么该地图会被回忆（并因此重复使用）。这些分解点引入了自然的在线归一化，形成了一个内在的潜在分子目标集，并被存储在 LTM 中为一个トポлогиカル图。代理人在当前本地地图或 LTM 中选择下一个目标，从而使得本地地图引导了本地探索，而 LTM 则促进了全局探索。我们在复杂的生成过程空间中评估 FarMap，以示其在环境探索中的更快速、更高效，而无损性能。

Merging multiple input descriptors and supervisors in a deep neural network for tractogram filtering

paper_url: http://arxiv.org/abs/2307.05786
repo_url: None
paper_authors: Daniel Jörgens, Pierre-Marc Jodoin, Maxime Descoteaux, Rodrigo Moreno
for: 本研究旨在提高 tractography 方法的准确率，通过训练深度学习模型来筛选 tractography 数据中的假阳性流线。
methods: 本研究使用了四种不同的 tractogram 筛选策略作为监督器：TractQuerier、RecobundlesX、TractSeg 和一种基于 анато学的筛选器。这些筛选器的输出被组合以获取流线的分类标签。
results: 研究发现，流线坐标和 diffusion 数据在本特定的分类任务中是最 relevante 的信息，其次是 T1 束质数据。

Abstract
One of the main issues of the current tractography methods is their high false-positive rate. Tractogram filtering is an option to remove false-positive streamlines from tractography data in a post-processing step. In this paper, we train a deep neural network for filtering tractography data in which every streamline of a tractogram is classified as {\em plausible, implausible}, or {\em inconclusive}. For this, we use four different tractogram filtering strategies as supervisors: TractQuerier, RecobundlesX, TractSeg, and an anatomy-inspired filter. Their outputs are combined to obtain the classification labels for the streamlines. We assessed the importance of different types of information along the streamlines for performing this classification task, including the coordinates of the streamlines, diffusion data, landmarks, T1-weighted information, and a brain parcellation. We found that the streamline coordinates are the most relevant followed by the diffusion data in this particular classification task.

摘要
一个主要问题是现有的轨迹图方法的假阳性率过高。轨迹图过滤是一种在后处理步骤中除去假阳性流线的方法。在这篇论文中，我们用深度神经网络来筛选轨迹图数据，每个流线都被分类为{\em 可能、不可能}或{\em 不明确}. 我们使用了四种不同的轨迹图筛选策略来作为监管器：TractQuerier、RecobundlesX、TractSeg和一种基于解剖学的筛选器。它们的输出被组合以获得流线的分类标签。我们评估了不同类型的轨迹图信息的重要性来进行这种分类任务，包括流线坐标、扩散数据、标记点、T1强化信息和脑分割。我们发现流线坐标是最重要的，其次是扩散数据。

EgoAdapt: A multi-stream evaluation study of adaptation to real-world egocentric user video

paper_url: http://arxiv.org/abs/2307.05784
repo_url: https://github.com/facebookresearch/egocentricuseradaptation
paper_authors: Matthias De Lange, Hamid Eghbalzadeh, Reuben Tan, Michael Iuzzolino, Franziska Meier, Karl Ridgeway
for: 本研究旨在提出一种适应型 Egocentric Action Recognition 模型，可以在用户眼镜上运行，并在用户的体验中进行适应。
methods: 本研究使用了两个阶段的方法，首先预训练一个人口模型，然后在设备上进行在线适应，以适应用户的体验。
results: 研究表明，使用这种适应型模型可以在真实世界应用中提高 Egocentric Action Recognition 的性能，并且可以在长尾动作分布和大规模分类等实际应用中表现出色。

Abstract
In egocentric action recognition a single population model is typically trained and subsequently embodied on a head-mounted device, such as an augmented reality headset. While this model remains static for new users and environments, we introduce an adaptive paradigm of two phases, where after pretraining a population model, the model adapts on-device and online to the user's experience. This setting is highly challenging due to the change from population to user domain and the distribution shifts in the user's data stream. Coping with the latter in-stream distribution shifts is the focus of continual learning, where progress has been rooted in controlled benchmarks but challenges faced in real-world applications often remain unaddressed. We introduce EgoAdapt, a benchmark for real-world egocentric action recognition that facilitates our two-phased adaptive paradigm, and real-world challenges naturally occur in the egocentric video streams from Ego4d, such as long-tailed action distributions and large-scale classification over 2740 actions. We introduce an evaluation framework that directly exploits the user's data stream with new metrics to measure the adaptation gain over the population model, online generalization, and hindsight performance. In contrast to single-stream evaluation in existing works, our framework proposes a meta-evaluation that aggregates the results from 50 independent user streams. We provide an extensive empirical study for finetuning and experience replay.

摘要
固有人称行为识别中通常使用单一人口模型，例如扩展现实头盔设备。而我们提出了一种适应型两阶段方法，其中首先预训练人口模型，然后在设备上线上适应用户的经验。这种设置具有高度挑战性，因为从人口预训练模型到用户预测模型的变化，以及用户数据流中的分布转移问题。为了应对后一点，我们引入了不断学习，其中进步基于控制的标准准 benchmark，但实际应用中的挑战通常未得到解决。我们引入了 EgoAdapt，一个用于实际世界 egocentric 行为识别的 benchmark，以及来自 Ego4d 的 egocentric 视频流，其中包括长尾动作分布和大规模分类的 2740 个动作。我们引入了一种直接利用用户数据流的新评价指标，以度量适应准则、在线泛化和后知性性能。与单流评价不同，我们的框架提出了一种元评价，可以将50个独立用户流的结果相加。我们进行了广泛的实验研究，以让 fine-tuning 和经验回放。

Has China caught up to the US in AI research? An exploration of mimetic isomorphism as a model for late industrializers

paper_url: http://arxiv.org/abs/2307.10198
repo_url: None
paper_authors: Chao Min, Yi Zhao, Yi Bu, Ying Ding, Caroline S. Wagner
For: This paper examines China’s development of artificial intelligence (AI) and compares it to the USA.* Methods: The paper uses data on AI-related research papers to analyze the volume and quality of China’s AI research, as well as a novel measure to gauge China’s imitation of US research.* Results: The paper finds that China has made remarkable progress in AI development, surpassing the USA in volume of research papers, but the USA still has a slight edge in terms of quality. Additionally, the paper shows that China has effectively bridged a significant knowledge gap and could potentially be setting out on an independent research trajectory.Here are the three points in Simplified Chinese text:* For: 这篇论文研究了中国人工智能的发展，并与美国进行比较。* Methods: 论文使用了AI相关研究论文的数据，分析了中国的AI研究量和质量，以及一种新的测量方法来衡量中国对美国研究的模仿。* Results: 论文发现，中国在人工智能方面的研究进步很快，已经超过美国的研究量，但美国仍然在质量上保持一定的优势。此外，论文还显示，中国已经减少了一个重要的知识差距，并可能在独立的研究轨迹上进行独立的研究。

Abstract
Artificial Intelligence (AI), a cornerstone of 21st-century technology, has seen remarkable growth in China. In this paper, we examine China's AI development process, demonstrating that it is characterized by rapid learning and differentiation, surpassing the export-oriented growth propelled by Foreign Direct Investment seen in earlier Asian industrializers. Our data indicates that China currently leads the USA in the volume of AI-related research papers. However, when we delve into the quality of these papers based on specific metrics, the USA retains a slight edge. Nevertheless, the pace and scale of China's AI development remain noteworthy. We attribute China's accelerated AI progress to several factors, including global trends favoring open access to algorithms and research papers, contributions from China's broad diaspora and returnees, and relatively lax data protection policies. In the vein of our research, we have developed a novel measure for gauging China's imitation of US research. Our analysis shows that by 2018, the time lag between China and the USA in addressing AI research topics had evaporated. This finding suggests that China has effectively bridged a significant knowledge gap and could potentially be setting out on an independent research trajectory. While this study compares China and the USA exclusively, it's important to note that research collaborations between these two nations have resulted in more highly cited work than those produced by either country independently. This underscores the power of international cooperation in driving scientific progress in AI.

摘要
人工智能（AI），21世纪科技的重要架构，在中国受到了无史前的发展。在这篇论文中，我们研究了中国的AI发展过程，发现它具有快速学习和 diferenciación的特点，超越了在更早的亚洲工业化国家中由外irect投资驱动的出口增长。据我们的数据显示，中国目前在美国之前在AI相关研究论文的量方面领先。然而，当我们根据特定指标进行评价时，美国仍保持一定的优势。不过，中国的AI发展速度和规模仍然很有吸引力。我们归因中国的快速AI进步于一些因素，包括全球趋势对算法和研究论文的开放Access，中国广泛的移民和返回者的贡献，以及相对松懈的数据保护政策。在我们的研究中，我们还开发了一种新的中国imitates US research的度量器。我们的分析显示，到2018年，中国和美国在解决AI研究话题上的时差已经消失。这一发现表明中国已经 bridge了一个重要的知识差距，可能在独立的研究轨迹上发展。虽然这篇研究仅对中国和美国进行了对比，但是需要注意的是，中美两国之间的合作研究已经生成了更多的引用的论文，这表明了国际合作在人工智能领域的科学进步的力量。

Unsupervised Learning in Complex Systems

paper_url: http://arxiv.org/abs/2307.10993
repo_url: https://github.com/hugcis/evolving-structures-in-complex-systems
paper_authors: Hugo Cisneros
for: 这些论文的目的是研究自然和人工系统中的学习和适应。
methods: 这篇论文使用复杂系统来研究学习和适应现象，包括发展一个普适的复杂度度量标准，以及应用大规模复杂系统中的减简方法来研究计算。
results: 这篇论文的主要结果是开发了一个学习效率度量标准，以及一个大规模复杂系统中的学习算法测试集。这些发现对于理解自然和人工系统中的学习和适应现象具有重要意义，并可能推动未来的学习算法的开发。

Abstract
In this thesis, we explore the use of complex systems to study learning and adaptation in natural and artificial systems. The goal is to develop autonomous systems that can learn without supervision, develop on their own, and become increasingly complex over time. Complex systems are identified as a suitable framework for understanding these phenomena due to their ability to exhibit growth of complexity. Being able to build learning algorithms that require limited to no supervision would enable greater flexibility and adaptability in various applications. By understanding the fundamental principles of learning in complex systems, we hope to advance our ability to design and implement practical learning algorithms in the future. This thesis makes the following key contributions: the development of a general complexity metric that we apply to search for complex systems that exhibit growth of complexity, the introduction of a coarse-graining method to study computations in large-scale complex systems, and the development of a metric for learning efficiency as well as a benchmark dataset for evaluating the speed of learning algorithms. Our findings add substantially to our understanding of learning and adaptation in natural and artificial systems. Moreover, our approach contributes to a promising new direction for research in this area. We hope these findings will inspire the development of more effective and efficient learning algorithms in the future.

摘要
在这个论文中，我们探索了使用复杂系统来研究学习和适应自然和人工系统中的现象。目标是开发能够自主学习、不需监督、逐渐增加复杂性的自适应系统。由于复杂系统能够展现增长复杂性的特点，因此我们选择使用复杂系统作为研究的理想框架。通过理解复杂系统中学习的基本原理，我们期望能够在未来设计和实现更加有效和高效的学习算法。本论文做出了以下关键贡献：开发了一种通用的复杂度指标，用于搜索展现增长复杂性的复杂系统，引入了大规模复杂系统中计算的粗糙化方法，以及开发了学习效率指标和评估学习算法速度的标准数据集。我们的发现对自然和人工系统中的学习和适应现象做出了重要贡献，同时，我们的方法也对研究这一领域的未来发展做出了重要贡献。我们希望这些发现能够激励未来的研究人员开发更有效和高效的学习算法。

Rad-ReStruct: A Novel VQA Benchmark and Method for Structured Radiology Reporting

paper_url: http://arxiv.org/abs/2307.05766
repo_url: https://github.com/chantalmp/rad-restruct
paper_authors: Chantal Pellegrini, Matthias Keicher, Ege Özsoy, Nassir Navab
for: 本研究旨在提高验Images radiology reporting的效率和准确性，通过结构化报告来alleviate radiologists和其他医疗专业人员之间的沟通问题。
methods: 本研究使用了 hierarchical visual question answering (VQA) 模型，named hi-VQA，该模型考虑了上一个问题和答案的层次结构，以便在结构化报告中填充验Images的信息。
results: experiments 表明，hi-VQA 在 medical VQA benchmark VQARad 上 achieved competitive performance，并在不具备域专业视语言预训练的情况下表现最佳，同时在 Rad-ReStruct 上提供了一个强大的基线值。本研究的成果为自动化结构化报告带来了一个重要的一步，并为未来在这个领域的研究提供了一个有价值的首个benchmark。

Abstract
Radiology reporting is a crucial part of the communication between radiologists and other medical professionals, but it can be time-consuming and error-prone. One approach to alleviate this is structured reporting, which saves time and enables a more accurate evaluation than free-text reports. However, there is limited research on automating structured reporting, and no public benchmark is available for evaluating and comparing different methods. To close this gap, we introduce Rad-ReStruct, a new benchmark dataset that provides fine-grained, hierarchically ordered annotations in the form of structured reports for X-Ray images. We model the structured reporting task as hierarchical visual question answering (VQA) and propose hi-VQA, a novel method that considers prior context in the form of previously asked questions and answers for populating a structured radiology report. Our experiments show that hi-VQA achieves competitive performance to the state-of-the-art on the medical VQA benchmark VQARad while performing best among methods without domain-specific vision-language pretraining and provides a strong baseline on Rad-ReStruct. Our work represents a significant step towards the automated population of structured radiology reports and provides a valuable first benchmark for future research in this area. We will make all annotations and our code for annotation generation, model evaluation, and training publicly available upon acceptance. Our dataset and code is available at https://github.com/ChantalMP/Rad-ReStruct.

摘要
辐射报告是医疗专业人员之间重要的沟通方式，但是它可能占用时间和容易出错。一种解决方案是使用结构化报告，这可以保存时间并且帮助更准确地评估。然而，关于自动化结构化报告的研究很少，而且没有公共的标准对比板准。为了填补这个差距，我们引入Rad-ReStruct，一个新的标准数据集，它提供了高级、层次结构的注释形式的X射线图像报告。我们将报告生成任务模型为层次视Question Answering（VQA），并提出一种新的方法，即hi-VQA，它考虑了上一个问题和答案的先前 контекст，以便填充结构化的医疗报告。我们的实验表明，hi-VQA可以与当前医疗VQA标准 benchmark VQARad 竞争，而且在不含特定视力语言预训练的情况下，hi-VQA 表现最佳。我们的工作代表了自动化结构化医疗报告的重要一步，并提供了未来这一领域的价值先锋。我们将在接受后发布所有注释和代码，包括报告生成、模型评估和训练代码。我们的数据集和代码可以在https://github.com/ChantalMP/Rad-ReStruct 中找到。

Towards A Scalable Solution for Improving Multi-Group Fairness in Compositional Classification

paper_url: http://arxiv.org/abs/2307.05728
repo_url: None
paper_authors: James Atwood, Tina Tian, Ben Packer, Meghana Deodhar, Jilin Chen, Alex Beutel, Flavien Prost, Ahmad Beirami
for: 这篇论文旨在解决复杂系统中的机器学习公平问题，其中最终预测结果是多个分类器的组合，并且存在多个群体。
methods: 作者首先显示了自然基线方法用于提高等机会公平性的扩展性不佳，这些方法的扩展性 linearly 增长与多个修复群体和多个预测标签的乘积。然后，作者介绍了两种简单的技术，称为“任务过conditioning”和“群体排列”，以实现常数扩展性在多个群体多个标签设置下。
results: 作者在学术和实际环境中进行了实验，证明了他们的提议可以有效地 mitigate 在这种环境中。

Abstract
Despite the rich literature on machine learning fairness, relatively little attention has been paid to remediating complex systems, where the final prediction is the combination of multiple classifiers and where multiple groups are present. In this paper, we first show that natural baseline approaches for improving equal opportunity fairness scale linearly with the product of the number of remediated groups and the number of remediated prediction labels, rendering them impractical. We then introduce two simple techniques, called {\em task-overconditioning} and {\em group-interleaving}, to achieve a constant scaling in this multi-group multi-label setup. Our experimental results in academic and real-world environments demonstrate the effectiveness of our proposal at mitigation within this environment.

摘要
尽管机器学习公平related literature已经有很多研究，但对于复杂系统来说，即最终预测是多个分类器的组合，多个群体存在的情况，相对较少获得了关注。在这篇论文中，我们首先表明了自然基线方法来提高equal opportunity fairness的扩展性是线性增长的，这意味着在多个群体多个预测标签的多组合情况下实现不可行。然后，我们介绍了两种简单的技术，即任务过程和群体排序，以实现常数级别的扩展性在多个多标签的设置下。我们在学术和实际环境中进行了实验，并证明了我们的提议的效果。

An Open-Source Knowledge Graph Ecosystem for the Life Sciences

paper_url: http://arxiv.org/abs/2307.05727
repo_url: None
paper_authors: Tiffany J. Callahan, Ignacio J. Tripodi, Adrianne L. Stefanski, Luca Cappelletti, Sanya B. Taneja, Jordan M. Wyrwa, Elena Casiraghi, Nicolas A. Matentzoglu, Justin Reese, Jonathan C. Silverstein, Charles Tapley Hoyt, Richard D. Boyce, Scott A. Malec, Deepak R. Unni, Marcin P. Joachimiak, Peter N. Robinson, Christopher J. Mungall, Emanuele Cavalleri, Tommaso Fontana, Giorgio Valentini, Marco Mesiti, Lucas A. Gillenwater, Brook Santangelo, Nicole A. Vasilevsky, Robert Hoehndorf, Tellen D. Bennett, Patrick B. Ryan, George Hripcsak, Michael G. Kahn, Michael Bada, William A. Baumgartner Jr, Lawrence E. Hunter
for: 本研究旨在提高生物层次结构数据的融合，以提高翻译研究的效果。
methods: 该研究使用知识图（KG）模型复杂现象，并提供了自动构建KG的方法。
results: 研究表明，使用PheKnowLator可以实现自定义知识表示，而不需要固定的知识表示模型。此外，PheKnowLator在构建12个大规模KG时的计算性能也充分表现了其可用性和可靠性。

Abstract
Translational research requires data at multiple scales of biological organization. Advancements in sequencing and multi-omics technologies have increased the availability of these data but researchers face significant integration challenges. Knowledge graphs (KGs) are used to model complex phenomena, and methods exist to automatically construct them. However, tackling complex biomedical integration problems requires flexibility in the way knowledge is modeled. Moreover, existing KG construction methods provide robust tooling at the cost of fixed or limited choices among knowledge representation models. PheKnowLator (Phenotype Knowledge Translator) is a semantic ecosystem for automating the FAIR (Findable, Accessible, Interoperable, and Reusable) construction of ontologically grounded KGs with fully customizable knowledge representation. The ecosystem includes KG construction resources (e.g., data preparation APIs), analysis tools (e.g., SPARQL endpoints and abstraction algorithms), and benchmarks (e.g., prebuilt KGs and embeddings). We evaluate the ecosystem by surveying open-source KG construction methods and analyzing its computational performance when constructing 12 large-scale KGs. With flexible knowledge representation, PheKnowLator enables fully customizable KGs without compromising performance or usability.

摘要
跨度研究需要生物组织层次的数据。sequencing和多Omics技术的进步使得这些数据更加可 accessible，但是研究人员面临着融合问题。知识图（KG）可以模型复杂现象，而且有方法可以自动构建它们。然而，解决生物医学融合问题需要在知识表示模型中的灵活性。此外，现有的KG建构方法提供了可靠的工具，但是这些方法通常具有固定或有限的知识表示模型选择。PheKnowLator（现象知识翻译器）是一个 semantic生态系统，用于自动构建符合FAIR（找到、访问、可重用）的ontologically grounded KGs，并具有完全可定义的知识表示模型。该生态系统包括KG建构资源（例如数据准备API），分析工具（例如SPARQL端点和抽象算法），以及标准（例如预建KGs和嵌入）。我们通过survey open-source KG建构方法和分析其计算性能，发现PheKnowLator可以在构建12个大规模KGs时保持高效性和可用性。通过 flexible知识表示，PheKnowLator允许无需妥协性能和用户体验的自定义KG。Simplified Chinese translation:研究需要多种生物组织层次的数据。现代sequencing和多Omics技术的进步使得这些数据更加可 accessible，但是研究人员面临着融合问题。知识图可以模型复杂现象，并且有方法可以自动构建它们。然而，解决生物医学融合问题需要在知识表示模型中的灵活性。现有的KG建构方法提供了可靠的工具，但是这些方法通常具有固定或有限的知识表示模型选择。PheKnowLator是一个 semantic生态系统，用于自动构建符合FAIR的ontologically grounded KGs，并具有完全可定义的知识表示模型。该生态系统包括KG建构资源、分析工具和标准。我们通过survey open-source KG建构方法和分析其计算性能，发现PheKnowLator可以在构建12个大规模KGs时保持高效性和可用性。通过 flexible知识表示，PheKnowLator允许无需妥协性能和用户体验的自定义KG。

A Causal Ordering Prior for Unsupervised Representation Learning

paper_url: http://arxiv.org/abs/2307.05704
repo_url: None
paper_authors: Avinash Kori, Pedro Sanchez, Konstantinos Vilouras, Ben Glocker, Sotirios A. Tsaftaris
for: 本研究旨在提出一种完全无监督的表征学习方法，以帮助理解数据中因果关系的潜在结构。
methods: 本方法基于一种隐藏变量模型（ANM），通过对 latent space 的梯度来鼓励 latent space 遵循 causal 顺序。
results: 研究人员通过实验表明，该方法可以自动找到 causal 关系，并且可以在不同的数据集上进行适应。

Abstract
Unsupervised representation learning with variational inference relies heavily on independence assumptions over latent variables. Causal representation learning (CRL), however, argues that factors of variation in a dataset are, in fact, causally related. Allowing latent variables to be correlated, as a consequence of causal relationships, is more realistic and generalisable. So far, provably identifiable methods rely on: auxiliary information, weak labels, and interventional or even counterfactual data. Inspired by causal discovery with functional causal models, we propose a fully unsupervised representation learning method that considers a data generation process with a latent additive noise model (ANM). We encourage the latent space to follow a causal ordering via loss function based on the Hessian of the latent distribution.

摘要
<>通过无监督学习和变量推理来实现无监督表示学习，它们假设了离散变量之间的独立性。然而， causal representation learning（CRL）认为，数据集中的变量是 causally related的。允许潜在变量之间存在相关性，是更加现实和普遍的。目前，可证可明的方法包括：协助信息、弱标签和 intervenational或者 counterfactual 数据。我们 draw inspiration from causal discovery with functional causal models，并提出了一种完全无监督表示学习方法，基于 latent additive noise model（ANM）。我们鼓励潜在空间遵循 causal ordering via loss function based on the Hessian of the latent distribution。Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you prefer Traditional Chinese, I can provide that as well.

Objaverse-XL: A Universe of 10M+ 3D Objects

paper_url: http://arxiv.org/abs/2307.05663
repo_url: None
paper_authors: Matt Deitke, Ruoshi Liu, Matthew Wallingford, Huong Ngo, Oscar Michel, Aditya Kusupati, Alan Fan, Christian Laforte, Vikram Voleti, Samir Yitzhak Gadre, Eli VanderBilt, Aniruddha Kembhavi, Carl Vondrick, Georgia Gkioxari, Kiana Ehsani, Ludwig Schmidt, Ali Farhadi
for: 本研究旨在提供大规模3D数据集，以推动3D视觉任务的进步。
methods: 本研究使用了多种来源的3D对象，包括手动设计的对象、光学摄影扫描的景点和日常物品、以及专业扫描的历史和珍贵品。
results: 研究表明，通过训练Zero123在新视图synthesis中，使用100万多视图渲染图像，可以实现强的零批处理泛化能力。

Abstract
Natural language processing and 2D vision models have attained remarkable proficiency on many tasks primarily by escalating the scale of training data. However, 3D vision tasks have not seen the same progress, in part due to the challenges of acquiring high-quality 3D data. In this work, we present Objaverse-XL, a dataset of over 10 million 3D objects. Our dataset comprises deduplicated 3D objects from a diverse set of sources, including manually designed objects, photogrammetry scans of landmarks and everyday items, and professional scans of historic and antique artifacts. Representing the largest scale and diversity in the realm of 3D datasets, Objaverse-XL enables significant new possibilities for 3D vision. Our experiments demonstrate the improvements enabled with the scale provided by Objaverse-XL. We show that by training Zero123 on novel view synthesis, utilizing over 100 million multi-view rendered images, we achieve strong zero-shot generalization abilities. We hope that releasing Objaverse-XL will enable further innovations in the field of 3D vision at scale.

摘要
自然语言处理和2D视觉模型在许多任务上已经达到了很高的水平，主要是通过增加训练数据的规模来实现。然而，3D视觉任务尚未经历同样的进步，一部分原因是收集高质量3D数据的困难。在这篇文章中，我们提出了Objaverse-XL数据集，包含了1000万个独特的3D对象。我们的数据集包括手动设计的对象、光学扫描的景点和日常用品、以及专业扫描的历史和珍贵 artifacts。 represent the largest scale and diversity in the field of 3D datasets, Objaverse-XL enables significant new possibilities for 3D vision. our experiments show that by training Zero123 on novel view synthesis using over 100 million multi-view rendered images, we achieve strong zero-shot generalization abilities. we hope that releasing Objaverse-XL will enable further innovations in the field of 3D vision at scale.Please note that the translation is in Simplified Chinese, and the grammar and sentence structure may be different from the original text.

Self-consistency for open-ended generations

paper_url: http://arxiv.org/abs/2307.06857
repo_url: None
paper_authors: Siddhartha Jain, Xiaofei Ma, Anoop Deoras, Bing Xiang
for: 这篇论文是关于如何提高语言模型（LLM）生成质量的研究。
methods: 该论文提出了一种新的方法来重新排序和选择LLM生成的最佳结果，而不需要额外的推理或训练特殊的reranker。这种方法基于生成之间的对比统计，计算成本很低。
results: 论文通过 teoretic 分析和仿真实验示出，这种方法可以帮助选择LLM生成的最佳$k$个结果，并且在代码生成、自动化ormalization和概要化等任务上都有强大的提升。此外，如果有更多的token概率信息可用，那么性能会更好。

Abstract
Large Language Models (LLMs) can exhibit considerable variation in the quality of their sampled outputs. Reranking and selecting the best generation from the sampled set is a popular way of obtaining strong gains in generation quality. In this paper, we present a novel approach for reranking LLM generations. Unlike other techniques that might involve additional inferences or training a specialized reranker, our approach relies on easy to compute pairwise statistics between the generations that have minimal compute overhead. We show that our approach can be formalized as an extension of self-consistency and analyze its performance in that framework, theoretically as well as via simulations. We show strong improvements for selecting the best $k$ generations for code generation tasks as well as robust improvements for best generation for the tasks of autoformalization, and summarization. While our approach only assumes black-box access to LLMs, we show that additional access to token probabilities can improve performance even further.

摘要
大语言模型（LLM）可以具有较大的输出质量差异。重新排序和选择最佳一代是一种常见的方法来提高生成质量。在这篇论文中，我们提出了一种新的重新排序LLM生成的方法。与其他技术不同，我们的方法不需要额外的推理或训练专门的重新排序器，而是基于易于计算的对生成的对应统计。我们证明了我们的方法可以视为自相关性的扩展，并在这个框架下分析其性能，包括理论分析和仿真分析。我们表明了在代码生成任务和自动化ormalization、概要化任务中可以获得强大的改进。我们的方法只需要黑盒访问LLM，但我们还表明了通过获得Token概率可以进一步提高性能。

Bio-Inspired Night Image Enhancement Based on Contrast Enhancement and Denoising

paper_url: http://arxiv.org/abs/2307.05447
repo_url: None
paper_authors: Xinyi Bai, Steffi Agino Priyanka, Hsiao-Jung Tung, Yuankai Wang
for: 提高夜间图像质量，以提高智能监测系统的检测和识别精度。
methods: 提出了一种基于生物体的图像增强算法，通过提高亮度和对比度，同时降低噪声，将低照度图像转化为更加亮丽和清晰的图像。
results: 对实验和模拟实验进行测试，Result show了提案算法的优势，比contrast pair、Meylan和Retinex等方法更有优势。

Abstract
Due to the low accuracy of object detection and recognition in many intelligent surveillance systems at nighttime, the quality of night images is crucial. Compared with the corresponding daytime image, nighttime image is characterized as low brightness, low contrast and high noise. In this paper, a bio-inspired image enhancement algorithm is proposed to convert a low illuminance image to a brighter and clear one. Different from existing bio-inspired algorithm, the proposed method doesn't use any training sequences, we depend on a novel chain of contrast enhancement and denoising algorithms without using any forms of recursive functions. Our method can largely improve the brightness and contrast of night images, besides, suppress noise. Then we implement on real experiment, and simulation experiment to test our algorithms. Both results show the advantages of proposed algorithm over contrast pair, Meylan and Retinex.

摘要
因为许多智能监视系统夜间物体检测和识别精度低，夜间图像质量非常重要。与白天图像相比，夜间图像具有低亮度、低对比度和高噪声特点。在这篇论文中，我们提出了一种生物体会得到的图像加强算法，以提高低照度图像的亮度和清晰度。与现有的生物体算法不同，我们不使用任何训练序列，而是基于一个新的对比增强和释放算法，不使用任何回归函数。我们的方法可以大幅提高夜间图像的亮度和对比度，同时减少噪声。然后我们在实验和模拟实验中测试了我们的算法，结果表明我们的算法在对比对和昂 Meylan 和 Retinex 方面具有优势。

ISLTranslate: Dataset for Translating Indian Sign Language

paper_url: http://arxiv.org/abs/2307.05440
repo_url: https://github.com/exploration-lab/isltranslate
paper_authors: Abhinav Joshi, Susmit Agrawal, Ashutosh Modi
for: bridging the communication gap between the hard-of-hearing community and the rest of the population
methods: using a translation dataset for continuous Indian Sign Language (ISL) consisting of 31k ISL-English sentence/phrase pairs
results: providing a detailed analysis of the dataset and benchmarking the performance of existing end-to-end Sign language to spoken language translation systems using a transformer-based model for ISL translation.Here’s the information in Simplified Chinese text:
for: bridging the communication gap between听力弱化的社区和普通人口
methods: 使用印度手语译文件（ISLTranslate），包含31k个印度手语-英语句子/短语对
results: 提供了详细的数据分析，并对现有的端到端手语到口语翻译系统的性能进行了 benchmarking，使用基于 transformer 的 ISL 翻译模型。

Abstract
Sign languages are the primary means of communication for many hard-of-hearing people worldwide. Recently, to bridge the communication gap between the hard-of-hearing community and the rest of the population, several sign language translation datasets have been proposed to enable the development of statistical sign language translation systems. However, there is a dearth of sign language resources for the Indian sign language. This resource paper introduces ISLTranslate, a translation dataset for continuous Indian Sign Language (ISL) consisting of 31k ISL-English sentence/phrase pairs. To the best of our knowledge, it is the largest translation dataset for continuous Indian Sign Language. We provide a detailed analysis of the dataset. To validate the performance of existing end-to-end Sign language to spoken language translation systems, we benchmark the created dataset with a transformer-based model for ISL translation.

摘要
签语是许多听力不佳人群的主要沟通方式。近些年，为bridging听力不佳社区和一般人群之间的沟通差距，一些签语翻译数据集已经被提议，以便开发统计签语翻译系统。然而，印度签语资源匮乏。这篇资源文章介绍了ISLTranslate，一个包含31k ISL-英语句子/短语对的不间断印度签语翻译数据集。根据我们所知，这是最大的签语翻译数据集。我们提供了细化分析。为验证现有的签语到口语翻译系统的性能，我们对创建的数据集进行了基于transformer的ISL翻译模型的测试。Note: "签语" (GS) in Simplified Chinese refers to sign language, and "听力不佳" (DW) refers to hearing impairment.

Named entity recognition using GPT for identifying comparable companies

paper_url: http://arxiv.org/abs/2307.07420
repo_url: None
paper_authors: Eurico Covas
For: 本研究旨在提高 comparable companies 方法的精度和成功率，并用于评估私人公司的估值。* Methods: 本研究使用大语言模型（LLM），如 openaAI 的 GPT，以及自然语言处理（NLP）技术，对公司描述或 Wikipedia 网站上的公司描述进行相似性分析。* Results: 研究表明，使用 LLM 比使用标准命名实体识别（NER）更有精度和成功率，并可以创建适当的相似公司 peer group，用于评估私人公司的估值。

Abstract
For both public and private firms, comparable companies analysis is widely used as a method for company valuation. In particular, the method is of great value for valuation of private equity companies. The several approaches to the comparable companies method usually rely on a qualitative approach to identifying similar peer companies, which tends to use established industry classification schemes and/or analyst intuition and knowledge. However, more quantitative methods have started being used in the literature and in the private equity industry, in particular, machine learning clustering, and natural language processing (NLP). For NLP methods, the process consists of extracting product entities from e.g., the company's website or company descriptions from some financial database system and then to perform similarity analysis. Here, using companies descriptions/summaries from publicly available companies' Wikipedia websites, we show that using large language models (LLMs), such as GPT from openaAI, has a much higher precision and success rate than using the standard named entity recognition (NER) which uses manual annotation. We demonstrate quantitatively a higher precision rate, and show that, qualitatively, it can be used to create appropriate comparable companies peer groups which can then be used for equity valuation.

摘要
Translated into Simplified Chinese:对于公共和私人公司来说，相似公司分析是广泛使用的公司估价方法。特别是对于私人Equity公司的估价，这种方法具有很高的价值。不同的方法通常采用 качеitative方法来确定类似 peer 公司，这些方法常常使用Established的行业分类 schemes和/或分析师的Intuition和知识。然而，更Quantitative的方法在文献和私人Equity行业中得到了更多的应用，例如机器学习 clustering和自然语言处理（NLP）。对NLP方法来说，过程包括从公司网站或Financial数据库系统中提取产品实体，并 then perform similarity analysis。在这里，我们使用公司Wikipedia网站上公开的公司描述/概要，并使用大语言模型（LLMs），如openaAI中的GPT，来实现更高的准确率和成功率，而不是使用标准的名实体识别（NER），NER使用手动批注。我们量化地示出了更高的准确率，并显示，质量地，可以使用这种方法来创建合适的类似公司 peer group，然后用于股票估价。

3D detection of roof sections from a single satellite image and application to LOD2-building reconstruction

paper_url: http://arxiv.org/abs/2307.05409
repo_url: None
paper_authors: Johann Lussange, Mulin Yu, Yuliya Tarabalka, Florent Lafarge
for: 这个论文的目的是 reconstruction 三角形urbane areas out of satellite raster images.
methods: 该方法包括两个新特征：一是基于深度学习的3D建筑瓦片探测，二是只需一个非正交的卫星照片作为模型输入。这是通过两步进行的：首先，一个Mask R-CNN模型进行2D分割建筑的瓦片部分，然后将这些分割的像素混合到RGB卫星照片中，并在第二步中，另一个相同的Mask R-CNN模型通过人工分割来推算瓦片部分的高度到地面的准确性，从而实现了建筑和城市的3D重建。
results: 该方法的可能性被证明了，通过在几分钟内重建不同的城市区域，Jaccard指数为2D分割个瓦片部分的88.55%和75.21%，以及3D重建中 correctly segmented pixels的高度差的平均错误为1.60米和2.06米。

Abstract
Reconstructing urban areas in 3D out of satellite raster images has been a long-standing and challenging goal of both academical and industrial research. The rare methods today achieving this objective at a Level Of Details $2$ rely on procedural approaches based on geometry, and need stereo images and/or LIDAR data as input. We here propose a method for urban 3D reconstruction named KIBS(\textit{Keypoints Inference By Segmentation}), which comprises two novel features: i) a full deep learning approach for the 3D detection of the roof sections, and ii) only one single (non-orthogonal) satellite raster image as model input. This is achieved in two steps: i) by a Mask R-CNN model performing a 2D segmentation of the buildings' roof sections, and after blending these latter segmented pixels within the RGB satellite raster image, ii) by another identical Mask R-CNN model inferring the heights-to-ground of the roof sections' corners via panoptic segmentation, unto full 3D reconstruction of the buildings and city. We demonstrate the potential of the KIBS method by reconstructing different urban areas in a few minutes, with a Jaccard index for the 2D segmentation of individual roof sections of $88.55\%$ and $75.21\%$ on our two data sets resp., and a height's mean error of such correctly segmented pixels for the 3D reconstruction of $1.60$ m and $2.06$ m on our two data sets resp., hence within the LOD2 precision range.

摘要
traditional Chinese:重建城市区域在3D级别出现了长期的学术和工业研究的挑战。今天的方法只有在Level Of Details 2（LOD2）级别达到这个目标，并且需要遮盖图像和/或激光雷达数据作为输入。我们在这里提出了一种名为KIBS（关键点推断 BY 分割）的城市3D重建方法，它包含两个新特点：1. 基于深度学习的3D瓦屋部分检测方法，可以准确地检测各个建筑物的瓦屋部分。2. 只需一个非对称的卫星照片图像作为模型输入，从而简化了输入数据的需求。这些方法在两个步骤中实现：1. 使用Mask R-CNN模型对建筑物的瓦屋部分进行2D分割，并将这些分割后的像素混合到RGB卫星照片图像中。2. 使用另一个相同的Mask R-CNN模型对瓦屋部分的角落进行高度推断，从而实现了建筑物和城市的3D重建。我们通过使用KIBS方法重建不同的城市区域，并证明了该方法的可行性。在我们的两个数据集上，2D分割的瓦屋部分Jaccard指数为88.55%和75.21%，而3D重建中高度的平均误差为1.60米和2.06米。这些结果表明KIBS方法在LOD2级别内达到了高度精度。

Domain-Agnostic Neural Architecture for Class Incremental Continual Learning in Document Processing Platform

paper_url: http://arxiv.org/abs/2307.05399
repo_url: https://github.com/mateusz-wojcik-97/domain-agnostic-architecture
paper_authors: Mateusz Wójcik, Witold Kościukiewicz, Mateusz Baran, Tomasz Kajdanowicz, Adam Gonczarek
for: 这个论文主要旨在解决复杂系统中ML体系高效可 reuse的问题，具体是在流式数据下进行分类问题。
methods: 该论文提出了一种基于混合专家模型的完全可导的体系，可以在每个类例 separately 进行训练高性能的分类器。
results: 经过了大量的实验证明，该方法可以在多个领域中达到最佳性能，并且可以在生产环境中进行在线学习，不需要内存缓存。与参考方法相比，该方法显著超越了其性能。

Abstract
Production deployments in complex systems require ML architectures to be highly efficient and usable against multiple tasks. Particularly demanding are classification problems in which data arrives in a streaming fashion and each class is presented separately. Recent methods with stochastic gradient learning have been shown to struggle in such setups or have limitations like memory buffers, and being restricted to specific domains that disable its usage in real-world scenarios. For this reason, we present a fully differentiable architecture based on the Mixture of Experts model, that enables the training of high-performance classifiers when examples from each class are presented separately. We conducted exhaustive experiments that proved its applicability in various domains and ability to learn online in production environments. The proposed technique achieves SOTA results without a memory buffer and clearly outperforms the reference methods.

摘要
生产部署在复杂系统中需要机器学习建筑高效可重用，能够对多个任务进行学习。特别是在流动数据中进行分类问题时， recient方法 based on stochastic gradient learning 可能会在这种设置下遇到问题或有限制，如内存缓存和特定领域约束，这限制了它们在实际场景中的使用。为此，我们提出了一种完全可导的混合专家模型基础 architecture，可以在每个类别的示例被分别传输时，训练高性能的分类器。我们进行了广泛的实验，证明了它在不同领域中的可应用性和在生产环境中的在线学习能力。我们的方法在SOTA结果中获得了无缓存的优势，并明显超过了参考方法。

2023-07-12

DSSE: a drone swarm search environment

Testing different Log Bases For Vector Model Weighting Technique

Self-Adaptive Large Language Model (LLM)-Based Multiagent Systems

Can Vision-Language Models be a Good Guesser? Exploring VLMs for Times and Location Reasoning

Deep Generative Models for Physiological Signals: A Systematic Literature Review

Reflective Hybrid Intelligence for Meaningful Human Control in Decision-Support Systems

Maneuver Decision-Making Through Automatic Curriculum Reinforcement Learning Without Handcrafted Reward functions

SayPlan: Grounding Large Language Models using 3D Scene Graphs for Scalable Task Planning

Guided Bottom-Up Interactive Constraint Acquisition

Learning Hierarchical Interactive Multi-Object Search for Mobile Manipulation

TreeFormer: a Semi-Supervised Transformer-based Framework for Tree Counting from a Single High Resolution Image

CLAIMED – the open source framework for building coarse-grained operators for accelerated discovery in science

Quantitative CLTs in Deep Neural Networks

VELMA: Verbalization Embodiment of LLM Agents for Vision and Language Navigation in Street View

Assessment of the suitability of degradation models for the planning of CCTV inspections of sewer pipes

Machine Learning for Autonomous Vehicle’s Trajectory Prediction: A comprehensive survey, Challenges, and Future Research Directions

Visualization for Multivariate Gaussian Anomaly Detection in Images

An OOD Multi-Task Perspective for Link Prediction with New Relation Types and Nodes

AI-Generated Imagery: A New Era for the `Readymade’

An Effective and Efficient Time-aware Entity Alignment Framework via Two-aspect Three-view Label Propagation

Pathway: a fast and flexible unified stream data processing framework for analytical and Machine Learning applications

Transformers in Reinforcement Learning: A Survey

Towards Safe Self-Distillation of Internet-Scale Text-to-Image Diffusion Models

VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models

Giving Robots a Hand: Learning Generalizable Manipulation with Eye-in-Hand Human Video Demonstrations

Automatically Reconciling the Trade-off between Prediction Accuracy and Earliness in Prescriptive Business Process Monitoring

BiRP: Learning Robot Generalized Bimanual Coordination using Relative Parameterization Method on Human Demonstration

Emotion recognition based on multi-modal electrophysiology multi-head attention Contrastive Learning

A New Dataset and Comparative Study for Aphid Cluster Detection

Reading Radiology Imaging Like The Radiologist

Close-up View synthesis by Interpolating Optical Flow

Stability Guarantees for Feature Attributions with Multiplicative Smoothing

PID-Inspired Inductive Biases for Deep Reinforcement Learning in Partially Observable Control Tasks

Ecosystem-level Analysis of Deployed Machine Learning Reveals Homogeneous Outcomes

FAIRO: Fairness-aware Adaptation in Sequential-Decision Making for Human-in-the-Loop Systems

Influential Simplices Mining via Simplicial Convolutional Network

Scaling Distributed Multi-task Reinforcement Learning with Experience Sharing

Bag of Views: An Appearance-based Approach to Next-Best-View Planning for 3D Reconstruction

Memorization Through the Lens of Curvature of Loss Function Around Samples

Relational Extraction on Wikipedia Tables using Convolutional and Memory Networks

Neuro-Inspired Efficient Map Building via Fragmentation and Recall

Merging multiple input descriptors and supervisors in a deep neural network for tractogram filtering

EgoAdapt: A multi-stream evaluation study of adaptation to real-world egocentric user video

Has China caught up to the US in AI research? An exploration of mimetic isomorphism as a model for late industrializers

Unsupervised Learning in Complex Systems

Rad-ReStruct: A Novel VQA Benchmark and Method for Structured Radiology Reporting

Towards A Scalable Solution for Improving Multi-Group Fairness in Compositional Classification

An Open-Source Knowledge Graph Ecosystem for the Life Sciences

A Causal Ordering Prior for Unsupervised Representation Learning

Objaverse-XL: A Universe of 10M+ 3D Objects

Self-consistency for open-ended generations

Bio-Inspired Night Image Enhancement Based on Contrast Enhancement and Denoising

ISLTranslate: Dataset for Translating Indian Sign Language

Named entity recognition using GPT for identifying comparable companies

3D detection of roof sections from a single satellite image and application to LOD2-building reconstruction

Domain-Agnostic Neural Architecture for Class Incremental Continual Learning in Document Processing Platform