cs.AI - 2023-07-05

Scaling Laws Do Not Scale

  • paper_url: http://arxiv.org/abs/2307.03201
  • repo_url: https://github.com/MarkipTheMudkip/in-class-project-2
  • paper_authors: Fernando Diaz, Michael Madaio
  • for: 本文探讨了人工智能(AI)模型的性能与模型设计参数之间的关系,即“扩大法律”(scaling laws)。
  • methods: 本文使用了现有的数据集和模型参数的大小来描述AI模型的性能,并分析了metric使用的困难和不稳定性。
  • results: 根据本文的分析,随着数据集的大小增加,模型的性能会随着数据集的大小增加,但是不同社区的人们可能对模型的输出质量有不同的评价标准,这可能会导致模型的性能不会在所有社区中提高。
    Abstract Recent work has proposed a power law relationship, referred to as ``scaling laws,'' between the performance of artificial intelligence (AI) models and aspects of those models' design (e.g., dataset size). In other words, as the size of a dataset (or model parameters, etc) increases, the performance of a given model trained on that dataset will correspondingly increase. However, while compelling in the aggregate, this scaling law relationship overlooks the ways that metrics used to measure performance may be precarious and contested, or may not correspond with how different groups of people may perceive the quality of models' output. In this paper, we argue that as the size of datasets used to train large AI models grows, the number of distinct communities (including demographic groups) whose data is included in a given dataset is likely to grow, each of whom may have different values. As a result, there is an increased risk that communities represented in a dataset may have values or preferences not captured by (or in the worst case, at odds with) the metrics used to evaluate model performance for scaling laws. We end the paper with implications for AI scaling laws -- that models may not, in fact, continue to improve as the datasets get larger -- at least not for all people or communities impacted by those models.
    摘要 As the size of datasets used to train large AI models grows, the number of distinct communities (including demographic groups) whose data is included in the dataset is likely to increase. Each of these communities may have different values and preferences, which may not be captured by the metrics used to evaluate model performance. As a result, there is a risk that the communities represented in the dataset may have values or preferences that are not aligned with the metrics used to evaluate the performance of the AI model.We conclude the paper with implications for AI scaling laws, suggesting that models may not continue to improve as the size of the datasets increases for all people or communities impacted by the models.

Decentralized Data Governance as Part of a Data Mesh Platform: Concepts and Approaches

  • paper_url: http://arxiv.org/abs/2307.02357
  • repo_url: None
  • paper_authors: Arif Wider, Sumedha Verma, Atif Akhtar
  • for: 这篇论文是关于数据网格(Data Mesh)的一种社会技术方法,用于分布式数据分析数据管理。
  • methods: 该论文提出了一种自助数据基础设施平台,以实现分布式数据管理的自动化。这个平台还提供了分布式数据管理的自治机制。
  • results: 该论文提出了一种数据网格概念模型,并讨论了如何通过平台手段实现数据管理的自治。这些结论基于实际实现了一个可用的数据网格平台的经验。
    Abstract Data mesh is a socio-technical approach to decentralized analytics data management. To manage this decentralization efficiently, data mesh relies on automation provided by a self-service data infrastructure platform. A key aspect of this platform is to enable decentralized data governance. Because data mesh is a young approach, there is a lack of coherence in how data mesh concepts are interpreted in the industry, and almost no work on how a data mesh platform facilitates governance. This paper presents a conceptual model of key data mesh concepts and discusses different approaches to drive governance through platform means. The insights presented are drawn from concrete experiences of implementing a fully-functional data mesh platform that can be used as a reference on how to approach data mesh platform development.
    摘要 “数据网”是一种社会技术方法,用于分布式分析数据管理。为了有效地管理这种分布,数据网依赖于自助数据基础设施平台的自动化。该平台的一个关键方面是实现分布式数据管理。由于数据网是一种新的方法,因此在行业中对数据网概念的解释存在一定的不一致,并且几乎没有关于如何通过平台手段实现管理的研究。本文提出了一个概念模型,描述了不同的方法来驱动管理通过平台手段。这些发现来自实际实施了可用于参考的完整功能数据网平台的经验。

LLQL: Logistic Likelihood Q-Learning for Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2307.02345
  • repo_url: None
  • paper_authors: Outongyi Lv, Bingxin Zhou, Yu Guang Wang
  • for: 本研究主要针对在线和离线学习环境下的 Bellman 误差分布性 analyzed。
  • methods: 我们使用了 Logistic 分布和受限 Logistic 分布来描述在线和离线环境下 Bellman 误差的分布性。
  • results: 我们发现在线环境下 Bellman 误差遵循 Logistic 分布,而离线环境下 Bellman 误差遵循受限 Logistic 分布,其受限分布取决于离线数据集中的先前策略。基于这些发现,我们提出了一种基于 Logistic 最大似然函数的 $\rm LLoss$ 替代损失函数。我们还发现了在离线数据集中的奖励应该遵循特定的分布,以便实现离线目标。在我们的数据分析中,我们对 Soft-Actor-Critic 算法的两个变种在线和离线环境中进行了控制变量修正。结果证明了我们的假设,并发现 LLoss 的方差比 MSELoss 更小。
    Abstract Currently, research on Reinforcement learning (RL) can be broadly classified into two categories: online RL and offline RL. Both in online and offline RL, the primary focus of research on the Bellman error lies in the optimization techniques and performance improvement, rather than exploring the inherent structural properties of the Bellman error, such as distribution characteristics. In this study, we analyze the distribution of the Bellman approximation error in both online and offline settings. We find that in the online environment, the Bellman error follows a Logistic distribution, while in the offline environment, the Bellman error follows a constrained Logistic distribution, where the constrained distribution is dependent on the prior policy in the offline data set. Based on this finding, we have improved the MSELoss which is based on the assumption that the Bellman errors follow a normal distribution, and we utilized the Logistic maximum likelihood function to construct $\rm LLoss$ as an alternative loss function. In addition, we observed that the rewards in the offline data set should follow a specific distribution, which would facilitate the achievement of offline objectives. In our numerical experiments, we performed controlled variable corrections on the loss functions of two variants of Soft-Actor-Critic in both online and offline environments. The results confirmed our hypothesis regarding the online and offline settings, we also found that the variance of LLoss is smaller than MSELoss. Our research provides valuable insights for further investigations based on the distribution of Bellman errors.
    摘要 当前研究的束缚学习(RL)可以分为两类:在线RL和离线RL。在线和离线RL中,研究者们主要关注优化技术和性能提升,而不是探索 Bellman 误差的内在结构特性,如分布特性。在这项研究中,我们分析了在线和离线环境中 Bellman aproximation 误差的分布。我们发现在在线环境中,Bellman 误差遵循 Logistic 分布,而在离线环境中,Bellman 误差遵循受限的 Logistic 分布,该分布取决于离线数据集中的先前策略。根据这一发现,我们改进了基于 assume Bellman 误差遵循正态分布的 MSELoss,并使用 Logistic 最大可能函数来构建 LLoss 作为替代损失函数。此外,我们发现在离线数据集中的奖励应遵循特定分布,这将有助于实现离线目标。在我们的数值实验中,我们对 MSELoss 和 LLoss 的损失函数进行了控制变量修正,并在在线和离线环境中对两种 Soft-Actor-Critic 的损失函数进行了控制变量修正。结果证明了我们的假设,并发现 LLoss 的方差比 MSELoss 更小。我们的研究提供了有价值的发现,供后续基于 Bellman 误差分布的研究。

Exploring new ways: Enforcing representational dissimilarity to learn new features and reduce error consistency

  • paper_url: http://arxiv.org/abs/2307.02516
  • repo_url: None
  • paper_authors: Tassilo Wald, Constantin Ulrich, Fabian Isensee, David Zimmerer, Gregor Koehler, Michael Baumgartner, Klaus H. Maier-Hein
  • for: 本研究旨在提高独立训练的机器学习模型的ensemble准确率,通过在不同架构层次上使用表示相似场方法来促进不同的表示之间的不同。
  • methods: 本研究使用了独立训练的多个机器学习模型,并通过提升不同层次的中间表示之间的不同来促进不同的表示之间的不同。
  • results: 研究发现,通过使用表示相似场方法来促进不同层次的中间表示之间的不同,可以提高ensemble的准确率,并且输出预测的相关性降低。
    Abstract Independently trained machine learning models tend to learn similar features. Given an ensemble of independently trained models, this results in correlated predictions and common failure modes. Previous attempts focusing on decorrelation of output predictions or logits yielded mixed results, particularly due to their reduction in model accuracy caused by conflicting optimization objectives. In this paper, we propose the novel idea of utilizing methods of the representational similarity field to promote dissimilarity during training instead of measuring similarity of trained models. To this end, we promote intermediate representations to be dissimilar at different depths between architectures, with the goal of learning robust ensembles with disjoint failure modes. We show that highly dissimilar intermediate representations result in less correlated output predictions and slightly lower error consistency, resulting in higher ensemble accuracy. With this, we shine first light on the connection between intermediate representations and their impact on the output predictions.
    摘要 独立训练的机器学习模型通常会学习类似的特征。给定一个独立训练的模型集合,这会导致相关的预测和共同失败模式。先前的尝试,包括对输出预测或搜索的减 correlate 的方法,得到了杂合的结果,特别是因为它们在优化目标之间矛盾。在这篇论文中,我们提出了一个新的想法,即利用表示相似场的方法来促进训练期间的不相似性。为此,我们在不同的架构之间Promote 中间表示的不同程度,以达到学习强健的集成模型,并且使得输出预测更加不相关。我们显示,高度不同的中间表示会导致输出预测更加不相关,并且错误一致率略低,从而提高集成模型的准确率。这样,我们首次照明了中间表示与其影响输出预测的关系。

Deep Contract Design via Discontinuous Piecewise Affine Neural Networks

  • paper_url: http://arxiv.org/abs/2307.02318
  • repo_url: None
  • paper_authors: Tonghan Wang, Paul Dütting, Dmitry Ivanov, Inbal Talgam-Cohen, David C. Parkes
  • for: 本文研究用深度学习自动设计优化合同。
  • methods: 本文使用深度网络表示主体对合同设计的预期收益函数,并引入了一种新的表示方式:离散ReLU(DeLU)网络,该网络模型主体的收益函数为散点函数,每个散点对应代理人行动。DeLU网络隐式地学习了代理人的奖励一致约束和主体的收益最大化目标,并支持并行推理每个散点上的线性 программирова或内点法解决优化合同问题。
  • results: 本文通过实验表明,可以使用一小数量的训练样本和扩展到大规模问题解决优化合同问题,并成功地估算主体的收益函数。
    Abstract Contract design involves a principal who establishes contractual agreements about payments for outcomes that arise from the actions of an agent. In this paper, we initiate the study of deep learning for the automated design of optimal contracts. We formulate this as an offline learning problem, where a deep network is used to represent the principal's expected utility as a function of the design of a contract. We introduce a novel representation: the Discontinuous ReLU (DeLU) network, which models the principal's utility as a discontinuous piecewise affine function where each piece corresponds to the agent taking a particular action. DeLU networks implicitly learn closed-form expressions for the incentive compatibility constraints of the agent and the utility maximization objective of the principal, and support parallel inference on each piece through linear programming or interior-point methods that solve for optimal contracts. We provide empirical results that demonstrate success in approximating the principal's utility function with a small number of training samples and scaling to find approximately optimal contracts on problems with a large number of actions and outcomes.
    摘要 translate to Simplified Chinese as follows:CONTRACT 设计包括一个主体,该主体确定基于代理人的行为的支付方式。在这篇论文中,我们开始使用深度学习来自动设计优化的合同。我们将这视为一个离线学习问题,其中深度网络用于表示主体的预期用户函数,该函数与合同的设计相关。我们引入了一个新的表示方式:离散ReLU(DeLU)网络,该网络模型了主体的用户函数为离散凸函数,其中每个部分对应代理人行为的不同。DeLU网络隐式地学习closed-form表达代理人的奖励兼容性约束和主体的用户最大化目标,并支持并行推理每个部分的Linear Programming或内部点法解决优化合同。我们提供了实验结果,证明了使用小量训练样本和扩展可以相似地 aproximate主体的用户函数,并在具有大量行动和结果的问题上找到相对优化的合同。

Meta-Learning Adversarial Bandit Algorithms

  • paper_url: http://arxiv.org/abs/2307.02295
  • repo_url: None
  • paper_authors: Mikhail Khodak, Ilya Osadchiy, Keegan Harris, Maria-Florina Balcan, Kfir Y. Levy, Ron Meir, Zhiwei Steven Wu
  • for: 该论文目标在于在多个任务之间提高性能,使得任务之间存在自然的相似性度量。
  • methods: 论文使用外部学习算法同时调整内部学习算法的初始化和其他超参数,包括多重针对点投机(MAB)和带均值线性优化(BLO)两种重要的情况。
  • results: 论文表明,在MAB中,外部学习算法可以初始化和调整 Tsallis-entropy 泛化 Exp3 算法,使得任务平均误差降低。在BLO中,论文学习初始化和调整在线镜像 DESC 算法,并证明任务平均误差与动作空间依赖的度量直接相关。
    Abstract We study online meta-learning with bandit feedback, with the goal of improving performance across multiple tasks if they are similar according to some natural similarity measure. As the first to target the adversarial online-within-online partial-information setting, we design meta-algorithms that combine outer learners to simultaneously tune the initialization and other hyperparameters of an inner learner for two important cases: multi-armed bandits (MAB) and bandit linear optimization (BLO). For MAB, the meta-learners initialize and set hyperparameters of the Tsallis-entropy generalization of Exp3, with the task-averaged regret improving if the entropy of the optima-in-hindsight is small. For BLO, we learn to initialize and tune online mirror descent (OMD) with self-concordant barrier regularizers, showing that task-averaged regret varies directly with an action space-dependent measure they induce. Our guarantees rely on proving that unregularized follow-the-leader combined with two levels of low-dimensional hyperparameter tuning is enough to learn a sequence of affine functions of non-Lipschitz and sometimes non-convex Bregman divergences bounding the regret of OMD.
    摘要 我们研究在线meta学习,使用反馈强制,以提高多个任务的性能,如果它们按照自然的相似度度量相似。作为首次针对在线内部partial-information设置下进行反馈强制的研究,我们设计了meta算法, combinest outer learner来同时调整内部学习器的初始化和其他超参数。对于多重抓拍机(MAB)和bandit线性优化(BLO)两个重要情况,我们的meta算法可以在不同的任务上实现性能提高。对于MAB,我们使用 Tsallis-entropy泛化的Exp3算法作为内部学习器,并将任务平均违假 regret的下降与Entropy of optima-in-hindsight的小化直接相关。对于BLO,我们学习了在线镜像下降(OMD)的初始化和调整,并证明任务平均违假 regret与动作空间依赖的度量相关。我们的保证基于证明,不含regularizer的follow-the-leadercombined with two levels of low-dimensional hyperparameter tuning是可以学习一个序列 Affine functions of non-Lipschitz和有时非 convex Bregman divergences,从而约束OMD的违假 regret。

First-Explore, then Exploit: Meta-Learning Intelligent Exploration

  • paper_url: http://arxiv.org/abs/2307.02276
  • repo_url: https://github.com/btnorman/First-Explore
  • paper_authors: Ben Norman, Jeff Clune
  • for: 这篇论文的目的是解决人工智能学习控制器不具备人类级别的探索能力的问题。
  • methods: 这篇论文提出了一种新的meta-RL框架,叫做First-Explore,它包括两个策略:一个策略专门用于探索,另一个策略专门用于利用。在训练后,我们可以通过探索策略进行探索,直到所需的时候,然后使用所有在探索过程中获得的信息进行利用。这种方法可以避免在同时进行探索和利用时的冲突。
  • results: 论文表明,First-Explore可以学习人类级别的探索策略,并且在需要牺牲奖励的探索领域表现更好。First-Explore也超越了现有的标准RL和meta-RL方法。
    Abstract Standard reinforcement learning (RL) agents never intelligently explore like a human (i.e. by taking into account complex domain priors and previous explorations). Even the most basic intelligent exploration strategies such as exhaustive search are only inefficiently or poorly approximated by approaches such as novelty search or intrinsic motivation, let alone more complicated strategies like learning new skills, climbing stairs, opening doors, or conducting experiments. This lack of intelligent exploration limits sample efficiency and prevents solving hard exploration domains. We argue a core barrier prohibiting many RL approaches from learning intelligent exploration is that the methods attempt to explore and exploit simultaneously, which harms both exploration and exploitation as the goals often conflict. We propose a novel meta-RL framework (First-Explore) with two policies: one policy learns to only explore and one policy learns to only exploit. Once trained, we can then explore with the explore policy, for as long as desired, and then exploit based on all the information gained during exploration. This approach avoids the conflict of trying to do both exploration and exploitation at once. We demonstrate that First-Explore can learn intelligent exploration strategies such as exhaustive search and more, and that it outperforms dominant standard RL and meta-RL approaches on domains where exploration requires sacrificing reward. First-Explore is a significant step towards creating meta-RL algorithms capable of learning human-level exploration which is essential to solve challenging unseen hard-exploration domains.
    摘要 标准强化学习(RL)代理不像人类一样智能探索(即考虑复杂的领域假设和先前的探索)。 Even the most basic intelligent exploration strategies, such as exhaustive search, are only inefficiently or poorly approximated by approaches such as novelty search or intrinsic motivation, let alone more complicated strategies like learning new skills, climbing stairs, opening doors, or conducting experiments. This lack of intelligent exploration limits sample efficiency and prevents solving hard exploration domains. We argue that a core barrier preventing many RL approaches from learning intelligent exploration is that the methods attempt to explore and exploit simultaneously, which harms both exploration and exploitation as the goals often conflict.We propose a novel meta-RL framework (First-Explore) with two policies: one policy learns to only explore, and one policy learns to only exploit. Once trained, we can then explore with the explore policy for as long as desired, and then exploit based on all the information gained during exploration. This approach avoids the conflict of trying to do both exploration and exploitation at once. We demonstrate that First-Explore can learn intelligent exploration strategies such as exhaustive search and more, and that it outperforms dominant standard RL and meta-RL approaches on domains where exploration requires sacrificing reward. First-Explore is a significant step towards creating meta-RL algorithms capable of learning human-level exploration, which is essential to solve challenging unseen hard-exploration domains.

SVDM: Single-View Diffusion Model for Pseudo-Stereo 3D Object Detection

  • paper_url: http://arxiv.org/abs/2307.02270
  • repo_url: None
  • paper_authors: Yuguang Shi
  • for: 提高LiDAR和摄像头方法之间准确性差距的3D物体探测方法
  • methods: 使用Pseudo-Stereo和单视扩散模型(SVDM),实现端到端、高效的3D探测框架
  • results: 在KITTI dataset上达到新的州OF-THE-ART性能,并且与多种静止探测器兼容
    Abstract One of the key problems in 3D object detection is to reduce the accuracy gap between methods based on LiDAR sensors and those based on monocular cameras. A recently proposed framework for monocular 3D detection based on Pseudo-Stereo has received considerable attention in the community. However, so far these two problems are discovered in existing practices, including (1) monocular depth estimation and Pseudo-Stereo detector must be trained separately, (2) Difficult to be compatible with different stereo detectors and (3) the overall calculation is large, which affects the reasoning speed. In this work, we propose an end-to-end, efficient pseudo-stereo 3D detection framework by introducing a Single-View Diffusion Model (SVDM) that uses a few iterations to gradually deliver right informative pixels to the left image. SVDM allows the entire pseudo-stereo 3D detection pipeline to be trained end-to-end and can benefit from the training of stereo detectors. Afterwards, we further explore the application of SVDM in depth-free stereo 3D detection, and the final framework is compatible with most stereo detectors. Among multiple benchmarks on the KITTI dataset, we achieve new state-of-the-art performance.
    摘要
  1. Monocular depth estimation and Pseudo-Stereo detector must be trained separately.2. Difficulty in compatibility with different stereo detectors.3. The overall calculation is large, which affects the reasoning speed.In this work, we propose an end-to-end, efficient pseudo-stereo 3D detection framework by introducing a Single-View Diffusion Model (SVDM) that uses a few iterations to gradually deliver right informative pixels to the left image. SVDM allows the entire pseudo-stereo 3D detection pipeline to be trained end-to-end and can benefit from the training of stereo detectors. Additionally, we explore the application of SVDM in depth-free stereo 3D detection, and the final framework is compatible with most stereo detectors. Our experiments on the KITTI dataset achieve new state-of-the-art performance.

Analyzing Different Expert-Opined Strategies to Enhance the Effect on the Goal of a Multi-Attribute Decision-Making System Using a Concept of Effort Propagation and Application in Enhancement of High School Students’ Performance

  • paper_url: http://arxiv.org/abs/2307.02254
  • repo_url: None
  • paper_authors: Suvojit Dhara, Adrijit Goswami
  • for: 本研究的目的是提出并分析多 attribute 决策问题中的两种策略,即并行和层次努力分配策略,以及传播策略。
  • methods: 本研究使用了一种称为努力传播的概念,并对两种策略进行了分析和讨论。分析表明,在一个实际的高中管理系统中,可以通过合理地分配努力来提高学生表现。
  • results: 研究发现,在一个实际的高中管理系统中,可以通过合理地分配努力来提高学生表现,并且可以达到约7%-15%的总努力传播。此外,研究还发现了一种最佳策略,可以帮助提高学生表现最大化。
    Abstract In many real-world multi-attribute decision-making (MADM) problems, mining the inter-relationships and possible hierarchical structures among the factors are considered to be one of the primary tasks. But, besides that, one major task is to determine an optimal strategy to work on the factors to enhance the effect on the goal attribute. This paper proposes two such strategies, namely parallel and hierarchical effort assignment, and propagation strategies. The concept of effort propagation through a strategy is formally defined and described in the paper. Both the parallel and hierarchical strategies are divided into sub-strategies based on whether the assignment of efforts to the factors is uniform or depends upon some appropriate heuristics related to the factors in the system. The adapted and discussed heuristics are the relative significance and effort propagability of the factors. The strategies are analyzed for a real-life case study regarding Indian high school administrative factors that play an important role in enhancing students' performance. Total effort propagation of around 7%-15% to the goal is seen across the proposed strategies given a total of 1 unit of effort to the directly accessible factors of the system. A comparative analysis is adapted to determine the optimal strategy among the proposed ones to enhance student performance most effectively. The highest effort propagation achieved in the work is approximately 14.4348%. The analysis in the paper establishes the necessity of research towards the direction of effort propagation analysis in case of decision-making problems.
    摘要 在许多实际多属性决策(MADM)问题中,挖掘因素之间的关系和可能的层次结构是一项重要任务。然而,除了这一点,另一个重要任务是确定一个优化策略,以便在因素上增强目标属性的效应。本文提出了两种策略,namely parallel和层次努力分配策略,以及宣传策略。在策略中,努力的传播是正式地定义和描述的。两种策略都被分为不同的子策略,根据努力分配到因素的方式是不同的。adapted和讨论的启发是因素相关的相对重要性和努力传播可能性。这些策略在印度高中学生表现的一个实际案例中进行了分析,并观察到了约7%-15%的总努力传播到目标。通过对这些策略进行比较分析,确定了最佳策略以提高学生表现的方式。在工作中,最高的努力传播率为约14.4348%。这种分析确立了努力传播分析在决策问题中的必要性。

Exploring Multimodal Approaches for Alzheimer’s Disease Detection Using Patient Speech Transcript and Audio Data

  • paper_url: http://arxiv.org/abs/2307.02514
  • repo_url: https://github.com/shui-dun/multimodal_ad
  • paper_authors: Hongmin Cai, Xiaoke Huang, Zhengliang Liu, Wenxiong Liao, Haixing Dai, Zihao Wu, Dajiang Zhu, Hui Ren, Quanzheng Li, Tianming Liu, Xiang Li
  • for: 这个研究旨在检测阿尔茨海默症(AD)使用患者的语音和文本数据,以提高患者的诊断和评估。
  • methods: 该研究使用预训练的语言模型和图 neural network(GNN)构建语音和文本数据中的图,提取特征以进行AD检测。此外,还使用了数据增强技术,包括同义词替换和GPT基于的增强器,以 Address the small dataset size issue。
  • results: 研究人员通过了intsensive的实验和分析,发现了对AD检测使用语音和文本数据的挑战和可能的解决方案。
    Abstract Alzheimer's disease (AD) is a common form of dementia that severely impacts patient health. As AD impairs the patient's language understanding and expression ability, the speech of AD patients can serve as an indicator of this disease. This study investigates various methods for detecting AD using patients' speech and transcripts data from the DementiaBank Pitt database. The proposed approach involves pre-trained language models and Graph Neural Network (GNN) that constructs a graph from the speech transcript, and extracts features using GNN for AD detection. Data augmentation techniques, including synonym replacement, GPT-based augmenter, and so on, were used to address the small dataset size. Audio data was also introduced, and WavLM model was used to extract audio features. These features were then fused with text features using various methods. Finally, a contrastive learning approach was attempted by converting speech transcripts back to audio and using it for contrastive learning with the original audio. We conducted intensive experiments and analysis on the above methods. Our findings shed light on the challenges and potential solutions in AD detection using speech and audio data.
    摘要 阿尔茨海默病 (AD) 是一种常见的脑残留病,对病人健康产生严重的影响。由于 AD 会对病人语言理解和表达能力产生影响,因此病人的语音可以作为该病的指标。这项研究探讨了使用病人语音和讲解数据进行 AD 检测的不同方法。提议的方法包括使用预训练语言模型和图内网络(GNN),构建语音讲解的图,并从图中提取特征进行 AD 检测。数据扩充技术,包括同义词替换、GPT-based augmenter 等,用于解决小数据集的问题。此外,我们还引入了音频数据,并使用 WavLM 模型提取音频特征。这些特征与文本特征进行拼接,并使用不同的方法进行融合。最后,我们尝试了一种对比学习方法,通过将语音讲解转换回音频,并与原始音频进行对比学习。我们进行了广泛的实验和分析,以探讨 AD 检测中使用语音和音频数据的挑战和可能的解决方案。

Power-up! What Can Generative Models Do for Human Computation Workflows?

  • paper_url: http://arxiv.org/abs/2307.02243
  • repo_url: None
  • paper_authors: Garrett Allen, Gaole He, Ujwal Gadiraju
  • for: This paper aims to explore the potential benefits of using large language models (LLMs) in crowdsourcing workflows and to identify the best ways to integrate LLMs into existing crowdsourcing design patterns.
  • methods: The paper proposes a vision for incorporating LLMs into crowdsourcing workflows, including identifying key junctures in the workflow where LLMs can add value and proposing means to augment existing design patterns for crowd work.
  • results: The paper aims to provide a comprehensive understanding of how LLMs can improve the effectiveness of crowdsourcing workflows and how such workflows can be evaluated from the perspectives of various stakeholders involved in the crowdsourcing paradigm.
    Abstract We are amidst an explosion of artificial intelligence research, particularly around large language models (LLMs). These models have a range of applications across domains like medicine, finance, commonsense knowledge graphs, and crowdsourcing. Investigation into LLMs as part of crowdsourcing workflows remains an under-explored space. The crowdsourcing research community has produced a body of work investigating workflows and methods for managing complex tasks using hybrid human-AI methods. Within crowdsourcing, the role of LLMs can be envisioned as akin to a cog in a larger wheel of workflows. From an empirical standpoint, little is currently understood about how LLMs can improve the effectiveness of crowdsourcing workflows and how such workflows can be evaluated. In this work, we present a vision for exploring this gap from the perspectives of various stakeholders involved in the crowdsourcing paradigm -- the task requesters, crowd workers, platforms, and end-users. We identify junctures in typical crowdsourcing workflows at which the introduction of LLMs can play a beneficial role and propose means to augment existing design patterns for crowd work.
    摘要 我们正处于人工智能研究的爆发阶段,特别是大语言模型(LLM)的研究。这些模型在医学、金融、通用常识图和众生编辑等领域有各种应用。研究在众生编辑中使用人工智能方法的工作流程还是一个未经探索的领域。众生编辑社区已经生成了一些关于工作流程和人工智能方法的研究。在众生编辑中,LLM可以视为工作流程中的一个部件。从实证角度来看,现在还未完全了解LLM如何改善众生编辑工作流程的效果,以及如何评估这些工作流程。在这项工作中,我们提出了一种视图,从众生编辑参与者的角度来explore这个黑板。我们标识了典型的众生编辑工作流程中的缺失,并提议使用人工智能技术来增强现有的设计模式。

MAE-DFER: Efficient Masked Autoencoder for Self-supervised Dynamic Facial Expression Recognition

  • paper_url: http://arxiv.org/abs/2307.02227
  • repo_url: https://github.com/sunlicai/mae-dfer
  • paper_authors: Licai Sun, Zheng Lian, Bin Liu, Jianhua Tao
  • for: 这个研究旨在提高智能和 sympathetic 机器的发展,推动面具识别技术的进步。
  • methods: 这篇文章提出了一种新的自我超vised学习方法(MAE-DFER),它利用大规模的自题supervised预训数据进行预训,以提高面具识别的发展。
  • results: 实验结果显示,MAE-DFER 可以与现有的监督学习方法相比,在六个数据集上具有更高的识别率(例如,DFEW 上的 UAR 提高了6.30%,MAFW 上的 UAR 提高了8.34%),并且具有较低的计算成本(约38% FLOPs)。
    Abstract Dynamic facial expression recognition (DFER) is essential to the development of intelligent and empathetic machines. Prior efforts in this field mainly fall into supervised learning paradigm, which is severely restricted by the limited labeled data in existing datasets. Inspired by recent unprecedented success of masked autoencoders (e.g., VideoMAE), this paper proposes MAE-DFER, a novel self-supervised method which leverages large-scale self-supervised pre-training on abundant unlabeled data to largely advance the development of DFER. Since the vanilla Vision Transformer (ViT) employed in VideoMAE requires substantial computation during fine-tuning, MAE-DFER develops an efficient local-global interaction Transformer (LGI-Former) as the encoder. Moreover, in addition to the standalone appearance content reconstruction in VideoMAE, MAE-DFER also introduces explicit temporal facial motion modeling to encourage LGI-Former to excavate both static appearance and dynamic motion information. Extensive experiments on six datasets show that MAE-DFER consistently outperforms state-of-the-art supervised methods by significant margins (e.g., +6.30\% UAR on DFEW and +8.34\% UAR on MAFW), verifying that it can learn powerful dynamic facial representations via large-scale self-supervised pre-training. Besides, it has comparable or even better performance than VideoMAE, while largely reducing the computational cost (about 38\% FLOPs). We believe MAE-DFER has paved a new way for the advancement of DFER and can inspire more relevant research in this field and even other related tasks. Codes and models are publicly available at https://github.com/sunlicai/MAE-DFER.
    摘要 “动态表情识别(DFER)是智能和同情机器的发展所必需的一环。先前的努力主要是基于指导学习 paradigm,它受限于现有数据集的标注数据量的严重限制。 Drawing inspiration from recent unprecedented success of masked autoencoders (e.g., VideoMAE), this paper proposes MAE-DFER, a novel self-supervised method that leverages large-scale self-supervised pre-training on abundant unlabeled data to significantly advance the development of DFER. Since the vanilla Vision Transformer (ViT) employed in VideoMAE requires substantial computation during fine-tuning, MAE-DFER develops an efficient local-global interaction Transformer (LGI-Former) as the encoder. Moreover, in addition to the standalone appearance content reconstruction in VideoMAE, MAE-DFER also introduces explicit temporal facial motion modeling to encourage LGI-Former to excavate both static appearance and dynamic motion information. Extensive experiments on six datasets show that MAE-DFER consistently outperforms state-of-the-art supervised methods by significant margins (e.g., +6.30\% UAR on DFEW and +8.34\% UAR on MAFW), verifying that it can learn powerful dynamic facial representations via large-scale self-supervised pre-training. Besides, it has comparable or even better performance than VideoMAE, while largely reducing the computational cost (about 38\% FLOPs). We believe MAE-DFER has paved a new way for the advancement of DFER and can inspire more relevant research in this field and even other related tasks. Codes and models are publicly available at https://github.com/sunlicai/MAE-DFER.”

On the Adversarial Robustness of Generative Autoencoders in the Latent Space

  • paper_url: http://arxiv.org/abs/2307.02202
  • repo_url: None
  • paper_authors: Mingfei Lu, Badong Chen
  • for: 本研究探讨了生成自动编码器在伪设空间中的攻击Robustness。
  • methods: 本研究使用了多种攻击方法来评估生成自动编码器的攻击Robustness,包括latent space攻击和维度缩放攻击。
  • results: 研究发现,流行的生成自动编码器在伪设空间中存在漏洞,可以通过攻击来让模型生成错误的样本。此外,研究还发现了一定的负相关性 между模型的攻击Robustness和干扰级别的关系。
    Abstract The generative autoencoders, such as the variational autoencoders or the adversarial autoencoders, have achieved great success in lots of real-world applications, including image generation, and signal communication. However, little concern has been devoted to their robustness during practical deployment. Due to the probabilistic latent structure, variational autoencoders (VAEs) may confront problems such as a mismatch between the posterior distribution of the latent and real data manifold, or discontinuity in the posterior distribution of the latent. This leaves a back door for malicious attackers to collapse VAEs from the latent space, especially in scenarios where the encoder and decoder are used separately, such as communication and compressed sensing. In this work, we provide the first study on the adversarial robustness of generative autoencoders in the latent space. Specifically, we empirically demonstrate the latent vulnerability of popular generative autoencoders through attacks in the latent space. We also evaluate the difference between variational autoencoders and their deterministic variants and observe that the latter performs better in latent robustness. Meanwhile, we identify a potential trade-off between the adversarial robustness and the degree of the disentanglement of the latent codes. Additionally, we also verify the feasibility of improvement for the latent robustness of VAEs through adversarial training. In summary, we suggest concerning the adversarial latent robustness of the generative autoencoders, analyze several robustness-relative issues, and give some insights into a series of key challenges.
    摘要 自然语言处理中的生成自动编码器,如变量自动编码器和反对自动编码器,在很多实际应用中取得了很大成功,包括图像生成和信号通信。然而,在实际部署中对这些生成自动编码器的Robustness没有受到充分的关注。由于生成自动编码器的概率 latent 结构,变量自动编码器(VAEs)可能会面临到真实数据拟合问题或 latent posterior distribution 的缺continuity。这使得恶意攻击者可以通过 latent 空间的攻击,尤其是在encoder 和 decoder 在独立使用时。在这种情况下,VAEs 的 latent 空间 Robustness 会受到攻击。在这个工作中,我们提供了生成自动编码器的 latent 空间 Robustness 的首次研究。我们通过在 latent 空间进行攻击来实证性地表明了各种生成自动编码器的 latent 感受性问题。我们还发现了变量自动编码器和其束定variant 之间的差异,并观察到后者在 latent Robustness 方面表现更好。此外,我们还发现了对 latent Robustness 的一定负面影响,即难以同时保证 Robustness 和混合程度的高度。最后,我们验证了对 VAEs 的 latent Robustness 进行适应 adversarial 训练的可行性。总之,我们建议关注生成自动编码器的 latent 空间 Robustness,分析了一些Robustness-相关的问题,并提供了一些关键挑战的视角。

The FormAI Dataset: Generative AI in Software Security Through the Lens of Formal Verification

  • paper_url: http://arxiv.org/abs/2307.02192
  • repo_url: None
  • paper_authors: Norbert Tihanyi, Tamas Bisztray, Ridhi Jain, Mohamed Amine Ferrag, Lucas C. Cordeiro, Vasileios Mavroeidis
    for:The paper presents the FormAI dataset, a large collection of AI-generated C programs with vulnerability classification.methods:The dataset is generated using GPT-3.5-turbo, and a dynamic zero-shot prompting technique is introduced to spawn a diverse set of programs. The dataset is labeled with vulnerabilities using a formal verification method based on ESBMC.results:The dataset contains 112,000 programs with varying levels of complexity, and each program is associated with relevant CWE numbers. The dataset is suitable for evaluating the effectiveness of static and dynamic analysis tools, and can be used to train LLMs and machine learning algorithms.
    Abstract This paper presents the FormAI dataset, a large collection of 112,000 AI-generated compilable and independent C programs with vulnerability classification. We introduce a dynamic zero-shot prompting technique, constructed to spawn a diverse set of programs utilizing Large Language Models (LLMs). The dataset is generated by GPT-3.5-turbo and comprises programs with varying levels of complexity. Some programs handle complicated tasks such as network management, table games, or encryption, while others deal with simpler tasks like string manipulation. Every program is labeled with the vulnerabilities found within the source code, indicating the type, line number, and vulnerable function name. This is accomplished by employing a formal verification method using the Efficient SMT-based Bounded Model Checker (ESBMC), which performs model checking, abstract interpretation, constraint programming, and satisfiability modulo theories, to reason over safety/security properties in programs. This approach definitively detects vulnerabilities and offers a formal model known as a counterexample, thus eliminating the possibility of generating false positive reports. This property of the dataset makes it suitable for evaluating the effectiveness of various static and dynamic analysis tools. Furthermore, we have associated the identified vulnerabilities with relevant Common Weakness Enumeration (CWE) numbers. We make the source code available for the 112,000 programs, accompanied by a comprehensive list detailing the vulnerabilities detected in each individual program including location and function name, which makes the dataset ideal to train LLMs and machine learning algorithms.
    摘要

Citation: A Key to Building Responsible and Accountable Large Language Models

  • paper_url: http://arxiv.org/abs/2307.02185
  • repo_url: None
  • paper_authors: Jie Huang, Kevin Chen-Chuan Chang
  • for: 这篇论文旨在探讨大语言模型(LLMs)的知识产权和伦理问题,并提出了一种新的解决方案,即引用系统。
  • methods: 论文提出了一种新的引用机制,以增强语言模型的内容透明度和可靠性,同时解决知识产权和伦理问题。
  • results: 论文提出了一些研究问题,以促进未来对语言模型的负责任和可靠性的研究。
    Abstract Large Language Models (LLMs) bring transformative benefits alongside unique challenges, including intellectual property (IP) and ethical concerns. This position paper explores a novel angle to mitigate these risks, drawing parallels between LLMs and established web systems. We identify "citation" as a crucial yet missing component in LLMs, which could enhance content transparency and verifiability while addressing IP and ethical dilemmas. We further propose that a comprehensive citation mechanism for LLMs should account for both non-parametric and parametric content. Despite the complexity of implementing such a citation mechanism, along with the inherent potential pitfalls, we advocate for its development. Building on this foundation, we outline several research problems in this area, aiming to guide future explorations towards building more responsible and accountable LLMs.
    摘要

Diffusion Models for Computational Design at the Example of Floor Plans

  • paper_url: http://arxiv.org/abs/2307.02511
  • repo_url: None
  • paper_authors: Joern Ploennigs, Markus Berger
  • For: The paper explores the capabilities of diffusion-based AI image generators for computational design in civil engineering, specifically for creating floor plans.* Methods: The paper uses diffusion models with improved semantic encoding to generate floor plans, and experiments are conducted to evaluate the validity and query performance of the generated plans.* Results: The paper shows that the proposed diffusion models can improve the validity of generated floor plans from 6% to 90%, and improve query performance for different examples. However, the paper also identifies shortcomings and future research challenges for these models.
    Abstract AI Image generators based on diffusion models are widely discussed recently for their capability to create images from simple text prompts. But, for practical use in civil engineering they need to be able to create specific construction plans for given constraints. Within this paper we explore the capabilities of those diffusion-based AI generators for computational design at the example of floor plans and identify their current limitation. We explain how the diffusion-models work and propose new diffusion models with improved semantic encoding. In several experiments we show that we can improve validity of generated floor plans from 6% to 90% and query performance for different examples. We identify short comings and derive future research challenges of those models and discuss the need to combine diffusion models with building information modelling. With this we provide key insights into the current state and future directions for diffusion models in civil engineering.
    摘要 《Diffusion模型基于AI图像生成器在近期内获得了广泛关注,因为它们可以从简单文本提示中生成图像。但是,在实用ivil engineering中,它们需要能够生成基于给定限制的具体的建筑计划。本文通过使用扩散模型来探讨计算设计的可能性,并在底层平面的示例中进行了详细的探讨。我们解释了扩散模型如何工作,并提出了改进 semantic encoding的新扩散模型。在多个实验中,我们显示了可以从6%提高到90%的有效性,并且对不同的示例进行了查询性能的改进。我们还识别出了这些模型的缺陷,并提出了未来研究的挑战。我们认为需要将扩散模型与建筑信息模型结合,以提供更好的设计解决方案。这些发现对于 diffusion模型在 civil engineering 中的当前状况和未来发展提供了关键的洞察。》Note: Simplified Chinese is used here, as it is more widely used in mainland China. If you prefer Traditional Chinese, I can provide that version as well.

Safety Shielding under Delayed Observation

  • paper_url: http://arxiv.org/abs/2307.02164
  • repo_url: https://github.com/filipcano/safety-shields-delayed
  • paper_authors: Filip Cano Córdoba, Alexander Palmisano, Martin Fränzle, Roderick Bloem, Bettina Könighofer
  • for: 这篇论文是为了描述一种可以在物理环境中运行的自动驾驶系统,并且能够承受延迟输入和输出信号的延迟遮盾。
  • methods: 这篇论文使用了一种名为”延迟遮盾”的技术,它可以在运行时确保自动驾驶系统的安全性,并且可以最小化遮盾对系统的干扰。
  • results: 这篇论文提出了一种新的延迟遮盾计算方法,可以在 worst-case 延迟输入信号的假设下保证安全性。此外,论文还提出了一些新的决策策略,可以最小化遮盾对系统的干扰。最后,论文还实现了延迟遮盾在实际驾驶模拟器中的首次应用。
    Abstract Agents operating in physical environments need to be able to handle delays in the input and output signals since neither data transmission nor sensing or actuating the environment are instantaneous. Shields are correct-by-construction runtime enforcers that guarantee safe execution by correcting any action that may cause a violation of a formal safety specification. Besides providing safety guarantees, shields should interfere minimally with the agent. Therefore, shields should pick the safe corrective actions in such a way that future interferences are most likely minimized. Current shielding approaches do not consider possible delays in the input signals in their safety analyses. In this paper, we address this issue. We propose synthesis algorithms to compute \emph{delay-resilient shields} that guarantee safety under worst-case assumptions on the delays of the input signals. We also introduce novel heuristics for deciding between multiple corrective actions, designed to minimize future shield interferences caused by delays. As a further contribution, we present the first integration of shields in a realistic driving simulator. We implemented our delayed shields in the driving simulator \textsc{Carla}. We shield potentially unsafe autonomous driving agents in different safety-critical scenarios and show the effect of delays on the safety analysis.
    摘要 In this paper, we address this issue by proposing synthesis algorithms to compute "delay-resilient shields" that guarantee safety under worst-case assumptions about input signal delays. We also introduce novel heuristics for deciding between multiple corrective actions, designed to minimize future shield interferences caused by delays. Furthermore, we present the first integration of shields in a realistic driving simulator. We implemented our delayed shields in the driving simulator \textsc{Carla} and shielded potentially unsafe autonomous driving agents in different safety-critical scenarios, demonstrating the effect of delays on the safety analysis.

Becoming self-instruct: introducing early stopping criteria for minimal instruct tuning

  • paper_url: http://arxiv.org/abs/2307.03692
  • repo_url: None
  • paper_authors: Waseem AlShikh, Manhal Daaboul, Kirk Goddard, Brock Imel, Kiran Kamble, Parikshith Kulkarni, Melisa Russak
  • for: 本研究 introduce了一个名为 Instruction Following Score (IFS) 的度量,用于测量语言模型能够遵循指令的能力。
  • methods: 本研究使用了公开available的基础模型和指令模型,并 benchmarked 它们以显示出 ratio of well formatted responses to partial and full sentences 可以作为基础模型和指令模型之间的区别。此外,IFS 也可以用作初级停止条件 для instruct tuning。
  • results: 研究发现,使用 SFT 训练 7B 和 13B LLaMA 模型,模型在训练过程中较早学会遵循指令,并且进一步 fine-tuning 可能会导致基础模型的 semantics 变化。例如,我们通过auxiliary metric ObjecQA 显示出模型预测的 объекivity 在特定情况下变化,而且这些变化在 IFS 倾斜时最为显著。本研究希望通过将 instruct tuning 分解为 IFS 和 semantics 因素,开启更好控制的 instruct tuning 和设计最小化 instruct 界面 querying foundation models 的可能性。
    Abstract In this paper, we introduce the Instruction Following Score (IFS), a metric that detects language models' ability to follow instructions. The metric has a dual purpose. First, IFS can be used to distinguish between base and instruct models. We benchmark publicly available base and instruct models, and show that the ratio of well formatted responses to partial and full sentences can be an effective measure between those two model classes. Secondly, the metric can be used as an early stopping criteria for instruct tuning. We compute IFS for Supervised Fine-Tuning (SFT) of 7B and 13B LLaMA models, showing that models learn to follow instructions relatively early in the training process, and the further finetuning can result in changes in the underlying base model semantics. As an example of semantics change we show the objectivity of model predictions, as defined by an auxiliary metric ObjecQA. We show that in this particular case, semantic changes are the steepest when the IFS tends to plateau. We hope that decomposing instruct tuning into IFS and semantic factors starts a new trend in better controllable instruct tuning and opens possibilities for designing minimal instruct interfaces querying foundation models.
    摘要 在这篇论文中,我们引入了指令遵循分数(IFS),这是一个用于检测语言模型遵循指令的度量。IFS有两个目的:首先,它可以用于分辨基础模型和指令模型之间的差异。我们对公开可用的基础模型和指令模型进行了比较,并显示了基础模型和指令模型之间的响应格式化率的差异。其次,IFS可以用作指令训练的早期停止条件。我们对7B和13B LLama模型进行了超参训练(SFT),并计算了IFS,显示了模型在训练过程中早些地学会遵循指令,并且进一步的训练可以导致基础模型 semantics 的改变。我们作为一个例子,显示了模型预测的 объекivity,如定义的auxiliary metric ObjecQA。我们发现,在这种特定情况下,semantic changes 是指令训练过程中最大化的。我们希望通过将 instruct tuning 分解为 IFS 和semantic factors,开启更好控制的 instruct tuning 和设计最小的 instruct 界面,以便查询基础模型。

  • paper_url: http://arxiv.org/abs/2307.02140
  • repo_url: https://github.com/morningd/model-centric-fml
  • paper_authors: Moming Duan
  • for: 提高 Federated Learning (FL) 的应用场景和数据持有者参与度,重新定义当前 FL 框架,并推广其概念到更加通用的 Open Federated Learning Platforms。
  • methods: 提出了两种相互合作框架:查询基于的 FL 和合同基于的 FL,以实现更加开放的 FL 平台。
  • results: 通过技术和法律两个视角进行全面回顾,提出了一系列有价值的主题,如模型库的可用性、法律合规性分析、模型 reuse 中的版权问题和知识产权保护等。特别是,提出了一种新的分类系统,用于整合 batch model reusing 方法中的许可证约束,并且提供了一个系统性的框架,用于 indentifying 可能的法律后果和限制。
    Abstract Traditional Federated Learning (FL) follows a server-domincated cooperation paradigm which narrows the application scenarios of FL and decreases the enthusiasm of data holders to participate. To fully unleash the potential of FL, we advocate rethinking the design of current FL frameworks and extending it to a more generalized concept: Open Federated Learning Platforms. We propose two reciprocal cooperation frameworks for FL to achieve this: query-based FL and contract-based FL. In this survey, we conduct a comprehensive review of the feasibility of constructing an open FL platform from both technical and legal perspectives. We begin by reviewing the definition of FL and summarizing its inherent limitations, including server-client coupling, low model reusability, and non-public. In the query-based FL platform, which is an open model sharing and reusing platform empowered by the community for model mining, we explore a wide range of valuable topics, including the availability of up-to-date model repositories for model querying, legal compliance analysis between different model licenses, and copyright issues and intellectual property protection in model reusing. In particular, we introduce a novel taxonomy to streamline the analysis of model license compatibility in FL studies that involve batch model reusing methods, including combination, amalgamation, distillation, and generation. This taxonomy provides a systematic framework for identifying the corresponding clauses of licenses and facilitates the identification of potential legal implications and restrictions when reusing models. Through this survey, we uncover the the current dilemmas faced by FL and advocate for the development of sustainable open FL platforms. We aim to provide guidance for establishing such platforms in the future, while identifying potential problems and challenges that need to be addressed.
    摘要 传统的联合学习(FL)采用服务器主导的合作方式,这限制了FL应用场景和数据持有者参与的积极性。为了充分发挥FL的潜力,我们提倡重新设计当前FL框架,扩展其为更通用的概念:开放联合学习平台。我们提出了两种对接合作框架,即查询基于的FL和合同基于的FL。在这篇评论中,我们从技术和法律两个角度进行了全面的审查,以确定开放FL平台的可行性。我们首先介绍了FL的定义和其内置的局限性,包括服务器客户端耦合、低模型复用率和非公共。在查询基于的FL平台上,我们探讨了一系列有价值的话题,包括社区驱动的模型共享和再利用,以及模型存储库的可用性、许可证分析和版权问题。其中,我们提出了一种新的分类方法,用于对FL研究中的批量模型复用方法进行许可证兼容性分析。这种分类方法提供了一个系统化的框架,可以帮助identify potential legal implications和restrictions when reusing models。通过这篇评论,我们揭示了当前FL面临的矛盾和挑战,并提出了建立可持续的开放FL平台的建议。我们希望通过这篇评论,为未来建立这些平台提供指南,并 indentify potential problems和挑战需要解决。

Beyond Known Reality: Exploiting Counterfactual Explanations for Medical Research

  • paper_url: http://arxiv.org/abs/2307.02131
  • repo_url: https://github.com/toygarr/counterfactual-explanations-for-medical-research
  • paper_authors: Toygar Tanyel, Serkan Ayvaz, Bilgin Keserci
  • for: 这个研究使用对比性解释来探讨医学研究中的”如果”场景,以扩展我们对现有边界的理解。Specifically, we focus on using MRI features for diagnosing pediatric posterior fossa brain tumors as a case study.
  • methods: 我们的方法利用对比性解释,提供一种新的方式来解释机器学习算法的结果。这些解释提供个性化和上下文特定的理解,帮助验证预测和解释不同情况下的变化。
  • results: 结果表明,对比性解释可以增强人工智能和解释的acceptance在医疗实践中。我们的方法保持了统计学和临床准确性,允许在不同的现实下探讨不同的肿瘤特征。 Additionally, we explore the potential use of counterfactuals for data augmentation and evaluate their feasibility as an alternative approach in medical research.
    Abstract This study employs counterfactual explanations to explore "what if?" scenarios in medical research, with the aim of expanding our understanding beyond existing boundaries. Specifically, we focus on utilizing MRI features for diagnosing pediatric posterior fossa brain tumors as a case study. The field of artificial intelligence and explainability has witnessed a growing number of studies and increasing scholarly interest. However, the lack of human-friendly interpretations in explaining the outcomes of machine learning algorithms has significantly hindered the acceptance of these methods by clinicians in their clinical practice. To address this, our approach incorporates counterfactual explanations, providing a novel way to examine alternative decision-making scenarios. These explanations offer personalized and context-specific insights, enabling the validation of predictions and clarification of variations under diverse circumstances. Importantly, our approach maintains both statistical and clinical fidelity, allowing for the examination of distinct tumor features through alternative realities. Additionally, we explore the potential use of counterfactuals for data augmentation and evaluate their feasibility as an alternative approach in medical research. The results demonstrate the promising potential of counterfactual explanations to enhance trust and acceptance of AI-driven methods in clinical settings.
    摘要 Translation notes:* "counterfactual explanations" Counterfactual explanations are used to explore "what if?" scenarios in medical research, providing personalized and context-specific insights into alternative decision-making scenarios.* "MRI features" 我们使用 MRI 特征来诊断儿童后部脑肿。* "artificial intelligence and explainability" 人工智能和解释性在医学研究中受到了越来越多的研究和学术关注,但是机器学习算法的解释性不够人性化,对临床医生的acceptance带来了很大障碍。* "data augmentation" 我们还探讨了使用 counterfactual explanations 作为数据扩充的可能性,以增强 AI 驱动方法在临床实践中的acceptance。

DARE: Towards Robust Text Explanations in Biomedical and Healthcare Applications

  • paper_url: http://arxiv.org/abs/2307.02094
  • repo_url: https://github.com/ibm/domain-adaptive-attribution-robustness
  • paper_authors: Adam Ivankay, Mattia Rigotti, Pascal Frossard
  • for: 本研究旨在探讨深度神经网络在医学领域中的解释性问题,以及如何提高这些网络的解释性 robustness。
  • methods: 本研究使用了现有的权重robustness估计方法,并提出了一种适应域(DomainAdaptive)的权重robustness估计器(DARE),以及两种方法来 Mitigate brittleness:适应训练和FAR训练。
  • results: 本研究通过广泛的实验 validate了我们的方法,并表明了在三个Established biomedical benchmarks上,我们的方法能够提高深度神经网络的解释性 robustness。
    Abstract Along with the successful deployment of deep neural networks in several application domains, the need to unravel the black-box nature of these networks has seen a significant increase recently. Several methods have been introduced to provide insight into the inference process of deep neural networks. However, most of these explainability methods have been shown to be brittle in the face of adversarial perturbations of their inputs in the image and generic textual domain. In this work we show that this phenomenon extends to specific and important high stakes domains like biomedical datasets. In particular, we observe that the robustness of explanations should be characterized in terms of the accuracy of the explanation in linking a model's inputs and its decisions - faithfulness - and its relevance from the perspective of domain experts - plausibility. This is crucial to prevent explanations that are inaccurate but still look convincing in the context of the domain at hand. To this end, we show how to adapt current attribution robustness estimation methods to a given domain, so as to take into account domain-specific plausibility. This results in our DomainAdaptiveAREstimator (DARE) attribution robustness estimator, allowing us to properly characterize the domain-specific robustness of faithful explanations. Next, we provide two methods, adversarial training and FAR training, to mitigate the brittleness characterized by DARE, allowing us to train networks that display robust attributions. Finally, we empirically validate our methods with extensive experiments on three established biomedical benchmarks.
    摘要 alongside the successful deployment of deep neural networks in several application domains, the need to unravel the black-box nature of these networks has become increasingly important recently. several methods have been proposed to provide insight into the inference process of deep neural networks. however, most of these explainability methods have been shown to be fragile in the face of adversarial perturbations of their inputs in the image and generic textual domain. in this work, we show that this phenomenon extends to specific and important high-stakes domains like biomedical datasets. in particular, we observe that the robustness of explanations should be characterized in terms of the accuracy of the explanation in linking a model's inputs and its decisions - faithfulness - and its relevance from the perspective of domain experts - plausibility. this is crucial to prevent explanations that are inaccurate but still look convincing in the context of the domain at hand. to this end, we show how to adapt current attribution robustness estimation methods to a given domain, so as to take into account domain-specific plausibility. this results in our DomainAdaptiveAREstimator (DARE) attribution robustness estimator, allowing us to properly characterize the domain-specific robustness of faithful explanations. next, we provide two methods, adversarial training and FAR training, to mitigate the brittleness characterized by DARE, allowing us to train networks that display robust attributions. finally, we empirically validate our methods with extensive experiments on three established biomedical benchmarks.

Combating Confirmation Bias: A Unified Pseudo-Labeling Framework for Entity Alignment

  • paper_url: http://arxiv.org/abs/2307.02075
  • repo_url: None
  • paper_authors: Qijie Ding, Jie Yin, Daokun Zhang, Junbin Gao
  • for: 提高实体对Alignment的准确率,排除假标签错误的影响
  • methods: 提posed a Unified Pseudo-Labeling framework for Entity Alignment (UPL-EA),包括OT模型化和跨迭代pseudo标签准确率提高
  • results: 实验结果表明,our approach可以在有限先aligned种子的情况下 achieve competitive performance,并 theoretically supported by our analysis of pseudo-labeling errors.
    Abstract Entity alignment (EA) aims at identifying equivalent entity pairs across different knowledge graphs (KGs) that refer to the same real-world identity. To systematically combat confirmation bias for pseudo-labeling-based entity alignment, we propose a Unified Pseudo-Labeling framework for Entity Alignment (UPL-EA) that explicitly eliminates pseudo-labeling errors to boost the accuracy of entity alignment. UPL-EA consists of two complementary components: (1) The Optimal Transport (OT)-based pseudo-labeling uses discrete OT modeling as an effective means to enable more accurate determination of entity correspondences across two KGs and to mitigate the adverse impact of erroneous matches. A simple but highly effective criterion is further devised to derive pseudo-labeled entity pairs that satisfy one-to-one correspondences at each iteration. (2) The cross-iteration pseudo-label calibration operates across multiple consecutive iterations to further improve the pseudo-labeling precision rate by reducing the local pseudo-label selection variability with a theoretical guarantee. The two components are respectively designed to eliminate Type I and Type II pseudo-labeling errors identified through our analyse. The calibrated pseudo-labels are thereafter used to augment prior alignment seeds to reinforce subsequent model training for alignment inference. The effectiveness of UPL-EA in eliminating pseudo-labeling errors is both theoretically supported and experimentally validated. The experimental results show that our approach achieves competitive performance with limited prior alignment seeds.
    摘要 Entity alignment (EA) targets identifying equivalent entity pairs across different knowledge graphs (KGs) that refer to the same real-world identity. To systematically combat confirmation bias for pseudo-labeling-based entity alignment, we propose a Unified Pseudo-Labeling framework for Entity Alignment (UPL-EA) that explicitly eliminates pseudo-labeling errors to boost the accuracy of entity alignment. UPL-EA consists of two complementary components:1. Optimal Transport (OT)-based pseudo-labeling uses discrete OT modeling as an effective means to enable more accurate determination of entity correspondences across two KGs and to mitigate the adverse impact of erroneous matches. A simple but highly effective criterion is further devised to derive pseudo-labeled entity pairs that satisfy one-to-one correspondences at each iteration.2. The cross-iteration pseudo-label calibration operates across multiple consecutive iterations to further improve the pseudo-labeling precision rate by reducing the local pseudo-label selection variability with a theoretical guarantee. The two components are respectively designed to eliminate Type I and Type II pseudo-labeling errors identified through our analysis.The calibrated pseudo-labels are thereafter used to augment prior alignment seeds to reinforce subsequent model training for alignment inference. The effectiveness of UPL-EA in eliminating pseudo-labeling errors is both theoretically supported and experimentally validated. The experimental results show that our approach achieves competitive performance with limited prior alignment seeds.

Performance Modeling of Data Storage Systems using Generative Models

  • paper_url: http://arxiv.org/abs/2307.02073
  • repo_url: https://github.com/djdprogramming/adfa2
  • paper_authors: Abdalaziz Rashid Al-Maeeni, Aziz Temirkhanov, Artem Ryzhikov, Mikhail Hushchyn
  • For: The paper is written for researchers and practitioners in the field of industrial data analysis, particularly those interested in high-precision modeling of systems and digital twins.* Methods: The paper uses machine learning-based generative models to develop models of a storage system, including probabilistic models for each storage component (HDD and SSD) and RAID schemes.* Results: The paper reports results of experiments demonstrating the accuracy of the predictions made by the models, with errors ranging from 4-10% for IOPS and 3-16% for latency, depending on the components and models used. The predictions show up to 0.99 Pearson correlation with Little’s law, which can be used for unsupervised reliability checks of the models. Additionally, the paper presents novel data sets that can be used for benchmarking regression algorithms, conditional generative models, and uncertainty estimation methods in machine learning.Here is the information in Simplified Chinese text:
  • for: 本研究适用于产业数据分析领域的研究人员和实践者,尤其是高精度系统模型化领域和数字双。
  • methods: 本研究使用机器学习基于生成模型来开发存储系统模型,包括存储Component(HDD和SSD)和RAID schemes的概率模型。
  • results: 实验结果表明模型预测的误差在4-10%(IOPS)和3-16%(延迟)之间,具体取决于组件和模型。预测结果与小的法律之间呈0.99相关性,可以用于无监督可靠性检查。此外,本研究还提供了可用于机器学习回归算法、条件生成模型和不确定性估计方法的新数据集。
    Abstract High-precision modeling of systems is one of the main areas of industrial data analysis. Models of systems, their digital twins, are used to predict their behavior under various conditions. We have developed several models of a storage system using machine learning-based generative models. The system consists of several components: hard disk drive (HDD) and solid-state drive (SSD) storage pools with different RAID schemes and cache. Each storage component is represented by a probabilistic model that describes the probability distribution of the component performance in terms of IOPS and latency, depending on their configuration and external data load parameters. The results of the experiments demonstrate the errors of 4-10 % for IOPS and 3-16 % for latency predictions depending on the components and models of the system. The predictions show up to 0.99 Pearson correlation with Little's law, which can be used for unsupervised reliability checks of the models. In addition, we present novel data sets that can be used for benchmarking regression algorithms, conditional generative models, and uncertainty estimation methods in machine learning.
    摘要 高精度系统模型化是工业数据分析的一个主要领域。系统的数字双(digital twin)被用于预测它们在不同条件下的行为。我们已经开发了一些基于机器学习生成模型的存储系统模型。存储系统由多个组件组成:硬盘驱动器(HDD)和固态驱动器(SSD)存储池,具有不同的RAID方案和缓存。每个存储组件都是一个概率模型,描述了组件性能的概率分布在 terms of IOPS 和延迟,受到配置和外部数据负荷参数的影响。实验结果显示,各种模型的误差为4-10% для IOPS 和 3-16% для 延迟,具体取决于系统的组件和模型。预测结果与 Little's law 的相关性达到0.99,可以用于无监督可靠性检查。此外,我们还提供了新的数据集,可以用于机器学习回归算法、条件生成模型和不确定性估计方法的benchmarking。

A Comparison of Machine Learning Methods for Data with High-Cardinality Categorical Variables

  • paper_url: http://arxiv.org/abs/2307.02071
  • repo_url: https://github.com/fabsig/compare_ml_highcardinality_categorical_variables
  • paper_authors: Fabio Sigrist
  • for: 这篇论文主要是研究高 Cardinality categorical 变量在机器学习模型中的影响。
  • methods: 论文使用了多种机器学习方法,包括树融合和深度神经网络,以及线性混合效应模型。
  • results: 研究发现,机器学习模型含有随机效应可以提高预测精度,而树融合含有随机效应的方法在高 Cardinality categorical 变量下表现较好。
    Abstract High-cardinality categorical variables are variables for which the number of different levels is large relative to the sample size of a data set, or in other words, there are few data points per level. Machine learning methods can have difficulties with high-cardinality variables. In this article, we empirically compare several versions of two of the most successful machine learning methods, tree-boosting and deep neural networks, and linear mixed effects models using multiple tabular data sets with high-cardinality categorical variables. We find that, first, machine learning models with random effects have higher prediction accuracy than their classical counterparts without random effects, and, second, tree-boosting with random effects outperforms deep neural networks with random effects.
    摘要 高级别分类变量是指数据集中样本数少于变量水平的数量相对较多的变量,也就是说,每个水平有少量数据点。机器学习方法可能会对高级别分类变量表示困难。本文通过使用多个标准化数据集来比较多种最successful机器学习方法的实际性能,包括树融合和深度神经网络,以及线性混合模型。我们发现:首先,机器学习模型含随机效应比其经典版本没有随机效应的模型具有更高的预测精度;其次,树融合含随机效应的模型在对高级别分类变量进行预测时表现较好。

Line Graphics Digitization: A Step Towards Full Automation

  • paper_url: http://arxiv.org/abs/2307.02065
  • repo_url: https://github.com/moured/document-graphics-digitization
  • paper_authors: Omar Moured, Jiaming Zhang, Alina Roitberg, Thorsten Schwarz, Rainer Stiefelhagen
  • for: This paper aims to improve the digitization of mathematical graphics, specifically statistical plots, to increase accessibility and reproducibility.
  • methods: The authors introduce the Line Graphics (LG) dataset, which includes pixel-wise annotations of 5 coarse and 10 fine-grained categories, and explore 7 state-of-the-art models to benchmark the dataset.
  • results: The dataset covers 520 images of mathematical graphics from 450 documents across different disciplines, and can support two computer vision tasks: semantic segmentation and object detection. The authors plan to make the dataset, code, and models publicly available to the community.Here’s the same information in Simplified Chinese text:
  • for: 这篇论文目标是提高数学图表的数字化,尤其是统计图表,以提高可访问性和重现性。
  • methods: 作者们引入了Line Graphics(LG)数据集,该数据集包括5个汽车精细类别和10个细化类别的像素级别标注,并探索了7种现状模型来评估数据集。
  • results: 数据集包括450份文档中的520张数学图表,覆盖了不同领域的文献,并可以支持2种计算机视觉任务:semantic segmentation和物体检测。作者们计划将数据集、代码和模型公开给社区。
    Abstract The digitization of documents allows for wider accessibility and reproducibility. While automatic digitization of document layout and text content has been a long-standing focus of research, this problem in regard to graphical elements, such as statistical plots, has been under-explored. In this paper, we introduce the task of fine-grained visual understanding of mathematical graphics and present the Line Graphics (LG) dataset, which includes pixel-wise annotations of 5 coarse and 10 fine-grained categories. Our dataset covers 520 images of mathematical graphics collected from 450 documents from different disciplines. Our proposed dataset can support two different computer vision tasks, i.e., semantic segmentation and object detection. To benchmark our LG dataset, we explore 7 state-of-the-art models. To foster further research on the digitization of statistical graphs, we will make the dataset, code, and models publicly available to the community.
    摘要 数字化文档的普及和可重现性得到了更广泛的访问和使用。自动化文档格式和文本内容的数字化已经是研究的长期焦点,但是关于图形元素,如统计图表,的问题尚未得到充分的研究。在这篇论文中,我们提出了细腻的数字化统计图表的视觉理解任务,并提供了Line Graphics(LG)数据集,该数据集包括精度的像素级别标注5种粗细的类别和10种细腻的类别。我们的数据集包括450份不同领域的文档中的520张数学图表。我们的提posed数据集可以支持两种不同的计算机视觉任务,即semantic segmentation和object detection。为了评估我们的LG数据集,我们探索了7个当前最佳模型。为了推动数字化统计图表领域的进一步研究,我们将在社区中公开数据集、代码和模型。

Emoji Prediction using Transformer Models

  • paper_url: http://arxiv.org/abs/2307.02054
  • repo_url: https://github.com/mnusrat786/emoji-prediction-with-transformer
  • paper_authors: Muhammad Osama Nusrat, Zeeshan Habib, Mehreen Alam, Saad Ahmed Jamal
  • for: 理解在社交媒体中的表情符号含义
  • methods: 使用BERTTransformer预训练语言模型进行表情符号预测
  • results: 实验结果显示,我们的方法在预测表情符号时的准确率超过75%,与当前状态的模型相比有所提高。
    Abstract In recent years, the use of emojis in social media has increased dramatically, making them an important element in understanding online communication. However, predicting the meaning of emojis in a given text is a challenging task due to their ambiguous nature. In this study, we propose a transformer-based approach for emoji prediction using BERT, a widely-used pre-trained language model. We fine-tuned BERT on a large corpus of text containing both text and emojis to predict the most appropriate emoji for a given text. Our experimental results demonstrate that our approach outperforms several state-of-the-art models in predicting emojis with an accuracy of over 75 percent. This work has potential applications in natural language processing, sentiment analysis, and social media marketing.
    摘要 近年来,社交媒体中emoji的使用量有很大的增长,使其成为了在线communication中重要的元素。然而,预测emoji在给定文本中的意思是一项具有挑战性的任务,这是因为emoji具有抽象的特点。在这项研究中,我们提出了基于 transformer 的方法,使用BERT,一种广泛使用的预先训练语言模型,来预测emoji。我们对一大量文本数据进行了微调,以预测文本中的最佳emoji。我们的实验结果表明,我们的方法可以超过75%的精度来预测emoji。这项工作有可能应用于自然语言处理、情感分析和社交媒体营销等领域。

Flowchase: a Mobile Application for Pronunciation Training

  • paper_url: http://arxiv.org/abs/2307.02051
  • repo_url: None
  • paper_authors: Noé Tits, Zoé Broisson
  • for: 提供个性化和实时反馈给英语学习者
  • methods: 使用流处理应用程序和语音技术进行语音分 segment和分析
  • results: 提供了一种基于语音表征学习的机器学习模型,用于设计个性化的反馈Here’s the same information in Traditional Chinese:
  • for: 提供个性化和实时反馈给英语学习者
  • methods: 使用流处理应用程序和语音技术进行语音分 segment和分析
  • results: 提供了一种基于语音表征学习的机器学习模型,用于设计个性化的反馈
    Abstract In this paper, we present a solution for providing personalized and instant feedback to English learners through a mobile application, called Flowchase, that is connected to a speech technology able to segment and analyze speech segmental and supra-segmental features. The speech processing pipeline receives linguistic information corresponding to an utterance to analyze along with a speech sample. After validation of the speech sample, a joint forced-alignment and phonetic recognition is performed thanks to a combination of machine learning models based on speech representation learning that provides necessary information for designing a feedback on a series of segmental and supra-segmental pronunciation aspects.
    摘要 在本文中,我们提出了一种解决方案,通过一款名为Flowchase的移动应用程序,为英语学习者提供个性化和实时反馈。该应用程序与一种可以分 segment和supra-segment 语音特征的语音技术相连接。语音处理管道接受语音词汇和语音样本,并进行验证。然后,通过一种基于语音表示学习的机器学习模型的组合,实现联合强制对齐和音名识别。这些信息对于设计 série of segmental和supra-segmental 发音方面的反馈是必要的。

Recommender Systems in the Era of Large Language Models (LLMs)

  • paper_url: http://arxiv.org/abs/2307.02046
  • repo_url: None
  • paper_authors: Wenqi Fan, Zihuai Zhao, Jiatong Li, Yunqing Liu, Xiaowei Mei, Yiqi Wang, Zhen Wen, Fei Wang, Xiangyu Zhao, Jiliang Tang, Qing Li
  • for: 本研究主要为了提供一个系统性的对 recenser systems中使用大型自然语言模型(LLMs)的审视,以便为相关领域的研究人员提供深入的理解。
  • methods: 本研究主要采用了以下三种方法来使用LLMs进行推荐系统的提升:(1)使用LLMs作为特征编码器来学习用户和Item的表示;(2)使用LLMs进行预训练、细化和提示来提高推荐系统的性能;以及(3)通过提高推荐系统的一致性和可靠性来提高用户体验。
  • results: 本研究发现,通过使用LLMs,可以提高推荐系统的一致性和可靠性,同时提高用户体验。此外,本研究还发现了一些未来的研究方向,包括如何更好地使用LLMs的特征来提高推荐系统的性能,以及如何在不同的推荐场景下使用LLMs来提高推荐系统的一致性和可靠性。
    Abstract With the prosperity of e-commerce and web applications, Recommender Systems (RecSys) have become an important component of our daily life, providing personalized suggestions that cater to user preferences. While Deep Neural Networks (DNNs) have made significant advancements in enhancing recommender systems by modeling user-item interactions and incorporating textual side information, DNN-based methods still face limitations, such as difficulties in understanding users' interests and capturing textual side information, inabilities in generalizing to various recommendation scenarios and reasoning on their predictions, etc. Meanwhile, the emergence of Large Language Models (LLMs), such as ChatGPT and GPT4, has revolutionized the fields of Natural Language Processing (NLP) and Artificial Intelligence (AI), due to their remarkable abilities in fundamental responsibilities of language understanding and generation, as well as impressive generalization and reasoning capabilities. As a result, recent studies have attempted to harness the power of LLMs to enhance recommender systems. Given the rapid evolution of this research direction in recommender systems, there is a pressing need for a systematic overview that summarizes existing LLM-empowered recommender systems, to provide researchers in relevant fields with an in-depth understanding. Therefore, in this paper, we conduct a comprehensive review of LLM-empowered recommender systems from various aspects including Pre-training, Fine-tuning, and Prompting. More specifically, we first introduce representative methods to harness the power of LLMs (as a feature encoder) for learning representations of users and items. Then, we review recent techniques of LLMs for enhancing recommender systems from three paradigms, namely pre-training, fine-tuning, and prompting. Finally, we comprehensively discuss future directions in this emerging field.
    摘要 随着电子商务和网络应用的发展,个性化推荐系统(RecSys)已经成为我们日常生活中的重要组成部分,提供个性化的建议,以适应用户的偏好。深度神经网络(DNNs)已经在提高RecSys方面做出了重要的进步,通过模拟用户 item 交互和 incorporating 文本侧 информацию,但 DNNs 方法仍然存在一些限制,如用户兴趣的理解和文本侧信息的捕捉困难、不同推荐场景的泛化和理解等等。同时,大型自然语言模型(LLMs),如 ChatGPT 和 GPT4,在自然语言处理(NLP)和人工智能(AI)领域中带来了革命,因为它们在语言理解和生成基本责任上表现出色,同时具有卓越的泛化和理解能力。因此,最近的研究尝试使用 LLMs 来改进 RecSys。随着这个研究方向的快速发展,有一定的急需要对 LLM-empowered RecSys 进行系统性的综述,以为相关领域的研究人员提供深入的理解。因此,在这篇论文中,我们进行了全面的 LLM-empowered RecSys 的综述,从多种方面进行了详细的介绍,包括预训练、精度调整和提示等方面。更 specifically,我们首先介绍了使用 LLMs 作为特征编码器来学习用户和商品的表示。然后,我们查看了最近在 RecSys 领域中使用 LLMs 的技术,从三个 paradigms 的角度进行了详细的介绍:预训练、精度调整和提示。最后,我们进行了全面的未来方向的讨论。

Fisher-Weighted Merge of Contrastive Learning Models in Sequential Recommendation

  • paper_url: http://arxiv.org/abs/2307.05476
  • repo_url: None
  • paper_authors: Jung Hyun Ryu, Jaeheyoung Jeon, Jewoong Cho, Myungjoo Kang 1
  • for: 这个研究旨在应对在线平台和服务的推荐系统中的推荐探索,尤其是游戏和社交媒体等领域的发展。
  • methods: 本研究使用了对照学习方法,对于时间序列推荐的挑战进行了解决。特别是通过融合多个模型的参数,以提高推荐系统的总性能。
  • results: 经过广泛的实验证明,我们的提案方法能够实现更好的推荐性能,并且显示出在时间序列推荐领域的应用前景。
    Abstract Along with the exponential growth of online platforms and services, recommendation systems have become essential for identifying relevant items based on user preferences. The domain of sequential recommendation aims to capture evolving user preferences over time. To address dynamic preference, various contrastive learning methods have been proposed to target data sparsity, a challenge in recommendation systems due to the limited user-item interactions. In this paper, we are the first to apply the Fisher-Merging method to Sequential Recommendation, addressing and resolving practical challenges associated with it. This approach ensures robust fine-tuning by merging the parameters of multiple models, resulting in improved overall performance. Through extensive experiments, we demonstrate the effectiveness of our proposed methods, highlighting their potential to advance the state-of-the-art in sequential learning and recommendation systems.
    摘要 随着在线平台和服务的激增,推荐系统已成为根据用户偏好 identify 相关项目的重要工具。Sequential recommendation 领域targets evolving user preferences over time。为了Address dynamic preference, various contrastive learning methods have been proposed to address data sparsity, a challenge in recommendation systems due to limited user-item interactions. In this paper, we are the first to apply the Fisher-Merging method to Sequential Recommendation, addressing and resolving practical challenges associated with it. This approach ensures robust fine-tuning by merging the parameters of multiple models, resulting in improved overall performance. Through extensive experiments, we demonstrate the effectiveness of our proposed methods, highlighting their potential to advance the state-of-the-art in sequential learning and recommendation systems.Here's a word-for-word translation of the text into Simplified Chinese:随着在线平台和服务的激增,推荐系统已成为根据用户偏好 identify 相关项目的重要工具。Sequential recommendation 领域targets evolving user preferences over time。为了Address dynamic preference, various contrastive learning methods have been proposed to address data sparsity, a challenge in recommendation systems due to limited user-item interactions. In this paper, we are the first to apply the Fisher-Merging method to Sequential Recommendation, addressing and resolving practical challenges associated with it. This approach ensures robust fine-tuning by merging the parameters of multiple models, resulting in improved overall performance. Through extensive experiments, we demonstrate the effectiveness of our proposed methods, highlighting their potential to advance the state-of-the-art in sequential learning and recommendation systems.

VertiBench: Advancing Feature Distribution Diversity in Vertical Federated Learning Benchmarks

  • paper_url: http://arxiv.org/abs/2307.02040
  • repo_url: None
  • paper_authors: Zhaomin Wu, Junyi Hou, Bingsheng He
  • for: 提高 Vertical Federated Learning (VFL) 的研究和评估,因为 Privacy 限制,现有的公共实验室 VFL 数据集很少,这些数据集只能反映有限的特征分布。
  • methods: 本文提出了两个关键因素影响 VFL 性能:特征重要性和特征相关性,并提出了相应的评估指标和数据分割方法。
  • results: 本文对 cutting-edge VFL 算法进行了全面的评估,提供了 valuable 的研究指导,包括特征重要性和特征相关性对 VFL 性能的影响,以及实际 VFL 数据集的效果。
    Abstract Vertical Federated Learning (VFL) is a crucial paradigm for training machine learning models on feature-partitioned, distributed data. However, due to privacy restrictions, few public real-world VFL datasets exist for algorithm evaluation, and these represent a limited array of feature distributions. Existing benchmarks often resort to synthetic datasets, derived from arbitrary feature splits from a global set, which only capture a subset of feature distributions, leading to inadequate algorithm performance assessment. This paper addresses these shortcomings by introducing two key factors affecting VFL performance - feature importance and feature correlation - and proposing associated evaluation metrics and dataset splitting methods. Additionally, we introduce a real VFL dataset to address the deficit in image-image VFL scenarios. Our comprehensive evaluation of cutting-edge VFL algorithms provides valuable insights for future research in the field.
    摘要 竖向联合学习(VFL)是一种重要的训练机器学习模型的方法,其中数据分布在不同的Feature中。然而,由于隐私限制,公共世界上存在的VFL数据集很少,这些数据集通常只表示一小部分的特征分布。现有的比赛通常会 resort to synthetic datasets,这些数据集来自于arbitrary的特征分裂,只 capture一部分的特征分布,导致评估算法性能的不准确。本文通过引入特征重要性和特征相关性两个关键因素,并提出相应的评估指标和数据分割方法来解决这些缺陷。此外,我们还引入了一个真实的VFL数据集,以解决图像-图像VFL场景中的欠缺。我们的全面评估 cutting-edge VFL 算法提供了有价值的意义,对未来在这个领域的研究做出了重要的贡献。

EHRSHOT: An EHR Benchmark for Few-Shot Evaluation of Foundation Models

  • paper_url: http://arxiv.org/abs/2307.02028
  • repo_url: https://github.com/som-shahlab/ehrshot-benchmark
  • paper_authors: Michael Wornow, Rahul Thapa, Ethan Steinberg, Jason Fries, Nigam Shah
  • For: The paper is written to address the challenges of applying machine learning (ML) in healthcare, specifically the lack of shared assets such as datasets, tasks, and models.* Methods: The paper contributes three main contributions: (1) a new dataset called EHRSHOT, which contains de-identified structured data from 6,712 patients’ electronic health records (EHRs) at Stanford Medicine; (2) the weights of a 141M parameter clinical foundation model pretrained on the structured EHR data of 2.57M patients; and (3) 15 few-shot clinical prediction tasks to evaluate the performance of foundation models.* Results: The paper provides an end-to-end pipeline for the community to validate and build upon the performance of the clinical foundation model, and defines 15 few-shot clinical prediction tasks to evaluate the model’s ability to adapt to new tasks with limited training data.
    Abstract While the general machine learning (ML) community has benefited from public datasets, tasks, and models, the progress of ML in healthcare has been hampered by a lack of such shared assets. The success of foundation models creates new challenges for healthcare ML by requiring access to shared pretrained models to validate performance benefits. We help address these challenges through three contributions. First, we publish a new dataset, EHRSHOT, containing de-identified structured data from the electronic health records (EHRs) of 6,712 patients from Stanford Medicine. Unlike MIMIC-III/IV and other popular EHR datasets, EHRSHOT is longitudinal and not restricted to ICU/ED patients. Second, we publish the weights of a 141M parameter clinical foundation model pretrained on the structured EHR data of 2.57M patients. We are one of the first to fully release such a model for coded EHR data; in contrast, most prior models released for clinical data (e.g. GatorTron, ClinicalBERT) only work with unstructured text and cannot process the rich, structured data within an EHR. We provide an end-to-end pipeline for the community to validate and build upon its performance. Third, we define 15 few-shot clinical prediction tasks, enabling evaluation of foundation models on benefits such as sample efficiency and task adaption. The code to reproduce our results, as well as the model and dataset (via a research data use agreement), are available at our Github repo here: https://github.com/som-shahlab/ehrshot-benchmark
    摘要 在通用机器学习(ML)社区中,公共数据集、任务和模型的进步带来了很多利好,但是医疗机器学习(ML)的进步却受到了共享资产的缺乏的限制。成功的基础模型创造了新的挑战,需要访问共享预训练模型来验证性能效果。我们通过以下三个贡献来解决这些挑战:1. 我们发布了一个新的数据集,名为EHRSHOT,其包含了医学记录系统(EHR)中6,712名患者的去推定结构化数据。与MIMIC-III/IV和其他流行的EHR数据集不同,EHRSHOT是长期的,而不是仅仅局部ICU/ED病人。2. 我们发布了一个141M参数的临床基础模型,该模型在6.27M名患者的结构化EHR数据上进行了预训练。我们是第一个完全发布codified EHR数据的基础模型的一个,而不是只能处理文本数据,并且可以处理医疗记录中的丰富结构数据。我们提供了一个综合的执行管道,让社区可以验证和建立其性能。3. 我们定义了15个几个shot临床预测任务,以评估基础模型的样本效率和任务适应性。我们在Github上提供了代码重现结果,以及模型和数据(通过研究数据使用协议),请参考以下链接:https://github.com/som-shahlab/ehrshot-benchmark。

Generative Adversarial Networks for Dental Patient Identity Protection in Orthodontic Educational Imaging

  • paper_url: http://arxiv.org/abs/2307.02019
  • repo_url: None
  • paper_authors: Mingchuan Tian, Wilson Weixun Lu, Kelvin Weng Chiong Foong, Eugene Loh
  • For: This research aims to develop a novel area-preserving Generative Adversarial Networks (GAN) inversion technique for effectively de-identifying dental patient images, addressing privacy concerns while preserving key dental features.* Methods: The enhanced GAN Inversion methodology incorporates several deep learning models to provide end-to-end development guidance and practical application for image de-identification, adapting the context from one image to another while preserving dental features essential for oral diagnosis and dental education.* Results: The approach was assessed with varied facial pictures and achieved effective de-identification, maintaining the realism of important dental features, and was deemed useful for dental diagnostics and education by a panel of five clinicians. The generated images can be used to streamline the de-identification process of dental patient images, enhance efficiency in dental education, and create de-identified datasets for broader 2D image research.
    Abstract Objectives: This research introduces a novel area-preserving Generative Adversarial Networks (GAN) inversion technique for effectively de-identifying dental patient images. This innovative method addresses privacy concerns while preserving key dental features, thereby generating valuable resources for dental education and research. Methods: We enhanced the existing GAN Inversion methodology to maximize the preservation of dental characteristics within the synthesized images. A comprehensive technical framework incorporating several deep learning models was developed to provide end-to-end development guidance and practical application for image de-identification. Results: Our approach was assessed with varied facial pictures, extensively used for diagnosing skeletal asymmetry and facial anomalies. Results demonstrated our model's ability to adapt the context from one image to another, maintaining compatibility, while preserving dental features essential for oral diagnosis and dental education. A panel of five clinicians conducted an evaluation on a set of original and GAN-processed images. The generated images achieved effective de-identification, maintaining the realism of important dental features and were deemed useful for dental diagnostics and education. Clinical Significance: Our GAN model and the encompassing framework can streamline the de-identification process of dental patient images, enhancing efficiency in dental education. This method improves students' diagnostic capabilities by offering more exposure to orthodontic malocclusions. Furthermore, it facilitates the creation of de-identified datasets for broader 2D image research at major research institutions.
    摘要 目标:本研究提出了一种新的面积保持式生成对抗网络(GAN)反转技术,用于有效地去标识牙科病人图像,同时保持牙科特征。这种创新方法解决了隐私问题,并为牙科教育和研究提供了有价值的资源。方法:我们对现有的GAN反转方法进行了改进,以最大化保留牙科特征在合成图像中。我们开发了一个包含多个深度学习模型的技术框架,以提供综合的开发指南和实践应用于图像去标识。结果:我们的方法在各种facial picture上进行了评估,广泛用于诊断骨骼偏位和脸部异常。结果表明我们的模型能够将上下文从一个图像传递到另一个图像,保持兼容性,同时保留牙科特征,这些特征对于牙科诊断和牙科教育非常重要。一组五名临床医生对一组原始图像和GAN处理后的图像进行评估。生成的图像实现了有效的去标识,保留了牙科特征的真实性,并被评估为用于牙科诊断和牙科教育中有用。临床意义:我们的GAN模型和包含它的框架可以快速减少牙科病人图像去标识的过程,提高牙科教育的效率。这种方法可以帮助学生提高诊断能力,并为更广泛的2D图像研究提供匿名化数据。

Comparative Analysis of GPT-4 and Human Graders in Evaluating Praise Given to Students in Synthetic Dialogues

  • paper_url: http://arxiv.org/abs/2307.02018
  • repo_url: None
  • paper_authors: Dollaya Hirunyasiri, Danielle R. Thomas, Jionghao Lin, Kenneth R. Koedinger, Vincent Aleven
  • for: 本研究旨在评估一种使用大语言模型提供对教师表现的特点反馈方法,以提高教师表现。
  • methods: 本研究使用GPT-4生成30个对话,并采用零例链式思维和几例链式思维两种提示方法来评估GPT-4对特定褒奖要素的识别率。
  • results: GPT-4在特定褒奖要素上表现moderately well,但在识别教师的诚恳褒奖语言方面表现不佳,尤其在零例提示场景下。
    Abstract Research suggests that providing specific and timely feedback to human tutors enhances their performance. However, it presents challenges due to the time-consuming nature of assessing tutor performance by human evaluators. Large language models, such as the AI-chatbot ChatGPT, hold potential for offering constructive feedback to tutors in practical settings. Nevertheless, the accuracy of AI-generated feedback remains uncertain, with scant research investigating the ability of models like ChatGPT to deliver effective feedback. In this work-in-progress, we evaluate 30 dialogues generated by GPT-4 in a tutor-student setting. We use two different prompting approaches, the zero-shot chain of thought and the few-shot chain of thought, to identify specific components of effective praise based on five criteria. These approaches are then compared to the results of human graders for accuracy. Our goal is to assess the extent to which GPT-4 can accurately identify each praise criterion. We found that both zero-shot and few-shot chain of thought approaches yield comparable results. GPT-4 performs moderately well in identifying instances when the tutor offers specific and immediate praise. However, GPT-4 underperforms in identifying the tutor's ability to deliver sincere praise, particularly in the zero-shot prompting scenario where examples of sincere tutor praise statements were not provided. Future work will focus on enhancing prompt engineering, developing a more general tutoring rubric, and evaluating our method using real-life tutoring dialogues.
    摘要 In this study, we evaluated 30 dialogues generated by GPT-4 in a tutor-student setting using two different prompting approaches: the zero-shot chain of thought and the few-shot chain of thought. We used five criteria to identify specific components of effective praise. We compared the results of human graders to assess the extent to which GPT-4 can accurately identify each praise criterion.We found that both zero-shot and few-shot chain of thought approaches yielded comparable results. GPT-4 performed moderately well in identifying instances when the tutor offers specific and immediate praise. However, GPT-4 underperformed in identifying the tutor's ability to deliver sincere praise, particularly when examples of sincere tutor praise statements were not provided in the zero-shot prompting scenario.Future work will focus on enhancing prompt engineering, developing a more general tutoring rubric, and evaluating our method using real-life tutoring dialogues.

Evaluating the Effectiveness of Large Language Models in Representing Textual Descriptions of Geometry and Spatial Relations

  • paper_url: http://arxiv.org/abs/2307.03678
  • repo_url: None
  • paper_authors: Yuhan Ji, Song Gao
  • for: 本研究旨在评估大语言模型(LLM)对几何体和其空间关系的表示能力。
  • methods: 研究使用GPT-2和BERT等大语言模型对几何体的WKT格式编码,然后将其编码 feed into类фика器和回归器来评估LLM生成的几何特征的有效性。
  • results: 实验显示,LLM生成的几何特征可以保持几何类型和捕捉一定的空间关系(准确率达73%),但还存在 numeric 值的估计和空间相关对象的检索等挑战。
    Abstract This research focuses on assessing the ability of large language models (LLMs) in representing geometries and their spatial relations. We utilize LLMs including GPT-2 and BERT to encode the well-known text (WKT) format of geometries and then feed their embeddings into classifiers and regressors to evaluate the effectiveness of the LLMs-generated embeddings for geometric attributes. The experiments demonstrate that while the LLMs-generated embeddings can preserve geometry types and capture some spatial relations (up to 73% accuracy), challenges remain in estimating numeric values and retrieving spatially related objects. This research highlights the need for improvement in terms of capturing the nuances and complexities of the underlying geospatial data and integrating domain knowledge to support various GeoAI applications using foundation models.
    摘要

STS-CCL: Spatial-Temporal Synchronous Contextual Contrastive Learning for Urban Traffic Forecasting

  • paper_url: http://arxiv.org/abs/2307.02507
  • repo_url: None
  • paper_authors: Lincan Li, Kaixiang Yang, Fengji Luo, Jichao Bi
  • for: 这篇论文是为了解决大规模未标注交通数据中的复杂空间时间表示问题而写的。
  • methods: 本论文使用了高级对比学习方法,提出了一种新的空间时间同步Contextual Contrastive Learning(STS-CCL)模型。该模型首先提出了基本和强对抗方法 для空间时间图数据,并使用了一种学习型动态图视图生成器来进行自适应对抗。其次,该模型引入了空间时间同步对比模块(STS-CM),同时捕捉到良好的空间时间关系,并实现了图级对抗。为了进一步鉴别节点个体在负滤波中,该模型还设计了基于semantic特征和空间异质性的semantic Contextual Contrastive方法,实现了节点级对抗学习和负滤波。最后,该模型采用了一种困难的双向视图对抗训练方案,并将经典对抗损失扩展到一个整合目标函数,从而获得更好的性能。
  • results: 根据广泛的实验和评估表明,建立基于STS-CCL对抗学习模型的预测器,在交通预测任务中表现出了superior性能,胜过现有的交通预测标准准。此外,STS-CCL模型对大量的未标注交通数据进行了高效地学习和预测,因此非常适合具有数据缺乏问题的其他空间时间任务。
    Abstract Efficiently capturing the complex spatiotemporal representations from large-scale unlabeled traffic data remains to be a challenging task. In considering of the dilemma, this work employs the advanced contrastive learning and proposes a novel Spatial-Temporal Synchronous Contextual Contrastive Learning (STS-CCL) model. First, we elaborate the basic and strong augmentation methods for spatiotemporal graph data, which not only perturb the data in terms of graph structure and temporal characteristics, but also employ a learning-based dynamic graph view generator for adaptive augmentation. Second, we introduce a Spatial-Temporal Synchronous Contrastive Module (STS-CM) to simultaneously capture the decent spatial-temporal dependencies and realize graph-level contrasting. To further discriminate node individuals in negative filtering, a Semantic Contextual Contrastive method is designed based on semantic features and spatial heterogeneity, achieving node-level contrastive learning along with negative filtering. Finally, we present a hard mutual-view contrastive training scheme and extend the classic contrastive loss to an integrated objective function, yielding better performance. Extensive experiments and evaluations demonstrate that building a predictor upon STS-CCL contrastive learning model gains superior performance than existing traffic forecasting benchmarks. The proposed STS-CCL is highly suitable for large datasets with only a few labeled data and other spatiotemporal tasks with data scarcity issue.
    摘要 efficiently capturing the complex spatiotemporal representations from large-scale unlabeled traffic data remains a challenging task. To address this issue, this work proposes a novel Spatial-Temporal Synchronous Contextual Contrastive Learning (STS-CCL) model.First, we introduce advanced contrastive learning techniques for spatiotemporal graph data, including basic and strong augmentation methods that perturb the data in terms of graph structure and temporal characteristics. Additionally, we employ a learning-based dynamic graph view generator for adaptive augmentation.Second, we introduce a Spatial-Temporal Synchronous Contrastive Module (STS-CM) to simultaneously capture decent spatial-temporal dependencies and realize graph-level contrasting. To further discriminate node individuals in negative filtering, we design a Semantic Contextual Contrastive method based on semantic features and spatial heterogeneity, achieving node-level contrastive learning along with negative filtering.Finally, we present a hard mutual-view contrastive training scheme and extend the classic contrastive loss to an integrated objective function, yielding better performance. Extensive experiments and evaluations demonstrate that building a predictor upon STS-CCL contrastive learning model gains superior performance than existing traffic forecasting benchmarks. The proposed STS-CCL is highly suitable for large datasets with only a few labeled data and other spatiotemporal tasks with data scarcity issues.

The KiTS21 Challenge: Automatic segmentation of kidneys, renal tumors, and renal cysts in corticomedullary-phase CT

  • paper_url: http://arxiv.org/abs/2307.01984
  • repo_url: https://github.com/neheller/kits21
  • paper_authors: Nicholas Heller, Fabian Isensee, Dasha Trofimova, Resha Tejpaul, Zhongchen Zhao, Huai Chen, Lisheng Wang, Alex Golts, Daniel Khapun, Daniel Shats, Yoel Shoshan, Flora Gilboa-Solomon, Yasmeen George, Xi Yang, Jianpeng Zhang, Jing Zhang, Yong Xia, Mengran Wu, Zhiyang Liu, Ed Walczak, Sean McSweeney, Ranveer Vasdev, Chris Hornung, Rafat Solaiman, Jamee Schoephoerster, Bailey Abernathy, David Wu, Safa Abdulkadir, Ben Byun, Justice Spriggs, Griffin Struyk, Alexandra Austin, Ben Simpson, Michael Hagstrom, Sierra Virnig, John French, Nitin Venkatesh, Sarah Chan, Keenan Moore, Anna Jacobsen, Susan Austin, Mark Austin, Subodh Regmi, Nikolaos Papanikolopoulos, Christopher Weight
  • for: The paper is written for the 2021 Kidney and Kidney Tumor Segmentation Challenge (KiTS21) and the 2021 international conference on Medical Image Computing and Computer Assisted Interventions (MICCAI).
  • methods: The paper uses a novel annotation method that collects three separate annotations for each region of interest, and these annotations are performed in a fully transparent setting using a web-based annotation tool.
  • results: The top-performing teams achieved a significant improvement over the state of the art set in 2019, and this performance is shown to inch ever closer to human-level performance.Here are the three key points in Simplified Chinese text:
  • for: 这篇论文是为2021年的肾脏和肾肿瘤分割挑战(KiTS21)和2021年的医学图像计算和助け手技术国际会议(MICCAI)而写的。
  • methods: 这篇论文使用了一种新的注释方法,收集了每个区域兴趣的三个独立注释,并使用了一个基于网络的注释工具进行完全透明的注释。
  • results: Top Performing Teams的性能得到了2019年的状态前进的显著改进,并且这种性能在人类水平附近。
    Abstract This paper presents the challenge report for the 2021 Kidney and Kidney Tumor Segmentation Challenge (KiTS21) held in conjunction with the 2021 international conference on Medical Image Computing and Computer Assisted Interventions (MICCAI). KiTS21 is a sequel to its first edition in 2019, and it features a variety of innovations in how the challenge was designed, in addition to a larger dataset. A novel annotation method was used to collect three separate annotations for each region of interest, and these annotations were performed in a fully transparent setting using a web-based annotation tool. Further, the KiTS21 test set was collected from an outside institution, challenging participants to develop methods that generalize well to new populations. Nonetheless, the top-performing teams achieved a significant improvement over the state of the art set in 2019, and this performance is shown to inch ever closer to human-level performance. An in-depth meta-analysis is presented describing which methods were used and how they faired on the leaderboard, as well as the characteristics of which cases generally saw good performance, and which did not. Overall KiTS21 facilitated a significant advancement in the state of the art in kidney tumor segmentation, and provides useful insights that are applicable to the field of semantic segmentation as a whole.
    摘要 Translation notes:* "Kidney and Kidney Tumor Segmentation Challenge" (KiTS21) was translated as "肾肿瘤分割挑战" (KiTS21)* "International Conference on Medical Image Computing and Computer Assisted Interventions" (MICCAI) was translated as "医学图像计算和计算助け手动 intervención国际会议" (MICCAI)* "novel annotation method" was translated as "新的注释方法"* "transparent setting" was translated as "透明的设定"* "web-based annotation tool" was translated as "基于网络的注释工具"* "outside institution" was translated as "外部机构"* "state of the art set" was translated as "现状下的最佳集"* "human-level performance" was translated as "人类水平的性能"* "detailed meta-analysis" was translated as "详细的元分析"* "insights" was translated as "思想"

SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis

  • paper_url: http://arxiv.org/abs/2307.01952
  • repo_url: https://github.com/stability-ai/generative-models
  • paper_authors: Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna, Robin Rombach
  • For: 本研究旨在提出一种基于潜在扩散模型的文本到图像生成模型,即SDXL。相比之前的稳定扩散模型,SDXL具有更大的UNet准备模型,主要是因为更多的注意块和更大的跨注意Context。* Methods: 本研究使用了多种新的conditioning方案,并在多个方向比例上训练了SDXL。此外,我们还提出了一种图像质量改进模型,用于通过后期image-to-image技术提高SDXL生成的图像的视觉准确性。* Results: 我们的实验结果表明,SDXL在比前一个版本的稳定扩散模型和黑盒状态的图像生成器中表现出了明显的改进,并达到了与之相当的图像质量。此外,我们还提供了模型代码和参数Weight,以便其他研究人员可以进行更多的研究和应用。
    Abstract We present SDXL, a latent diffusion model for text-to-image synthesis. Compared to previous versions of Stable Diffusion, SDXL leverages a three times larger UNet backbone: The increase of model parameters is mainly due to more attention blocks and a larger cross-attention context as SDXL uses a second text encoder. We design multiple novel conditioning schemes and train SDXL on multiple aspect ratios. We also introduce a refinement model which is used to improve the visual fidelity of samples generated by SDXL using a post-hoc image-to-image technique. We demonstrate that SDXL shows drastically improved performance compared the previous versions of Stable Diffusion and achieves results competitive with those of black-box state-of-the-art image generators. In the spirit of promoting open research and fostering transparency in large model training and evaluation, we provide access to code and model weights at https://github.com/Stability-AI/generative-models
    摘要 我们介绍SDXL,一个文本至图生成模型。与过去版本的稳定扩散模型相比,SDXL具有三倍大的UNet底层:模型参数的增加主要是因为更多的注意块和更大的跨注意条件。我们设计了多种新的条件定义方案,并将SDXL训练在多个比例的方向上。我们还引入了一个修正模型,用于在SDXL生成的样本中提高视觉实际性,使用后置的图像至图像技术。我们示出SDXL与过去版本的稳定扩散模型相比,有着截然改善的性能,并与黑盒类型的图像生成器的结果竞争。为了促进开放性的研究和透明度在大型模型训练和评估中,我们在https://github.com/Stability-AI/generative-models提供了代码和模型的重量。

A Neural Collapse Perspective on Feature Evolution in Graph Neural Networks

  • paper_url: http://arxiv.org/abs/2307.01951
  • repo_url: https://github.com/kvignesh1420/gnn_collapse
  • paper_authors: Vignesh Kothapalli, Tom Tirer, Joan Bruna
  • for: 这篇论文主要是关于图结构数据的分类任务中,图内节点的分类。
  • methods: 该论文使用了图神经网络(GNNs),并进行了一系列的实验和理论分析,以了解图结构和特征的演化关系。
  • results: 研究发现,在图内节点分类任务中,也存在类似于“神经崩溃”(NC)现象,即在训练深度学习模型后,特征的 innerhalb-class variability 减少,类征的 class means 更加倾斜于某些对称结构。此外,研究还发现,这种现象不仅存在于图内节点分类任务中,还存在于图上的其他任务中。
    Abstract Graph neural networks (GNNs) have become increasingly popular for classification tasks on graph-structured data. Yet, the interplay between graph topology and feature evolution in GNNs is not well understood. In this paper, we focus on node-wise classification, illustrated with community detection on stochastic block model graphs, and explore the feature evolution through the lens of the "Neural Collapse" (NC) phenomenon. When training instance-wise deep classifiers (e.g. for image classification) beyond the zero training error point, NC demonstrates a reduction in the deepest features' within-class variability and an increased alignment of their class means to certain symmetric structures. We start with an empirical study that shows that a decrease in within-class variability is also prevalent in the node-wise classification setting, however, not to the extent observed in the instance-wise case. Then, we theoretically study this distinction. Specifically, we show that even an "optimistic" mathematical model requires that the graphs obey a strict structural condition in order to possess a minimizer with exact collapse. Interestingly, this condition is viable also for heterophilic graphs and relates to recent empirical studies on settings with improved GNNs' generalization. Furthermore, by studying the gradient dynamics of the theoretical model, we provide reasoning for the partial collapse observed empirically. Finally, we present a study on the evolution of within- and between-class feature variability across layers of a well-trained GNN and contrast the behavior with spectral methods.
    摘要 граф neural networks (GNNs) 在图structured data 上的分类任务中越来越受欢迎。然而,graph topology和特征进化在 GNNs 中的关系还不够清楚。在这篇论文中,我们将关注 node-wise 分类,通过社区探测在 Stochastic Block Model 图上进行示例,并通过 "Neural Collapse" (NC) 现象来探究特征进化。在训练深度分类器(例如图像分类) beyond the zero training error point 时,NC 表明深度特征中的 Within-class 变化减少,并且类别的中心对某些对称结构进行了Alignment。我们开始了一个实验,显示在 node-wise 分类 setting 中也存在类似的减少 Within-class 变化,但不如 instance-wise 情况那么严重。然后,我们进行了理论研究。具体来说,我们表明了在 graphs obey 一定的 strict structural condition 下,才能存在一个减少 Within-class 变化的最优解。这种condition 适用于 heterophilic graphs 也,与 latest empirical studies on improved GNNs 的泛化有关。此外,我们通过研究理论模型的梯度动态,提供了对 partial collapse 的解释。最后,我们展示了一个 well-trained GNN 的层次特征变化的进化,并与 spectral methods 进行了比较。

Causal Video Summarizer for Video Exploration

  • paper_url: http://arxiv.org/abs/2307.01947
  • repo_url: None
  • paper_authors: Jia-Hong Huang, Chao-Han Huck Yang, Pin-Yu Chen, Andrew Brown, Marcel Worring
  • for: 本研究旨在提高视频探索效果,通过多模态视频概要来解决传统视频概要模型具有固定概要,不符合用户特定需求的问题。
  • methods: 本研究提出了一种新的 causality-based 方法,名为 causal video summarizer (CVS),以有效地捕捉视频和查询之间的交互信息,解决多模态视频概要任务。CVS 方法包括一个概率编码器和一个概率解码器。
  • results: 根据现有的多模态视频概要数据集的评估结果,实验结果显示,提出的方法在比较 state-of-the-art 方法时,具有+5.4% 的准确率和+4.92% 的 F 1 分数的提升。
    Abstract Recently, video summarization has been proposed as a method to help video exploration. However, traditional video summarization models only generate a fixed video summary which is usually independent of user-specific needs and hence limits the effectiveness of video exploration. Multi-modal video summarization is one of the approaches utilized to address this issue. Multi-modal video summarization has a video input and a text-based query input. Hence, effective modeling of the interaction between a video input and text-based query is essential to multi-modal video summarization. In this work, a new causality-based method named Causal Video Summarizer (CVS) is proposed to effectively capture the interactive information between the video and query to tackle the task of multi-modal video summarization. The proposed method consists of a probabilistic encoder and a probabilistic decoder. Based on the evaluation of the existing multi-modal video summarization dataset, experimental results show that the proposed approach is effective with the increase of +5.4% in accuracy and +4.92% increase of F 1- score, compared with the state-of-the-art method.
    摘要 最近,视频概要提出了方法,以便视频探索。然而,传统的视频概要模型只生成固定的视频概要,通常与用户特定需求独立,因此限制了视频探索的效iveness。多模式视频概要是解决这一问题的一种方法。多模式视频概要需要视频输入和文本基于查询输入。因此,模型视频输入和查询输入之间的交互信息的有效模型化是多模式视频概要的关键。在这项工作中,一种新的 causality-based 方法named Causal Video Summarizer (CVS) 被提出,以便有效地捕捉视频和查询之间的交互信息,解决多模式视频概要任务。该方法包括一个 probabilistic encoder 和一个 probabilistic decoder。根据现有的多模式视频概要数据集的评估结果,实验结果显示,提posed方法比状态之前的方法增加了 +5.4% 的准确率和 +4.92% 的 F 1- 分数。

Query-based Video Summarization with Pseudo Label Supervision

  • paper_url: http://arxiv.org/abs/2307.01945
  • repo_url: None
  • paper_authors: Jia-Hong Huang, Luka Murn, Marta Mrak, Marcel Worring
  • for: This paper is written for improving the performance of deep video summarization models by using self-supervised learning and pseudo labels to address the data sparsity challenge.
  • methods: The paper proposes using segment-level pseudo labels generated based on existing human-defined frame-level labels to pre-train a supervised deep model. It also introduces a semantics booster to generate context-aware query representations and mutual attention to capture interactive information between visual and textual modalities.
  • results: The proposed video summarization algorithm achieves state-of-the-art performance on three commonly-used video summarization benchmarks.Here is the information in Simplified Chinese text:
  • for: 这篇论文是为了通过自我指导学习和 pseudo labels 来提高深度视频概要模型的性能,解决数据稀缺问题。
  • methods: 论文提议使用基于现有人 definED 帧级标签的 segment-level pseudo labels 来预训练一个深度模型。它还引入了 semantics booster 来生成上下文感知的查询表示,以及 mutual attention 来捕捉视觉和文本模式之间的互动信息。
  • results: 提posed的视频概要算法在三个常用的视频概要 benchmark 上达到了国际最佳性能。
    Abstract Existing datasets for manually labelled query-based video summarization are costly and thus small, limiting the performance of supervised deep video summarization models. Self-supervision can address the data sparsity challenge by using a pretext task and defining a method to acquire extra data with pseudo labels to pre-train a supervised deep model. In this work, we introduce segment-level pseudo labels from input videos to properly model both the relationship between a pretext task and a target task, and the implicit relationship between the pseudo label and the human-defined label. The pseudo labels are generated based on existing human-defined frame-level labels. To create more accurate query-dependent video summaries, a semantics booster is proposed to generate context-aware query representations. Furthermore, we propose mutual attention to help capture the interactive information between visual and textual modalities. Three commonly-used video summarization benchmarks are used to thoroughly validate the proposed approach. Experimental results show that the proposed video summarization algorithm achieves state-of-the-art performance.
    摘要 Currently available datasets for manually labeled query-based video summarization are limited and expensive, which hinders the performance of supervised deep video summarization models. Self-supervision can address this issue by using a pretext task and acquiring extra data with pseudo labels to pre-train a supervised deep model. In this work, we introduce segment-level pseudo labels from input videos to properly model the relationship between the pretext task and the target task, as well as the implicit relationship between the pseudo label and the human-defined label. The pseudo labels are generated based on existing human-defined frame-level labels. To create more accurate query-dependent video summaries, a semantics booster is proposed to generate context-aware query representations. Additionally, we propose mutual attention to capture the interactive information between visual and textual modalities. Three commonly-used video summarization benchmarks are used to thoroughly validate the proposed approach. Experimental results show that the proposed video summarization algorithm achieves state-of-the-art performance.Here's the translation in Traditional Chinese:现有的手动标注的查询基本视频摘要数据集是有限且昂贵的,这限制了深度视频摘要模型的表现。自我超vision可以解决这个问题,使用预text任务和获取额外数据的pseudo标签进行预训深度模型。在这个工作中,我们引入input视频中的段级pseudo标签,以正确地模型预text任务和目标任务之间的关系,以及pseudo标签和人 definied标签之间的潜在关系。pseudo标签是基于现有的人definied帧级标签。为了创建更加精确的查询基本视频摘要,我们提议了semantics booster来生成上下文感知查询表现。此外,我们提议了互动注意力,以帮助捕捉视觉和文本Modalities之间的互动信息。三个通用的视频摘要 benchmark是用来充分验证我们的方法。实验结果显示,我们的 виде摘要算法实现了顶尖性能。

Concept2Box: Joint Geometric Embeddings for Learning Two-View Knowledge Graphs

  • paper_url: http://arxiv.org/abs/2307.01933
  • repo_url: None
  • paper_authors: Zijie Huang, Daheng Wang, Binxuan Huang, Chenwei Zhang, Jingbo Shang, Yan Liang, Zhengyang Wang, Xian Li, Christos Faloutsos, Yizhou Sun, Wei Wang
  • for: 本研究旨在嵌入大规模关系数据,以满足现实世界应用。
  • methods: 本方法使用双重几何表示,一是概念盒子表示,另一是实体向量表示。概念盒子表示学习层次结构和复杂关系,盒子体积可以表示概念粒度。实体向量表示学习与概念盒子的关系。
  • results: 实验表明,Concept2Box可以有效地嵌入两种视图的关系数据。
    Abstract Knowledge graph embeddings (KGE) have been extensively studied to embed large-scale relational data for many real-world applications. Existing methods have long ignored the fact many KGs contain two fundamentally different views: high-level ontology-view concepts and fine-grained instance-view entities. They usually embed all nodes as vectors in one latent space. However, a single geometric representation fails to capture the structural differences between two views and lacks probabilistic semantics towards concepts' granularity. We propose Concept2Box, a novel approach that jointly embeds the two views of a KG using dual geometric representations. We model concepts with box embeddings, which learn the hierarchy structure and complex relations such as overlap and disjoint among them. Box volumes can be interpreted as concepts' granularity. Different from concepts, we model entities as vectors. To bridge the gap between concept box embeddings and entity vector embeddings, we propose a novel vector-to-box distance metric and learn both embeddings jointly. Experiments on both the public DBpedia KG and a newly-created industrial KG showed the effectiveness of Concept2Box.
    摘要 知识图embeddings(KGE)已广泛研究以涵盖大规模关系数据的多种实际应用。现有方法长期忽视了许多知识图中的两种基本视图:高级 ontology-view 概念和细化实例-view 实体。它们通常将所有节点embed为一个笛卡尔空间中的向量。然而,单一的几何表示无法捕捉两种视图之间的结构差异,缺乏概率semantics towards 概念的粒度。我们提出了Concept2Box,一种新的方法,它在两种视图中同时使用双几何表示。我们使用盒形 embeddings来模型概念,它可以学习层次结构和复杂关系,如穿梭和分立。盒体体积可以被解释为概念的粒度。与概念不同,我们使用向量来模型实体。为了将盒形 embeddings和向量 embeddings相互关联,我们提出了一种新的向量-盒形距离度量,并同时学习两种embeddings。经过在公共DBpedia知识图和自己创建的工业知识图的实验,Concept2Box的效果得到了证明。

MDI+: A Flexible Random Forest-Based Feature Importance Framework

  • paper_url: http://arxiv.org/abs/2307.01932
  • repo_url: https://github.com/csinva/imodels
  • paper_authors: Abhineet Agarwal, Ana M. Kenney, Yan Shuo Tan, Tiffany M. Tang, Bin Yu
  • for: 本研究旨在提出一种可以扩展MDI(Mean Decrease in Impurity)的特征重要性度量方法,以便在随机森林(Random Forest)模型中更好地评估特征的重要性。
  • methods: 本研究使用了MDI的解释,即每棵树中的特征$X_k$的MDI等于Linear regression中的非正则化$R^2$值。基于这个解释,我们提出了一种更加灵活的特征重要性框架 called MDI+。MDI+可以让分析者根据数据结构选择合适的Generalized Linear Models(GLMs)和度量,并且可以减少树见的偏见。
  • results: 我们通过数据驱动的 simulations 示例了MDI+在标准特征重要性度量方法(如MFE、Recursive Feature Elimination等)的比较,并发现MDI+可以更好地预测标准特征。此外,我们还应用MDI+于两个实际案例(药物响应预测和乳腺癌分型),并发现MDI+可以提取稳定的预测基因。
    Abstract Mean decrease in impurity (MDI) is a popular feature importance measure for random forests (RFs). We show that the MDI for a feature $X_k$ in each tree in an RF is equivalent to the unnormalized $R^2$ value in a linear regression of the response on the collection of decision stumps that split on $X_k$. We use this interpretation to propose a flexible feature importance framework called MDI+. Specifically, MDI+ generalizes MDI by allowing the analyst to replace the linear regression model and $R^2$ metric with regularized generalized linear models (GLMs) and metrics better suited for the given data structure. Moreover, MDI+ incorporates additional features to mitigate known biases of decision trees against additive or smooth models. We further provide guidance on how practitioners can choose an appropriate GLM and metric based upon the Predictability, Computability, Stability framework for veridical data science. Extensive data-inspired simulations show that MDI+ significantly outperforms popular feature importance measures in identifying signal features. We also apply MDI+ to two real-world case studies on drug response prediction and breast cancer subtype classification. We show that MDI+ extracts well-established predictive genes with significantly greater stability compared to existing feature importance measures. All code and models are released in a full-fledged python package on Github.
    摘要 《Mean Decrease in Impurity (MDI) is a popular feature importance measure for random forests (RFs). We show that the MDI for a feature $X_k$ in each tree in an RF is equivalent to the unnormalized $R^2$ value in a linear regression of the response on the collection of decision stumps that split on $X_k$. We use this interpretation to propose a flexible feature importance framework called MDI+. Specifically, MDI+ generalizes MDI by allowing the analyst to replace the linear regression model and $R^2$ metric with regularized generalized linear models (GLMs) and metrics better suited for the given data structure. Moreover, MDI+ incorporates additional features to mitigate known biases of decision trees against additive or smooth models. We further provide guidance on how practitioners can choose an appropriate GLM and metric based upon the Predictability, Computability, Stability framework for veridical data science. Extensive data-inspired simulations show that MDI+ significantly outperforms popular feature importance measures in identifying signal features. We also apply MDI+ to two real-world case studies on drug response prediction and breast cancer subtype classification. We show that MDI+ extracts well-established predictive genes with significantly greater stability compared to existing feature importance measures. All code and models are released in a full-fledged python package on Github.》Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you prefer Traditional Chinese, please let me know.

Learning ECG signal features without backpropagation

  • paper_url: http://arxiv.org/abs/2307.01930
  • repo_url: None
  • paper_authors: Péter Pósfay, Marcell T. Kurbucz, Péter Kovács, Antal Jakovác
  • for: 本研究旨在提出一种新的时间序列数据表示学习方法,以提高下游任务 such as 分类和预测的效果、范围和可应用性。
  • methods: 该方法基于物理学的想法,通过数据驱动的方式构建一个减少的表示,能够捕捉数据的下面结构和任务特定信息,同时仍保持INTUITIVE、可读性和可验证性。
  • results: 在ECG信号分类任务中,该方法实现了state-of-the-art表现。
    Abstract Representation learning has become a crucial area of research in machine learning, as it aims to discover efficient ways of representing raw data with useful features to increase the effectiveness, scope and applicability of downstream tasks such as classification and prediction. In this paper, we propose a novel method to generate representations for time series-type data. This method relies on ideas from theoretical physics to construct a compact representation in a data-driven way, and it can capture both the underlying structure of the data and task-specific information while still remaining intuitive, interpretable and verifiable. This novel methodology aims to identify linear laws that can effectively capture a shared characteristic among samples belonging to a specific class. By subsequently utilizing these laws to generate a classifier-agnostic representation in a forward manner, they become applicable in a generalized setting. We demonstrate the effectiveness of our approach on the task of ECG signal classification, achieving state-of-the-art performance.
    摘要 <>关键词: representational learning,时序数据,简化表示,物理理论,ECG信号分类研究领域:机器学习,数据分析摘要:本文提出了一种新的方法,用于生成时序数据的表示。这种方法基于物理理论的想法,在数据驱动的方式下构建了一个紧凑的表示,可以捕捉数据的内在结构和任务特定的信息,同时仍保持易于理解、可读性和验证性。这种新方法的目标是发现 Shared Characteristic 的 linear law,以便在特定类别中的样本之间建立共同的表示。通过后续使用这些法律生成一种类器无关的表示,它们在通用设定下可以应用。我们在 ECG 信号分类任务中实现了state-of-the-art 的性能。简化中文:研究领域:机器学习,数据分析摘要:本文提出了一种新的方法,用于生成时序数据的表示。这种方法基于物理理论的想法,可以捕捉数据的内在结构和任务特定的信息,同时仍保持易于理解、可读性和验证性。这种新方法的目标是发现 Shared Characteristic 的 linear law,以便在特定类别中的样本之间建立共同的表示。详细描述: Representational learning 已成为机器学习中的一个关键领域,旨在找到有效的表示方式,以提高下游任务的效果、范围和可应用性。本文提出了一种基于物理理论的方法,用于生成时序数据的表示。这种方法可以捕捉数据的内在结构和任务特定的信息,同时仍保持易于理解、可读性和验证性。本文的主要贡献在于提出了一种新的方法,用于生成时序数据的表示。这种方法基于物理理论的想法,可以捕捉数据的内在结构和任务特定的信息,同时仍保持易于理解、可读性和验证性。这种新方法的目标是发现 Shared Characteristic 的 linear law,以便在特定类别中的样本之间建立共同的表示。通过实验,我们在 ECG 信号分类任务中实现了state-of-the-art 的性能。这表明,我们的方法可以有效地捕捉 ECG 信号的特征,并且可以在通用设定下应用。结语:本文提出了一种基于物理理论的方法,用于生成时序数据的表示。这种方法可以捕捉数据的内在结构和任务特定的信息,同时仍保持易于理解、可读性和验证性。这种新方法的目标是发现 Shared Characteristic 的 linear law,以便在特定类别中的样本之间建立共同的表示。我们在 ECG 信号分类任务中实现了state-of-the-art 的性能,这表明我们的方法可以有效地捕捉 ECG 信号的特征。

Natural Language Generation and Understanding of Big Code for AI-Assisted Programming: A Review

  • paper_url: http://arxiv.org/abs/2307.02503
  • repo_url: None
  • paper_authors: Man Fai Wong, Shangxin Guo, Ching Nam Hang, Siu Wai Ho, Chee Wei Tan
  • for: 该论文探讨了自然语言处理(NLP)技术在程式设计任务中的应用,尤其是基于变数大型语言模型(LLMs)在大量代码(Big Code)上的训练。
  • methods: 该论文介绍了主要的LLMs,并探讨了这些模型在程式设计相关任务中的应用,包括程式码生成、程式码完成、程式码翻译、程式码精炼、程式码摘要、错误探测和复制探测。
  • results: 该论文总结了NLP技术在程式设计任务中的应用,包括GitHub Copilot和DeepMind AlphaCode等应用,并探讨了在扩展至Apple Xcode的 mobil 软件开发中的挑战和机遇。
    Abstract This paper provides a comprehensive review of the literature concerning the utilization of Natural Language Processing (NLP) techniques, with a particular focus on transformer-based large language models (LLMs) trained using Big Code, within the domain of AI-assisted programming tasks. LLMs, augmented with software naturalness, have played a crucial role in facilitating AI-assisted programming applications, including code generation, code completion, code translation, code refinement, code summarization, defect detection, and clone detection. Notable examples of such applications include the GitHub Copilot powered by OpenAI's Codex and DeepMind AlphaCode. This paper presents an overview of the major LLMs and their applications in downstream tasks related to AI-assisted programming. Furthermore, it explores the challenges and opportunities associated with incorporating NLP techniques with software naturalness in these applications, with a discussion on extending AI-assisted programming capabilities to Apple's Xcode for mobile software development. This paper also presents the challenges of and opportunities for incorporating NLP techniques with software naturalness, empowering developers with advanced coding assistance and streamlining the software development process.
    摘要 Translation notes:* "Natural Language Processing" (NLP) is translated as "自然语言处理" (Zìrán yǔyán gōngchá)* "Large Language Models" (LLMs) is translated as "大语言模型" (Dà yǔyán módelì)* "Big Code" is translated as "大代码" (Dà biǎo código)* "AI-assisted programming" is translated as "人工智能助成编程" (Réngōng zhìnéng bùxí biānjiāng)* "Software naturalness" is translated as "软件自然性" (Ròngjiàn zìrán xìng)* "GitHub Copilot" is translated as "GitHub 助手" (GitHub zhùshǒu)* "OpenAI's Codex" is translated as "OpenAI 的 Codex" (OpenAI de Codex)* "DeepMind AlphaCode" is translated as "DeepMind 的 AlphaCode" (DeepMind de AlphaCode)* "Xcode" is translated as "Xcode" (Xcode)

Robots That Ask For Help: Uncertainty Alignment for Large Language Model Planners

  • paper_url: http://arxiv.org/abs/2307.01928
  • repo_url: None
  • paper_authors: Allen Z. Ren, Anushri Dixit, Alexandra Bodrova, Sumeet Singh, Stephen Tu, Noah Brown, Peng Xu, Leila Takayama, Fei Xia, Jake Varley, Zhenjia Xu, Dorsa Sadigh, Andy Zeng, Anirudha Majumdar
  • for: 本研究旨在提供一个测量和调整大型自然语言模型(LLM)基于的 планування器的不确定性框架,以便确保这些模型在实际应用中能够有效地完成任务。
  • methods: 本研究基于对称预测理论,使用对称预测来提供任务完成的 statistically guarantee,同时对于复杂多步 планування设定中的不确定性进行调整。
  • results: 实验结果显示, KnowNo 在多种 simulated 和实际 robot 设定中表现出色,可以优化效率和自主性,同时提供正式的保证。 KnowNo 可以与 LLM 直接使用,不需要模型调整,并且建议一种轻量级的不确定性模型,可以与基础模型相容。
    Abstract Large language models (LLMs) exhibit a wide range of promising capabilities -- from step-by-step planning to commonsense reasoning -- that may provide utility for robots, but remain prone to confidently hallucinated predictions. In this work, we present KnowNo, which is a framework for measuring and aligning the uncertainty of LLM-based planners such that they know when they don't know and ask for help when needed. KnowNo builds on the theory of conformal prediction to provide statistical guarantees on task completion while minimizing human help in complex multi-step planning settings. Experiments across a variety of simulated and real robot setups that involve tasks with different modes of ambiguity (e.g., from spatial to numeric uncertainties, from human preferences to Winograd schemas) show that KnowNo performs favorably over modern baselines (which may involve ensembles or extensive prompt tuning) in terms of improving efficiency and autonomy, while providing formal assurances. KnowNo can be used with LLMs out of the box without model-finetuning, and suggests a promising lightweight approach to modeling uncertainty that can complement and scale with the growing capabilities of foundation models. Website: https://robot-help.github.io
    摘要 大型语言模型(LLM)表现出广泛的有前途能力——从步骤规划到通过常识理解——可能提供 robot 中的用途,但仍然容易做出自信的预测。在这个工作中,我们提出了 KnowNo,它是一个测量和调整 LLM 基于的 плаanner 的不确定性框架,以便它们知道自己不确定的时候,并在需要时请求帮助。KnowNo 基于对照预测理论,以提供任务完成的 Statistical 保证,同时最小化人工帮助在复杂多步 плаanning 环境中。实验显示,KnowNo 在各种模拟和实际 robot 环境中表现出色,比modern baseline(可能包括集合或广泛提升)在任务完成和自主性方面表现更好,同时提供了正式保证。KnowNo 可以与 LLM 直接使用,不需要模型调整,并建议一个轻量级的不确定性模型,可以与基础模型的能力相互补充。详情请参考 website。

Stranding Risk for Underactuated Vessels in Complex Ocean Currents: Analysis and Controllers

  • paper_url: http://arxiv.org/abs/2307.01917
  • repo_url: None
  • paper_authors: Andreas Doering, Marius Wiggert, Hanna Krasowski, Manan Doshi, Pierre F. J. Lermusiaux, Claire J. Tomlin
  • For: The paper aims to improve the safety of low-propulsion vessels navigating in high-risk regions by developing a feedback control policy that takes into account the uncertainty of ocean currents.* Methods: The authors use a combination of analytical and numerical methods, including the Hamilton-Jacobi Multi-Time Reachability (HJ-MTR) framework, to synthesize a feedback policy that guarantees safe operation in the presence of forecast errors.* Results: The authors demonstrate the effectiveness of their approach through large-scale simulations, showing that their method significantly improves safety over baseline methods while still achieving timely arrival at the destination.Here’s the same information in Simplified Chinese:* For: 这篇论文旨在提高低推力船舶在高风险区域的安全性,通过开发一种Feedback控制策略,考虑到海流预测错误的不确定性。* Methods: 作者使用了一种组合分析和数值方法,包括Hamilton-Jacobi Multi-Time Reachability(HJ-MTR)框架,Synthesize一种Feedback策略,在预测错误存在的情况下保证安全运行。* Results: 作者通过大规模仿真实验,证明了他们的方法在安全性和快速到达目的地之间具有显著改善,而且仍然能够保证船舶的安全运行。
    Abstract Low-propulsion vessels can take advantage of powerful ocean currents to navigate towards a destination. Recent results demonstrated that vessels can reach their destination with high probability despite forecast errors. However, these results do not consider the critical aspect of safety of such vessels: because of their low propulsion which is much smaller than the magnitude of currents, they might end up in currents that inevitably push them into unsafe areas such as shallow areas, garbage patches, and shipping lanes. In this work, we first investigate the risk of stranding for free-floating vessels in the Northeast Pacific. We find that at least 5.04% would strand within 90 days. Next, we encode the unsafe sets as hard constraints into Hamilton-Jacobi Multi-Time Reachability (HJ-MTR) to synthesize a feedback policy that is equivalent to re-planning at each time step at low computational cost. While applying this policy closed-loop guarantees safe operation when the currents are known, in realistic situations only imperfect forecasts are available. We demonstrate the safety of our approach in such realistic situations empirically with large-scale simulations of a vessel navigating in high-risk regions in the Northeast Pacific. We find that applying our policy closed-loop with daily re-planning on new forecasts can ensure safety with high probability even under forecast errors that exceed the maximal propulsion. Our method significantly improves safety over the baselines and still achieves a timely arrival of the vessel at the destination.
    摘要 低推力船只可以利用强大的海洋流动来导航到目的地。最新的结果表明,船只可以快速到达目的地,即使预测错误也可以达到高probability。然而,这些结果不考虑船只的安全性:由于它们的推力非常小,可能会被强大的流动推向危险区域,如浅水区、垃圾堆和航道。在这项工作中,我们首先调查了偏离船只在北太平洋的风险。我们发现至少5.04%的船只在90天内会偏离。接着,我们将危险集编码为硬制约into Hamilton-Jacobi Multi-Time Reachability(HJ-MTR),以生成一个等价于重新规划的反馈策略。当应用这种策略时,在计算成本低的情况下,可以保证船只的安全运行。然而,在实际情况下,只有不准确的预测available。我们通过大规模的 simulate船只在高风险区域的高效实际情况来证明我们的方法的安全性。我们发现,在每天重新规划新的预测基础上应用我们的策略可以保证船只的安全性,即使预测错误超过最大推力。我们的方法可以大幅提高船只的安全性,同时仍能达到快速到达目的地的目的。

Maximizing Seaweed Growth on Autonomous Farms: A Dynamic Programming Approach for Underactuated Systems Navigating on Uncertain Ocean Currents

  • paper_url: http://arxiv.org/abs/2307.01916
  • repo_url: None
  • paper_authors: Matthias Killer, Marius Wiggert, Hanna Krasowski, Manan Doshi, Pierre F. J. Lermusiaux, Claire J. Tomlin
  • for: 这个论文旨在设计一种控制器,以便在开放海洋中的大规模自主海藻养殖园中 maximizes 海藻生长。
  • methods: 这个论文使用了动态计划法来解决海藻生长控制问题,并提出了三种扩展方法来处理不准确的预测。
  • results: 实验结果表明,使用这种控制器可以实现95.8%的最佳生长水平,只使用5天的预测数据。这表明在实际情况下,使用低功率推进和优化控制可以提高浮动海藻养殖园中的海藻生长。
    Abstract Seaweed biomass offers significant potential for climate mitigation, but large-scale, autonomous open-ocean farms are required to fully exploit it. Such farms typically have low propulsion and are heavily influenced by ocean currents. We want to design a controller that maximizes seaweed growth over months by taking advantage of the non-linear time-varying ocean currents for reaching high-growth regions. The complex dynamics and underactuation make this challenging even when the currents are known. This is even harder when only short-term imperfect forecasts with increasing uncertainty are available. We propose a dynamic programming-based method to efficiently solve for the optimal growth value function when true currents are known. We additionally present three extensions when as in reality only forecasts are known: (1) our methods resulting value function can be used as feedback policy to obtain the growth-optimal control for all states and times, allowing closed-loop control equivalent to re-planning at every time step hence mitigating forecast errors, (2) a feedback policy for long-term optimal growth beyond forecast horizons using seasonal average current data as terminal reward, and (3) a discounted finite-time Dynamic Programming (DP) formulation to account for increasing ocean current estimate uncertainty. We evaluate our approach through 30-day simulations of floating seaweed farms in realistic Pacific Ocean current scenarios. Our method demonstrates an achievement of 95.8% of the best possible growth using only 5-day forecasts. This confirms the feasibility of using low-power propulsion and optimal control for enhanced seaweed growth on floating farms under real-world conditions.
    摘要 海藻质量具有气候缓和潜在,但需要大规模、自主的开放海洋农场来完全利用其潜力。这些农场通常具有低速度和受海洋流动影响,我们想要设计一个控制器,以最大化海藻生长在月分内的时间,通过利用不对称时间变化的海洋流动来到达高生长区域。由于流动的复杂dinamics和对控制的具体影响,这是一个具有挑战性的任务,甚至当真实流动currents是知道的时候。我们提出了一种基于动态计划的方法,以有效地解决最佳生长值函数的问题,当真实流动currents是知道的时候。我们还提出了三个扩展,包括:1. 我们的方法所得到的值函数可以作为反馈政策,以获得生长最佳控制,即使只有5天的预测。2. 一个以季节平均流动为终端赏金的负回授政策,以便在预测时间以外的长期优化生长。3. 一个折衣finite-time动态计划形式,以考虑海洋流动估计uncertainty的增加。我们通过30天的 simulate Pacific Ocean current scenarios中的浮动海藻农场来评估我们的方法。我们的方法能够实现95.8%的最佳生长,仅使用5天的预测。这证明了在现实情况下使用低功率propulsion和优化控制可以实现更好的海藻生长。

ClimateLearn: Benchmarking Machine Learning for Weather and Climate Modeling

  • paper_url: http://arxiv.org/abs/2307.01909
  • repo_url: https://github.com/aditya-grover/climate-learn
  • paper_authors: Tung Nguyen, Jason Jewik, Hritik Bansal, Prakhar Sharma, Aditya Grover
  • for: 该研究旨在提供一个大规模、开源的机器学习库,用于推进气候科学研究。
  • methods: 该库使用PyTorch进行实现,包括数据处理管道(如ERA5、CMIP6、PRISM)、现有的深度学习模型(如Transformers、ResNets)以及评估方法。
  • results: 该研究表明,该库可以帮助解决气候科学中一些核心问题,如天气预测和气候下降。同时,该库还提供了详细的文档、贡献指南和快速入门教程,以扩大访问和推动社区的成长。
    Abstract Modeling weather and climate is an essential endeavor to understand the near- and long-term impacts of climate change, as well as inform technology and policymaking for adaptation and mitigation efforts. In recent years, there has been a surging interest in applying data-driven methods based on machine learning for solving core problems such as weather forecasting and climate downscaling. Despite promising results, much of this progress has been impaired due to the lack of large-scale, open-source efforts for reproducibility, resulting in the use of inconsistent or underspecified datasets, training setups, and evaluations by both domain scientists and artificial intelligence researchers. We introduce ClimateLearn, an open-source PyTorch library that vastly simplifies the training and evaluation of machine learning models for data-driven climate science. ClimateLearn consists of holistic pipelines for dataset processing (e.g., ERA5, CMIP6, PRISM), implementation of state-of-the-art deep learning models (e.g., Transformers, ResNets), and quantitative and qualitative evaluation for standard weather and climate modeling tasks. We supplement these functionalities with extensive documentation, contribution guides, and quickstart tutorials to expand access and promote community growth. We have also performed comprehensive forecasting and downscaling experiments to showcase the capabilities and key features of our library. To our knowledge, ClimateLearn is the first large-scale, open-source effort for bridging research in weather and climate modeling with modern machine learning systems. Our library is available publicly at https://github.com/aditya-grover/climate-learn.
    摘要 Modeling 天气和气候是一项重要的尝试,以便理解气候变化的短期和长期影响,以及为适应和遏制尝试提供技术和政策。在过去几年中,对使用数据驱动方法基于机器学习解决气象预报和气候下降的核心问题有很大的兴趣。尽管有了明显的进步,但这些进步受到了大规模、开源的尝试的缺乏,导致用户们使用不一致或不充分的数据集、训练setup和评估方法,从而限制了模型的可重复性和可靠性。我们介绍了 ClimateLearn,一个基于 PyTorch 的开源库,可以大大简化气象学习模型的训练和评估。ClimateLearn 包括气象数据处理的整体管道(如 ERA5、CMIP6 和 PRISM),状态 искус理智慧模型的实现(如 Transformers 和 ResNets),以及量化和质量评估方法 для标准气象预报和气候模型任务。我们还提供了广泛的文档、贡献指南和快速入门教程,以扩大访问权限并推动社区的增长。我们还进行了全面的预测和下降实验,以示出我们库的能力和关键特点。到我们所知,ClimateLearn 是首个大规模、开源的气象学习模型与现代机器学习系统之间的桥梁。我们的库可以在 中下载。

Math Agents: Computational Infrastructure, Mathematical Embedding, and Genomics

  • paper_url: http://arxiv.org/abs/2307.02502
  • repo_url: None
  • paper_authors: Melanie Swan, Takashi Kido, Eric Roland, Renato P. dos Santos
  • for: 这项研究旨在使用数学代理人和数学嵌入来推动生成AI的进步,以及应用这些技术解决生物信息系统龄化问题。
  • methods: 该项目使用GPT基于的工作流程将文献中的方程转换为LaTeX和Python格式,并利用LLMs作为自然语言界面,提供人机AI对话和大规模AI支持的计算基础设施。
  • results: 该项目通过使用数学代理人和数学嵌入来解决生物信息系统龄化问题,并通过使用生成AI分析长期健康记录,使用SIR精度医疗模型分析 causal 关系。
    Abstract The advancement in generative AI could be boosted with more accessible mathematics. Beyond human-AI chat, large language models (LLMs) are emerging in programming, algorithm discovery, and theorem proving, yet their genomics application is limited. This project introduces Math Agents and mathematical embedding as fresh entries to the "Moore's Law of Mathematics", using a GPT-based workflow to convert equations from literature into LaTeX and Python formats. While many digital equation representations exist, there's a lack of automated large-scale evaluation tools. LLMs are pivotal as linguistic user interfaces, providing natural language access for human-AI chat and formal languages for large-scale AI-assisted computational infrastructure. Given the infinite formal possibility spaces, Math Agents, which interact with math, could potentially shift us from "big data" to "big math". Math, unlike the more flexible natural language, has properties subject to proof, enabling its use beyond traditional applications like high-validation math-certified icons for AI alignment aims. This project aims to use Math Agents and mathematical embeddings to address the ageing issue in information systems biology by applying multiscalar physics mathematics to disease models and genomic data. Generative AI with episodic memory could help analyse causal relations in longitudinal health records, using SIR Precision Health models. Genomic data is suggested for addressing the unsolved Alzheimer's disease problem.
    摘要 “生成AI的进步可能会受到更多的易于访问的数学支持。在大语言模型(LLM)的出现之前,程式语言和数学问题的发现、证明也在发展中。然而,这些领域的应用尚未推广到 genomics 领域。这个项目将引入“数学代理人”和“数学嵌入”作为“摩尔的法则”的新入口,使用基于 GPT 的 workflow 将文献中的公式转换为 LaTeX 和 Python 格式。处理大量数据的数学表现方式已经存在许多,但是没有自动化的大规模评估工具。LLMs 作为人类语言界面,提供了自然语言访问,并且可以帮助大规模的 AI 辅助 computional 基础设施。由于数学有许多固定的性质,可以进行证明,因此它可以在传统应用领域以外进行应用。这个项目将使用数学代理人和数学嵌入来解决生物信息系统中的年龄问题,并且运用多尺度物理数学来应用疾病模型和 genomic 数据。生成 AI WITH episodic 记忆可以分析循环健康记录中的 causal 关系,使用 SIR 精度健康模型。 genomic 数据被建议用于解决阿兹海默症的问题。”

Concept-Based Explanations to Test for False Causal Relationships Learned by Abusive Language Classifiers

  • paper_url: http://arxiv.org/abs/2307.01900
  • repo_url: https://github.com/isarnejad/global-sufficiency
  • paper_authors: Isar Nejadgholi, Svetlana Kiritchenko, Kathleen C. Fraser, Esma Balkır
  • for: The paper is written to address the issue of classifiers learning false causal relationships between concepts and labels, specifically the concept of negative emotions in abusive language.
  • methods: The paper uses three well-known abusive language classifiers trained on large English datasets, and assesses their accuracy on a challenge set across all decision thresholds. Additionally, the paper introduces concept-based explanation metrics to assess the influence of the concept on the labels.
  • results: The paper finds that the classifiers have learned unwanted dependencies on the concept of negative emotions, and that these dependencies can compromise classification accuracy. The paper also introduces concept-based explanation metrics to compare classifiers regarding the degree of false global sufficiency they have learned between a concept and a label.
    Abstract Classifiers tend to learn a false causal relationship between an over-represented concept and a label, which can result in over-reliance on the concept and compromised classification accuracy. It is imperative to have methods in place that can compare different models and identify over-reliances on specific concepts. We consider three well-known abusive language classifiers trained on large English datasets and focus on the concept of negative emotions, which is an important signal but should not be learned as a sufficient feature for the label of abuse. Motivated by the definition of global sufficiency, we first examine the unwanted dependencies learned by the classifiers by assessing their accuracy on a challenge set across all decision thresholds. Further, recognizing that a challenge set might not always be available, we introduce concept-based explanation metrics to assess the influence of the concept on the labels. These explanations allow us to compare classifiers regarding the degree of false global sufficiency they have learned between a concept and a label.
    摘要 классификаторы有很大可能学习一种假的 causal 关系 между一个过度表达的概念和一个标签,这可能导致依赖于该概念并降低分类精度。因此需要有方法来比较不同的模型并识别特定概念上的过度依赖。我们使用三种广泛使用的负面语言分类器,并将关注情绪的概念,这是一个重要的信号,但不应该被视为 sufficient feature для标签。 motivated by the definition of global sufficiency,我们首先查看分类器学习的不必要依赖关系,并评估它们在所有决策阈值下的准确率。此外,recognizing 挑战集可能不总是可用,我们引入基于概念的解释指标,以评估概念对标签的影响。这些解释allow us to compare 分类器,以评估它们学习的假的 global sufficiency 程度。

KDSTM: Neural Semi-supervised Topic Modeling with Knowledge Distillation

  • paper_url: http://arxiv.org/abs/2307.01878
  • repo_url: None
  • paper_authors: Weijie Xu, Xiaoyu Jiang, Jay Desai, Bin Han, Fuqin Yan, Francis Iannacci
  • for: 文本分类任务中,精细调整预训练语言模型如BERT和GPT-3可以获得竞争性的准确率,但这两种方法均需要大量文本数据预训练。
  • methods: 我们开发了知识填充 semi-supervised topic modeling(KDSTM)方法,不需要预训练 embedding,只需几个标注文档和训练效率高。
  • results: 在多个 dataset 上,我们的方法比现有的超vised topic modeling方法在分类准确率、稳定性和效率方面表现出色,并且与弱监督文本分类方法相当。
    Abstract In text classification tasks, fine tuning pretrained language models like BERT and GPT-3 yields competitive accuracy; however, both methods require pretraining on large text datasets. In contrast, general topic modeling methods possess the advantage of analyzing documents to extract meaningful patterns of words without the need of pretraining. To leverage topic modeling's unsupervised insights extraction on text classification tasks, we develop the Knowledge Distillation Semi-supervised Topic Modeling (KDSTM). KDSTM requires no pretrained embeddings, few labeled documents and is efficient to train, making it ideal under resource constrained settings. Across a variety of datasets, our method outperforms existing supervised topic modeling methods in classification accuracy, robustness and efficiency and achieves similar performance compare to state of the art weakly supervised text classification methods.
    摘要 在文本分类任务中,使用预训练的自然语言模型如BERT和GPT-3可以获得竞争力的准确率,但两者都需要预训练大量文本数据。然而,通用主题模型方法具有分析文档以提取有意义的词语模式的优势,无需预训练。为了利用主题模型方法在文本分类任务中的无监督抽象知识提取,我们开发了知识填充半supervised主题模型(KDSTM)。KDSTM不需预训练 embedding,只需几个标注文档,训练效率高,适用于资源受限的情况。在多个数据集上,我们的方法比现有的监督主题模型方法在分类精度、稳定性和效率方面表现出色,并与州度静的弱监督文本分类方法实现相似的性能。

Exploring Non-Verbal Predicates in Semantic Role Labeling: Challenges and Opportunities

  • paper_url: http://arxiv.org/abs/2307.01870
  • repo_url: None
  • paper_authors: Riccardo Orlando, Simone Conia, Roberto Navigli
  • for: 本文主要研究 semantic role labeling(SRL)领域中的非verb predicate,即名词和形容词作为 predicate 的情况。
  • methods: 作者提出了一个新的 PropBank dataset,以推动 SRL 领域的发展,并通过实验表明了现有的标准 benchmark 不能准确反映 SRL 的当前情况,以及现有系统无法在不同 predicate 类型之间传输知识。
  • results: 作者发现了现有 benchmark 不能准确反映 SRL 的当前情况,以及现有系统无法在不同 predicate 类型之间传输知识。此外,作者还提出了一个新的 challenge set,以探讨不同语言资源是否可以促进知识传输。
    Abstract Although we have witnessed impressive progress in Semantic Role Labeling (SRL), most of the research in the area is carried out assuming that the majority of predicates are verbs. Conversely, predicates can also be expressed using other parts of speech, e.g., nouns and adjectives. However, non-verbal predicates appear in the benchmarks we commonly use to measure progress in SRL less frequently than in some real-world settings -- newspaper headlines, dialogues, and tweets, among others. In this paper, we put forward a new PropBank dataset which boasts wide coverage of multiple predicate types. Thanks to it, we demonstrate empirically that standard benchmarks do not provide an accurate picture of the current situation in SRL and that state-of-the-art systems are still incapable of transferring knowledge across different predicate types. Having observed these issues, we also present a novel, manually-annotated challenge set designed to give equal importance to verbal, nominal, and adjectival predicate-argument structures. We use such dataset to investigate whether we can leverage different linguistic resources to promote knowledge transfer. In conclusion, we claim that SRL is far from "solved", and its integration with other semantic tasks might enable significant improvements in the future, especially for the long tail of non-verbal predicates, thereby facilitating further research on SRL for non-verbal predicates.
    摘要 尽管我们已经目睹了Semantic Role Labeling(SRL)的印象人的进步,但大多数研究在这个领域都是基于假设大多数 predicate 是动词的。然而, predicate 可以通过其他部件表达,如名称和形容词。然而,非语言 predicate 在我们通用的测试数据集中出现得更少, чем在一些实际场景中,如新闻标题、对话和微博。在这篇论文中,我们提出了一个新的 PropBank 数据集,它具有广泛的多 predicate 类型覆盖率。因此,我们可以证明现有的标准测试数据集不能准确反映当前情况,并且现有的状态顶尖系统无法在不同 predicate 类型之间传输知识。经过观察这些问题,我们还提出了一个新的、手动标注的挑战集,用于让 verbal、名称和形容词 predicate-argument 结构具有同样的重要性。我们使用这个数据集来调查是否可以通过不同的语言资源来促进知识传输。在结论中,我们宣称 SRL 还没有"解决",并且将它与其他 semantics 任务结合可能会导致未来的重要进步,特别是对非语言 predicate 的长尾,从而促进 SRL 的进一步研究。

Self-Consuming Generative Models Go MAD

  • paper_url: http://arxiv.org/abs/2307.01850
  • repo_url: None
  • paper_authors: Sina Alemohammad, Josue Casco-Rodriguez, Lorenzo Luzi, Ahmed Imtiaz Humayun, Hossein Babaei, Daniel LeJeune, Ali Siahkoohi, Richard G. Baraniuk
  • for: This paper aims to investigate the properties of autophagous loops in generative AI algorithms and their impact on the quality and diversity of future generative models.
  • methods: The authors use state-of-the-art generative image models of three families to conduct an analytical and empirical analysis of autophagous loops with varying levels of fresh real data availability and bias.
  • results: The primary conclusion is that without enough fresh real data in each generation of an autophagous loop, future generative models are likely to experience a decline in quality (precision) or diversity (recall), which the authors term “Model Autophagy Disorder” (MAD).
    Abstract Seismic advances in generative AI algorithms for imagery, text, and other data types has led to the temptation to use synthetic data to train next-generation models. Repeating this process creates an autophagous (self-consuming) loop whose properties are poorly understood. We conduct a thorough analytical and empirical analysis using state-of-the-art generative image models of three families of autophagous loops that differ in how fixed or fresh real training data is available through the generations of training and in whether the samples from previous generation models have been biased to trade off data quality versus diversity. Our primary conclusion across all scenarios is that without enough fresh real data in each generation of an autophagous loop, future generative models are doomed to have their quality (precision) or diversity (recall) progressively decrease. We term this condition Model Autophagy Disorder (MAD), making analogy to mad cow disease.
    摘要 自适应式AI算法的发展在图像、文本和其他数据类型上,导致使用生成数据训练下一代模型的吸引力。重复这个过程会形成一个自食い(自危)循环,其性质未得好解。我们通过使用当今最佳的生成图像模型进行全面的分析和实验研究,对三种不同的自食い循环进行了分析,它们是根据训练数据的可用性和前一代模型的样本偏好是否受到影响。我们的主要结论是,如果每代训练数据不够新鲜,那么未来的生成模型就会无法维持其精度(精度)或多样性(重复率)。我们将这种情况称为“模型自食い病”(MAD),与疯牛病相似。

Embodied Task Planning with Large Language Models

  • paper_url: http://arxiv.org/abs/2307.01848
  • repo_url: https://github.com/Gary3410/TaPA
  • paper_authors: Zhenyu Wu, Ziwei Wang, Xiuwei Xu, Jiwen Lu, Haibin Yan
  • for: 本研究旨在将智能代理人赋予常识,以便在实际环境中完成复杂的人类指令。
  • methods: 本研究使用了大型自然语言模型(LLM),并与物理场景约束相结合,实现实际世界中的行动规划。
  • results: 实验结果显示,本研究所提出的Task Planing Agent(TaPA)框架可以在实际环境中高度成功完成复杂的任务,并且比LLaVA和GPT-3.5更高的成功率。
    Abstract Equipping embodied agents with commonsense is important for robots to successfully complete complex human instructions in general environments. Recent large language models (LLM) can embed rich semantic knowledge for agents in plan generation of complex tasks, while they lack the information about the realistic world and usually yield infeasible action sequences. In this paper, we propose a TAsk Planing Agent (TaPA) in embodied tasks for grounded planning with physical scene constraint, where the agent generates executable plans according to the existed objects in the scene by aligning LLMs with the visual perception models. Specifically, we first construct a multimodal dataset containing triplets of indoor scenes, instructions and action plans, where we provide the designed prompts and the list of existing objects in the scene for GPT-3.5 to generate a large number of instructions and corresponding planned actions. The generated data is leveraged for grounded plan tuning of pre-trained LLMs. During inference, we discover the objects in the scene by extending open-vocabulary object detectors to multi-view RGB images collected in different achievable locations. Experimental results show that the generated plan from our TaPA framework can achieve higher success rate than LLaVA and GPT-3.5 by a sizable margin, which indicates the practicality of embodied task planning in general and complex environments.
    摘要 equiping 智能 agents with common sense 是为 robot 成功完成复杂的人类指令的关键,这些指令可能存在一般环境中。 latest large language models (LLM) 可以嵌入智能 agent 的具有 semantic knowledge ,但它们缺乏实际世界的信息,通常会生成不可行的动作序列。在这篇论文中,我们提出了一种 Task Planing Agent (TaPA) 在embodied task 中进行基于物理场景的固有计划,agent 会根据场景中存在的物品生成可执行的计划。specifically,我们首先构建了一个多Modal dataset,包括室内场景、指令和动作计划的 triplets,我们为 GPT-3.5 提供了设计的提示和场景中存在的物品列表,以便它在 generate 大量的指令和相应的计划动作。生成的数据被用于地面的 LLVM 训练。在推理时,我们使用多视图RGB图像来扩展开放词汇物体检测器,以确定场景中的物品。实验结果表明,我们的 TaPA 框架生成的计划可以在 LLLaVA 和 GPT-3.5 的基础上取得更高的成功率,这表明了embodied task 的偏置计划在一般和复杂的环境中的实用性。

DiT-3D: Exploring Plain Diffusion Transformers for 3D Shape Generation

  • paper_url: http://arxiv.org/abs/2307.01831
  • repo_url: https://github.com/DiT-3D/DiT-3D
  • paper_authors: Shentong Mo, Enze Xie, Ruihang Chu, Lewei Yao, Lanqing Hong, Matthias Nießner, Zhenguo Li
  • for: 本研究旨在探讨Diffusion Transformer(DiT)是否能够有效地生成3D形状,以前的3D扩散方法主要采用U-Net架构。
  • methods: 我们提出了一种新的Diffusion Transformer для3D形状生成,即DiT-3D,可以直接进行点云的净化过程使用普通的Transformers。DiT-3D采用了Diffusion Transformer的设计哲学,并将3D位置嵌入和补丁嵌入进行适应性地归一化输入点云。为了降低3D形状生成中自我注意的计算成本,我们在Transformer块中添加了3D窗口注意。最后,我们使用线性和净化层预测净化后的点云。
  • results: 我们的transformer架构可以高效地进行2D到3D的 fine-tuning,使得预测DiT-3D的点云预测器可以在ShapeNet上达到状态之最的性能。实验结果表明,我们提出的DiT-3D可以在高精度和多样化的3D点云生成中占据领先地位,特别是在Chamfer Distance上减少了状态之最方法的1-最近邻接精度值4.59,并将覆盖率提高3.51。
    Abstract Recent Diffusion Transformers (e.g., DiT) have demonstrated their powerful effectiveness in generating high-quality 2D images. However, it is still being determined whether the Transformer architecture performs equally well in 3D shape generation, as previous 3D diffusion methods mostly adopted the U-Net architecture. To bridge this gap, we propose a novel Diffusion Transformer for 3D shape generation, namely DiT-3D, which can directly operate the denoising process on voxelized point clouds using plain Transformers. Compared to existing U-Net approaches, our DiT-3D is more scalable in model size and produces much higher quality generations. Specifically, the DiT-3D adopts the design philosophy of DiT but modifies it by incorporating 3D positional and patch embeddings to adaptively aggregate input from voxelized point clouds. To reduce the computational cost of self-attention in 3D shape generation, we incorporate 3D window attention into Transformer blocks, as the increased 3D token length resulting from the additional dimension of voxels can lead to high computation. Finally, linear and devoxelization layers are used to predict the denoised point clouds. In addition, our transformer architecture supports efficient fine-tuning from 2D to 3D, where the pre-trained DiT-2D checkpoint on ImageNet can significantly improve DiT-3D on ShapeNet. Experimental results on the ShapeNet dataset demonstrate that the proposed DiT-3D achieves state-of-the-art performance in high-fidelity and diverse 3D point cloud generation. In particular, our DiT-3D decreases the 1-Nearest Neighbor Accuracy of the state-of-the-art method by 4.59 and increases the Coverage metric by 3.51 when evaluated on Chamfer Distance.
    摘要 现代扩散变换器(例如DiT)已经证明了它们在生成高质量2D图像方面的强大效果。然而,是否使用Transformer架构在3D形状生成方面表现良好,目前仍在决定中。为了bridging这个差距,我们提出了一种新的扩散变换器 для3D形状生成,即DiT-3D,它可以直接将噪声级别提取到点云中的简单变换器中进行操作。与现有的U-Net方法相比,我们的DiT-3D更加可扩展,生成质量更高。具体来说,DiT-3D采用了Diffusion Transformer的设计哲学,并在点云中添加3D位置嵌入和小块嵌入,以适应输入的自适应归一化。为了降低3D形状生成中自我注意力的计算成本,我们在Transformer块中添加了3D窗口注意力。最后,我们使用线性层和去voxel化层预测噪声级别提取结果。此外,我们的transformer架构支持高效的精度调整,从2D到3D的精度调整,可以使用预训练的DiT-2D Checkpoint在ImageNet上进行高效地调整DiT-3D在ShapeNet上的性能。实验结果表明,我们提出的DiT-3D可以在ShapeNet dataset上实现状态的艺术性和多样性3D点云生成的最佳性能。具体来说,我们的DiT-3D在1-Nearest Neighbor精度上降低了状态艺术性方法的精度值4.59,并在ChamferDistance上提高了覆盖率3.51。

Strictly Low Rank Constraint Optimization – An Asymptotically $\mathcal{O}(\frac{1}{t^2})$ Method

  • paper_url: http://arxiv.org/abs/2307.14344
  • repo_url: None
  • paper_authors: Mengyuan Zhang, Kai Liu
  • for: 提高鲁棒性和稀疏性
  • methods: 使用 proximal gradient descent 方法和支持集投影操作
  • results: 实现 $O(\frac{1}{t^2})$ 的收敛率,且支持集在每次更新中 monotonically shrinking,这是 momentum-based 算法中的一个新特点。
    Abstract We study a class of non-convex and non-smooth problems with \textit{rank} regularization to promote sparsity in optimal solution. We propose to apply the proximal gradient descent method to solve the problem and accelerate the process with a novel support set projection operation on the singular values of the intermediate update. We show that our algorithms are able to achieve a convergence rate of $O(\frac{1}{t^2})$, which is exactly same as Nesterov's optimal convergence rate for first-order methods on smooth and convex problems. Strict sparsity can be expected and the support set of singular values during each update is monotonically shrinking, which to our best knowledge, is novel in momentum-based algorithms.
    摘要 我们研究一类非 convex 和非光滑的问题,使用 \textit{rank} regularization 来提高优解的稀疏性。我们提议使用 proximal 梯度下降法来解决问题,并使用一种新的支持集投影操作来加速过程。我们表明我们的算法可以实现 $O(\frac{1}{t^2})$ 的收敛率,与内斯特洛夫的优化收敛率相同,这是在光滑和 convex 问题上的首个案例。在每次更新中,支持集的单值是 monotonically shrinking,这在旋转-based 算法中是新的。Here's the word-for-word translation of the text into Simplified Chinese:我们研究一类非 convex 和非光滑的问题,使用 \textit{rank} regularization 来提高优解的稀疏性。我们提议使用 proximal 梯度下降法来解决问题,并使用一种新的支持集投影操作来加速过程。我们表明我们的算法可以实现 $O(\frac{1}{t^2})$ 的收敛率,与内斯特洛夫的优化收敛率相同,这是在光滑和 convex 问题上的首个案例。在每次更新中,支持集的单值是 monotonically shrinking,这在旋转-based 算法中是新的。

Human Trajectory Forecasting with Explainable Behavioral Uncertainty

  • paper_url: http://arxiv.org/abs/2307.01817
  • repo_url: None
  • paper_authors: Jiangbei Yue, Dinesh Manocha, He Wang
  • for: 这个论文的目的是提出一种新的人员运动预测方法,用于解释和预测人类行为,以便应用于社交机器人和自动驾驶车等领域。
  • methods: 这个方法基于 bayesian neural networks (BNNs) 和 Stochastic Differential Equation (SDE) 模型,具有strong explainability和高度预测精度。
  • results: 对于11种现有方法,这个方法可以达到50%的提升预测精度,并且在不同的环境和人群密度下能够更好地泛化。此外,这个方法还可以提供有信心的预测结果,以更好地解释人类行为的原因。
    Abstract Human trajectory forecasting helps to understand and predict human behaviors, enabling applications from social robots to self-driving cars, and therefore has been heavily investigated. Most existing methods can be divided into model-free and model-based methods. Model-free methods offer superior prediction accuracy but lack explainability, while model-based methods provide explainability but cannot predict well. Combining both methodologies, we propose a new Bayesian Neural Stochastic Differential Equation model BNSP-SFM, where a behavior SDE model is combined with Bayesian neural networks (BNNs). While the NNs provide superior predictive power, the SDE offers strong explainability with quantifiable uncertainty in behavior and observation. We show that BNSP-SFM achieves up to a 50% improvement in prediction accuracy, compared with 11 state-of-the-art methods. BNSP-SFM also generalizes better to drastically different scenes with different environments and crowd densities (~ 20 times higher than the testing data). Finally, BNSP-SFM can provide predictions with confidence to better explain potential causes of behaviors. The code will be released upon acceptance.
    摘要 人类轨迹预测可以理解和预测人类行为,因此得到了广泛的研究。现有的方法主要可以分为无模型和具有模型的方法。无模型方法具有更高的预测精度,但缺乏可解性,而具有模型的方法提供了可解性,但预测不准确。我们提出了一种新的 Bayesian Neural Stochastic Differential Equation 模型(BNSP-SFM),其中将行为 SDE 模型与 Bayesian neural networks(BNNs)结合在一起。而 BNNs 提供了更高的预测力,SDE 则提供了强大的解释性,同时可以量化行为和观察的uncertainty。我们表明,BNSP-SFM 可以与 11 种现有方法进行比较,并达到50%的提升预测精度。此外,BNSP-SFM 还可以更好地适应不同的环境和人群密度(大约 20 倍于测试数据)。最后,BNSP-SFM 可以提供预测结果的信任度,以更好地解释行为的可能性。代码将在接受后发布。

DeepFlorist: Rethinking Deep Neural Networks and Ensemble Learning as A Meta-Classifier For Object Classification

  • paper_url: http://arxiv.org/abs/2307.01806
  • repo_url: None
  • paper_authors: Afshin Khadangi
  • for: 本文提出了一种新的学习模式,即“深度 florist”,用于使用集成学习为 flower 类别的高精度和可靠性。
  • methods: 该提案的网络架构结合了深度学习和集成方法,通过 dense convolutional neural networks (DCNNs) 和 convolutional neural networks (CNNs) 提取高级特征,并将其分类使用全连接层。
  • results: 实验结果表明,深度 florist 在标准花卉数据集上表现出色,与当前最佳方法进行比较,具有较高的准确率和可靠性。
    Abstract In this paper, we propose a novel learning paradigm called "DeepFlorist" for flower classification using ensemble learning as a meta-classifier. DeepFlorist combines the power of deep learning with the robustness of ensemble methods to achieve accurate and reliable flower classification results. The proposed network architecture leverages a combination of dense convolutional and convolutional neural networks (DCNNs and CNNs) to extract high-level features from flower images, followed by a fully connected layer for classification. To enhance the performance and generalization of DeepFlorist, an ensemble learning approach is employed, incorporating multiple diverse models to improve the classification accuracy. Experimental results on benchmark flower datasets demonstrate the effectiveness of DeepFlorist, outperforming state-of-the-art methods in terms of accuracy and robustness. The proposed framework holds significant potential for automated flower recognition systems in real-world applications, enabling advancements in plant taxonomy, conservation efforts, and ecological studies.
    摘要 在这篇论文中,我们提出了一种新的学习方法,即“深度 florist”,用于使用集成学习作为元类фикатор进行花类分类。深度 florist 结合了深度学习的力量和集成方法的稳定性,以实现高度准确和可靠的花类分类结果。提议的网络架构利用了 dense convolutional neural networks (DCNNs) 和 convolutional neural networks (CNNs) 来提取花图像中的高级特征,然后进行分类。为了提高深度 florist 的性能和泛化能力,我们采用了集成学习方法,并将多种多样的模型集成到一起,以提高分类精度。实验结果表明,深度 florist 在标准花图像集上表现出色,与当前最佳方法相比,具有更高的准确率和更好的泛化能力。该提案的框架在实际应用中的自动花识别系统中具有重要的潜在价值,可以推动植物分类、保护和生态学研究等领域的进步。