cs.LG - 2023-07-02

Protecting the Future: Neonatal Seizure Detection with Spatial-Temporal Modeling

  • paper_url: http://arxiv.org/abs/2307.05382
  • repo_url: None
  • paper_authors: Ziyue Li, Yuchen Fang, You Li, Kan Ren, Yansen Wang, Xufang Luo, Juanyong Duan, Congrui Huang, Dongsheng Li, Lili Qiu
  • for: 旨在提供一种自动化新生儿癫痫识别方法,以替代人工监测。
  • methods: 使用深度学习框架STATENet,特点包括temporal、 spatial和model层次的精细设计,以便更好地适应新生儿癫痫特点。
  • results: 实验结果表明,我们的框架在大规模实际新生儿EEG数据集上表现出了显著更高的癫痫识别精度。
    Abstract A timely detection of seizures for newborn infants with electroencephalogram (EEG) has been a common yet life-saving practice in the Neonatal Intensive Care Unit (NICU). However, it requires great human efforts for real-time monitoring, which calls for automated solutions to neonatal seizure detection. Moreover, the current automated methods focusing on adult epilepsy monitoring often fail due to (i) dynamic seizure onset location in human brains; (ii) different montages on neonates and (iii) huge distribution shift among different subjects. In this paper, we propose a deep learning framework, namely STATENet, to address the exclusive challenges with exquisite designs at the temporal, spatial and model levels. The experiments over the real-world large-scale neonatal EEG dataset illustrate that our framework achieves significantly better seizure detection performance.
    摘要 新生儿电enzephalogram(EEG)检测是医疗单元(NICU)中一种常见 yet life-saving的做法。然而,这需要大量的人工劳动进行实时监测,因此需要自动化新生儿癫痫检测的解决方案。然而,现有的自动化方法通常因(i)脑动脉癫痫开始的地方不断变化;(ii)新生儿和成人的montage不同;以及(iii)不同个体之间的分布差异而失败。本文提出一种深度学习框架,即状态网络(STATENet),以解决这些独特的挑战。实验表明,我们的框架在大规模实际新生儿EEG数据集上显著提高了癫痫检测性能。

IoT-Based Air Quality Monitoring System with Machine Learning for Accurate and Real-time Data Analysis

  • paper_url: http://arxiv.org/abs/2307.00580
  • repo_url: https://github.com/Hemanth-Karnati-HK/AQMS-ML
  • paper_authors: Hemanth Karnati
  • for: addresses the issue of air pollution awareness in urban areas
  • methods: uses two sensors (MQ135 and MQ3) to detect harmful gases and measure air quality in PPM, and employs machine learning analysis on the collected data
  • results: provides real-time data specific to the user’s location, and visualizes the data using a cloud-based web app called ThinkSpeak
    Abstract Air pollution in urban areas has severe consequences for both human health and the environment, predominantly caused by exhaust emissions from vehicles. To address the issue of air pollution awareness, Air Pollution Monitoring systems are used to measure the concentration of gases like CO2, smoke, alcohol, benzene, and NH3 present in the air. However, current mobile applications are unable to provide users with real-time data specific to their location. In this paper, we propose the development of a portable air quality detection device that can be used anywhere. The data collected will be stored and visualized using the cloud-based web app ThinkSpeak. The device utilizes two sensors, MQ135 and MQ3, to detect harmful gases and measure air quality in parts per million (PPM). Additionally, machine learning analysis will be employed on the collected data.
    摘要 城市空气污染造成人体健康和环境问题严重,主要由交通工具排放的废气引起。为解决空气污染意识问题,空气污染监测系统用于测量空气中的气体 like CO2、烟雾、酒精、苯并丙烯的浓度。但现有 mobil 应用程序无法为用户提供实时特定位置的数据。在这篇论文中,我们提议开发一种可携带的空气质量检测设备,可以在任何地方使用。收集的数据将被存储并视觉化使用云端的网站 ThinkSpeak。 该设备使用 MQ135 和 MQ3 两种传感器检测危险气体,并测量空气质量为每万分之一(PPM)。此外,我们还将机器学习分析集成到收集的数据中。

Mode-wise Principal Subspace Pursuit and Matrix Spiked Covariance Model

  • paper_url: http://arxiv.org/abs/2307.00575
  • repo_url: None
  • paper_authors: Runshi Tang, Ming Yuan, Anru R. Zhang
  • for: 这个论文提出了一种新的框架,叫做模式精炼主要特征追踪(MOP-UP),用于抽取矩阵数据中隐藏的变化。
  • methods: 该algorithm consists of two steps: Average Subspace Capture (ASC) and Alternating Projection (AP). These steps are specifically designed to capture the row-wise and column-wise dimension-reduced subspaces which contain the most informative features of the data.
  • results: The proposed framework is demonstrated to be effective through experiments on both simulated and real datasets, and the authors also discuss generalizations of their approach to higher-order data.
    Abstract This paper introduces a novel framework called Mode-wise Principal Subspace Pursuit (MOP-UP) to extract hidden variations in both the row and column dimensions for matrix data. To enhance the understanding of the framework, we introduce a class of matrix-variate spiked covariance models that serve as inspiration for the development of the MOP-UP algorithm. The MOP-UP algorithm consists of two steps: Average Subspace Capture (ASC) and Alternating Projection (AP). These steps are specifically designed to capture the row-wise and column-wise dimension-reduced subspaces which contain the most informative features of the data. ASC utilizes a novel average projection operator as initialization and achieves exact recovery in the noiseless setting. We analyze the convergence and non-asymptotic error bounds of MOP-UP, introducing a blockwise matrix eigenvalue perturbation bound that proves the desired bound, where classic perturbation bounds fail. The effectiveness and practical merits of the proposed framework are demonstrated through experiments on both simulated and real datasets. Lastly, we discuss generalizations of our approach to higher-order data.
    摘要 MOP-UP consists of two steps: Average Subspace Capture (ASC) and Alternating Projection (AP). These steps are designed to capture the row-wise and column-wise dimension-reduced subspaces that contain the most informative features of the data. ASC uses a novel average projection operator as initialization and can achieve exact recovery in the noiseless setting.We analyze the convergence and non-asymptotic error bounds of MOP-UP, using a blockwise matrix eigenvalue perturbation bound that proves the desired bound, where classic perturbation bounds fail. The effectiveness and practical merits of the proposed framework are demonstrated through experiments on both simulated and real datasets. Finally, we discuss generalizations of our approach to higher-order data.

Conditionally Invariant Representation Learning for Disentangling Cellular Heterogeneity

  • paper_url: http://arxiv.org/abs/2307.00558
  • repo_url: None
  • paper_authors: Hananeh Aliee, Ferdinand Kapl, Soroor Hediyeh-Zadeh, Fabian J. Theis
  • for: 这 paper 的目的是学习受到不想要的变化的表示,通过强制独立性来消除噪声,并建立可解释的模型。
  • methods: 该方法利用域变化来学习受到不想要的变化的表示,并在满足条件下强制独立性,以便构建可解释的模型。
  • results: 该方法可以在大规模的单元细胞 genomics 数据中实现数据集 Integration,并提供更深刻的理解单元细胞多样性和疾病细胞状态的能力。
    Abstract This paper presents a novel approach that leverages domain variability to learn representations that are conditionally invariant to unwanted variability or distractors. Our approach identifies both spurious and invariant latent features necessary for achieving accurate reconstruction by placing distinct conditional priors on latent features. The invariant signals are disentangled from noise by enforcing independence which facilitates the construction of an interpretable model with a causal semantic. By exploiting the interplay between data domains and labels, our method simultaneously identifies invariant features and builds invariant predictors. We apply our method to grand biological challenges, such as data integration in single-cell genomics with the aim of capturing biological variations across datasets with many samples, obtained from different conditions or multiple laboratories. Our approach allows for the incorporation of specific biological mechanisms, including gene programs, disease states, or treatment conditions into the data integration process, bridging the gap between the theoretical assumptions and real biological applications. Specifically, the proposed approach helps to disentangle biological signals from data biases that are unrelated to the target task or the causal explanation of interest. Through extensive benchmarking using large-scale human hematopoiesis and human lung cancer data, we validate the superiority of our approach over existing methods and demonstrate that it can empower deeper insights into cellular heterogeneity and the identification of disease cell states.
    摘要

Partial-label Learning with Mixed Closed-set and Open-set Out-of-candidate Examples

  • paper_url: http://arxiv.org/abs/2307.00553
  • repo_url: None
  • paper_authors: Shuo He, Lei Feng, Guowu Yang
  • for: 本研究旨在解决部分标签学习中的一种限制性假设问题,即训练示例的真实标签必须在候选标签集中。
  • methods: 本研究使用了两种类型的假设例子,即闭集/开集假设例子,其中true标签在知道标签空间内/外。为解决这个新的部分标签学习问题,我们首先计算了木头克朗径分布损失函数,并根据特制的标签分类器来动态地区分两种假设例子。对于闭集假设例子,我们进行了反向标签混淆处理;对于开集假设例子,我们利用了一种有效的规范策略,在候选标签集中随机分配了假设标签。这样,两种假设例子都可以得到分化并利用于模型训练。
  • results: 我们的提议方法与现有的部分标签学习方法进行比较,实验结果显示,我们的方法在训练集中表现更好。
    Abstract Partial-label learning (PLL) relies on a key assumption that the true label of each training example must be in the candidate label set. This restrictive assumption may be violated in complex real-world scenarios, and thus the true label of some collected examples could be unexpectedly outside the assigned candidate label set. In this paper, we term the examples whose true label is outside the candidate label set OOC (out-of-candidate) examples, and pioneer a new PLL study to learn with OOC examples. We consider two types of OOC examples in reality, i.e., the closed-set/open-set OOC examples whose true label is inside/outside the known label space. To solve this new PLL problem, we first calculate the wooden cross-entropy loss from candidate and non-candidate labels respectively, and dynamically differentiate the two types of OOC examples based on specially designed criteria. Then, for closed-set OOC examples, we conduct reversed label disambiguation in the non-candidate label set; for open-set OOC examples, we leverage them for training by utilizing an effective regularization strategy that dynamically assigns random candidate labels from the candidate label set. In this way, the two types of OOC examples can be differentiated and further leveraged for model training. Extensive experiments demonstrate that our proposed method outperforms state-of-the-art PLL methods.
    摘要 假设学习(Partial-label learning,PLL)基于一个关键假设,即每个训练示例的真实标签必须在候选标签集中。然而,在复杂的实际场景中,这个假设可能会被违反,有些收集的示例的真实标签可能会不期望地外部候选标签集。在这篇论文中,我们称这些示例为“外部候选标签示例”(OOC,out-of-candidate),并开拓了一种新的PLL研究,以学习与OOC示例。我们在实际中考虑了两种类型的OOC示例,即关闭集/开放集OOC示例,其中true标签在知道的标签空间内/外。为解决这个新的PLL问题,我们首先计算了木材cross-entropy损失从候选标签和非候选标签分别来,然后通过特殊设计的标准来动态分类OOC示例。对关闭集OOC示例,我们进行了反向标签混淆在非候选标签集中;对开放集OOC示例,我们利用它们进行训练,通过使用一种有效的补偿策略,动态将候选标签集中的随机标签分配给OOC示例。这种方法可以区分和利用两种类型的OOC示例进行训练。我们的提议方法在现有的PLL方法之上具有优异性。

Adaptive reinforcement learning of multi-agent ethically-aligned behaviours: the QSOM and QDSOM algorithms

  • paper_url: http://arxiv.org/abs/2307.00552
  • repo_url: None
  • paper_authors: Rémy Chaput, Olivier Boissier, Mathieu Guillermin
  • for: 这篇论文主要是为了解决人工智能系统的伦理考虑问题,特别是随着时间的推移,我们社会的价值观念不断发展,这使得现有的AI系统困难以适应。
  • methods: 该论文提出了两种算法,即QSOM和QDSOM,它们可以适应环境和奖励函数的变化,以实现AI系统的伦理考虑。它们使用了知名的Q表格和动态自组织地图来处理连续和多维状态和动作空间。
  • results: 作者在一个小型智能网格中进行了多代理能量分配的用例,并证明了QSOM和QDSOM算法的适应性和比基eline Reinforcement Learning算法的高性能。
    Abstract The numerous deployed Artificial Intelligence systems need to be aligned with our ethical considerations. However, such ethical considerations might change as time passes: our society is not fixed, and our social mores evolve. This makes it difficult for these AI systems; in the Machine Ethics field especially, it has remained an under-studied challenge. In this paper, we present two algorithms, named QSOM and QDSOM, which are able to adapt to changes in the environment, and especially in the reward function, which represents the ethical considerations that we want these systems to be aligned with. They associate the well-known Q-Table to (Dynamic) Self-Organizing Maps to handle the continuous and multi-dimensional state and action spaces. We evaluate them on a use-case of multi-agent energy repartition within a small Smart Grid neighborhood, and prove their ability to adapt, and their higher performance compared to baseline Reinforcement Learning algorithms.
    摘要 各种已经部署的人工智能系统需要与我们的道德考虑相协调。然而,这些道德考虑可能随着时间的推移而发生变化:我们的社会不固定,我们的社会习俗也在不断发展。这会让这些 AI 系统受到挑战:在机器伦理学领域,这是一个未得到充分研究的挑战。在这篇论文中,我们提出了两种算法,即 QSOM 和 QDSOM,它们可以适应环境的变化,特别是奖励函数的变化,这些奖励函数表达我们想要这些系统遵循的道德考虑。它们将知名的 Q-表与(动态)自组织地图相结合,以处理连续和多维状态和动作空间。我们在一个小型智能网格社区中进行了多机器人能源分配的使用案例研究,并证明它们的适应性和与基准 Reinforcement Learning 算法相比的高性能。

Is Risk-Sensitive Reinforcement Learning Properly Resolved?

  • paper_url: http://arxiv.org/abs/2307.00547
  • repo_url: None
  • paper_authors: Ruiwen Zhou, Minghuan Liu, Kan Ren, Xufang Luo, Weinan Zhang, Dongsheng Li
  • for: 本研究旨在解决分布式强化学习中的风险敏感问题,提出了一种新的算法,即轨迹Q学习(TQL),以便在风险敏感目标下优化做法。
  • methods: 本研究使用了分布式强化学习框架,学习风险敏感目标函数,并提出了一种新的学习算法,即TQL,可以在不同的风险度量下学习不同的风险敏感策略。
  • results: 实验结果表明,TQL算法可以有效地实现风险敏感目标,并且可以在不同的风险度量下实现更好的性能。
    Abstract Due to the nature of risk management in learning applicable policies, risk-sensitive reinforcement learning (RSRL) has been realized as an important direction. RSRL is usually achieved by learning risk-sensitive objectives characterized by various risk measures, under the framework of distributional reinforcement learning. However, it remains unclear if the distributional Bellman operator properly optimizes the RSRL objective in the sense of risk measures. In this paper, we prove that the existing RSRL methods do not achieve unbiased optimization and can not guarantee optimality or even improvements regarding risk measures over accumulated return distributions. To remedy this issue, we further propose a novel algorithm, namely Trajectory Q-Learning (TQL), for RSRL problems with provable convergence to the optimal policy. Based on our new learning architecture, we are free to introduce a general and practical implementation for different risk measures to learn disparate risk-sensitive policies. In the experiments, we verify the learnability of our algorithm and show how our method effectively achieves better performances toward risk-sensitive objectives.
    摘要 (Simplified Chinese translation)由于学习适用政策的风险管理性质,风险敏感回归学习(RSRL)已成为重要的方向。RSRL通常通过学习不同的风险度量来定义风险敏感目标,在分布式回归学习框架下进行学习。然而,是否存在分布式贝尔曼算子可以正确优化RSRL目标还存在unclear。在这篇论文中,我们证明现有的RSRL方法不能够实现不偏优化和 garantía优化或even improvements regarding风险度量 надaccumulated return分布。为了解决这个问题,我们进一步提出了一种新的算法,即轨迹Q学习(TQL),用于RSRL问题,并证明其可提供可靠的优化方案。基于我们的新学习架构,我们可以自由地引入不同的风险度量来学习不同的风险敏感政策。在实验中,我们验证了我们的算法的学习可能性并显示了我们的方法可以更好地实现风险敏感目标。

Defending Against Malicious Behaviors in Federated Learning with Blockchain

  • paper_url: http://arxiv.org/abs/2307.00543
  • repo_url: None
  • paper_authors: Nanqing Dong, Zhipeng Wang, Jiahao Sun, Michael Kampffmeyer, Yizhe Wen, Shuoying Zhang, William Knottenbelt, Eric Xing
  • for: 提高 federated learning 系统的安全性和可靠性,使其更易承受多个机构数据拥有者(客户)之间的合作训练。
  • methods: 基于区块链和分布式笔记技术,提出一种安全可靠的 federated learning 系统,包括对接触 peer-to-peer 投票机制和奖励折损机制,以检测和抵制恶意客户端行为。
  • results: 通过理论和实验分析,证明提出的方法可以具有高效性和可靠性,并能够抵制恶意客户端的行为。
    Abstract In the era of deep learning, federated learning (FL) presents a promising approach that allows multi-institutional data owners, or clients, to collaboratively train machine learning models without compromising data privacy. However, most existing FL approaches rely on a centralized server for global model aggregation, leading to a single point of failure. This makes the system vulnerable to malicious attacks when dealing with dishonest clients. In this work, we address this problem by proposing a secure and reliable FL system based on blockchain and distributed ledger technology. Our system incorporates a peer-to-peer voting mechanism and a reward-and-slash mechanism, which are powered by on-chain smart contracts, to detect and deter malicious behaviors. Both theoretical and empirical analyses are presented to demonstrate the effectiveness of the proposed approach, showing that our framework is robust against malicious client-side behaviors.
    摘要 在深度学习时代,联邦学习(FL)表现出了一种有前途的方法,允许多个机构数据所有者或客户端共同训练机器学习模型,无需违反数据隐私。然而,大多数现有FL方法仍然依赖中央服务器进行全球模型集成,导致单点失败的风险。这使得系统易受到不良客户端行为的攻击。在这种情况下,我们解决这个问题,提出一个安全可靠的FL系统,基于区块链和分布式记录技术。我们的系统包括一个点对点投票机制和一个奖励惩罚机制,这些机制由onto-chain智能合约动力,用于检测和抵制不良客户端行为。我们提供了理论和实验分析,证明我们的框架对客户端不良行为具有耐诡性。

Collaborative Policy Learning for Dynamic Scheduling Tasks in Cloud-Edge-Terminal IoT Networks Using Federated Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2307.00541
  • repo_url: None
  • paper_authors: Do-Yup Kim, Da-Eun Lee, Ji-Wan Kim, Hyun-Suk Lee
  • for: 这篇论文研究了云端-边缘-终端互联网络,其中边缘设备进行了一系列通常的动态调度任务。
  • methods: 该论文提出了一种基于联邦 reinforcement learning 的共同策略学习框架,用于动态调度任务。该框架可以在云服务器上协同学习每个任务的中央策略,并且可以通过Edge在每个任务上学习本地策略,以避免Edge需要从头开始学习策略。
  • results: 通过实验,论文表明,相比没有共同策略学习的方法,该框架能够加速策略学习速度,并且帮助新到达的Edge更容易适应其任务。
    Abstract In this paper, we examine cloud-edge-terminal IoT networks, where edges undertake a range of typical dynamic scheduling tasks. In these IoT networks, a central policy for each task can be constructed at a cloud server. The central policy can be then used by the edges conducting the task, thereby mitigating the need for them to learn their own policy from scratch. Furthermore, this central policy can be collaboratively learned at the cloud server by aggregating local experiences from the edges, thanks to the hierarchical architecture of the IoT networks. To this end, we propose a novel collaborative policy learning framework for dynamic scheduling tasks using federated reinforcement learning. For effective learning, our framework adaptively selects the tasks for collaborative learning in each round, taking into account the need for fairness among tasks. In addition, as a key enabler of the framework, we propose an edge-agnostic policy structure that enables the aggregation of local policies from different edges. We then provide the convergence analysis of the framework. Through simulations, we demonstrate that our proposed framework significantly outperforms the approaches without collaborative policy learning. Notably, it accelerates the learning speed of the policies and allows newly arrived edges to adapt to their tasks more easily.
    摘要 在这篇论文中,我们研究了云端-边缘-终端互联网络,其中边缘进行了一系列典型的动态调度任务。在这些互联网络中,可以在云服务器上构建一个中央政策 для每个任务。这个中央政策可以由边缘进行调度任务使用,从而减少边缘需要从零开始学习自己的策略。此外,这个中央政策可以在云服务器上协同学习,通过累累的结构来汇集不同边缘的地方经验。为了实现这一目标,我们提出了一种基于联邦束力学学习的共同策略学习框架。为了有效学习,我们的框架在每次轮次中适时选择需要协同学习的任务,考虑到任务之间的公平性。此外,我们还提出了一种不受边缘影响的策略结构,以便将不同边缘的本地策略集成。然后,我们提供了框架的收敛分析。通过实验,我们证明了我们提出的框架可以快速学习策略,并且新到达的边缘可以更容易适应自己的任务。

Shared Growth of Graph Neural Networks via Free-direction Knowledge Distillation

  • paper_url: http://arxiv.org/abs/2307.00534
  • repo_url: None
  • paper_authors: Kaituo Feng, Yikun Miao, Changsheng Li, Ye Yuan, Guoren Wang
  • for: 提高Graph Neural Networks(GNNs)的性能,避免过参数和过拟合问题。
  • methods: 提出了首个基于强化学习的Free-direction Knowledge Distillation框架(FreeKD),不再需要提供一个很深和优化的教师GNN。我们的核心想法是通过层次的强化学习来让两个相对较浅的GNN Collaboratively学习,以便在层次上交换知识。我们观察到,一个典型的GNN模型在训练中常常在不同的节点上展现出不同的性能,所以我们设计了一种动态和自由方向的知识传递策略,其中包括两个层次的操作:1)节点级别的操作确定了两个网络中相应节点之间的知识传递方向; 2)结构级别的操作确定了哪些本地结构由节点级别的操作生成的知识传递。
  • results: 对五个标准 benchmark dataset进行了广泛的实验,并显示了我们的方法可以大幅提高基GNN的性能,并且可以适应不同的GNN模型。最后,我们还提出了FreeKD++,可以在多视图输入下实现自由方向知识传递。
    Abstract Knowledge distillation (KD) has shown to be effective to boost the performance of graph neural networks (GNNs), where the typical objective is to distill knowledge from a deeper teacher GNN into a shallower student GNN. However, it is often quite challenging to train a satisfactory deeper GNN due to the well-known over-parametrized and over-smoothing issues, leading to invalid knowledge transfer in practical applications. In this paper, we propose the first Free-direction Knowledge Distillation framework via reinforcement learning for GNNs, called FreeKD, which is no longer required to provide a deeper well-optimized teacher GNN. Our core idea is to collaboratively learn two shallower GNNs in an effort to exchange knowledge between them via reinforcement learning in a hierarchical way. As we observe that one typical GNN model often exhibits better and worse performances at different nodes during training, we devise a dynamic and free-direction knowledge transfer strategy that involves two levels of actions: 1) node-level action determines the directions of knowledge transfer between the corresponding nodes of two networks; and then 2) structure-level action determines which of the local structures generated by the node-level actions to be propagated. Furthermore, considering the diverse knowledge present in different GNNs when dealing with multi-view inputs, we introduce FreeKD++ as a solution to enable free-direction knowledge transfer among multiple shallow GNNs operating on multi-view inputs. Extensive experiments on five benchmark datasets demonstrate our approaches outperform the base GNNs in a large margin, and shows their efficacy to various GNNs. More surprisingly, our FreeKD has comparable or even better performance than traditional KD algorithms that distill knowledge from a deeper and stronger teacher GNN.
    摘要 知识塑化(KD)已经证明可以提高图内 нейрон网络(GNN)的性能,通常的目标是将深度更大的教师GNN中的知识透传到更浅的学生GNN中。然而,在实际应用中,往往很难训练一个满意的深度更大的GNN,因为图内神经网络容易受到过参数和过缓态问题的影响,从而导致无效的知识传递。在这篇论文中,我们提出了首个基于奖励学习的自由方向知识塑化框架 для GNN,称为FreeKD。我们的核心想法是通过奖励学习的方式,将两个更浅的GNN合作学习,以便在其中交换知识。我们发现,一个典型的GNN模型在训练中经常在不同的节点表现出不同的性能,因此我们设计了一种动态和自由方向的知识传递策略,其中包括两个层次的动作:1)节点级别的动作确定了两个网络中对应节点之间的知识传递方向; 然后2)结构级别的动作确定了哪些本地结构,由节点级别的动作生成的。此外,当处理多视图输入时,我们引入FreeKD++,以允许多个浅GNN之间自由地进行知识传递。我们的方法在五个基准数据集上进行了广泛的实验,并证明了我们的方法在基GNN上大幅提高性能,并且可以适应不同的GNN模型。更重要的是,我们的FreeKD与传统的KD算法相比,在提取深度更大的教师GNN中的知识时,表现相对或甚至更好。

New intelligent defense systems to reduce the risks of Selfish Mining and Double-Spending attacks using Learning Automata

  • paper_url: http://arxiv.org/abs/2307.00529
  • repo_url: None
  • paper_authors: Seyed Ardalan Ghoreishi, Mohammad Reza Meybodi
  • for: 这 paper addresses the double-spending and selfish mining attacks challenges in blockchain-based digital currencies.
  • methods: 该 paper 提出了一种新的攻击方法, combinining double-spending 和 selfish mining attacks,并提出了一种基于机器学习的解决方案。 Specifically, 使用 learning automaton 来开发两种模型, namely SDTLA 和 WVBM,可以有效地防止自ish mining attacks。
  • results: 实验结果表明, SDTLA 方法可以提高自ish mining 的利润阈值达到 47%,而 WVBM 方法在许多情况下几乎达到理想情况,即每个矿工的收益与其分配的 hash 处理能力成正比。 此外,两种方法都可以有效地减少 double-spending 的风险。
    Abstract In this paper, we address the critical challenges of double-spending and selfish mining attacks in blockchain-based digital currencies. Double-spending is a problem where the same tender is spent multiple times during a digital currency transaction, while selfish mining is an intentional alteration of a blockchain to increase rewards to one miner or a group of miners. We introduce a new attack that combines both these attacks and propose a machine learning-based solution to mitigate the risks associated with them. Specifically, we use the learning automaton, a powerful online learning method, to develop two models, namely the SDTLA and WVBM, which can effectively defend against selfish mining attacks. Our experimental results show that the SDTLA method increases the profitability threshold of selfish mining up to 47$\%$, while the WVBM method performs even better and is very close to the ideal situation where each miner's revenue is proportional to their shared hash processing power. Additionally, we demonstrate that both methods can effectively reduce the risks of double-spending by tuning the $Z$ Parameter. Our findings highlight the potential of SDTLA and WVBM as promising solutions for enhancing the security and efficiency of blockchain networks.
    摘要 在本文中,我们讨论了区块链基于货币的双重支付和自私采矿攻击的关键挑战。双重支付是一个情况下,同一笔货币在数字货币交易中被使用多次,而自私采矿是故意修改区块链,以增加采矿奖励的情况。我们介绍了一种新的攻击,该攻击组合了这两种攻击,并提出了基于机器学习的解决方案。 Specifically,我们使用学习自动机,一种强大的在线学习方法,开发了两种模型, namely SDTLA和WVBM,以有效防止自私采矿攻击。我们的实验结果表明,SDTLA方法可以提高自私采矿的利润阈值,最高达47%;WVBM方法表现更好,与每个矿工的分配 hash 处理能力相似。此外,我们证明了这两种方法可以有效降低双重支付的风险,通过调整 $Z$ 参数。我们的发现表明 SDTLA 和 WVBM 是加强区块链网络安全性和效率的有力解决方案。

Graph Neural Network based Log Anomaly Detection and Explanation

  • paper_url: http://arxiv.org/abs/2307.00527
  • repo_url: None
  • paper_authors: Zhong Li, Jiayang Shi, Matthijs van Leeuwen
  • for: 该研究旨在提高高科技系统监控中的日志异常检测精度,使用图形模型来检测日志异常。
  • methods: 该方法首先将日志事件转换为有Attributes、指向和权重的图形模型,然后使用图形神经网络进行图形级别异常检测。
  • results: 实验结果显示,Logs2Graphs在五个基准数据集上表现至少与现状异常检测方法相当,而在复杂数据集上大量超过现状异常检测方法的性能。
    Abstract Event logs are widely used to record the status of high-tech systems, making log anomaly detection important for monitoring those systems. Most existing log anomaly detection methods take a log event count matrix or log event sequences as input, exploiting quantitative and/or sequential relationships between log events to detect anomalies. Unfortunately, only considering quantitative or sequential relationships may result in many false positives and/or false negatives. To alleviate this problem, we propose a graph-based method for unsupervised log anomaly detection, dubbed Logs2Graphs, which first converts event logs into attributed, directed, and weighted graphs, and then leverages graph neural networks to perform graph-level anomaly detection. Specifically, we introduce One-Class Digraph Inception Convolutional Networks, abbreviated as OCDiGCN, a novel graph neural network model for detecting graph-level anomalies in a collection of attributed, directed, and weighted graphs. By coupling the graph representation and anomaly detection steps, OCDiGCN can learn a representation that is especially suited for anomaly detection, resulting in a high detection accuracy. Importantly, for each identified anomaly, we additionally provide a small subset of nodes that play a crucial role in OCDiGCN's prediction as explanations, which can offer valuable cues for subsequent root cause diagnosis. Experiments on five benchmark datasets show that Logs2Graphs performs at least on par state-of-the-art log anomaly detection methods on simple datasets while largely outperforming state-of-the-art log anomaly detection methods on complicated datasets.
    摘要 <>translate the following text into Simplified Chinese<>高科技系统的事件日志广泛应用,因此对事件日志异常检测非常重要。现有的大多数异常检测方法都是根据事件日志的数量或Sequential关系进行检测,但是只考虑数量或Sequential关系可能会导致许多假阳性和/或假阴性。为了解决这个问题,我们提出了一种基于图的方法,名为Logs2Graphs,它首先将事件日志转换为有 Attribute、指向和权重的图,然后利用图神经网络进行图级异常检测。我们引入了一种novel的图神经网络模型,称为One-Class Digraph Inception Convolutional Networks(简称OCDiGCN),可以在一个集合 Attribute、指向和权重的图中检测图级异常。通过将图表示和异常检测步骤结合起来,OCDiGCN可以学习一种特别适合异常检测的表示,从而实现高的检测精度。此外,对每个异常检测结果,我们还提供了一小 subsets of nodes,这些 nodes 在 OCDiGCN 的预测中发挥了关键作用,这些 nodes 可以提供有价值的诊断灵感。在五个 benchmark 数据集上进行了实验,Logs2Graphs 与 state-of-the-art 异常检测方法在简单的数据集上表现至少与state-of-the-art,而在复杂的数据集上则广泛超越 state-of-the-art。

TensorGPT: Efficient Compression of the Embedding Layer in LLMs based on the Tensor-Train Decomposition

  • paper_url: http://arxiv.org/abs/2307.00526
  • repo_url: None
  • paper_authors: Mingxue Xu, Yao Lei Xu, Danilo P. Mandic
  • For: 提高 Large Language Models (LLMs) 的性能和压缩硬件需求* Methods: 使用 Tensor-Train Decomposition (TTD) 将每个token embedding treated as a Matrix Product State (MPS),可以有效地在分布式环境中计算* Results: 对 GPT-2 进行实验,可以压缩 embedding layer 38.40 倍,并且当压缩因子为 3.31 倍时,甚至超过原始 GPT-2 模型的性能。
    Abstract High-dimensional token embeddings underpin Large Language Models (LLMs), as they can capture subtle semantic information and significantly enhance the modelling of complex language patterns. However, the associated high dimensionality also introduces considerable model parameters, and a prohibitively high model storage. To address this issue, this work proposes an approach based on the Tensor-Train Decomposition (TTD), where each token embedding is treated as a Matrix Product State (MPS) that can be efficiently computed in a distributed manner. The experimental results on GPT-2 demonstrate that, through our approach, the embedding layer can be compressed by a factor of up to 38.40 times, and when the compression factor is 3.31 times, even produced a better performance than the original GPT-2 model.
    摘要 高维度Token embedding在大型语言模型(LLM)中扮演着关键角色,它们可以捕捉语言表达中的细微含义并明显提高复杂语言模式的模型化。然而,相关的高维度也导致了较大的模型参数和庞大的模型存储。为解决这问题,本研究提出了基于张量积 Train Decomposition(TTD)的方法,其中每个Token embedding被视为一个Matrix Product State(MPS),可以有效地在分布式环境中计算。实验结果表明,通过我们的方法,Embedding层可以被压缩38.40倍,并且当压缩因子为3.31倍时,甚至超越了原始GPT-2模型的性能。

LEDITS: Real Image Editing with DDPM Inversion and Semantic Guidance

  • paper_url: http://arxiv.org/abs/2307.00522
  • repo_url: https://github.com/adham-elarabawy/ledits
  • paper_authors: Linoy Tsaban, Apolinário Passos
  • for: 本文旨在提出一种轻量级的真实图像编辑方法,以便使用文本来修改图像,而不需要更改模型结构。
  • methods: 本方法利用 Edit Friendly DDPM 倒推技术和 semantic guidance,将图像倒推到 DDPM 模型的预训练空间中,然后通过文本提示进行修改。
  • results: 本方法可以实现高质量的图像修改,包括细微和广泛的修改、组合和风格的变化,而无需进行优化或扩展模型结构。
    Abstract Recent large-scale text-guided diffusion models provide powerful image-generation capabilities. Currently, a significant effort is given to enable the modification of these images using text only as means to offer intuitive and versatile editing. However, editing proves to be difficult for these generative models due to the inherent nature of editing techniques, which involves preserving certain content from the original image. Conversely, in text-based models, even minor modifications to the text prompt frequently result in an entirely distinct result, making attaining one-shot generation that accurately corresponds to the users intent exceedingly challenging. In addition, to edit a real image using these state-of-the-art tools, one must first invert the image into the pre-trained models domain - adding another factor affecting the edit quality, as well as latency. In this exploratory report, we propose LEDITS - a combined lightweight approach for real-image editing, incorporating the Edit Friendly DDPM inversion technique with Semantic Guidance, thus extending Semantic Guidance to real image editing, while harnessing the editing capabilities of DDPM inversion as well. This approach achieves versatile edits, both subtle and extensive as well as alterations in composition and style, while requiring no optimization nor extensions to the architecture.
    摘要 现代大规模文本导向扩散模型提供了强大的图像生成能力。当前,许多努力是为了通过文本来修改这些图像,以便提供直观和灵活的编辑。然而,编辑 proves to be difficult for these generative models due to the inherent nature of editing techniques, which involves preserving certain content from the original image。 conversely, in text-based models, even minor modifications to the text prompt frequently result in an entirely distinct result, making one-shot generation that accurately corresponds to the user's intent exceedingly challenging。 In addition, to edit a real image using these state-of-the-art tools, one must first invert the image into the pre-trained models domain - adding another factor affecting the edit quality, as well as latency。在这份探索报告中,我们提出了 LEDITS - 一种轻量级的方法 для真实图像编辑,将 Edit Friendly DDPM 抽象技术与semantic guidance相结合,从而将semantic guidance扩展到真实图像编辑,同时利用 DDPM 抽象技术的编辑能力。这种方法可以实现多样化的修改,包括细微和广泛的修改,以及修改图像的结构和风格。此外,这种方法不需要扩展 nor optimization to the architecture。

DSTCGCN: Learning Dynamic Spatial-Temporal Cross Dependencies for Traffic Forecasting

  • paper_url: http://arxiv.org/abs/2307.00518
  • repo_url: https://github.com/water-wbq/dstcgcn
  • paper_authors: Binqing Wu, Ling Chen
  • for: 本研究旨在提出一种能够同时学习空间和时间关系的动态空间temporal crossing graph convolutional neural network (DSTCGCN),以便进行智能交通系统中的交通预测。
  • methods: 该模型使用了快速傅立叶变换 (FFT) 基于的注意 selector,选择时间序列中相关的时间步骤,并将其作为动态cross graph构建模块进行学习。
  • results: 对于六个真实世界数据集,DSTCGCN实验结果表明,该模型在交通预测任务中达到了领先的性能。
    Abstract Traffic forecasting is essential to intelligent transportation systems, which is challenging due to the complicated spatial and temporal dependencies within a road network. Existing works usually learn spatial and temporal dependencies separately, ignoring the dependencies crossing spatial and temporal dimensions. In this paper, we propose DSTCGCN, a dynamic spatial-temporal cross graph convolution network to learn dynamic spatial and temporal dependencies jointly via graphs for traffic forecasting. Specifically, we introduce a fast Fourier transform (FFT) based attentive selector to choose relevant time steps for each time step based on time-varying traffic data. Given the selected time steps, we introduce a dynamic cross graph construction module, consisting of the spatial graph construction, temporal connection graph construction, and fusion modules, to learn dynamic spatial-temporal cross dependencies without pre-defined priors. Extensive experiments on six real-world datasets demonstrate that DSTCGCN achieves the state-of-the-art performance.
    摘要 traffic 预测是智能交通系统的重要组成部分,但是它具有较复杂的空间和时间依赖关系,使得现有的方法通常只是分别学习空间和时间依赖关系,忽略了空间和时间维度之间的依赖关系。在这篇论文中,我们提出了DSTCGCN,一种能够同时学习空间和时间依赖关系的动态空间-时间跨Graph卷积网络。具体来说,我们引入了基于快速傅立叶变换(FFT)的注意力选择器,可以根据时间变化的交通数据选择相关的时间步骤。给定选择的时间步骤,我们引入了动态跨 Graph 建构模块,包括空间 Graph 建构模块、时间连接 Graph 建构模块和融合模块,以无预先假设的方式学习动态空间-时间跨依赖关系。我们在六个真实世界数据集上进行了广泛的实验,结果显示DSTCGCN可以达到领先的性能。

SUGAR: Spherical Ultrafast Graph Attention Framework for Cortical Surface Registration

  • paper_url: http://arxiv.org/abs/2307.00511
  • repo_url: None
  • paper_authors: Jianxun Ren, Ning An, Youjia Zhang, Danyang Wang, Zhenyu Sun, Cong Lin, Weigang Cui, Weiwei Wang, Ying Zhou, Wei Zhang, Qingyu Hu, Ping Zhang, Dan Hu, Danhong Wang, Hesheng Liu
    for:SUGAR is designed to improve cortical surface registration, which is crucial for aligning cortical functional and anatomical features across individuals.methods:SUGAR is a deep-learning framework that incorporates a U-Net-based spherical graph attention network and leverages the Euler angle representation for deformation. It also includes a similarity loss, fold loss, and multiple distortion losses to preserve topology and minimize distortions.results:SUGAR achieves high registration performance and accelerated processing times, making it a promising solution for large-scale neuroimaging studies. It exhibits comparable or superior registration performance in accuracy, distortion, and test-retest reliability compared to conventional and learning-based methods, and processes data much faster than conventional methods.
    Abstract Cortical surface registration plays a crucial role in aligning cortical functional and anatomical features across individuals. However, conventional registration algorithms are computationally inefficient. Recently, learning-based registration algorithms have emerged as a promising solution, significantly improving processing efficiency. Nonetheless, there remains a gap in the development of a learning-based method that exceeds the state-of-the-art conventional methods simultaneously in computational efficiency, registration accuracy, and distortion control, despite the theoretically greater representational capabilities of deep learning approaches. To address the challenge, we present SUGAR, a unified unsupervised deep-learning framework for both rigid and non-rigid registration. SUGAR incorporates a U-Net-based spherical graph attention network and leverages the Euler angle representation for deformation. In addition to the similarity loss, we introduce fold and multiple distortion losses, to preserve topology and minimize various types of distortions. Furthermore, we propose a data augmentation strategy specifically tailored for spherical surface registration, enhancing the registration performance. Through extensive evaluation involving over 10,000 scans from 7 diverse datasets, we showed that our framework exhibits comparable or superior registration performance in accuracy, distortion, and test-retest reliability compared to conventional and learning-based methods. Additionally, SUGAR achieves remarkable sub-second processing times, offering a notable speed-up of approximately 12,000 times in registering 9,000 subjects from the UK Biobank dataset in just 32 minutes. This combination of high registration performance and accelerated processing time may greatly benefit large-scale neuroimaging studies.
    摘要 cortical surface registration 在调整 cortical functional 和 anatomical features 之间扮演着关键角色。然而,传统的 registration algorithm computationally inefficient。 recent years, learning-based registration algorithm emerged as a promising solution,significantly improving processing efficiency。然而,there remains a gap in the development of a learning-based method that exceeds the state-of-the-art conventional methods simultaneously in computational efficiency, registration accuracy, and distortion control,despite the theoretically greater representational capabilities of deep learning approaches。to address the challenge, we present SUGAR, a unified unsupervised deep-learning framework for both rigid and non-rigid registration。SUGAR incorporates a U-Net-based spherical graph attention network and leverages the Euler angle representation for deformation。 In addition to the similarity loss, we introduce fold and multiple distortion losses, to preserve topology and minimize various types of distortions。furthermore, we propose a data augmentation strategy specifically tailored for spherical surface registration, enhancing the registration performance。through extensive evaluation involving over 10,000 scans from 7 diverse datasets, we showed that our framework exhibits comparable or superior registration performance in accuracy, distortion, and test-retest reliability compared to conventional and learning-based methods。 additionally, SUGAR achieves remarkable sub-second processing times, offering a notable speed-up of approximately 12,000 times in registering 9,000 subjects from the UK Biobank dataset in just 32 minutes。this combination of high registration performance and accelerated processing time may greatly benefit large-scale neuroimaging studies。

HeGeL: A Novel Dataset for Geo-Location from Hebrew Text

  • paper_url: http://arxiv.org/abs/2307.00509
  • repo_url: https://github.com/onlplab/hegel
  • paper_authors: Tzuf Paz-Argaman, Tal Bauman, Itai Mondshine, Itzhak Omer, Sagi Dalyot, Reut Tsarfaty
  • for: 这篇论文的目的是提出一个新的地理位置检索方法,以解决文本地理位置检索问题。
  • methods: 论文使用了人工勘探方法,收集了5649个直接描述希伯来地点的 literal Hebrew 描述。
  • results: 研究发现,这些描述中有许多具有地ospatial reasoning的语言表达,需要一种新的环境表示方式。
    Abstract The task of textual geolocation - retrieving the coordinates of a place based on a free-form language description - calls for not only grounding but also natural language understanding and geospatial reasoning. Even though there are quite a few datasets in English used for geolocation, they are currently based on open-source data (Wikipedia and Twitter), where the location of the described place is mostly implicit, such that the location retrieval resolution is limited. Furthermore, there are no datasets available for addressing the problem of textual geolocation in morphologically rich and resource-poor languages, such as Hebrew. In this paper, we present the Hebrew Geo-Location (HeGeL) corpus, designed to collect literal place descriptions and analyze lingual geospatial reasoning. We crowdsourced 5,649 literal Hebrew place descriptions of various place types in three cities in Israel. Qualitative and empirical analysis show that the data exhibits abundant use of geospatial reasoning and requires a novel environmental representation.
    摘要 文本地理位置 Retrieving the coordinates of a place based on a natural language description requires not only grounding but also natural language understanding and geospatial reasoning. Although there are several datasets in English for geolocation, they are based on open-source data (Wikipedia and Twitter), where the location of the described place is mostly implicit, resulting in limited location retrieval resolution. Moreover, there are no datasets available for addressing the problem of textual geolocation in morphologically rich and resource-poor languages, such as Hebrew. In this paper, we present the Hebrew Geo-Location (HeGeL) corpus, designed to collect literal place descriptions and analyze lingual geospatial reasoning. We crowdsourced 5,649 literal Hebrew place descriptions of various place types in three cities in Israel. Qualitative and empirical analysis show that the data exhibits abundant use of geospatial reasoning and requires a novel environmental representation.Here's the translation in Traditional Chinese:文本地理位置 Retrieving the coordinates of a place based on a natural language description requires not only grounding but also natural language understanding and geospatial reasoning. Although there are several datasets in English for geolocation, they are based on open-source data (Wikipedia and Twitter), where the location of the described place is mostly implicit, resulting in limited location retrieval resolution. Moreover, there are no datasets available for addressing the problem of textual geolocation in morphologically rich and resource-poor languages, such as Hebrew. In this paper, we present the Hebrew Geo-Location (HeGeL) corpus, designed to collect literal place descriptions and analyze lingual geospatial reasoning. We crowdsourced 5,649 literal Hebrew place descriptions of various place types in three cities in Israel. Qualitative and empirical analysis show that the data exhibits abundant use of geospatial reasoning and requires a novel environmental representation.

Cloud Ensemble Learning for Fault Diagnosis of Rolling Bearings with Stochastic Configuration Networks

  • paper_url: http://arxiv.org/abs/2307.00507
  • repo_url: None
  • paper_authors: Wei Dai, Jiang Liu, Lanhao Wang
  • for: rolling bearing fault diagnosis in few shot scenarios
  • methods: stochastic configuration network (SCN) based cloud ensemble learning
  • results: accurate fault diagnosis with few training samples
    Abstract Fault diagnosis of rolling bearings is of great significance for post-maintenance in rotating machinery, but it is a challenging work to diagnose faults efficiently with a few samples. Additionally, faults commonly occur with randomness and fuzziness due to the complexity of the external environment and the structure of rolling bearings, hindering effective mining of fault characteristics and eventually restricting accuracy of fault diagnosis. To overcome these problems, stochastic configuration network (SCN) based cloud ensemble learning, called SCN-CEL, is developed in this work. Concretely, a cloud feature extraction method is first developed by using a backward cloud generator of normal cloud model to mine the uncertainty of fault information. Then, a cloud sampling method, which generates enough cloud droplets using bidirectional cloud generator, is proposed to extend the cloud feature samples. Finally, an ensemble model with SCNs is developed to comprehensively characterize the uncertainty of fault information and advance the generalization performance of fault diagnosis machine. Experimental results demonstrate that the proposed method indeed performs favorably for distinguishing fault categories of rolling bearings in the few shot scenarios.
    摘要 FAULT诊断rolling bearing是后续维护机器人的重要 significace,但是efficiently fault diagnosis with few samples是一项挑战性的工作。另外,FAULTS通常occurs randomly and fuzzily due to the complexity of the external environment and the structure of rolling bearings, which hinders the effective mining of fault characteristics and eventually restricts the accuracy of fault diagnosis. To overcome these problems, this work proposes a stochastic configuration network (SCN) based cloud ensemble learning method, called SCN-CEL. Specifically, a cloud feature extraction method is first developed by using a backward cloud generator of normal cloud model to mine the uncertainty of fault information. Then, a cloud sampling method, which generates enough cloud droplets using bidirectional cloud generator, is proposed to extend the cloud feature samples. Finally, an ensemble model with SCNs is developed to comprehensively characterize the uncertainty of fault information and improve the generalization performance of fault diagnosis machine. Experimental results show that the proposed method can indeed distinguish fault categories of rolling bearings in the few shot scenarios.Here's the translation in Traditional Chinese as well:FAULT诊断rolling bearing是后续维护机器人的重要 significace,但是efficiently fault diagnosis with few samples是一项挑战性的工作。另外,FAULTS通常occurs randomly and fuzzily due to the complexity of the external environment and the structure of rolling bearings, which hinders the effective mining of fault characteristics and eventually restricts the accuracy of fault diagnosis. To overcome these problems, this work proposes a stochastic configuration network (SCN) based cloud ensemble learning method, called SCN-CEL. Specifically, a cloud feature extraction method is first developed by using a backward cloud generator of normal cloud model to mine the uncertainty of fault information. Then, a cloud sampling method, which generates enough cloud droplets using bidirectional cloud generator, is proposed to extend the cloud feature samples. Finally, an ensemble model with SCNs is developed to comprehensively characterize the uncertainty of fault information and improve the generalization performance of fault diagnosis machine. Experimental results show that the proposed method can indeed distinguish fault categories of rolling bearings in the few shot scenarios.

On efficient computation in active inference

  • paper_url: http://arxiv.org/abs/2307.00504
  • repo_url: https://github.com/aswinpaul/dpefe_2023
  • paper_authors: Aswin Paul, Noor Sajid, Lancelot Da Costa, Adeel Razi
  • for: 提高 active inference 的计算效率和定义目标分布的准确性
  • methods: 提出两种解决方案,包括一种基于动态Programming的新规划算法和一种基于 Z-学习的目标分布设定方法
  • results: 在标准的grid-world任务中进行了模拟测试,并显示了这些方法的效果和可行性
    Abstract Despite being recognized as neurobiologically plausible, active inference faces difficulties when employed to simulate intelligent behaviour in complex environments due to its computational cost and the difficulty of specifying an appropriate target distribution for the agent. This paper introduces two solutions that work in concert to address these limitations. First, we present a novel planning algorithm for finite temporal horizons with drastically lower computational complexity. Second, inspired by Z-learning from control theory literature, we simplify the process of setting an appropriate target distribution for new and existing active inference planning schemes. Our first approach leverages the dynamic programming algorithm, known for its computational efficiency, to minimize the cost function used in planning through the Bellman-optimality principle. Accordingly, our algorithm recursively assesses the expected free energy of actions in the reverse temporal order. This improves computational efficiency by orders of magnitude and allows precise model learning and planning, even under uncertain conditions. Our method simplifies the planning process and shows meaningful behaviour even when specifying only the agent's final goal state. The proposed solutions make defining a target distribution from a goal state straightforward compared to the more complicated task of defining a temporally informed target distribution. The effectiveness of these methods is tested and demonstrated through simulations in standard grid-world tasks. These advances create new opportunities for various applications.
    摘要 despite being recognized as neurobiologically plausible, active inference faces difficulties when employed to simulate intelligent behavior in complex environments due to its computational cost and the difficulty of specifying an appropriate target distribution for the agent. This paper introduces two solutions that work in concert to address these limitations. First, we present a novel planning algorithm for finite temporal horizons with drastically lower computational complexity. Second, inspired by Z-learning from control theory literature, we simplify the process of setting an appropriate target distribution for new and existing active inference planning schemes. Our first approach leverages the dynamic programming algorithm, known for its computational efficiency, to minimize the cost function used in planning through the Bellman-optimality principle. Accordingly, our algorithm recursively assesses the expected free energy of actions in the reverse temporal order. This improves computational efficiency by orders of magnitude and allows precise model learning and planning, even under uncertain conditions. Our method simplifies the planning process and shows meaningful behavior even when specifying only the agent's final goal state. The proposed solutions make defining a target distribution from a goal state straightforward compared to the more complicated task of defining a temporally informed target distribution. The effectiveness of these methods is tested and demonstrated through simulations in standard grid-world tasks. These advances create new opportunities for various applications.Here is the word-for-word translation of the text into Simplified Chinese:尽管被认可为神经生物学上可能的,活动推理却在复杂环境中模拟智能行为时遇到了计算成本和设置合适目标分布的困难。这篇论文介绍了两种解决方案,它们在合作下解决了这些限制。首先,我们提出了一种新的规划算法,可以在有限时间Horizon上降低计算成本。其次,我们受控制理论文学中的Z学习启发,使得设置合适的目标分布变得更加简单。我们的第一个方法利用了动态计划算法,知名的计算效率高,来最小化计划中使用的成本函数。因此,我们的算法递归评估行动的预期自由能量,从反时间顺序进行评估。这会提高计算效率,并允许精确地学习和规划,即使在不确定的条件下。我们的方法简化了规划过程,并在指定只有代理人的最终目标状态时显示了意义的行为。我们的提案使得从目标分布定义变得更加直观,而不是在更复杂的时间 informed target distribution中进行定义。我们的方法的效果通过标准网格世界任务的模拟测试而证明。这些进步创造了新的应用机会。

Classifying World War II Era Ciphers with Machine Learning

  • paper_url: http://arxiv.org/abs/2307.00501
  • repo_url: None
  • paper_authors: Brooke Dalton, Mark Stamp
  • for: 研究使用机器学习和深度学习技术对世界二战时期密码的分类,只使用密码文本。
  • methods: 使用支持向量机(SVM)、k-最近邻($k$-NN)和随机森林(RF)等三种经典机器学习模型,以及多层感知机(MLP)、长短期记忆(LSTM)、极限学习机(ELM)和卷积神经网络(CNN)等四种深度学习神经网络模型。
  • results: 对四种密码(恩igma、M-209、Sigaba、Purple和Typex)进行分类,使用不同的enario和不同长度的密码文本。结果显示,在最实际的scenario下,使用1000个字的密码文本,可以分类密码的准确率高于97%。此外,对一些学习技术的准确率进行分析,发现经典机器学习模型和深度学习模型在一些情况下表现相当。
    Abstract We determine the accuracy with which machine learning and deep learning techniques can classify selected World War II era ciphers when only ciphertext is available. The specific ciphers considered are Enigma, M-209, Sigaba, Purple, and Typex. We experiment with three classic machine learning models, namely, Support Vector Machines (SVM), $k$-Nearest Neighbors ($k$-NN), and Random Forest (RF). We also experiment with four deep learning neural network-based models: Multi-Layer Perceptrons (MLP), Long Short-Term Memory (LSTM), Extreme Learning Machines (ELM), and Convolutional Neural Networks (CNN). Each model is trained on features consisting of histograms, digrams, and raw ciphertext letter sequences. Furthermore, the classification problem is considered under four distinct scenarios: Fixed plaintext with fixed keys, random plaintext with fixed keys, fixed plaintext with random keys, and random plaintext with random keys. Under the most realistic scenario, given 1000 characters per ciphertext, we are able to distinguish the ciphers with greater than 97% accuracy. In addition, we consider the accuracy of a subset of the learning techniques as a function of the length of the ciphertext messages. Somewhat surprisingly, our classic machine learning models perform at least as well as our deep learning models. We also find that ciphers that are more similar in design are somewhat more challenging to distinguish, but not as difficult as might be expected.
    摘要 我们测试了机器学习和深度学习技术可以在只有密文available情况下准确地分类选择的World War II时期密码。我们考虑了Enigma、M-209、Sigaba、Purple和Typex等密码。我们尝试了三种经典机器学习模型,namely Support Vector Machines(SVM)、$k$-Nearest Neighbors($k$-NN)和Random Forest(RF)。我们还尝试了四种深度学习神经网络模型:Multi-Layer Perceptrons(MLP)、Long Short-Term Memory(LSTM)、Extreme Learning Machines(ELM)和Convolutional Neural Networks(CNN)。每个模型都在 histograms、digram和密文字符序列中的特征上进行训练。此外,我们还考虑了密码分类问题在四种不同的enario下进行解决:固定的平文和固定的密钥、随机的平文和固定的密钥、固定的平文和随机的密钥、随机的平文和随机的密钥。在最真实的scenario下,给定1000个密文字符,我们能够分类密码的准确率高于97%。此外,我们还考虑了学习技术的准确率与密文字符数量之间的关系。 surprisingly,我们的经典机器学习模型在准确率上至少与深度学习模型相当。我们还发现,在设计上更相似的密码相对较难分类,但不如可能所想的那么难。

Data-Free Quantization via Mixed-Precision Compensation without Fine-Tuning

  • paper_url: http://arxiv.org/abs/2307.00498
  • repo_url: None
  • paper_authors: Jun Chen, Shipeng Bai, Tianxin Huang, Mengmeng Wang, Guanzhong Tian, Yong Liu
  • For: The paper aims to propose a data-free mixed-precision compensation (DF-MPC) method to recover the performance of an ultra-low precision quantized model without any data and fine-tuning process.* Methods: The proposed DF-MPC method assumes that the quantized error caused by a low-precision quantized layer can be restored via the reconstruction of a high-precision quantized layer, and minimizes the reconstruction loss of the feature maps to achieve the closed-form solution.* Results: The experimental results show that the proposed DF-MPC method is able to achieve higher accuracy for an ultra-low precision quantized model compared to the recent methods without any data and fine-tuning process.Here are the three key points in Simplified Chinese text:* For: 本文提出的目标是无数据、无调教的情况下提高低精度量化模型的性能。* Methods: 提议的DF-MPC方法假设低精度量化层引起的量化误差可以通过高精度量化层的重建来修复,并将重建loss作为目标函数来求解closed-form解决方案。* Results: 实验结果表明,提议的DF-MPC方法可以在无数据、无调教的情况下提高低精度量化模型的性能,相比之下最近的方法表现更好。
    Abstract Neural network quantization is a very promising solution in the field of model compression, but its resulting accuracy highly depends on a training/fine-tuning process and requires the original data. This not only brings heavy computation and time costs but also is not conducive to privacy and sensitive information protection. Therefore, a few recent works are starting to focus on data-free quantization. However, data-free quantization does not perform well while dealing with ultra-low precision quantization. Although researchers utilize generative methods of synthetic data to address this problem partially, data synthesis needs to take a lot of computation and time. In this paper, we propose a data-free mixed-precision compensation (DF-MPC) method to recover the performance of an ultra-low precision quantized model without any data and fine-tuning process. By assuming the quantized error caused by a low-precision quantized layer can be restored via the reconstruction of a high-precision quantized layer, we mathematically formulate the reconstruction loss between the pre-trained full-precision model and its layer-wise mixed-precision quantized model. Based on our formulation, we theoretically deduce the closed-form solution by minimizing the reconstruction loss of the feature maps. Since DF-MPC does not require any original/synthetic data, it is a more efficient method to approximate the full-precision model. Experimentally, our DF-MPC is able to achieve higher accuracy for an ultra-low precision quantized model compared to the recent methods without any data and fine-tuning process.
    摘要 神经网络量化是一个非常有前途的解决方案,可以大幅压缩模型,但是它的结果准确性高度取决于训练/精度调整过程,并且需要原始数据。这不仅带来了重大的计算成本和时间开销,而且不利于隐私和敏感信息保护。因此,一些最近的研究开始关注数据无关量化。然而,数据无关量化在处理超低精度量化时并不很好。尽管研究人员利用生成方法 Synthetic Data 来解决这个问题,但是数据生成需要大量计算和时间。在这篇论文中,我们提出了一种不需要数据的混合精度补偿方法(DF-MPC),可以无需任何数据和精度调整过程,使得模型的性能得到改进。我们假设low-precision量化层所导致的量化误差可以通过高精度量化层的重建来还原。我们将这种情况Mathematically formulated as a reconstruction loss between the pre-trained full-precision model and its layer-wise mixed-precision quantized model。基于我们的形式化,我们可以 theoretically deduce the closed-form solution by minimizing the reconstruction loss of the feature maps。由于 DF-MPC 不需要任何原始/生成的数据,因此它是一种更高效的方法来 aproximate the full-precision model。实验表明,我们的 DF-MPC 能够在无需数据和精度调整过程的情况下,对ultra-low precision量化模型进行更高的准确性改进。

Don’t Memorize; Mimic The Past: Federated Class Incremental Learning Without Episodic Memory

  • paper_url: http://arxiv.org/abs/2307.00497
  • repo_url: None
  • paper_authors: Sara Babakniya, Zalan Fabian, Chaoyang He, Mahdi Soltanolkotabi, Salman Avestimehr
  • for: 本文研究的目的是解决深度学习模型在新数据上学习时忘记过去学习的问题,特别在联合学习(Federated Learning,FL)中,数据是分散的,每个用户都可以独立地更改自己的数据。
  • methods: 本文提出了一个基于生成模型的联合学习类增量学习框架,通过生成样本从过去的分布中来避免忘记现象,而不需要将过去的数据存储在客户端上。服务器端使用数据无需方法在每个任务结束时训练生成模型,因此减少了数据泄露的风险。
  • results: 对于CIFAR-100 dataset,本文比对 existed 的基准值显示了显著的改进。
    Abstract Deep learning models are prone to forgetting information learned in the past when trained on new data. This problem becomes even more pronounced in the context of federated learning (FL), where data is decentralized and subject to independent changes for each user. Continual Learning (CL) studies this so-called \textit{catastrophic forgetting} phenomenon primarily in centralized settings, where the learner has direct access to the complete training dataset. However, applying CL techniques to FL is not straightforward due to privacy concerns and resource limitations. This paper presents a framework for federated class incremental learning that utilizes a generative model to synthesize samples from past distributions instead of storing part of past data. Then, clients can leverage the generative model to mitigate catastrophic forgetting locally. The generative model is trained on the server using data-free methods at the end of each task without requesting data from clients. Therefore, it reduces the risk of data leakage as opposed to training it on the client's private data. We demonstrate significant improvements for the CIFAR-100 dataset compared to existing baselines.
    摘要 深度学习模型容易忘记过去学习的信息,特别在 federated learning(FL)上,数据分散化和每个用户独立变化。 kontinual learning(CL)在中心化Setting中主要研究这种称为“极端忘记”现象,但是在隐私和资源限制的情况下应用CL技术到FL不是直接的。这篇文章介绍了一种基于生成模型的联邦分类逐步学习框架,可以在客户端使用生成模型来避免极端忘记。生成模型在服务器端使用数据free方法在每个任务结束时训练,因此减少了数据泄露的风险,与在客户端私有数据上训练生成模型不同。我们对 CIFAR-100 数据集进行了比较,与现有基eline相比,我们得到了显著的改进。

STG4Traffic: A Survey and Benchmark of Spatial-Temporal Graph Neural Networks for Traffic Prediction

  • paper_url: http://arxiv.org/abs/2307.00495
  • repo_url: https://github.com/trainingl/stg4traffic
  • paper_authors: Xunlian Luo, Chunjiang Zhu, Detian Zhang, Qing Li
  • for: 这个论文主要是为了提出一种基于图学习的实时交通预测方法,以提高智能城市系统的安全、稳定性和多样性。
  • methods: 这篇论文使用了图学习策略和常见的图卷积网络来模型交通系统的空间时间相关性。
  • results: 研究人员通过设计了一个标准化和可扩展的benchmark,并对两种交通数据集进行了比较性评估,发现这种方法可以提供更高的准确率和更好的可扩展性。Please note that the above text is in Simplified Chinese, and the format of the output is as you requested:
  • for: <what are the paper written for?>
  • methods: <what methods the paper use?>
  • results: <what results the paper get?>
    Abstract Traffic prediction has been an active research topic in the domain of spatial-temporal data mining. Accurate real-time traffic prediction is essential to improve the safety, stability, and versatility of smart city systems, i.e., traffic control and optimal routing. The complex and highly dynamic spatial-temporal dependencies make effective predictions still face many challenges. Recent studies have shown that spatial-temporal graph neural networks exhibit great potential applied to traffic prediction, which combines sequential models with graph convolutional networks to jointly model temporal and spatial correlations. However, a survey study of graph learning, spatial-temporal graph models for traffic, as well as a fair comparison of baseline models are pending and unavoidable issues. In this paper, we first provide a systematic review of graph learning strategies and commonly used graph convolution algorithms. Then we conduct a comprehensive analysis of the strengths and weaknesses of recently proposed spatial-temporal graph network models. Furthermore, we build a study called STG4Traffic using the deep learning framework PyTorch to establish a standardized and scalable benchmark on two types of traffic datasets. We can evaluate their performance by personalizing the model settings with uniform metrics. Finally, we point out some problems in the current study and discuss future directions. Source codes are available at https://github.com/trainingl/STG4Traffic.
    摘要 宽泛研究领域:预测交通流量是智能城市系统中的一个活跃研究话题。实时准确的交通预测可以提高智能城市系统的安全性、稳定性和多样性,如交通控制和优化Routing。距离Complex and highly dynamic spatial-temporal dependencies make effective predictions still face many challenges. Recent studies have shown that spatial-temporal graph neural networks exhibit great potential applied to traffic prediction, which combines sequential models with graph convolutional networks to jointly model temporal and spatial correlations. However, a survey study of graph learning, spatial-temporal graph models for traffic, as well as a fair comparison of baseline models are pending and unavoidable issues.在这篇论文中,我们首先提供了系统性的图学策略和通用的图 convolution 算法的评估。然后,我们进行了广泛的 spatial-temporal graph network 模型的分析,探讨其优劣点。其次,我们使用 PyTorch 深度学习框架建立了一个标准化和可扩展的 benchmark,并在两种交通数据集上进行了研究。通过个性化模型设置,我们可以评估其性能。最后,我们指出了当前研究中的一些问题,并讨论了未来的方向。源代码可以在 https://github.com/trainingl/STG4Traffic 上获取。

Optimizing protein fitness using Gibbs sampling with Graph-based Smoothing

  • paper_url: http://arxiv.org/abs/2307.00494
  • repo_url: https://github.com/kirjner/ggs
  • paper_authors: Andrew Kirjner, Jason Yim, Raman Samusevich, Tommi Jaakkola, Regina Barzilay, Ila Fiete
  • for: 本研究旨在设计高适应性蛋白质,以满足医学多种领域的需求。
  • methods: 本文提出了 Gibbs sampling with Graph-based Smoothing(GGS)方法,可以有效地探索蛋白质设计空间。GGS方法通过 iteratively 应用 Gibbs 与梯度来提出有利变化,并使用图基的平滑来消除干扰梯度导致的假阳性。
  • results: 对 GFP 和 AAV 设计问题,以及简单的ablations 和基线,本文进行了研究,并得到了 state-of-the-art 的结果,能够找到高适应性蛋白质的Up to 8 个变化。
    Abstract The ability to design novel proteins with higher fitness on a given task would be revolutionary for many fields of medicine. However, brute-force search through the combinatorially large space of sequences is infeasible. Prior methods constrain search to a small mutational radius from a reference sequence, but such heuristics drastically limit the design space. Our work seeks to remove the restriction on mutational distance while enabling efficient exploration. We propose Gibbs sampling with Graph-based Smoothing (GGS) which iteratively applies Gibbs with gradients to propose advantageous mutations using graph-based smoothing to remove noisy gradients that lead to false positives. Our method is state-of-the-art in discovering high-fitness proteins with up to 8 mutations from the training set. We study the GFP and AAV design problems, ablations, and baselines to elucidate the results. Code: https://github.com/kirjner/GGS
    摘要 能够设计新的蛋白质,其适应能力更高,将对医学多个领域带来革命。然而,简单地通过枚举空间的搜索是不可能的。先前的方法受限于小距离的参照序列,但这些决策限制了设计空间。我们的工作是要 removes 这种距离限制,同时允许有效探索。我们提议使用 Gibbs 抽象法和图像缓和(GGS),每步应用 Gibbs 与梯度相结合,提出有利мутации,并使用图像缓和来消除干扰梯度,以避免干扰梯度导致的假阳性。我们的方法目前为止在使用训练集上达到高适应能力蛋白质的设计问题上占据了国际领先地位。我们在 GFP 和 AAV 设计问题、ablations 和基准值进行了研究,以便解释结果。代码可以在 GitHub 上找到:https://github.com/kirjner/GGS。

Fourier-Mixed Window Attention: Accelerating Informer for Long Sequence Time-Series Forecasting

  • paper_url: http://arxiv.org/abs/2307.00493
  • repo_url: https://github.com/nhatthanhtran/fwin2023
  • paper_authors: Nhat Thanh Tran, Jack Xin
  • for: 用于加速长序列时间预测 Informer 的快本地全球窗口基于注意力方法。
  • methods: 使用本地窗口注意力,而不是假设查询稀疏性,并使用 fourier 变换块来补做全球token信息。
  • results: 通过在单变量和多变量 datasets 上进行实验,我们表明 FWin 变换器可以提高 Informer 的总预测精度,同时提高其推理速度,比如40-50%。此外,我们还在非线性回归模型中显示了一种学习 FWin 类型注意力的方法可以超过 softmax 全注意力。
    Abstract We study a fast local-global window-based attention method to accelerate Informer for long sequence time-series forecasting. While window attention is local and a considerable computational saving, it lacks the ability to capture global token information which is compensated by a subsequent Fourier transform block. Our method, named FWin, does not rely on query sparsity hypothesis and an empirical approximation underlying the ProbSparse attention of Informer. Through experiments on univariate and multivariate datasets, we show that FWin transformers improve the overall prediction accuracies of Informer while accelerating its inference speeds by 40 to 50 %. We also show in a nonlinear regression model that a learned FWin type attention approaches or even outperforms softmax full attention based on key vectors extracted from an Informer model's full attention layer acting on time series data.
    摘要 我们研究一种快速的本地-全局窗口基于注意方法,以加速Informer进行长序时间序列预测。虽然窗口注意是本地的,可以获得较大的计算时间 saving,但是它缺乏捕捉全局Token信息的能力,这是通过后续的傅里叶变换块补做。我们的方法,名为FWin,不依赖于查询稀缺假设和Informer中的ProbSparse注意力的经验近似。通过对单variate和多variate数据集进行实验,我们显示了FWin变换器可以提高Informer的总预测精度,同时加速其推理速度 by 40% to 50%. 我们还在非线性回归模型中表明,一种学习的FWin类型注意力可以与Softmax全注意力相当或者超过从Informer模型的全注意层中提取的键vector。

Pricing European Options with Google AutoML, TensorFlow, and XGBoost

  • paper_url: http://arxiv.org/abs/2307.00476
  • repo_url: https://github.com/juan-esteban-berger/options_pricing_automl_tensorflow_xgboost
  • paper_authors: Juan Esteban Berger
  • for: 该文章用于比较使用Google Cloud AutoML Regressor、TensorFlow神经网络和XGBoost分类决策树来估算欧洲Option价格。
  • methods: 该文章使用了三种不同的机器学习算法来估算欧洲Option价格,即Google Cloud AutoML Regressor、TensorFlow神经网络和XGBoost分类决策树。
  • results: 三种模型都能够超越黑色熊模型,具体来说,使用历史数据来估算欧洲Option价格尤其有效,尤其是使用机器学习算法来学习复杂的模式,传统参数模型不能考虑。
    Abstract Researchers have been using Neural Networks and other related machine-learning techniques to price options since the early 1990s. After three decades of improvements in machine learning techniques, computational processing power, cloud computing, and data availability, this paper is able to provide a comparison of using Google Cloud's AutoML Regressor, TensorFlow Neural Networks, and XGBoost Gradient Boosting Decision Trees for pricing European Options. All three types of models were able to outperform the Black Scholes Model in terms of mean absolute error. These results showcase the potential of using historical data from an option's underlying asset for pricing European options, especially when using machine learning algorithms that learn complex patterns that traditional parametric models do not take into account.
    摘要 研究人员 desde 1990年代开始使用神经网络和相关的机器学习技术来估算选择价格。过去三十年来,机器学习技术、计算机处理能力、云计算和数据可用性得到了大幅提高。本文能够对使用Google Cloud的AutoML Regressor、TensorFlow神经网络和XGBoost分布gradient Boosting Decision Trees来估算欧洲选择价格进行比较。这三种类型的模型都能够超越黑卫星模型(Black Scholes Model)的mean absolute error。这些结果表明,使用选择物品的历史数据来估算欧洲选择价格,尤其是使用机器学习算法来学习复杂的模式,传统参数模型无法考虑。

Moments, Random Walks, and Limits for Spectrum Approximation

  • paper_url: http://arxiv.org/abs/2307.00474
  • repo_url: None
  • paper_authors: Yujia Jin, Christopher Musco, Aaron Sidford, Apoorv Vikram Singh
  • for: 本文研究一维分布近似问题中的下界问题,具体来说是对于已知多项式级别的一维分布,可以达到伪随机性水平。
  • methods: 本文使用了induced by eigenvalue spectra of carefully constructed graph adjacency matrices的hard instance,以及Cohen-Steiner et al. [KDD 2018]提供的spectral moments approximations using $2^{O(1/\epsilon)}$ random walks initiated at uniformly random nodes in the graph。
  • results: 本文证明了一些分布在[-1,1]上不能够在Wasserstein-1 distance上准确地近似,即使知道所有多项式级别的分布。这个结果与Kong和Valiant [Annals of Statistics, 2017]的Upper bound相符。
    Abstract We study lower bounds for the problem of approximating a one dimensional distribution given (noisy) measurements of its moments. We show that there are distributions on $[-1,1]$ that cannot be approximated to accuracy $\epsilon$ in Wasserstein-1 distance even if we know \emph{all} of their moments to multiplicative accuracy $(1\pm2^{-\Omega(1/\epsilon)})$; this result matches an upper bound of Kong and Valiant [Annals of Statistics, 2017]. To obtain our result, we provide a hard instance involving distributions induced by the eigenvalue spectra of carefully constructed graph adjacency matrices. Efficiently approximating such spectra in Wasserstein-1 distance is a well-studied algorithmic problem, and a recent result of Cohen-Steiner et al. [KDD 2018] gives a method based on accurately approximating spectral moments using $2^{O(1/\epsilon)}$ random walks initiated at uniformly random nodes in the graph. As a strengthening of our main result, we show that improving the dependence on $1/\epsilon$ in this result would require a new algorithmic approach. Specifically, no algorithm can compute an $\epsilon$-accurate approximation to the spectrum of a normalized graph adjacency matrix with constant probability, even when given the transcript of $2^{\Omega(1/\epsilon)}$ random walks of length $2^{\Omega(1/\epsilon)}$ started at random nodes.
    摘要 我们研究一维分布的下界,对于受到杂度的测量的矩形几何。我们证明有在[-1,1]上的分布,不能够在 Wasserstein-1 距离下对准确度 $\epsilon$ 的测量。这个结果与 Kong 和 Valiant 的上界相匹配 [Annals of Statistics, 2017]。我们使用具有精确的数值矩阵的对角线几何来提供一个困难的实例。现有一个由 Cohen-Steiner 等人提出的方法 [KDD 2018],可以使用 $2^{O(1/\epsilon)}$ 步骤的随机步进行精确地测量几何的spectrum。作为我们主要结果的强化,我们证明 improvving 这个结果中的 $1/\epsilon$ 取决因数需要一个新的算法方法。具体来说,无法使用现有的算法,在 givent 矩阵的转换矩阵上 compute 一个 $\epsilon$-精确的测量,即使被给定 $2^{\Omega(1/\epsilon)}$ 步骤的随机步进行测量。

Equal Confusion Fairness: Measuring Group-Based Disparities in Automated Decision Systems

  • paper_url: http://arxiv.org/abs/2307.00472
  • repo_url: https://github.com/furkangursoy/equalconfusion
  • paper_authors: Furkan Gursoy, Ioannis A. Kakadiaris
  • for: This paper focuses on evaluating the fairness of automated decision systems, specifically in terms of group fairness.
  • methods: The paper proposes a new equal confusion fairness test and a new confusion parity error metric to measure unfairness, as well as an appropriate post hoc analysis methodology to identify the source of potential unfairness.
  • results: The proposed methods and metrics are demonstrated on the case of COMPAS, an automated decision system used in the US to assess recidivism risks, and show their usefulness in assessing fairness as part of a more extensive accountability assessment.Here’s the same information in Simplified Chinese:
  • for: 这篇论文关注自动决策系统的公平性评估,尤其是集体公平性。
  • methods: 该论文提出了一种新的平衡决策公平测试和一个新的决策错误度量表来衡量不公平,以及一种适当的后续分析方法来检测潜在的不公平源头。
  • results: 提议的方法和指标在使用COMPAS例子进行示例后显示了它们在评估自动决策系统公平性的用于更广泛的责任评估中的用用ifulness。
    Abstract As artificial intelligence plays an increasingly substantial role in decisions affecting humans and society, the accountability of automated decision systems has been receiving increasing attention from researchers and practitioners. Fairness, which is concerned with eliminating unjust treatment and discrimination against individuals or sensitive groups, is a critical aspect of accountability. Yet, for evaluating fairness, there is a plethora of fairness metrics in the literature that employ different perspectives and assumptions that are often incompatible. This work focuses on group fairness. Most group fairness metrics desire a parity between selected statistics computed from confusion matrices belonging to different sensitive groups. Generalizing this intuition, this paper proposes a new equal confusion fairness test to check an automated decision system for fairness and a new confusion parity error to quantify the extent of any unfairness. To further analyze the source of potential unfairness, an appropriate post hoc analysis methodology is also presented. The usefulness of the test, metric, and post hoc analysis is demonstrated via a case study on the controversial case of COMPAS, an automated decision system employed in the US to assist judges with assessing recidivism risks. Overall, the methods and metrics provided here may assess automated decision systems' fairness as part of a more extensive accountability assessment, such as those based on the system accountability benchmark.
    摘要 随着人工智能在影响人类和社会决策中的角色变得越来越重要,决策系统的负责任得到了更多的关注。公平是考虑消除不公正对待和歧视的一个关键方面,它是负责任的一部分。然而,为了评估公平,在文献中有很多不同的公平指标,这些指标常有不同的视角和假设,导致它们之间不兼容。这项工作将关注于群体公平。大多数群体公平指标寻求在不同敏感群体的决策系统中计算的混乱矩阵中实现平衡。总之,这篇论文提出了一种新的平衡公平测试,用于检测自动决策系统的公平,以及一种新的混乱平衡错误来衡量任何不公正。此外,为了分析潜在的不公正来源,我们还提供了一种适用的后续分析方法。这些方法和指标在一个关于 COMPAS 案例的案例研究中被证明了其用于评估自动决策系统的公平。总之,提供的方法和指标可以用于评估自动决策系统的公平,并作为更广泛的负责任评估的一部分,如基于系统负责任指标。

Data-Driven Probabilistic Energy Consumption Estimation for Battery Electric Vehicles with Model Uncertainty

  • paper_url: http://arxiv.org/abs/2307.00469
  • repo_url: None
  • paper_authors: Ayan Maity, Sudeshna Sarkar
  • for: 这个论文是为了提出一种基于概率数据驱动的电动汽车(BEV)行程级能耗估计方法。
  • methods: 该方法使用概率神经网络,并通过 Монテ卡洛分布来确定模型uncertainty。
  • results: 实验结果表明,该方法可以准确地估计电动汽车行程级能耗,并且与其他现有的电动汽车能耗模型相比,具有较高的准确率。
    Abstract This paper presents a novel probabilistic data-driven approach to trip-level energy consumption estimation of battery electric vehicles (BEVs). As there are very few electric vehicle (EV) charging stations, EV trip energy consumption estimation can make EV routing and charging planning easier for drivers. In this research article, we propose a new driver behaviour-centric EV energy consumption estimation model using probabilistic neural networks with model uncertainty. By incorporating model uncertainty into neural networks, we have created an ensemble of neural networks using Monte Carlo approximation. Our method comprehensively considers various vehicle dynamics, driver behaviour and environmental factors to estimate EV energy consumption for a given trip. We propose relative positive acceleration (RPA), average acceleration and average deceleration as driver behaviour factors in EV energy consumption estimation and this paper shows that the use of these driver behaviour features improves the accuracy of the EV energy consumption model significantly. Instead of predicting a single-point estimate for EV trip energy consumption, this proposed method predicts a probability distribution for the EV trip energy consumption. The experimental results of our approach show that our proposed probabilistic neural network with weight uncertainty achieves a mean absolute percentage error of 9.3% and outperforms other existing EV energy consumption models in terms of accuracy.
    摘要 The proposed method uses probabilistic neural networks with model uncertainty to estimate EV energy consumption. By incorporating model uncertainty into neural networks, an ensemble of neural networks is created using Monte Carlo approximation. The method considers various vehicle dynamics, driver behavior, and environmental factors to estimate EV energy consumption for a given trip.The proposed method uses relative positive acceleration (RPA), average acceleration, and average deceleration as driver behavior factors in EV energy consumption estimation. The experimental results show that the proposed method outperforms other existing EV energy consumption models in terms of accuracy, with a mean absolute percentage error of 9.3%. Instead of predicting a single-point estimate for EV trip energy consumption, the proposed method predicts a probability distribution for the EV trip energy consumption.Here is the translation in Simplified Chinese:这篇论文提出了一种新的可能性推理数据驱动的方法来估算电动汽车(BEVs)的旅程级能 consumption。由于有很少的电动汽车充电站, точно估算电动汽车旅程能 consumption可以让驾驶员更容易进行电动汽车路径和充电规划。提议的方法使用可能性神经网络 WITH 模型不确定性来估算电动汽车的能 consumption。通过将模型不确定性引入神经网络中,我们创建了一个 ensemble of 神经网络 using Monte Carlo approximation。该方法广泛考虑了不同的汽车动力学、驾驶行为和环境因素,以估算电动汽车旅程能 consumption。提议的方法使用幂加速度(RPA)、均值加速度和均值减速度作为驾驶行为因素。实验结果表明,提议的方法在准确性方面超过了其他现有的电动汽车能 consumption模型, Mean Absolute Percentage Error 为9.3%。而不是预测单点的电动汽车旅程能 consumption,提议的方法预测了电动汽车旅程能 consumption的概率分布。

MissDiff: Training Diffusion Models on Tabular Data with Missing Values

  • paper_url: http://arxiv.org/abs/2307.00467
  • repo_url: None
  • paper_authors: Yidong Ouyang, Liyan Xie, Chongxuan Li, Guang Cheng
  • for: 模型学习从数据中缺失值的数据分布,以便在各种实际应用中处理缺失数据。
  • methods: 提出了一种统一的涉及原理的扩展 diffusion 模型,可以在不同的缺失机制下学习数据分布。
  • results: 与 state-of-the-art diffusion model 比较,在多个实际的 tabular 数据集上达到了较大的性能提升。
    Abstract The diffusion model has shown remarkable performance in modeling data distributions and synthesizing data. However, the vanilla diffusion model requires complete or fully observed data for training. Incomplete data is a common issue in various real-world applications, including healthcare and finance, particularly when dealing with tabular datasets. This work presents a unified and principled diffusion-based framework for learning from data with missing values under various missing mechanisms. We first observe that the widely adopted "impute-then-generate" pipeline may lead to a biased learning objective. Then we propose to mask the regression loss of Denoising Score Matching in the training phase. We prove the proposed method is consistent in learning the score of data distributions, and the proposed training objective serves as an upper bound for the negative likelihood in certain cases. The proposed framework is evaluated on multiple tabular datasets using realistic and efficacious metrics and is demonstrated to outperform state-of-the-art diffusion model on tabular data with "impute-then-generate" pipeline by a large margin.
    摘要 diffusion 模型在数据分布模型和数据合成方面表现出色,但是普通的扩散模型需要完整或完全观察到的数据进行训练。实际应用中,包括医疗和金融等领域,数据中存在缺失数据是一个常见的问题。这项工作提出了一种统一的、原则正的扩散模型基于缺失数据学习框架,用于处理不同的缺失机制。我们首先发现,通过"填充然后生成"管道可能会导致偏向的学习目标。然后,我们提议在训练阶段对杜邦Score匹配的损失进行遮盖。我们证明了我们提议的方法是学习数据分布的分数的一种一致的方法,并且我们提议的训练目标为certain cases中的负概率下界。我们的框架在多个实际的 tabular 数据集上进行了实际和有效的评估,并与state-of-the-art扩散模型在 tabular 数据上"填充然后生成"管道的比较中表现出了大幅度的超越。

Towards Unbiased Exploration in Partial Label Learning

  • paper_url: http://arxiv.org/abs/2307.00465
  • repo_url: None
  • paper_authors: Zsolt Zombori, Agapi Rissaki, Kristóf Szabó, Wolfgang Gatterbauer, Michael Benedikt
  • for: 这篇论文是用于描述一种基于半标注的学习方法,该方法可以在标准神经网络架构中使用softmax层进行推理,并且可以减轻softmax层中的偏见现象,以便更好地探索多个可能性。
  • methods: 这篇论文使用了一种新的损失函数,该损失函数可以减轻softmax层中的偏见现象,并且可以在标准神经网络架构中进行不偏的探索。
  • results: 论文通过对синтетиче数据、标准半标注benchmark和一个新的规则学习挑战中的数据进行广泛的评估,证明了该损失函数的有效性。
    Abstract We consider learning a probabilistic classifier from partially-labelled supervision (inputs denoted with multiple possibilities) using standard neural architectures with a softmax as the final layer. We identify a bias phenomenon that can arise from the softmax layer in even simple architectures that prevents proper exploration of alternative options, making the dynamics of gradient descent overly sensitive to initialisation. We introduce a novel loss function that allows for unbiased exploration within the space of alternative outputs. We give a theoretical justification for our loss function, and provide an extensive evaluation of its impact on synthetic data, on standard partially labelled benchmarks and on a contributed novel benchmark related to an existing rule learning challenge.
    摘要 我们考虑从受限的指导下学习一个概率分类器,使用标准的神经网络架构,并在最终层使用softmax层。我们发现在简单的架构中,softmax层可能会带来偏袋现象,使得梯度下降的几何对初始化的敏感性。我们介绍了一个新的损失函数,允许不偏的探索多个出力空间中的选项。我们提供了理论上的说明,并对实验数据、标准受限 benchmark和一个新的规则学习挑战中的贡献 benchmark进行了广泛的评估。

FedDefender: Backdoor Attack Defense in Federated Learning

  • paper_url: http://arxiv.org/abs/2307.08672
  • repo_url: https://github.com/warisgill/FedDefender
  • paper_authors: Waris Gill, Ali Anwar, Muhammad Ali Gulzar
  • for: 防止targeted poisoning攻击在 Federated Learning (FL) 中
  • methods: 利用差异测试方法来识别可疑客户端(包含后门)
  • results: 在 MNIST 和 FashionMNIST 数据集上,FedDefender 有效地 mitigates 攻击,从而降低攻击成功率至 10%,无需对全球模型性能产生负面影响。
    Abstract Federated Learning (FL) is a privacy-preserving distributed machine learning technique that enables individual clients (e.g., user participants, edge devices, or organizations) to train a model on their local data in a secure environment and then share the trained model with an aggregator to build a global model collaboratively. In this work, we propose FedDefender, a defense mechanism against targeted poisoning attacks in FL by leveraging differential testing. Our proposed method fingerprints the neuron activations of clients' models on the same input and uses differential testing to identify a potentially malicious client containing a backdoor. We evaluate FedDefender using MNIST and FashionMNIST datasets with 20 and 30 clients, and our results demonstrate that FedDefender effectively mitigates such attacks, reducing the attack success rate (ASR) to 10\% without deteriorating the global model performance.
    摘要 federated 学习(FL)是一种隐私保护的分布式机器学习技术,允许个体客户端(例如用户参与者、边缘设备或组织)在安全环境中使用本地数据进行模型训练,然后将训练好的模型分享给汇总器以建立全球模型。在这项工作中,我们提议了 FedDefender,一种针对攻击型投毒攻击的防御机制,通过对客户端模型的神经元活动进行差异测试来识别可能有恶意后门的客户端。我们使用 MNIST 和 FashionMNIST 数据集,并与 20 和 30 个客户端进行评估,结果表明,FedDefender 有效地防止了这类攻击,攻击成功率降低至 10%,而全球模型性能不受影响。

Conformer LLMs – Convolution Augmented Large Language Models

  • paper_url: http://arxiv.org/abs/2307.00461
  • repo_url: None
  • paper_authors: Prateek Verma
  • for: 这个研究旨在将两种受欢迎的神经架构combined,即卷积层和Transformers,以应用于大型语言模型(LLMs)。
  • methods: 这个研究使用了非 causal 的卷积层,并将它们应用于 causal 的训练架构中。transformers decoder 优化了跨多modalities的长距离依赖,并成为现代机器学习的核心进步。
  • results: 这个研究获得了显著的性能提升,并显示了一个可靠的语音架构,可以在 causal 设置中进行集成和适应。
    Abstract This work builds together two popular blocks of neural architecture, namely convolutional layers and Transformers, for large language models (LLMs). Non-causal conformers are used ubiquitously in automatic speech recognition. This work aims to adapt these architectures in a causal setup for training LLMs. Transformers decoders effectively capture long-range dependencies over several modalities and form a core backbone of modern advancements in machine learning. Convolutional architectures have been popular in extracting features in domains such as raw 1-D signals, speech, and images, to name a few. In this paper, by combining local and global dependencies over latent representations using causal convolutional filters and Transformer, we achieve significant gains in performance. This work showcases a robust speech architecture that can be integrated and adapted in a causal setup beyond speech applications for large-scale language modeling.
    摘要

GenRec: Large Language Model for Generative Recommendation

  • paper_url: http://arxiv.org/abs/2307.00457
  • repo_url: https://github.com/rutgerswiselab/genrec
  • paper_authors: Jianchao Ji, Zelong Li, Shuyuan Xu, Wenyue Hua, Yingqiang Ge, Juntao Tan, Yongfeng Zhang
  • For: This paper presents a novel approach to recommendation systems using large language models (LLMs) based on text data, which can directly generate the target item to recommend rather than calculating ranking scores for each candidate item.* Methods: The proposed approach leverages the vast knowledge encoded in large language models to accomplish recommendation tasks, and uses specialized prompts to enhance the ability of LLM to comprehend recommendation tasks.* Results: The proposed GenRec approach has significant better results on large datasets, and the experiments show the potential of LLM-based generative recommendation in revolutionizing the domain of recommendation systems.Here’s the summary in Chinese:* 用这篇论文,我们提出了一种基于文本数据的推荐系统方法,可以直接生成目标项目的推荐,而不需要计算每个候选项目的排名分数。* 我们的方法利用大型自然语言模型(LLM)的优秀表现能力,实现推荐任务。我们使用特殊的提示来增强LLM的理解能力,以便更好地理解用户喜好和项目特性。* 我们的实验结果显示,我们的GenRec方法在大型数据集上有很好的表现,并证明了LLM基于生成推荐的潜在力量。
    Abstract In recent years, large language models (LLM) have emerged as powerful tools for diverse natural language processing tasks. However, their potential for recommender systems under the generative recommendation paradigm remains relatively unexplored. This paper presents an innovative approach to recommendation systems using large language models (LLMs) based on text data. In this paper, we present a novel LLM for generative recommendation (GenRec) that utilized the expressive power of LLM to directly generate the target item to recommend, rather than calculating ranking score for each candidate item one by one as in traditional discriminative recommendation. GenRec uses LLM's understanding ability to interpret context, learn user preferences, and generate relevant recommendation. Our proposed approach leverages the vast knowledge encoded in large language models to accomplish recommendation tasks. We first we formulate specialized prompts to enhance the ability of LLM to comprehend recommendation tasks. Subsequently, we use these prompts to fine-tune the LLaMA backbone LLM on a dataset of user-item interactions, represented by textual data, to capture user preferences and item characteristics. Our research underscores the potential of LLM-based generative recommendation in revolutionizing the domain of recommendation systems and offers a foundational framework for future explorations in this field. We conduct extensive experiments on benchmark datasets, and the experiments shows that our GenRec has significant better results on large dataset.
    摘要 We first formulate specialized prompts to enhance the ability of LLM to comprehend recommendation tasks. Then, we use these prompts to fine-tune the LLaMA backbone LLM on a dataset of user-item interactions, represented by textual data, to capture user preferences and item characteristics. Our research highlights the potential of LLM-based generative recommendation in revolutionizing the domain of recommendation systems and provides a foundational framework for future explorations in this field. We conduct extensive experiments on benchmark datasets, and the results show that our GenRec method has significantly better performance on large datasets.

3D-IDS: Doubly Disentangled Dynamic Intrusion Detection

  • paper_url: http://arxiv.org/abs/2307.11079
  • repo_url: None
  • paper_authors: Chenyang Qiu, Yingsheng Geng, Junrui Lu, Kaida Chen, Shitong Zhu, Ya Su, Guoshun Nan, Can Zhang, Junsong Fu, Qimei Cui, Xiaofeng Tao
  • for: 提高网络入侵检测系统(NIDS)的检测精度和可解释性,帮助旁减不良攻击对信息基础设施的威胁。
  • methods: 提出了一种基于两步特征分解和动态图diffusion算法的新方法(3D-IDS),通过自动对复杂的攻击特征进行非参数化优化,生成攻击特征表示,并使用动态图diffusion方法进行空间时间聚合,有效地识别多种攻击,包括未知威胁和已知威胁。
  • results: 实验表明,3D-IDS可以有效地识别多种攻击,包括未知威胁和已知威胁,并且比现有方法更高的检测精度和可解释性。
    Abstract Network-based intrusion detection system (NIDS) monitors network traffic for malicious activities, forming the frontline defense against increasing attacks over information infrastructures. Although promising, our quantitative analysis shows that existing methods perform inconsistently in declaring various unknown attacks (e.g., 9% and 35% F1 respectively for two distinct unknown threats for an SVM-based method) or detecting diverse known attacks (e.g., 31% F1 for the Backdoor and 93% F1 for DDoS by a GCN-based state-of-the-art method), and reveals that the underlying cause is entangled distributions of flow features. This motivates us to propose 3D-IDS, a novel method that aims to tackle the above issues through two-step feature disentanglements and a dynamic graph diffusion scheme. Specifically, we first disentangle traffic features by a non-parameterized optimization based on mutual information, automatically differentiating tens and hundreds of complex features of various attacks. Such differentiated features will be fed into a memory model to generate representations, which are further disentangled to highlight the attack-specific features. Finally, we use a novel graph diffusion method that dynamically fuses the network topology for spatial-temporal aggregation in evolving data streams. By doing so, we can effectively identify various attacks in encrypted traffics, including unknown threats and known ones that are not easily detected. Experiments show the superiority of our 3D-IDS. We also demonstrate that our two-step feature disentanglements benefit the explainability of NIDS.
    摘要 To address these issues, we propose 3D-IDS, a novel method that utilizes two-step feature disentanglement and a dynamic graph diffusion scheme. First, we disentangle traffic features using a non-parameterized optimization based on mutual information, automatically differentiating tens and hundreds of complex features of various attacks. These differentiated features are then fed into a memory model to generate representations, which are further disentangled to highlight attack-specific features. Finally, we use a novel graph diffusion method that dynamically fuses the network topology for spatial-temporal aggregation in evolving data streams. This enables effective identification of various attacks in encrypted traffics, including unknown threats and known ones that are not easily detected.Experiments show the superiority of our 3D-IDS. Additionally, we demonstrate that our two-step feature disentanglements improve the explainability of NIDS.

An Adaptive Optimization Approach to Personalized Financial Incentives in Mobile Behavioral Weight Loss Interventions

  • paper_url: http://arxiv.org/abs/2307.00444
  • repo_url: None
  • paper_authors: Qiaomei Li, Kara L. Gavin, Corrine I. Voils, Yonatan Mintz
  • for: 本研究旨在设计个性化的营养干预,使用直接金钱奖励来鼓励身高减轻,同时保持在研究预算内。
  • methods: 本研究使用机器学习方法预测参与者如何响应不同奖励计划,并在Behavioral 干预中使用这些预测来定制奖励。
  • results: 研究结果表明,个性化奖励设计可以提高营养干预的效果和经济性。
    Abstract Obesity is a critical healthcare issue affecting the United States. The least risky treatments available for obesity are behavioral interventions meant to promote diet and exercise. Often these interventions contain a mobile component that allows interventionists to collect participants level data and provide participants with incentives and goals to promote long term behavioral change. Recently, there has been interest in using direct financial incentives to promote behavior change. However, adherence is challenging in these interventions, as each participant will react differently to different incentive structure and amounts, leading researchers to consider personalized interventions. The key challenge for personalization, is that the clinicians do not know a priori how best to administer incentives to participants, and given finite intervention budgets how to disburse costly resources efficiently. In this paper, we consider this challenge of designing personalized weight loss interventions that use direct financial incentives to motivate weight loss while remaining within a budget. We create a machine learning approach that is able to predict how individuals may react to different incentive schedules within the context of a behavioral intervention. We use this predictive model in an adaptive framework that over the course of the intervention computes what incentives to disburse to participants and remain within the study budget. We provide both theoretical guarantees for our modeling and optimization approaches as well as demonstrate their performance in a simulated weight loss study. Our results highlight the cost efficiency and effectiveness of our personalized intervention design for weight loss.
    摘要 肥胖是美国医疗系统中的一个严重问题。最安全有效的肥胖治疗方法是行为改变方法,包括提倡饮食和运动。这些方法经常包括移动组件,允许 interveners 收集参与者的数据并为参与者提供激励和目标,以促进长期行为变化。在最近,有兴趣使用直接金钱激励来促进行为变化。然而,遵循性困难,因为每个参与者都会不同地对不同的激励结构和金额响应不同。这导致研究人员考虑个性化 intervención。个性化挑战是,临床医生不知道在先知道如何向参与者分配激励,以及如何有效地分配有限的投资资源。在这篇论文中,我们考虑这个个性化肥胖损重优化问题。我们开发了一种机器学习方法,可以预测参与者如何响应不同的激励计划。我们使用这个预测模型,在行为改变方法中进行adaptive框架,在训练期间计算怎样分配激励,以保持在研究预算内。我们提供了理论保证和优化方法的实践表现,并在模拟的肥胖损重研究中证明了我们的个性化 intervención的成本效果。我们的结果表明,我们的个性化 intervención设计可以有效地促进肥胖损重。

One Copy Is All You Need: Resource-Efficient Streaming of Medical Imaging Data at Scale

  • paper_url: http://arxiv.org/abs/2307.00438
  • repo_url: https://github.com/um2ii/openjphpy
  • paper_authors: Pranav Kulkarni, Adway Kanhere, Eliot Siegel, Paul H. Yi, Vishwa S. Parekh
  • for: 这篇论文是为了解决医疗影像数据集大量化问题,并且提高人工智能工具的开发速度。
  • methods: 这篇论文使用了一个开源框架called MIST,实现了进度分辨率的运算过程,允许用户在不同的分辨率下载取医疗影像。
  • results: 这篇论文的结果显示,使用MIST可以将医疗影像集中存储和流式处理的设备不足问题降低>90%,并且维持深度学习应用中的诊断质量。
    Abstract Large-scale medical imaging datasets have accelerated development of artificial intelligence tools for clinical decision support. However, the large size of these datasets is a bottleneck for users with limited storage and bandwidth. Many users may not even require such large datasets as AI models are often trained on lower resolution images. If users could directly download at their desired resolution, storage and bandwidth requirements would significantly decrease. However, it is impossible to anticipate every users' requirements and impractical to store the data at multiple resolutions. What if we could store images at a single resolution but send them at different ones? We propose MIST, an open-source framework to operationalize progressive resolution for streaming medical images at multiple resolutions from a single high-resolution copy. We demonstrate that MIST can dramatically reduce imaging infrastructure inefficiencies for hosting and streaming medical images by >90%, while maintaining diagnostic quality for deep learning applications.
    摘要 大规模医疗影像数据集的扩大已经推动了人工智能工具的临床决策支持发展。然而,这些大规模数据集的大小成为了用户储存和带宽限制的瓶颈。许多用户可能不需要这样大的数据集,因为人工智能模型通常是在更低的分辨率图像上训练的。如果用户可以直接下载他们所需的分辨率,储存和带宽需求将会减少很多。然而,预测每个用户的需求是不可能的,并且存储数据在多个分辨率下是不实用的。我们提出了MIST框架,一个开源的框架,用于实现进行式分辨率的流动医疗影像。我们示示了MIST可以减少医疗影像基础设施的不fficient的使用>90%,而且保持深度学习应用的诊断质量。

Data-Driven Design for Metamaterials and Multiscale Systems: A Review

  • paper_url: http://arxiv.org/abs/2307.05506
  • repo_url: None
  • paper_authors: Doksoo Lee, Wei Wayne Chen, Liwei Wang, Yu-Chin Chan, Wei Chen
  • for: 这篇论文旨在探讨数据驱动设计方法在Meta材料设计中的潜力。
  • methods: 该论文使用数据收集、机器学习基于单元细胞设计和数据驱动多尺度优化等方法来实现数据驱动设计。
  • results: 论文提出了一种束缚数据驱动设计的总体方法,并将现有研究分为数据驱动模块,包括数据收集、机器学习基于单元细胞设计和数据驱动多尺度优化等方法。
    Abstract Metamaterials are artificial materials designed to exhibit effective material parameters that go beyond those found in nature. Composed of unit cells with rich designability that are assembled into multiscale systems, they hold great promise for realizing next-generation devices with exceptional, often exotic, functionalities. However, the vast design space and intricate structure-property relationships pose significant challenges in their design. A compelling paradigm that could bring the full potential of metamaterials to fruition is emerging: data-driven design. In this review, we provide a holistic overview of this rapidly evolving field, emphasizing the general methodology instead of specific domains and deployment contexts. We organize existing research into data-driven modules, encompassing data acquisition, machine learning-based unit cell design, and data-driven multiscale optimization. We further categorize the approaches within each module based on shared principles, analyze and compare strengths and applicability, explore connections between different modules, and identify open research questions and opportunities.
    摘要 美特材料是人造材料,旨在实现自然界之外的效果。它们由单元细胞组合而成,单元细胞具有丰富的设计性,可以组成多尺度系统。这些材料具有极高的潜在功能,但是设计困难重大,因为设计空间庞大,结构-性能关系复杂。一种吸引人的思想是数据驱动设计,这种思想在这篇文章中得到了详细的介绍。我们将现有的研究分为三个数据驱动模块:数据收集、机器学习基于单元细胞设计和数据驱动多尺度优化。每个模块都包含不同的方法,我们根据共同原则分类和分析它们。我们还探讨了不同模块之间的连接,并评估了各模块的优劣和适用范围。最后,我们还提出了一些未解决的研究问题和机遇。

Sparsity-aware generalization theory for deep neural networks

  • paper_url: http://arxiv.org/abs/2307.00426
  • repo_url: None
  • paper_authors: Ramchandran Muthukumar, Jeremias Sulam
  • for: 本研究旨在探讨深度人工神经网络的泛化能力,并提出一种新的分析方法来解释这种泛化能力。
  • methods: 本研究使用了深度循环神经网络,并开发了一种基于隐藏层活动的度量方法来衡量模型的泛化能力。
  • results: 研究发现,隐藏层活动的度量可以用于衡量模型的泛化能力,并且可以提供非虚假的下界,即使模型具有较高的参数数量。
    Abstract Deep artificial neural networks achieve surprising generalization abilities that remain poorly understood. In this paper, we present a new approach to analyzing generalization for deep feed-forward ReLU networks that takes advantage of the degree of sparsity that is achieved in the hidden layer activations. By developing a framework that accounts for this reduced effective model size for each input sample, we are able to show fundamental trade-offs between sparsity and generalization. Importantly, our results make no strong assumptions about the degree of sparsity achieved by the model, and it improves over recent norm-based approaches. We illustrate our results numerically, demonstrating non-vacuous bounds when coupled with data-dependent priors in specific settings, even in over-parametrized models.
    摘要 深度人工神经网络实现了奇异的泛化能力,这些能力尚未得到充分理解。在这篇论文中,我们提出了一种新的分析泛化方法,利用隐藏层活动的稀畴程度来考虑。我们开发了一个考虑这个减少的有效模型大小的框架,以便为每个输入样本表示基准。我们可以显示泛化和稀畴之间存在基本的负面关系,这些结果不假设模型达到了哪个水平的稀畴程度。我们的结果超过了最近的 нор-based方法。我们通过数值计算示出了非虚无关的下界,即使在过参数化模型中。

Adaptive Algorithms for Relaxed Pareto Set Identification

  • paper_url: http://arxiv.org/abs/2307.00424
  • repo_url: None
  • paper_authors: Cyrille Kone, Emilie Kaufmann, Laura Richert
  • for: 本文研究了一种多目标多枪支牛津模型中的固定信任性识别Pareto优点集的问题。由于确定精确的Pareto集可能需要很大的样本量,因此研究了一种允许输出一些近似优点枪支的放松。此外,本文还研究了其他放松方法,允许Identify一个相关的Pareto集子集。
  • methods: 本文提出了一种单一的抽样策略,called Adaptive Pareto Exploration,可以与不同的停止规则结合使用,以满足不同的放松。本文还分析了不同组合的抽样复杂度,特别是在寻找最多$k$ Pareto优点枪支时的减少样本复杂度。
  • results: 本文在一个实际应用中展示了Adaptive Pareto Exploration的良好实践性,在考虑多个免疫力标准时选择 Covid-19 疫苗策略的问题上。
    Abstract In this paper we revisit the fixed-confidence identification of the Pareto optimal set in a multi-objective multi-armed bandit model. As the sample complexity to identify the exact Pareto set can be very large, a relaxation allowing to output some additional near-optimal arms has been studied. In this work we also tackle alternative relaxations that allow instead to identify a relevant subset of the Pareto set. Notably, we propose a single sampling strategy, called Adaptive Pareto Exploration, that can be used in conjunction with different stopping rules to take into account different relaxations of the Pareto Set Identification problem. We analyze the sample complexity of these different combinations, quantifying in particular the reduction in sample complexity that occurs when one seeks to identify at most $k$ Pareto optimal arms. We showcase the good practical performance of Adaptive Pareto Exploration on a real-world scenario, in which we adaptively explore several vaccination strategies against Covid-19 in order to find the optimal ones when multiple immunogenicity criteria are taken into account.
    摘要 在这篇论文中,我们重新审视了多目标多机枪猎猎人模型中固定信心的标准化集的预测。由于样本复杂度来确定精确的Pareto集可以非常大,因此有关输出一些附加的近似优致的机枪的放弃措施的研究。在这种工作中,我们也研究了不同的放弃措施,允许在Pareto集标准化问题中确定一个相关的子集。特别是,我们提出了一种单一的采样策略,called Adaptive Pareto Exploration,可以与不同的停止规则结合使用,以便在不同的放弃措施中考虑不同的Pareto集标准化问题。我们分析了不同组合的样本复杂度,特别是在寻找最多$k$ Pareto优致机枪时的样本复杂度减少。我们还展示了Adaptive Pareto Exploration在一个实际场景中的良好实践性,在多种immunogenicity标准下对covid-19疫苗的适应性进行了可控的探索。

JoinBoost: Grow Trees Over Normalized Data Using Only SQL

  • paper_url: http://arxiv.org/abs/2307.00422
  • repo_url: None
  • paper_authors: Zezhou Huang, Rathijit Sen, Jiaxiang Liu, Eugene Wu
  • for: 这种论文的目的是提出一种基于 SQL 的内存中 ML 系统,以避免数据移动和提供数据管理。
  • methods: 这种系统使用了纯 SQL 语句来训练树型模型,并且可以在任何 DBMS 上运行。
  • results: 实验表明,JoinBoost 比特有限的 LightGBM 快三倍(1.1倍),并且与现有的内存中 ML 系统相比,速度超过一个数量级。此外,JoinBoost 可以跨越 LightGBM 的特点,包括特性数、数据库大小和Join图复杂度。
    Abstract Although dominant for tabular data, ML libraries that train tree models over normalized databases (e.g., LightGBM, XGBoost) require the data to be denormalized as a single table, materialized, and exported. This process is not scalable, slow, and poses security risks. In-DB ML aims to train models within DBMSes to avoid data movement and provide data governance. Rather than modify a DBMS to support In-DB ML, is it possible to offer competitive tree training performance to specialized ML libraries...with only SQL? We present JoinBoost, a Python library that rewrites tree training algorithms over normalized databases into pure SQL. It is portable to any DBMS, offers performance competitive with specialized ML libraries, and scales with the underlying DBMS capabilities. JoinBoost extends prior work from both algorithmic and systems perspectives. Algorithmically, we support factorized gradient boosting, by updating the $Y$ variable to the residual in the non-materialized join result. Although this view update problem is generally ambiguous, we identify addition-to-multiplication preserving, the key property of variance semi-ring to support rmse, the most widely used criterion. System-wise, we identify residual updates as a performance bottleneck. Such overhead can be natively minimized on columnar DBMSes by creating a new column of residual values and adding it as a projection. We validate this with two implementations on DuckDB, with no or minimal modifications to its internals for portability. Our experiment shows that JoinBoost is 3x (1.1x) faster for random forests (gradient boosting) compared to LightGBM, and over an order magnitude faster than state-of-the-art In-DB ML systems. Further, JoinBoost scales well beyond LightGBM in terms of the # features, DB size (TPC-DS SF=1000), and join graph complexity (galaxy schemas).
    摘要 although dominant for tabular data, machine learning(ML)libraries that train tree models over normalized databases(e.g., LightGBM, XGBoost)require the data to be denormalized as a single table, materialized, and exported. This process is not scalable, slow, and poses security risks. In-DB ML aims to train models within DBMSes to avoid data movement and provide data governance. Rather than modify a DBMS to support In-DB ML, is it possible to offer competitive tree training performance to specialized ML libraries...with only SQL? We present JoinBoost, a Python library that rewrites tree training algorithms over normalized databases into pure SQL. It is portable to any DBMS, offers performance competitive with specialized ML libraries, and scales with the underlying DBMS capabilities. JoinBoost extends prior work from both algorithmic and systems perspectives. Algorithmically, we support factorized gradient boosting, by updating the $Y$ variable to the residual in the non-materialized join result. Although this view update problem is generally ambiguous, we identify addition-to-multiplication preserving, the key property of variance semi-ring to support rmse, the most widely used criterion. System-wise, we identify residual updates as a performance bottleneck. Such overhead can be natively minimized on columnar DBMSes by creating a new column of residual values and adding it as a projection. We validate this with two implementations on DuckDB, with no or minimal modifications to its internals for portability. Our experiment shows that JoinBoost is 3x (1.1x) faster for random forests (gradient boosting) compared to LightGBM, and over an order magnitude faster than state-of-the-art In-DB ML systems. Further, JoinBoost scales well beyond LightGBM in terms of the # features, DB size (TPC-DS SF=1000), and join graph complexity (galaxy schemas).

Provably Efficient UCB-type Algorithms For Learning Predictive State Representations

  • paper_url: http://arxiv.org/abs/2307.00405
  • repo_url: None
  • paper_authors: Ruiquan Huang, Yingbin Liang, Jing Yang
  • for: 本研究旨在提高累积奖励的策略选择问题,包括Markov决策过程(MDPs)和部分可见MDPs(POMDPs)为特殊情况。
  • methods: 该研究提出了首个已知的UCB类型方法,基于预测状态表示(PSRs),其中包括一个新的奖励项来Upper bound total variation distance between estimated和true模型。
  • results: 我们计算出了在线和离线PSRs的样本复杂性下界,并证明了我们的设计的UCB类型算法具有计算效率、最后一轮保证近似优策、和模型准确性的优点。
    Abstract The general sequential decision-making problem, which includes Markov decision processes (MDPs) and partially observable MDPs (POMDPs) as special cases, aims at maximizing a cumulative reward by making a sequence of decisions based on a history of observations and actions over time. Recent studies have shown that the sequential decision-making problem is statistically learnable if it admits a low-rank structure modeled by predictive state representations (PSRs). Despite these advancements, existing approaches typically involve oracles or steps that are not computationally efficient. On the other hand, the upper confidence bound (UCB) based approaches, which have served successfully as computationally efficient methods in bandits and MDPs, have not been investigated for more general PSRs, due to the difficulty of optimistic bonus design in these more challenging settings. This paper proposes the first known UCB-type approach for PSRs, featuring a novel bonus term that upper bounds the total variation distance between the estimated and true models. We further characterize the sample complexity bounds for our designed UCB-type algorithms for both online and offline PSRs. In contrast to existing approaches for PSRs, our UCB-type algorithms enjoy computational efficiency, last-iterate guaranteed near-optimal policy, and guaranteed model accuracy.
    摘要 通用顺序决策问题(包括Markov决策过程(MDPs)和部分可见MDPs(POMDPs)为特例)的目标是通过时间序列的决策来 maximize 累积奖励。 recent studies have shown that this problem is statistically learnable if it admits a low-rank structure modeled by predictive state representations (PSRs). However, existing approaches typically involve oracles or computationally inefficient steps. On the other hand, the upper confidence bound (UCB) based approaches, which have been successful in bandits and MDPs, have not been investigated for more general PSRs due to the difficulty of optimistic bonus design in these more challenging settings. This paper proposes the first known UCB-type approach for PSRs, featuring a novel bonus term that upper bounds the total variation distance between the estimated and true models. We further characterize the sample complexity bounds for our designed UCB-type algorithms for both online and offline PSRs. Unlike existing approaches for PSRs, our UCB-type algorithms enjoy computational efficiency, last-iterate guaranteed near-optimal policy, and guaranteed model accuracy.Here's the word-for-word translation of the text into Simplified Chinese:通用顺序决策问题(包括Markov决策过程(MDPs)和部分可见MDPs(POMDPs)为特例)的目标是通过时间序列的决策来 maximize 累积奖励。 recent studies have shown that this problem is statistically learnable if it admits a low-rank structure modeled by predictive state representations (PSRs). however, existing approaches typically involve oracles or computationally inefficient steps. On the other hand, the upper confidence bound (UCB) based approaches, which have been successful in bandits and MDPs, have not been investigated for more general PSRs due to the difficulty of optimistic bonus design in these more challenging settings. this paper proposes the first known UCB-type approach for PSRs, featuring a novel bonus term that upper bounds the total variation distance between the estimated and true models. we further characterize the sample complexity bounds for our designed UCB-type algorithms for both online and offline PSRs. unlike existing approaches for PSRs, our UCB-type algorithms enjoy computational efficiency, last-iterate guaranteed near-optimal policy, and guaranteed model accuracy.

ProbVLM: Probabilistic Adapter for Frozen Vison-Language Models

  • paper_url: http://arxiv.org/abs/2307.00398
  • repo_url: https://github.com/ExplainableML/ProbVLM
  • paper_authors: Uddeshya Upadhyay, Shyamgopal Karthik, Massimiliano Mancini, Zeynep Akata
  • for: 本研究旨在提高大规模视觉语言模型(VLM)的表现,以实现更好的协同运算和模型选择。
  • methods: 本研究提出了一种 probabilistic adapter,可以在posts-hoc方式中对已经预训练的 VLM 进行概率调整,以估计嵌入空间中的概率分布。
  • results: 在四个挑战性 dataset 上,包括 COCO、Flickr、CUB 和 Oxford-flowers,研究人员可以通过估计嵌入空间中的概率分布,评估 VLM 的嵌入不确定性,并证明 ProbVLM 在回归任务中表现出色。此外,研究人员还提出了两个现实世界下沉浸任务,即活动学习和模型选择,并证明在这些任务中,估计嵌入空间中的概率分布具有很好的帮助作用。最后,研究人员还介绍了一种基于大规模预训练的潜在扩散模型,用于可见化嵌入分布。
    Abstract Large-scale vision-language models (VLMs) like CLIP successfully find correspondences between images and text. Through the standard deterministic mapping process, an image or a text sample is mapped to a single vector in the embedding space. This is problematic: as multiple samples (images or text) can abstract the same concept in the physical world, deterministic embeddings do not reflect the inherent ambiguity in the embedding space. We propose ProbVLM, a probabilistic adapter that estimates probability distributions for the embeddings of pre-trained VLMs via inter/intra-modal alignment in a post-hoc manner without needing large-scale datasets or computing. On four challenging datasets, i.e., COCO, Flickr, CUB, and Oxford-flowers, we estimate the multi-modal embedding uncertainties for two VLMs, i.e., CLIP and BLIP, quantify the calibration of embedding uncertainties in retrieval tasks and show that ProbVLM outperforms other methods. Furthermore, we propose active learning and model selection as two real-world downstream tasks for VLMs and show that the estimated uncertainty aids both tasks. Lastly, we present a novel technique for visualizing the embedding distributions using a large-scale pre-trained latent diffusion model.
    摘要 大规模视力语言模型(VLM)如CLIP成功地找到图像和文本之间的对应关系。通过标准排定 mapping 过程,一个图像或文本样本将映射到 embedding 空间中的单个向量上。这是一个问题:多个样本(图像或文本)可以抽象 Physical 世界中的同一个概念,因此排定 embedding 不会反映 embedding 空间中的内在含义。我们提议 ProbVLM,一种 probabilistic adapter,通过对 pre-trained VLM 的嵌入进行概率分布的估计,在后续方式中无需大规模数据或计算。在四个具有挑战性的 datasets 上,我们估算 pre-trained VLM 的嵌入不确定性,衡量嵌入不确定性的准确性在检索任务中,并示出 ProbVLM 超过其他方法。此外,我们提出了基于 VLM 的活动学习和模型选择两个实际应用任务,并证明估计不确定性可以 aid 这两个任务。最后,我们介绍了一种使用大规模预训练的潜在扩散模型来可见 embedding 分布的新技术。

MobileViG: Graph-Based Sparse Attention for Mobile Vision Applications

  • paper_url: http://arxiv.org/abs/2307.00395
  • repo_url: https://github.com/sldgroup/mobilevig
  • paper_authors: Mustafa Munir, William Avery, Radu Marculescu
    for: This paper proposes a new graph-based sparse attention mechanism (SVGA) and a hybrid CNN-GNN architecture (MobileViG) for vision tasks on mobile devices.methods: The proposed SVGA mechanism is designed to reduce the computational cost of representing images as graph structures, while the MobileViG architecture combines SVGA with a CNN backbone.results: Extensive experiments show that MobileViG outperforms existing ViG models and mobile CNN and ViT architectures in terms of accuracy and/or speed on image classification, object detection, and instance segmentation tasks. The fastest model, MobileViG-Ti, achieves 75.7% top-1 accuracy with 0.78 ms inference latency on iPhone 13 Mini NPU, while the largest model, MobileViG-B, obtains 82.6% top-1 accuracy with only 2.30 ms latency.
    Abstract Traditionally, convolutional neural networks (CNN) and vision transformers (ViT) have dominated computer vision. However, recently proposed vision graph neural networks (ViG) provide a new avenue for exploration. Unfortunately, for mobile applications, ViGs are computationally expensive due to the overhead of representing images as graph structures. In this work, we propose a new graph-based sparse attention mechanism, Sparse Vision Graph Attention (SVGA), that is designed for ViGs running on mobile devices. Additionally, we propose the first hybrid CNN-GNN architecture for vision tasks on mobile devices, MobileViG, which uses SVGA. Extensive experiments show that MobileViG beats existing ViG models and existing mobile CNN and ViT architectures in terms of accuracy and/or speed on image classification, object detection, and instance segmentation tasks. Our fastest model, MobileViG-Ti, achieves 75.7% top-1 accuracy on ImageNet-1K with 0.78 ms inference latency on iPhone 13 Mini NPU (compiled with CoreML), which is faster than MobileNetV2x1.4 (1.02 ms, 74.7% top-1) and MobileNetV2x1.0 (0.81 ms, 71.8% top-1). Our largest model, MobileViG-B obtains 82.6% top-1 accuracy with only 2.30 ms latency, which is faster and more accurate than the similarly sized EfficientFormer-L3 model (2.77 ms, 82.4%). Our work proves that well designed hybrid CNN-GNN architectures can be a new avenue of exploration for designing models that are extremely fast and accurate on mobile devices. Our code is publicly available at https://github.com/SLDGroup/MobileViG.
    摘要 传统上,卷积神经网络(CNN)和视Transformer(ViT)在计算机视觉领域占据主导地位,但最近提出的视图图神经网络(ViG)提供了一个新的探索方向。然而,由于图像表示为图结构所带来的计算开销,ViG在移动设备上是计算昂贵的。在这种情况下,我们提出了一种新的图像 sparse attention机制——图像 sparse vision graph attention(SVGA),用于适应移动设备上的 ViG 运行。此外,我们还提出了首个在移动设备上使用 CNN-GNN 架构的 Hybrid CNN-GNN 模型——MobileViG,该模型使用 SVGA。我们的实验表明,MobileViG 在图像分类、物体检测和实例 segmentation 任务上比现有的 ViG 模型和现有的移动 CNN 和 ViT 架构更高的准确率和/或运行速度。我们的最快模型,MobileViG-Ti,在 ImageNet-1K 上达到了 75.7% 的顶部 1 准确率,并且在 iPhone 13 Mini NPU 上编译 CoreML 时间为 0.78 ms,比 MobileNetV2x1.4 (1.02 ms, 74.7% top-1) 和 MobileNetV2x1.0 (0.81 ms, 71.8% top-1) 更快。我们的最大模型,MobileViG-B,在 82.6% 的顶部 1 准确率下,只需 2.30 ms 的时间,这比 EfficientFormer-L3 模型 (2.77 ms, 82.4%) 更快和更准确。我们的工作证明了,通过设计合适的 Hybrid CNN-GNN 架构,可以在移动设备上设计出EXTREMELY FAST和EXTREMELY ACCURATE的模型。我们的代码可以在 上获取。

CasTGAN: Cascaded Generative Adversarial Network for Realistic Tabular Data Synthesis

  • paper_url: http://arxiv.org/abs/2307.00384
  • repo_url: https://github.com/abedshantti/castgan
  • paper_authors: Abdallah Alshantti, Damiano Varagnolo, Adil Rasheed, Aria Rahmati, Frank Westad
  • for: 本文提出了一种基于生成对抗网络(GAN)的方法,用于生成具有真实性的表格数据,特别是关注Validity问题。
  • methods: 本文提出了一种级联的表格GAN框架(CasTGAN),通过级联的扩展,使生成的数据更加真实地反映原始数据中的特征相互关系。
  • results: 实验结果表明,CasTGAN能够很好地捕捉原始数据中特征之间的相互关系和约束,尤其是高维数据集。此外,对模型进行一些扰动处理可以提高模型对特定攻击的抗性。
    Abstract Generative adversarial networks (GANs) have drawn considerable attention in recent years for their proven capability in generating synthetic data which can be utilized for multiple purposes. While GANs have demonstrated tremendous successes in producing synthetic data samples that replicate the dynamics of the original datasets, the validity of the synthetic data and the underlying privacy concerns represent major challenges which are not sufficiently addressed. In this work, we design a cascaded tabular GAN framework (CasTGAN) for generating realistic tabular data with a specific focus on the validity of the output. In this context, validity refers to the the dependency between features that can be found in the real data, but is typically misrepresented by traditional generative models. Our key idea entails that employing a cascaded architecture in which a dedicated generator samples each feature, the synthetic output becomes more representative of the real data. Our experimental results demonstrate that our model well captures the constraints and the correlations between the features of the real data, especially the high dimensional datasets. Furthermore, we evaluate the risk of white-box privacy attacks on our model and subsequently show that applying some perturbations to the auxiliary learners in CasTGAN increases the overall robustness of our model against targeted attacks.
    摘要

Residual-based attention and connection to information bottleneck theory in PINNs

  • paper_url: http://arxiv.org/abs/2307.00379
  • repo_url: https://github.com/soanagno/rba-pinns
  • paper_authors: Sokratis J. Anagnostopoulos, Juan Diego Toscano, Nikolaos Stergiopulos, George Em Karniadakis
  • for: 本研究旨在提高物理学习机制中的数据集成效率和无缝性。
  • methods: 该研究提出了一种高效、不需要梯度的重量规则,用于加速物理学习机制中的动态或静态系统的收敛。该简单 yet effective 的注意力机制是基于系统的演化准确误差,并且不需要额外的计算成本或反向学习。
  • results: 该研究表明,该重量规则可以在标准优化器上实现相对 $L^{2}$ 误差在 $10^{-5}$ 水平。此外,通过分析训练过程中的权重演化,研究人员发现了两个不同的学习阶段,与信息瓶颈理论(IB)中的匹配和扩散阶段相似。
    Abstract Driven by the need for more efficient and seamless integration of physical models and data, physics-informed neural networks (PINNs) have seen a surge of interest in recent years. However, ensuring the reliability of their convergence and accuracy remains a challenge. In this work, we propose an efficient, gradient-less weighting scheme for PINNs, that accelerates the convergence of dynamic or static systems. This simple yet effective attention mechanism is a function of the evolving cumulative residuals and aims to make the optimizer aware of problematic regions at no extra computational cost or adversarial learning. We illustrate that this general method consistently achieves a relative $L^{2}$ error of the order of $10^{-5}$ using standard optimizers on typical benchmark cases of the literature. Furthermore, by investigating the evolution of weights during training, we identify two distinct learning phases reminiscent of the fitting and diffusion phases proposed by the information bottleneck (IB) theory. Subsequent gradient analysis supports this hypothesis by aligning the transition from high to low signal-to-noise ratio (SNR) with the transition from fitting to diffusion regimes of the adopted weights. This novel correlation between PINNs and IB theory could open future possibilities for understanding the underlying mechanisms behind the training and stability of PINNs and, more broadly, of neural operators.
    摘要 驱动了更高效和无缝的物理模型和数据集成的需求,物理学 informed neural networks(PINNs)在最近几年内得到了广泛的关注。然而,保证其减少和精度的可靠性仍然是一个挑战。在这种工作中,我们提出了一种高效的无梯度权重方案,用于加速动态或静态系统的PINNs的收敛。这种简单 yet effective的注意力机制是函数所处的积累差异,并且在不Extra的计算成本或对抗学习的情况下,使得优化器对问题地带出更多的注意。我们示出,这种通用方法可以在典型的文献中的测试案例中实现相对的L2误差为10^-5水平。此外,通过分析训练过程中权重的发展,我们发现了IB理论中的两个不同学习阶段,即“适应阶段”和“扩散阶段”。这种预测和权重的采用支持了这一假设,并且对PINNs和更广泛的神经运算器的稳定性和训练机制的理解带来了新的可能性。

eess.IV - 2023-07-02

A multi-task learning framework for carotid plaque segmentation and classification from ultrasound images

  • paper_url: http://arxiv.org/abs/2307.00583
  • repo_url: None
  • paper_authors: Haitao Gan, Ran Zhou, Yanghan Ou, Furong Wang, Xinyao Cheng, Xiaoyan Wu, Aaron Fenster
  • for: 本研究的目的是提出一种多任务学习框架,用于 ultrasound 脉搏凝固板分类和 segmentation,以利用这两个任务之间的相关性。
  • methods: 该方法使用了一个区域权重模块 (RWM) 和一个样本权重模块 (SWM),以利用分类任务中的区域预知知识,并通过学习样本权重来提高分类和 segmentation 的性能。
  • results: 实验结果表明,提出的方法可以significantly提高与单任务网络相比的性能,包括分类精度为 85.82% 和 segmentation 的 Dice 相似度为 84.92%。
    Abstract Carotid plaque segmentation and classification play important roles in the treatment of atherosclerosis and assessment for risk of stroke. Although deep learning methods have been used for carotid plaque segmentation and classification, most focused on a single task and ignored the relationship between the segmentation and classification of carotid plaques. Therefore, we propose a multi-task learning framework for ultrasound carotid plaque segmentation and classification, which utilizes a region-weight module (RWM) and a sample-weight module (SWM) to exploit the correlation between these two tasks. The RWM provides a plaque regional prior knowledge to the classification task, while the SWM is designed to learn the categorical sample weight for the segmentation task. A total of 1270 2D ultrasound images of carotid plaques were collected from Zhongnan Hospital (Wuhan, China) for our experiments. The results of the experiments showed that the proposed method can significantly improve the performance compared to existing networks trained for a single task, with an accuracy of 85.82% for classification and a Dice similarity coefficient of 84.92% for segmentation. In the ablation study, the results demonstrated that both the designed RWM and SWM were beneficial in improving the network's performance. Therefore, we believe that the proposed method could be useful for carotid plaque analysis in clinical trials and practice.
    摘要 卡罗提脂板分割和分类在脉络疾病治疗和风险评估中发挥重要作用。虽然深度学习方法已经用于卡罗提脂板分割和分类,但大多数方法都专注于单一任务,忽略了这两个任务之间的关系。因此,我们提出了一种多任务学习框架 для脉络卡罗提脂板分割和分类,该框架利用区域权重模块(RWM)和样本权重模块(SWM)来利用这两个任务之间的相关性。RWM提供了脉络内分泌区域的知识,以便分类任务中的识别,而SWM是为分割任务学习样本权重。我们在 Zhongnan Hospital(武汉中南医院)收集了1270个2D脉络卡罗提脂板图像进行实验。实验结果表明,我们提出的方法可以明显提高与已有网络单任务培训的性能,具体数据为85.82%的分类精度和84.92%的分割同步率。在减少研究中,结果表明,设计的RWM和SWM都对网络性能的提高做出了贡献。因此,我们认为,我们的方法可以在临床试验和实践中用于脉络卡罗提脂板分析。

Enhancing Super-Resolution Networks through Realistic Thick-Slice CT Simulation

  • paper_url: http://arxiv.org/abs/2307.10182
  • repo_url: None
  • paper_authors: Zeyu Tang, Xiaodan Xing, Guang Yang
  • for: 这项研究旨在开发一种新的快速CT图像生成算法,以便生成与实际图像更加相似的厚片CT图像。
  • methods: 该研究使用了Peak Signal-to-Noise Ratio (PSNR)和Root Mean Square Error (RMSE)指标来评估提议的算法,并发现该算法能够提供更加与实际图像相似的图像。
  • results: 该研究表明,使用提议的算法可以获得较高的PSNR和较低的RMSE,并且生成的图像更加与实际图像相似。
    Abstract This study aims to develop and evaluate an innovative simulation algorithm for generating thick-slice CT images that closely resemble actual images in the AAPM-Mayo's 2016 Low Dose CT Grand Challenge dataset. The proposed method was evaluated using Peak Signal-to-Noise Ratio (PSNR) and Root Mean Square Error (RMSE) metrics, with the hypothesis that our simulation would produce images more congruent with their real counterparts. Our proposed method demonstrated substantial enhancements in terms of both PSNR and RMSE over other simulation methods. The highest PSNR values were obtained with the proposed method, yielding 49.7369 $\pm$ 2.5223 and 48.5801 $\pm$ 7.3271 for D45 and B30 reconstruction kernels, respectively. The proposed method also registered the lowest RMSE with values of 0.0068 $\pm$ 0.0020 and 0.0108 $\pm$ 0.0099 for D45 and B30, respectively, indicating a distribution more closely aligned with the authentic thick-slice image. Further validation of the proposed simulation algorithm was conducted using the TCIA LDCT-and-Projection-data dataset. The generated images were then leveraged to train four distinct super-resolution (SR) models, which were subsequently evaluated using the real thick-slice images from the 2016 Low Dose CT Grand Challenge dataset. When trained with data produced by our novel algorithm, all four SR models exhibited enhanced performance.
    摘要 Translated into Simplified Chinese:这项研究的目的是开发和评估一种创新的thick-slice CT图像生成算法,以便更加准确地模拟实际图像在AAPM-Mayo的2016年低剂量CT挑战数据集中。提出的方法使用PSNR和RMSE度量来评估,假设我们的生成算法可以生成更加与实际图像相似的图像。我们的提出方法在PSNR和RMSE上都达到了substantial提高,相比其他生成方法。我们的方法在D45和B30重建器中获得了最高PSNR值,具体值为49.7369 ± 2.5223和48.5801 ± 7.3271。我们的方法还在D45和B30重建器中 регистрирова了最低RMSE值,具体值为0.0068 ± 0.0020和0.0108 ± 0.0099。这表明我们的方法生成的图像更加与实际图像相似。我们进一步验证了我们的生成算法使用TCIA LDCT-and-Projection-data dataset。生成的图像然后被用来训练四种不同的super-resolution(SR)模型,并在2016年低剂量CT挑战数据集中使用实际thick-slice图像进行评估。当使用我们的新算法生成数据时,所有四种SR模型均展现出了改进的性能。

ARHNet: Adaptive Region Harmonization for Lesion-aware Augmentation to Improve Segmentation Performance

  • paper_url: http://arxiv.org/abs/2307.01220
  • repo_url: https://github.com/king-haw/arhnet
  • paper_authors: Jiayu Huo, Yang Liu, Xi Ouyang, Alejandro Granados, Sebastien Ourselin, Rachel Sparks
  • for: 提供更好的脑损害诊断和 neuromonitoring 服务
  • methods: 使用增强的数据增强技术和自适应区域协调模块
  • results: 提高 segmentation 性能,在真实和 sintetic 图像上达到最佳效果,代码公开在 GitHubHere’s the breakdown of each point:
  • for: The paper is written for providing better diagnosis and neurological monitoring services by accurately segmenting brain lesions in MRI scans.
  • methods: The paper proposes a foreground harmonization framework (ARHNet) that uses advanced data augmentation and an Adaptive Region Harmonization (ARH) module to dynamically align foreground feature maps to the background with an attention mechanism.
  • results: The paper demonstrates the effectiveness of ARHNet in improving segmentation performance using real and synthetic images, and outperforms other methods for image harmonization tasks. The code is publicly available on GitHub.
    Abstract Accurately segmenting brain lesions in MRI scans is critical for providing patients with prognoses and neurological monitoring. However, the performance of CNN-based segmentation methods is constrained by the limited training set size. Advanced data augmentation is an effective strategy to improve the model's robustness. However, they often introduce intensity disparities between foreground and background areas and boundary artifacts, which weakens the effectiveness of such strategies. In this paper, we propose a foreground harmonization framework (ARHNet) to tackle intensity disparities and make synthetic images look more realistic. In particular, we propose an Adaptive Region Harmonization (ARH) module to dynamically align foreground feature maps to the background with an attention mechanism. We demonstrate the efficacy of our method in improving the segmentation performance using real and synthetic images. Experimental results on the ATLAS 2.0 dataset show that ARHNet outperforms other methods for image harmonization tasks, and boosts the down-stream segmentation performance. Our code is publicly available at https://github.com/King-HAW/ARHNet.
    摘要 优先级段落:精准分割脑部损害的MRI扫描图像是诊断和脑科监测中非常重要的。然而,基于Convolutional Neural Network(CNN)的分割方法的性能受训练集大小的限制。高级数据增强是一种有效的策略来提高模型的鲁棒性。然而,它们通常会导致背景和前景区域之间的明暗差异和边缘artefacts,这会削弱这些策略的效果。在这篇论文中,我们提出了一种前景协调框架(ARHNet)来解决明暗差异和Synthetic图像的真实性。特别是,我们提出了一种适应区域协调(ARH)模块,通过注意力机制来动态对前景特征图与背景进行对齐。我们通过实验表明,ARHNet可以提高下游分割性能,并在ATLAS 2.0 dataset上超过其他图像协调任务的方法。我们的代码公开在GitHub上,请参考https://github.com/King-HAW/ARHNet。

Domain Transfer Through Image-to-Image Translation for Uncertainty-Aware Prostate Cancer Classification

  • paper_url: http://arxiv.org/abs/2307.00479
  • repo_url: None
  • paper_authors: Meng Zhou, Amoon Jamzad, Jason Izard, Alexandre Menard, Robert Siemens, Parvin Mousavi
  • for: 这个研究是为了提高肝癌诊断的精度和效率,使用深度学习模型来支持医生在诊断过程中。
  • methods: 这个研究使用了对照式图像转换方法,将3.0T MRI图像转换为1.5T MRI图像,以增加训练数据的量。还使用了证据深度学习方法来估计模型的不确定性,并运用数据范围技术来筛选训练数据。最后,这个研究引入了证据类型单元损失,将类型单元损失与证据不确定性结合以训练模型。
  • results: 这个研究的结果显示,使用对照式图像转换方法和证据深度学习方法可以提高肝癌诊断的精度,AUC值提高了20%以上(98.4% vs. 76.2%)。这些结果显示,提供预测不确定性可能会帮助医生更好地处理不确定的案例,并且更快地完成诊断过程。
    Abstract Prostate Cancer (PCa) is often diagnosed using High-resolution 3.0 Tesla(T) MRI, which has been widely established in clinics. However, there are still many medical centers that use 1.5T MRI units in the actual diagnostic process of PCa. In the past few years, deep learning-based models have been proven to be efficient on the PCa classification task and can be successfully used to support radiologists during the diagnostic process. However, training such models often requires a vast amount of data, and sometimes it is unobtainable in practice. Additionally, multi-source MRIs can pose challenges due to cross-domain distribution differences. In this paper, we have presented a novel approach for unpaired image-to-image translation of prostate mp-MRI for classifying clinically significant PCa, to be applied in data-constrained settings. First, we introduce domain transfer, a novel pipeline to translate unpaired 3.0T multi-parametric prostate MRIs to 1.5T, to increase the number of training data. Second, we estimate the uncertainty of our models through an evidential deep learning approach; and leverage the dataset filtering technique during the training process. Furthermore, we introduce a simple, yet efficient Evidential Focal Loss that incorporates the focal loss with evidential uncertainty to train our model. Our experiments demonstrate that the proposed method significantly improves the Area Under ROC Curve (AUC) by over 20% compared to the previous work (98.4% vs. 76.2%). We envision that providing prediction uncertainty to radiologists may help them focus more on uncertain cases and thus expedite the diagnostic process effectively. Our code is available at https://github.com/med-i-lab/DT_UE_PCa
    摘要 丙级尿道癌(PCa)经常通过高分辨率3.0T MRI进行诊断,但是医疗机构中仍有许多使用1.5T MRI单元进行诊断过程。过去几年,深度学习基于模型已经在PCa分类任务上证明效果良好,可以为医生提供支持。然而,训练这些模型通常需要巨量数据,而且在实践中可能无法获得。此外,多源MRIs可能会产生交叉领域分布差异。在这篇论文中,我们提出了一种新的方法,用于无拟合的图像到图像翻译,以便在数据紧张的情况下对丙级尿道癌进行分类。首先,我们引入域传递,一种新的管道,用于将3.0T多参量尿道MRIs翻译成1.5T,以增加训练数据的数量。其次,我们通过证明深度学习方法来估计模型的uncertainty;并在训练过程中运用数据筛选技术。此外,我们引入了一种简单 yet efficient的证明焦点损失,并将其与证明uncertainty相结合,以训练我们的模型。我们的实验表明,我们的方法可以提高ROC曲线面积(AUC)比前一个工作(98.4% vs. 76.2%)。我们认为,为医生提供预测不确定性可能会帮助他们更好地关注不确定的案例,从而更有效地快速诊断。我们的代码可以在https://github.com/med-i-lab/DT_UE_PCa中找到。

Weighted Anisotropic-Isotropic Total Variation for Poisson Denoising

  • paper_url: http://arxiv.org/abs/2307.00439
  • repo_url: https://github.com/kbui1993/official_aitv_poisson_denoising
  • paper_authors: Kevin Bui, Yifei Lou, Fredrick Park, Jack Xin
  • for: 这篇研究旨在提出一种基于weighted anisotropic-isotropic total variation(AITV)的Poisson噪声除除法,以提高图像质量和计算效率。
  • methods: 该研究使用了一种基于替换方法的多值函数,并使用了一种组合 proximal 算法和权重补做法来实现。
  • results: 数值实验表明,该算法比其他Poisson噪声除除法具有更高的图像质量和计算效率。
    Abstract Poisson noise commonly occurs in images captured by photon-limited imaging systems such as in astronomy and medicine. As the distribution of Poisson noise depends on the pixel intensity value, noise levels vary from pixels to pixels. Hence, denoising a Poisson-corrupted image while preserving important details can be challenging. In this paper, we propose a Poisson denoising model by incorporating the weighted anisotropic-isotropic total variation (AITV) as a regularization. We then develop an alternating direction method of multipliers with a combination of a proximal operator for an efficient implementation. Lastly, numerical experiments demonstrate that our algorithm outperforms other Poisson denoising methods in terms of image quality and computational efficiency.
    摘要 Poisson 噪声通常发生在由光子限制的捕捉系统中,如天文学和医学中的图像捕捉。由于噪声分布取决于像素INTENSITY值,噪声水平各像素不同,因此去噪化Poisson受损图像保持重要细节可谓挑战。在这篇论文中,我们提出了包含加重度权重iso-anisotropic total variation(AITV)的Poisson去噪模型。然后,我们开发了一种alternating direction method of multipliers,并使用 proximal operator 实现高效的实现。最后,数值实验表明,我们的算法在图像质量和计算效率方面都超过了其他Poisson去噪方法。

Sulcal Pattern Matching with the Wasserstein Distance

  • paper_url: http://arxiv.org/abs/2307.00385
  • repo_url: https://github.com/laplcebeltrami/sulcaltree
  • paper_authors: Zijian Chen, Soumya Das, Moo K. Chung
  • for: 该论文旨在提供一种统一的计算框架,用于模型人脑磁共振图像中的皱槽模式。
  • methods: 论文使用沃asserstein距离来非线性匹配皱槽模式,并开发了梯度下降算法来估计塑形场。
  • results: 该方法可以准确地识别男性和女性皱槽模式之间的差异。I hope that helps! Let me know if you have any other questions.
    Abstract We present the unified computational framework for modeling the sulcal patterns of human brain obtained from the magnetic resonance images. The Wasserstein distance is used to align the sulcal patterns nonlinearly. These patterns are topologically different across subjects making the pattern matching a challenge. We work out the mathematical details and develop the gradient descent algorithms for estimating the deformation field. We further quantify the image registration performance. This method is applied in identifying the differences between male and female sulcal patterns.
    摘要 我们提出了一个统一的计算框架,用于模拟人类大脑磁共振成像中的脑隙Pattern。我们使用沃asserstein距离来非线性匹配这些Pattern。由于这些Pattern在不同个体中具有不同的拓扑结构,因此匹配这些Pattern是一个挑战。我们在详细的数学上下文中详细介绍了这些方法,并开发了梯度下降算法来估计扭变场。我们进一步评估了图像匹配性。这种方法可以用于对男女脑隙Pattern之间的差异进行识别。Here's the text with traditional Chinese characters:我们提出了一个统一的计算框架,用于模拟人类大脑磁共振成像中的脑隙Pattern。我们使用沃asserstein距离来非线性匹配这些Pattern。由于这些Pattern在不同个体中具有不同的拓扑结构,因此匹配这些Pattern是一个挑战。我们在详细的数学上下文中详细介绍了这些方法,并开发了梯度下降算法来估计扭变场。我们进一步评估了图像匹配性。这种方法可以用于对男女脑隙Pattern之间的差异进行识别。

cs.SD - 2023-07-01

The Human Auditory System and Audio

  • paper_url: http://arxiv.org/abs/2307.00084
  • repo_url: https://github.com/chilldude/stereo-cipher
  • paper_authors: Milind N. Kunchur
  • for: 这篇论文探讨了人类听觉系统,描述了一些特殊化机制和非线性路径,从物理声音的感知过程中。
  • methods: 该论文使用了一些新的技术和方法,包括声音响应的测量和分析,以及计算模型的构建。
  • results: 研究发现,人类听觉系统具有惊人的高精度和多样性,可以在微秒级别听觉和分辨声音细节,并且可以检测到声音的非常小的变化。
    Abstract This work reviews the human auditory system, elucidating some of the specialized mechanisms and non-linear pathways along the chain of events between physical sound and its perception. Customary relationships between frequency, time, and phase--such as the uncertainty principle--that hold for linear systems, do not apply straightforwardly to the hearing process. Auditory temporal resolution for certain processes can be a hundredth of the period of the signal, and can extend down to the microseconds time scale. The astonishingly large number of variations that correspond to the neural excitation pattern of 30000 auditory nerve fibers, originating from 3500 inner hair cells, explicates the vast capacity of the auditory system for the resolution of sonic detail. And the ear is sensitive enough to detect a basilar-membrane amplitude at the level of a picometer, or about a hundred times smaller than an atom. This article surveys and provides new insights into some of the impressive capabilities of the human auditory system and explores their relationship to fidelity in reproduced sound.
    摘要 Translated into Simplified Chinese:这篇文章介绍人类听觉系统,描述了听觉过程中的一些特殊机制和非线性路径,从物理声音转化为感知。传统的关系 между频率、时间和相位,例如不确定原理,不直接适用于听觉过程。听觉时间分辨率可以达百万分之一秒级,并可以降到微秒级别。听觉系统的神经刺激模式有30000个 auditory nerve fibers,来自3500个内声毫细胞,这使得听觉系统具有很大的容量,用于分辨声音细节。而耳朵也够敏感,可以探测到基ляр膜振荡的振荡幅度,只有一个picometer级别,约相当于一个原子的100倍。这篇文章提供了新的意义和听觉系统的关系,并探讨其与重新生成的声音的准确性之间的关系。

Towards Improving the Performance of Pre-Trained Speech Models for Low-Resource Languages Through Lateral Inhibition

  • paper_url: http://arxiv.org/abs/2306.17792
  • repo_url: None
  • paper_authors: Andrei-Marius Avram, Răzvan-Alexandru Smădu, Vasile Păiş, Dumitru-Clementin Cercel, Radu Ion, Dan Tufiş
  • for: 提高预先训练的语音模型性能
  • methods: 取代细化 dense layer avec lateral inhibition layer
  • results: 在 Romanian 语言下提高了12.5% 字异错率 (WER),并在 Romanian Speech Corpus 和 Robin Technical Acquisition Corpus 上达到了状态机器人的result(1.78% WER 和 29.64% WER)。
    Abstract With the rise of bidirectional encoder representations from Transformer models in natural language processing, the speech community has adopted some of their development methodologies. Therefore, the Wav2Vec models were introduced to reduce the data required to obtain state-of-the-art results. This work leverages this knowledge and improves the performance of the pre-trained speech models by simply replacing the fine-tuning dense layer with a lateral inhibition layer inspired by the biological process. Our experiments on Romanian, a low-resource language, show an average improvement of 12.5% word error rate (WER) using the lateral inhibition layer. In addition, we obtain state-of-the-art results on both the Romanian Speech Corpus and the Robin Technical Acquisition Corpus with 1.78% WER and 29.64% WER, respectively.
    摘要 随着Transformer模型的bidirectional编码器表示法在自然语言处理领域的普及,语音社区开始采纳其开发方法。因此,Wav2Vec模型被引入,以降低需要获得状态对应的数据量。本工作借用这些知识,改进了预训练的语音模型,通过取代精度降低层为 lateral inhibition层,这种层启发自生物过程。我们的实验表明,在罗马尼亚语,一种低资源语言,使用 lateral inhibition 层可以提高语音识别精度,平均提高12.5%词错率(WER)。此外,我们在罗马尼亚语语音库和Robin技术获得 corpus 上达到了状态对应的最佳结果,分别为1.78% WER和29.64% WER。

eess.AS - 2023-07-01

Enhancing the EEG Speech Match Mismatch Tasks With Word Boundaries

  • paper_url: http://arxiv.org/abs/2307.00366
  • repo_url: https://github.com/iiscleap/eegspeech-matchmismatch
  • paper_authors: Akshara Soman, Vidhi Sinha, Sriram Ganapathy
  • for: This paper is written for analyzing the underlying neural mechanisms of human speech comprehension, specifically using a match-mismatch classification of speech stimuli and neural responses.
  • methods: The paper uses a network of convolution layers to process both speech and EEG signals, followed by a word boundary-based average pooling and a recurrent layer to incorporate inter-word context.
  • results: The experiments show that the modeling accuracy can be significantly improved to 93% on a publicly available speech-EEG data set, which is higher than previous efforts that achieved an accuracy of 65-75% for this task.
    Abstract Recent studies have shown that the underlying neural mechanisms of human speech comprehension can be analyzed using a match-mismatch classification of the speech stimulus and the neural response. However, such studies have been conducted for fixed-duration segments without accounting for the discrete processing of speech in the brain. In this work, we establish that word boundary information plays a significant role in sentence processing by relating EEG to its speech input. We process the speech and the EEG signals using a network of convolution layers. Then, a word boundary-based average pooling is performed on the representations, and the inter-word context is incorporated using a recurrent layer. The experiments show that the modeling accuracy can be significantly improved (match-mismatch classification accuracy) to 93% on a publicly available speech-EEG data set, while previous efforts achieved an accuracy of 65-75% for this task.
    摘要 近期研究表明,人类语言理解的下面神经机制可以通过匹配-不匹配分类的语音刺激和神经回快的方式进行分析。然而,这些研究通常是在固定时间段内进行的,没有考虑大脑对语音的精度处理。在这项工作中,我们证明了单词边界信息在句子处理中发挥了重要作用,并使用神经网络进行语音和EEG信号处理。然后,我们对表示进行了单词边界基于的均值抽取,并通过回快层 incorporate 了间隔词上下文。实验结果显示,我们的模型准确率可以提高至 93% 在一个公共可用的语音-EEG数据集上,而之前的努力只能达到 65-75% 的水平。

cs.CV - 2023-07-01

Learning Content-enhanced Mask Transformer for Domain Generalized Urban-Scene Segmentation

  • paper_url: http://arxiv.org/abs/2307.00371
  • repo_url: None
  • paper_authors: Qi Bi, Shaodi You, Theo Gevers
  • for: 这个研究目的是为了发展一个可以应对不同城市景观风格的 semantic segmentation 方法(Domain-generalized urban-scene semantic segmentation,USSS)。
  • methods: 这篇研究提出了一个基于 Transformer 的 Content-enhanced Mask TransFormer(CMFormer)方法,它强调了面罩注意力机制的增强,以提高模型的内容识别能力。
  • results: 实验结果显示,CMFormer 在不同城市景观风格下的 semantic segmentation task 中表现出色,与已有的 CNN 方法相比,CMFormer 可以达到14.00% 的 mIoU 提升(mean intersection over union)。
    Abstract Domain-generalized urban-scene semantic segmentation (USSS) aims to learn generalized semantic predictions across diverse urban-scene styles. Unlike domain gap challenges, USSS is unique in that the semantic categories are often similar in different urban scenes, while the styles can vary significantly due to changes in urban landscapes, weather conditions, lighting, and other factors. Existing approaches typically rely on convolutional neural networks (CNNs) to learn the content of urban scenes. In this paper, we propose a Content-enhanced Mask TransFormer (CMFormer) for domain-generalized USSS. The main idea is to enhance the focus of the fundamental component, the mask attention mechanism, in Transformer segmentation models on content information. To achieve this, we introduce a novel content-enhanced mask attention mechanism. It learns mask queries from both the image feature and its down-sampled counterpart, as lower-resolution image features usually contain more robust content information and are less sensitive to style variations. These features are fused into a Transformer decoder and integrated into a multi-resolution content-enhanced mask attention learning scheme. Extensive experiments conducted on various domain-generalized urban-scene segmentation datasets demonstrate that the proposed CMFormer significantly outperforms existing CNN-based methods for domain-generalized semantic segmentation, achieving improvements of up to 14.00\% in terms of mIoU (mean intersection over union). The source code for CMFormer will be made available at this \href{https://github.com/BiQiWHU/domain-generalized-urban-scene-segmentation}{repository}.
    摘要 领域总体化的城市场景semantic segmentation(USSS)目标是学习多样化城市场景风格下的通用semantic预测。与领域差异挑战不同,USSS的semantic类别通常在不同的城市场景中相似,而style可以因城市风貌、天气、照明和其他因素而发生显著变化。现有方法通常采用卷积神经网络(CNN)来学习城市场景的内容。在这篇论文中,我们提出了一种基于Transformer segmentation模型的Content-enhanced Mask TransFormer(CMFormer)。主要思想是在Transformer segmentation模型中增强基本组件的面积注意力,以便更好地利用内容信息。为此,我们提出了一种新的内容增强面积注意力机制。它从图像特征和其下采样后的图像特征中学习面 queries,以便更好地利用图像的内容信息和风格特征。这些特征被混合到Transformer解码器中,并在多resolution content-enhanced面积注意力学习方案中集成。我们对多个领域总体化城市场景semantic segmentation数据集进行了广泛的实验,结果表明,提出的CMFormer显著超过了现有的CNN基于方法,在领域总体化semantic segmentation中实现了14.00%的提升, measured by mean intersection over union(mIoU)。我们将CMFormer的源代码公开在这个\href{https://github.com/BiQiWHU/domain-generalized-urban-scene-segmentation}{存储库}中。

Spatial-Temporal Enhanced Transformer Towards Multi-Frame 3D Object Detection

  • paper_url: http://arxiv.org/abs/2307.00347
  • repo_url: None
  • paper_authors: Yifan Zhang, Zhiyu Zhu, Junhui Hou
  • for: 这篇论文旨在探讨多帧3D物体检测系统中的DETR模型,并提出一个基于DETR的端到端框架STEMD,用于解决多帧3D物体检测中的问题。
  • methods: STEMD使用DETR-like的方法,将多帧3D物体检测视为一个序列到序列的任务,并具有优化的空间-时间对话网络,以实现更好地捕捉物体之间的空间-时间依存性。
  • results: 经过实验证明,STEMD可以在复杂的测试场景下实现更好的多帧3D物体检测效果,并且仅增加了少量的计算负载。
    Abstract The Detection Transformer (DETR) has revolutionized the design of CNN-based object detection systems, showcasing impressive performance. However, its potential in the domain of multi-frame 3D object detection remains largely unexplored. In this paper, we present STEMD, a novel end-to-end framework for multi-frame 3D object detection based on the DETR-like paradigm. Our approach treats multi-frame 3D object detection as a sequence-to-sequence task and effectively captures spatial-temporal dependencies at both the feature and query levels. To model the inter-object spatial interaction and complex temporal dependencies, we introduce the spatial-temporal graph attention network. This network represents queries as nodes in a graph and enables effective modeling of object interactions within a social context. In addition, to solve the problem of missing hard cases in the proposed output of the encoder in the current frame, we incorporate the output of the previous frame to initialize the query input of the decoder. Moreover, we tackle the issue of redundant detection results, where the model generates numerous overlapping boxes from similar queries. To mitigate this, we introduce an IoU regularization term in the loss function. This term aids in distinguishing between queries matched with the ground-truth box and queries that are similar but unmatched during the refinement process, leading to reduced redundancy and more accurate detections. Through extensive experiments, we demonstrate the effectiveness of our approach in handling challenging scenarios, while incurring only a minor additional computational overhead. The code will be available at \url{https://github.com/Eaphan/STEMD}.
    摘要 <>Translate the given text into Simplified Chinese.<>德州检测变换(DETR)已经革命化了基于Convolutional Neural Networks(CNN)的物体检测系统的设计,展示了出色的性能。然而,它在多帧三维物体检测领域的潜力仍然未能得到充分发挥。在这篇论文中,我们提出了STEMD,一种基于DETR的novel结束到终点框架 для多帧三维物体检测。我们的方法将多帧三维物体检测视为一个序列到序列任务,并有效地捕捉了空间-时间依赖关系在特征和查询层次。为了模型对象之间的空间依赖关系和复杂的时间依赖关系,我们引入了空间-时间图注意力网络。这个网络将查询视为图形节点,并允许有效地模型对象之间的社交往来。此外,为解决提议编码器输出的问题,我们在当前帧的输出作为查询输入初始化decoder。此外,我们解决了模型生成重复的检测结果问题,其中模型生成了多个重叠的检测框。为此,我们引入了交合率 regularization项到损失函数中,以助于在反调过程中分辨与实际匹配的查询和相似 yet 未匹配的查询。经过广泛的实验,我们证明了我们的方法在面临挑战的场景下表现出色,同时只增加了少量的计算负担。代码将在 \url{https://github.com/Eaphan/STEMD} 上提供。

SDRCNN: A single-scale dense residual connected convolutional neural network for pansharpening

  • paper_url: http://arxiv.org/abs/2307.00327
  • repo_url: None
  • paper_authors: Yuan Fang, Yuanzhi Cai, Lei Fan
  • for: 本研究开发了一个单枝、单测度的轻量级卷积神经网络(SDRCNN),用于折衣高分辨率多spectral影像和低分辨率摄像头影像的混合。
  • methods: SDRCNN使用了一个新的紧密连接的构造和卷积层,以取得更好的精确性和效率的协调。
  • results: 根据四个来自世界视三、世界视二和快鹰镜头的测试数据,SDRCNN在 Visual inspection 和相关统计量评估中表现最佳,与传统方法和轻量级深度学习方法相比。
    Abstract Pansharpening is a process of fusing a high spatial resolution panchromatic image and a low spatial resolution multispectral image to create a high-resolution multispectral image. A novel single-branch, single-scale lightweight convolutional neural network, named SDRCNN, is developed in this study. By using a novel dense residual connected structure and convolution block, SDRCNN achieved a better trade-off between accuracy and efficiency. The performance of SDRCNN was tested using four datasets from the WorldView-3, WorldView-2 and QuickBird satellites. The compared methods include eight traditional methods (i.e., GS, GSA, PRACS, BDSD, SFIM, GLP-CBD, CDIF and LRTCFPan) and five lightweight deep learning methods (i.e., PNN, PanNet, BayesianNet, DMDNet and FusionNet). Based on a visual inspection of the pansharpened images created and the associated absolute residual maps, SDRCNN exhibited least spatial detail blurring and spectral distortion, amongst all the methods considered. The values of the quantitative evaluation metrics were closest to their ideal values when SDRCNN was used. The processing time of SDRCNN was also the shortest among all methods tested. Finally, the effectiveness of each component in the SDRCNN was demonstrated in ablation experiments. All of these confirmed the superiority of SDRCNN.
    摘要 文本翻译:杜邦普兰推算(Pansharpening)是将高空间分辨率粉尘图和低空间分辨率多spectral图像联合成高分辨率多spectral图像的过程。本研究中提出了一种单支持单尺度轻量级卷积神经网络,即SDRCNN。通过使用单支持密集连接结构和卷积块,SDRCNN实现了更好的精度和效率之间的平衡。SDRCNN的性能在四个世界视图-3、世界视图-2和快鸟卫星的四个数据集上进行测试,与传统方法(GS、GSA、PRACS、BDSD、SFIM、GLP-CBD、CDIF和LRTCFPan)和轻量级深度学习方法(PNN、PanNet、概率网络、DMDNet和FusionNet)进行比较。视觉检查照片和相关绝对差异图中的详细信息,SDRCNN表现最好,其他方法中的详细信息均受到了锐化和spectral扭曲的影响。量化评价指标的值最接近理想值时,SDRCNN被使用。SDRCNN的处理时间也是所有方法中最短。最后,SDRCNN的每个组件的效果在减少实验中得到了证明。这些证明了SDRCNN的优越性。

DeepMediX: A Deep Learning-Driven Resource-Efficient Medical Diagnosis Across the Spectrum

  • paper_url: http://arxiv.org/abs/2307.00324
  • repo_url: None
  • paper_authors: Kishore Babu Nampalle, Pradeep Singh, Uppala Vivek Narayan, Balasubramanian Raman
  • for: 这个研究旨在提出一个高精度 yet computationally efficient 的医疗影像诊断模型,以应对医疗影像诊断中的挑战。
  • methods: 这个模型基于 MobileNetV2 架构,并包括 Federated Learning 的概念,实现了跨院所的学习合作,不需要直接存取敏感患者数据,同时保持数据隐私和完整性。
  • results: 这个研究透过严谨的测试,证明 DeepMediX 具有出色的诊断能力,与现有模型在大多数任务上匹配或超越其表现,并且适合在手持式设备上部署,实现实时诊断支持。
    Abstract In the rapidly evolving landscape of medical imaging diagnostics, achieving high accuracy while preserving computational efficiency remains a formidable challenge. This work presents \texttt{DeepMediX}, a groundbreaking, resource-efficient model that significantly addresses this challenge. Built on top of the MobileNetV2 architecture, DeepMediX excels in classifying brain MRI scans and skin cancer images, with superior performance demonstrated on both binary and multiclass skin cancer datasets. It provides a solution to labor-intensive manual processes, the need for large datasets, and complexities related to image properties. DeepMediX's design also includes the concept of Federated Learning, enabling a collaborative learning approach without compromising data privacy. This approach allows diverse healthcare institutions to benefit from shared learning experiences without the necessity of direct data access, enhancing the model's predictive power while preserving the privacy and integrity of sensitive patient data. Its low computational footprint makes DeepMediX suitable for deployment on handheld devices, offering potential for real-time diagnostic support. Through rigorous testing on standard datasets, including the ISIC2018 for dermatological research, DeepMediX demonstrates exceptional diagnostic capabilities, matching the performance of existing models on almost all tasks and even outperforming them in some cases. The findings of this study underline significant implications for the development and deployment of AI-based tools in medical imaging and their integration into point-of-care settings. The source code and models generated would be released at https://github.com/kishorebabun/DeepMediX.
    摘要 在医疗影像诊断领域中,快速发展的景象中,实现高精度的同时保持计算效率是一项具有挑战性的任务。本研究提出了《DeepMediX》,一种创新的、资源高效的模型,可以有效地解决这个问题。基于MobileNetV2架构,DeepMediX在脑MRI扫描和皮肤癌图像分类方面表现出色,在双类和多类皮肤癌数据集上都达到了优秀的性能。它解决了劳动 INTENSIVE 的手动过程、大量数据的需求以及图像属性的复杂性等问题。DeepMediX的设计还包括联邦学习概念,允许不同的医疗机构共同学习而不需要直接访问敏感 patient 数据,从而提高模型的预测力而保护患者数据的隐私和完整性。它的低计算脚本使得 DeepMediX 适用于手持设备上部署,为实时诊断支持提供了潜在的可能性。经过对标准数据集的严格测试,包括 ISIC2018 皮肤科研数据集,DeepMediX 在大多数任务上表现出了极佳的诊断能力,与现有模型在大多数任务上几乎相当,甚至在一些情况下超越它们。这些发现对医疗影像中的 AI 基于工具的开发和部署以及其集成到点您护Setting 中具有重要的含义。研究所生成的代码和模型将在 上发布。

Automatic Solver Generator for Systems of Laurent Polynomial Equations

  • paper_url: http://arxiv.org/abs/2307.00320
  • repo_url: None
  • paper_authors: Evgeniy Martyushev, Snehal Bhayani, Tomas Pajdla
  • for: 解决给定的 Laurent 多项式系统中的一家问题,即在不同的系统中寻找可以快速计算解的方法。
  • methods: 提出了一种新的实用算法,用于检查给定的 Laurent 多项式是否可以构建排除模板。基于这个算法,我们提出了一个自动生成器,可以快速地生成解系统的 Laurent 多项式方程的解。
  • results: 我们的生成器可以快速地生成解系统的 Laurent 多项式方程的解,并且可以自动探测部分 $p$-fold 对称性。我们对各种简单的问题进行了测试,并证明了我们的生成器比现有的方法快速。
    Abstract In computer vision applications, the following problem often arises: Given a family of (Laurent) polynomial systems with the same monomial structure but varying coefficients, find a solver that computes solutions for any family member as fast as possible. Under appropriate genericity assumptions, the dimension and degree of the respective polynomial ideal remain unchanged for each particular system in the same family. The state-of-the-art approach to solving such problems is based on elimination templates, which are the coefficient (Macaulay) matrices that encode the transformation from the initial polynomials to the polynomials needed to construct the action matrix. Knowing an action matrix, the solutions of the system are computed from its eigenvectors. The important property of an elimination template is that it applies to all polynomial systems in the family. In this paper, we propose a new practical algorithm that checks whether a given set of Laurent polynomials is sufficient to construct an elimination template. Based on this algorithm, we propose an automatic solver generator for systems of Laurent polynomial equations. The new generator is simple and fast; it applies to ideals with positive-dimensional components; it allows one to uncover partial $p$-fold symmetries automatically. We test our generator on various minimal problems, mostly in geometric computer vision. The speed of the generated solvers exceeds the state-of-the-art in most cases. In particular, we propose the solvers for the following problems: optimal 3-view triangulation, semi-generalized hybrid pose estimation and minimal time-of-arrival self-calibration. The experiments on synthetic scenes show that our solvers are numerically accurate and either comparable to or significantly faster than the state-of-the-art solvers.
    摘要 在计算机视觉应用中,常遇到以下问题:给定一个 Laurent 多项式系统家族,找到一个可以尽快计算系统解的求解器。在适当的泛化假设下,每个特定系统的多项式理想的维度和度数保持不变。现状的解决方法是基于减法模板,它们是变量多项式系统中的约化矩阵,它们编码了将初始多项式转换成构造动作矩阵的过程中的多项式。知道动作矩阵,系统的解可以从其各自的特征向量中计算出来。减法模板的重要特点是它适用于所有多项式系统家族中的系统。在这篇论文中,我们提出了一个新的实用算法,该算法可以判断给定的 Laurent 多项式集是否具有构建减法模板的能力。基于这个算法,我们提出了一个自动生成器 для Laurent 多项式方程系统的解。新的生成器简单快速,适用于具有正的维度组分的理想;它允许自动找到部分 $p$-次对称性。我们在各种最小问题上进行了测试,包括优化三视图三角形、半总化混合位姿估计和最小时间到达自我校准。实验结果表明,我们的生成器可以在大多数情况下提供更快的解决方案,并且数值精度和状态艺术家的解决方案相当或更高。

Detection of River Sandbank for Sand Mining with the Presence of Other High Mineral Content Regions Using Multi-spectral Images

  • paper_url: http://arxiv.org/abs/2307.00314
  • repo_url: None
  • paper_authors: Jit Mukherjee
  • for: 检测河川砂岸区域,直接影响经济、社会和环境。
  • methods: 使用多Modal分析,包括多spectral成像、Synthetic Aperture Radar(SAR)成像、航空图像和点云数据,但尚未充分利用河川砂岸区域的特征。
  • results: 提出了一种基于多spectral成像的新方法,可以在不使用标注数据的情况下,准确地检测河川砂岸区域。该方法基于河川和矿物质的强烈相关性,可以在不同季节下提供90.75%的准确率、85.47%的精度和73.5%的回归率。
    Abstract Sand mining is a booming industry. The river sandbank is one of the primary sources of sand mining. Detection of potential river sandbank regions for sand mining directly impacts the economy, society, and environment. In the past, semi-supervised and supervised techniques have been used to detect mining regions including sand mining. A few techniques employ multi-modal analysis combining different modalities such as multi-spectral imaging, synthetic aperture radar (\emph{SAR}) imaging, aerial images, and point cloud data. However, the distinguishing spectral characteristics of river sandbank regions are yet to be fully explored. This paper provides a novel method to detect river sandbank regions for sand mining using multi-spectral images without any labeled data over the seasons. Association with a river stream and the abundance of minerals are the most prominent features of such a region. The proposed work uses these distinguishing features to determine the spectral signature of a river sandbank region, which is robust to other high mineral abundance regions. It follows a two-step approach, where first, potential high mineral regions are detected and next, they are segregated using the presence of a river stream. The proposed technique provides average accuracy, precision, and recall of 90.75%, 85.47%, and 73.5%, respectively over the seasons from Landsat 8 images without using any labeled dataset.
    摘要 river sandbank 是一个潜在的重要来源 для冲积泥矿产。探测 potential river sandbank 区域的泥矿可以直接影响经济、社会和环境。在过去,半supervised和supervised技术已经被用来探测包括冲积泥矿在内的采矿区域。一些技术使用多模态分析,结合不同的模式,如多spectral imaging、Synthetic Aperture Radar(SAR) imaging、飞行图像和点云数据。然而,river sandbank 区域的特征还未被完全探索。本文提出了一种新的方法,用于探测 river sandbank 区域,不需要任何标注数据。该方法基于river stream的相关性和矿物质的充足程度,可以准确地分类不同的区域。本文采用了两步方法,首先检测高矿物质区域,然后使用river stream的存在来分类。提出的方法在不使用任何标注数据的情况下,从LandSat 8 图像上获得了90.75%、85.47%和73.5%的平均准确率、精度和回归率。

PM-DETR: Domain Adaptive Prompt Memory for Object Detection with Transformers

  • paper_url: http://arxiv.org/abs/2307.00313
  • repo_url: None
  • paper_authors: Peidong Jia, Jiaming Liu, Senqiao Yang, Jiarui Wu, Xiaodong Xie, Shanghang Zhang
  • for: 本研究旨在适应检测器(DETR)在不同数据分布下的性能下降问题。
  • methods: 我们提出了一种层次Prompt Domain Memory(PDM),用于适应检测器的适应。PDM通过各个提示和其相应的分布值的对应关系,抽象地提取了域特有的知识,并将其作为多级嵌入和DETR输入的一部分进行注入。此外,我们还引入了Prompt Memory Alignment(PMA),用于减少源和目标域之间的差异,并充分利用提取到的域特有的知识。
  • results: 我们的方法在三个测试准则上(场景、 sintetic to real 和天气适应)比靶状态的领域适应检测方法表现出色,得到了更好的性能。
    Abstract The Transformer-based detectors (i.e., DETR) have demonstrated impressive performance on end-to-end object detection. However, transferring DETR to different data distributions may lead to a significant performance degradation. Existing adaptation techniques focus on model-based approaches, which aim to leverage feature alignment to narrow the distribution shift between different domains. In this study, we propose a hierarchical Prompt Domain Memory (PDM) for adapting detection transformers to different distributions. PDM comprehensively leverages the prompt memory to extract domain-specific knowledge and explicitly constructs a long-term memory space for the data distribution, which represents better domain diversity compared to existing methods. Specifically, each prompt and its corresponding distribution value are paired in the memory space, and we inject top M distribution-similar prompts into the input and multi-level embeddings of DETR. Additionally, we introduce the Prompt Memory Alignment (PMA) to reduce the discrepancy between the source and target domains by fully leveraging the domain-specific knowledge extracted from the prompt domain memory. Extensive experiments demonstrate that our method outperforms state-of-the-art domain adaptive object detection methods on three benchmarks, including scene, synthetic to real, and weather adaptation. Codes will be released.
    摘要 《 transformer 基于检测器(即 DETR)在端到端对象检测方面表现出了很好的表现。然而,在不同数据分布下传输 DETR 可能会导致性能下降。现有的适应技术主要集中在模型基于的方法上, aiming to leverage feature alignment to narrow the distribution shift between different domains。在这种研究中,我们提出了层次结构 Prompt Domain Memory(PDM),用于适应检测转换器到不同分布。PDM 通过全面利用提示记忆来抽取域pecific的知识,并将每个提示和其相应的分布值配对在记忆空间中,以及在 DETR 的输入和多级嵌入中注入 top M Distribution-similar 提示。此外,我们还引入了 Prompt Memory Alignment(PMA),以减少源和目标域之间的差异,全面利用域specific 知识从提示域记忆中提取。广泛的实验表明,我们的方法在三个标准检测benchmark上(包括场景、Synthetic to Real 和天气适应)都超过了现有的领先方法。代码将会被发布。》Note: The translation is in Simplified Chinese, which is the standard written form of Chinese used in mainland China. If you prefer Traditional Chinese, please let me know and I can provide the translation in that form as well.

Adversarial Attacks and Defenses on 3D Point Cloud Classification: A Survey

  • paper_url: http://arxiv.org/abs/2307.00309
  • repo_url: None
  • paper_authors: Hanieh Naderi, Ivan V. Bajić
  • for: 本文主要探讨了点云分类领域中的 adversarial attack 和 defense 技术的现状,以便鼓励未来的研究。
  • methods: 本文首先介绍了反对攻击的原理和特点,然后总结了过去几年中的反对例生成方法。此外,它还分类了防御策略为输入变换、数据优化和深度模型修改。
  • results: 本文综合梳理了防御策略的效果,并提出了未来研究的一些挑战和方向。I hope that helps! Let me know if you have any other questions.
    Abstract Deep learning has successfully solved a wide range of tasks in 2D vision as a dominant AI technique. Recently, deep learning on 3D point clouds is becoming increasingly popular for addressing various tasks in this field. Despite remarkable achievements, deep learning algorithms are vulnerable to adversarial attacks. These attacks are imperceptible to the human eye but can easily fool deep neural networks in the testing and deployment stage. To encourage future research, this survey summarizes the current progress on adversarial attack and defense techniques on point cloud classification. This paper first introduces the principles and characteristics of adversarial attacks and summarizes and analyzes the adversarial example generation methods in recent years. Besides, it classifies defense strategies as input transformation, data optimization, and deep model modification. Finally, it presents several challenging issues and future research directions in this domain.
    摘要 深度学习在2D视觉任务中已经成为当今AI技术的主导者,最近它在3D点云任务中也越来越受欢迎。尽管它们已经取得了很多成就,但深度学习算法却容易受到抗击攻击。这些攻击可以让人类不可见,但可以轻松地让深度神经网络在测试和部署阶段出现错误。为鼓励未来的研究,本文将summarize了当前在点云分类领域中的抗击攻击和防御技术。本文首先介绍了抗击攻击的原则和特点,然后总结和分析了过去几年中的抗击示例生成方法。此外,它还分类了防御策略为输入转换、数据优化和深度模型修改。最后,它提出了一些挑战性的问题和未来研究方向。

DreamIdentity: Improved Editability for Efficient Face-identity Preserved Image Generation

  • paper_url: http://arxiv.org/abs/2307.00300
  • repo_url: None
  • paper_authors: Zhuowei Chen, Shancheng Fang, Wei Liu, Qian He, Mengqi Huang, Yongdong Zhang, Zhendong Mao
  • for: 本研究旨在提高文本到图像模型的可编辑性,同时保持人脸identität的稳定性。
  • methods: 我们提出了一种无需优化的方法,通过学习多比例人脸特征并应用多个映射项目来直接生成文本空间中的pseudo字。此外,我们还提出了一种自适应可编辑学习方法,通过使用名人名称来构建对应的生成和修改的人脸图像对。
  • results: 我们的方法可以在不同的场景下生成快速速度下生成identity-保持的图像,并且可以增强模型的可编辑性。
    Abstract While large-scale pre-trained text-to-image models can synthesize diverse and high-quality human-centric images, an intractable problem is how to preserve the face identity for conditioned face images. Existing methods either require time-consuming optimization for each face-identity or learning an efficient encoder at the cost of harming the editability of models. In this work, we present an optimization-free method for each face identity, meanwhile keeping the editability for text-to-image models. Specifically, we propose a novel face-identity encoder to learn an accurate representation of human faces, which applies multi-scale face features followed by a multi-embedding projector to directly generate the pseudo words in the text embedding space. Besides, we propose self-augmented editability learning to enhance the editability of models, which is achieved by constructing paired generated face and edited face images using celebrity names, aiming at transferring mature ability of off-the-shelf text-to-image models in celebrity faces to unseen faces. Extensive experiments show that our methods can generate identity-preserved images under different scenes at a much faster speed.
    摘要 大规模预训练文本到图模型可以生成多样性和高质量的人acentric图像,但是一个不可解决的问题是如何保持人脸identidad дляconditioned face images。现有方法可以通过时间consuming的优化来保持每个face identity,但是这会对模型的可编辑性造成损害。在这种情况下,我们提出了一种不需要优化的方法,同时保持文本到图模型的可编辑性。具体来说,我们提出了一种新的人脸identidad encoder,该encoder通过多个scale的人脸特征 followed by一个多重投影器来直接生成 pseudo words在文本embedding空间中。此外,我们还提出了自然增强的可编辑性学习,该学习是通过使用celebrity名称construct paired generated face和 edited face图像来实现,以传递不visible faces的成熟能力到off-the-shelf文本到图模型中。广泛的实验表明,我们的方法可以在不同的场景下生成保持人脸identidad的图像,并且比 tradicional方法快得多。

AutoST: Training-free Neural Architecture Search for Spiking Transformers

  • paper_url: http://arxiv.org/abs/2307.00293
  • repo_url: None
  • paper_authors: Ziqing Wang, Qidong Zhao, Jinku Cui, Xu Liu, Dongkuan Xu
    for: AutoST is designed to rapidly identify high-performance and energy-efficient Spiking Transformer architectures, addressing the limitations of traditional approaches.methods: AutoST uses Floating-Point Operations (FLOPs) as a performance metric, which is independent of model computations and training dynamics, leading to a stronger correlation with performance. Additionally, activation patterns are leveraged during initialization to estimate the energy consumption of Spiking Transformers.results: AutoST models outperform state-of-the-art manually or automatically designed SNN architectures on static and neuromorphic datasets, while significantly reducing energy consumption.
    Abstract Spiking Transformers have gained considerable attention because they achieve both the energy efficiency of Spiking Neural Networks (SNNs) and the high capacity of Transformers. However, the existing Spiking Transformer architectures, derived from ANNs, exhibit a notable architectural gap, resulting in suboptimal performance compared to their ANN counterparts. Traditional approaches to discovering optimal architectures primarily rely on either manual procedures, which are time-consuming, or Neural Architecture Search (NAS) methods, which are usually expensive in terms of memory footprints and computation time. To address these limitations, we introduce AutoST, a training-free NAS method for Spiking Transformers, to rapidly identify high-performance and energy-efficient Spiking Transformer architectures. Unlike existing training-free NAS methods, which struggle with the non-differentiability and high sparsity inherent in SNNs, we propose to utilize Floating-Point Operations (FLOPs) as a performance metric, which is independent of model computations and training dynamics, leading to a stronger correlation with performance. Moreover, to enable the search for energy-efficient architectures, we leverage activation patterns during initialization to estimate the energy consumption of Spiking Transformers. Our extensive experiments show that AutoST models outperform state-of-the-art manually or automatically designed SNN architectures on static and neuromorphic datasets, while significantly reducing energy consumption.
    摘要 <> translate into Simplified Chinese神经网络中的启突变换器在过去几年中获得了广泛关注,因为它们可以同时实现神经网络的能量效率和变换器的高容量。然而,现有的启突变换器架构,基于神经网络,存在一定的建筑性差距,导致其性能远低于其神经网络对应的架构。传统的最佳架构发现方法主要靠manual procedures,这些方法需要很长时间,或者使用神经网络搜索(NAS)方法,这些方法通常需要大量的存储空间和计算资源。为了解决这些限制,我们介绍了AutoST,一种无需训练的NAS方法,用于快速发现高性能和能效的启突变换器架构。与现有的无需训练NAS方法不同,我们使用操作数(FLOPs)作为性能指标,这个指标与模型计算和训练动态无关,从而具有更强的相关性。此外,为了搜索能效的架构,我们利用初始化时的活动模式来估算启突变换器的能 consumption。我们的广泛实验表明,AutoST模型在静态和 neuromorphic 数据集上都高于当前最佳手动或自动设计的神经网络架构,同时显著降低能 consumption。Note: Simplified Chinese is used here, as it is the most widely used variety of Chinese. However, if you prefer Traditional Chinese, I can also provide the translation.

All-in-SAM: from Weak Annotation to Pixel-wise Nuclei Segmentation with Prompt-based Finetuning

  • paper_url: http://arxiv.org/abs/2307.00290
  • repo_url: None
  • paper_authors: Can Cui, Ruining Deng, Quan Liu, Tianyuan Yao, Shunxing Bao, Lucas W. Remedios, Yucheng Tang, Yuankai Huo
  • for: This paper aims to improve the efficiency of the Segment Anything Model (SAM) for biomedical image segmentation tasks by eliminating the need for manual prompts during the inference stage.
  • methods: The proposed pipeline utilizes SAM to generate pixel-level annotations from weak prompts (e.g., points, bounding boxes), which are then used to finetune the SAM segmentation model without requiring manual prompts during the inference stage.
  • results: The proposed pipeline achieved competitive performance compared to using strong pixel-wise annotated data, and surpassed the state-of-the-art (SOTA) methods in a nuclei segmentation task on the public Monuseg dataset.
    Abstract The Segment Anything Model (SAM) is a recently proposed prompt-based segmentation model in a generic zero-shot segmentation approach. With the zero-shot segmentation capacity, SAM achieved impressive flexibility and precision on various segmentation tasks. However, the current pipeline requires manual prompts during the inference stage, which is still resource intensive for biomedical image segmentation. In this paper, instead of using prompts during the inference stage, we introduce a pipeline that utilizes the SAM, called all-in-SAM, through the entire AI development workflow (from annotation generation to model finetuning) without requiring manual prompts during the inference stage. Specifically, SAM is first employed to generate pixel-level annotations from weak prompts (e.g., points, bounding box). Then, the pixel-level annotations are used to finetune the SAM segmentation model rather than training from scratch. Our experimental results reveal two key findings: 1) the proposed pipeline surpasses the state-of-the-art (SOTA) methods in a nuclei segmentation task on the public Monuseg dataset, and 2) the utilization of weak and few annotations for SAM finetuning achieves competitive performance compared to using strong pixel-wise annotated data.
    摘要 Segment Anything Model (SAM) 是一种最近提出的批处理基于的分割模型,可以在无预料分割的情况下实现出色的灵活性和精度。然而,现有的管道仍然需要在推理阶段手动提供批处理,这对生物医学图像分割而言仍然是资源浪费。在这篇文章中,我们提出一个管道,使用 SAM 来实现从注解生成到模型精度调整的整个人工智能开发工程,无需在推理阶段手动提供批处理。具体来说,SAM 首先用弱提示(例如,点和 bounding box)生成像素级注解。然后,这些像素级注解被用来调整 SAM 分割模型,而不是从 scratch retrained。我们的实验结果表明了两点:1)我们的管道超过了目前最佳方法(SOTA)在公共的 Monuseg 数据集上核心分割任务中的性能;2)使用弱和少量的注解来调整 SAM 模型达到了与使用强像素级注解数据相同的性能。

Common Knowledge Learning for Generating Transferable Adversarial Examples

  • paper_url: http://arxiv.org/abs/2307.00274
  • repo_url: None
  • paper_authors: Ruijie Yang, Yuanfang Guo, Junfu Wang, Jiantao Zhou, Yunhong Wang
    for: This paper focuses on improving the adversarial transferability of transfer-based black-box attacks, where the attacker uses a substitute (source) model to generate adversarial examples for an unseen target model.methods: The proposed method uses a common knowledge learning (CKL) framework to learn better network weights for generating adversarial examples with better transferability, under fixed network architectures. The CKL framework involves constructing a multi-teacher framework where the knowledge is distilled from different teacher architectures into one student network, and imposing constraints on the gradients between the student and teacher models to alleviate the output inconsistency problem.results: The proposed method significantly improves the adversarial transferability, as demonstrated by extensive experiments.
    Abstract This paper focuses on an important type of black-box attacks, i.e., transfer-based adversarial attacks, where the adversary generates adversarial examples by a substitute (source) model and utilize them to attack an unseen target model, without knowing its information. Existing methods tend to give unsatisfactory adversarial transferability when the source and target models are from different types of DNN architectures (e.g. ResNet-18 and Swin Transformer). In this paper, we observe that the above phenomenon is induced by the output inconsistency problem. To alleviate this problem while effectively utilizing the existing DNN models, we propose a common knowledge learning (CKL) framework to learn better network weights to generate adversarial examples with better transferability, under fixed network architectures. Specifically, to reduce the model-specific features and obtain better output distributions, we construct a multi-teacher framework, where the knowledge is distilled from different teacher architectures into one student network. By considering that the gradient of input is usually utilized to generated adversarial examples, we impose constraints on the gradients between the student and teacher models, to further alleviate the output inconsistency problem and enhance the adversarial transferability. Extensive experiments demonstrate that our proposed work can significantly improve the adversarial transferability.
    摘要 Translated into Simplified Chinese:这篇论文关注了一种重要的黑盒攻击方法,即传输基于敌意攻击,敌对方通过一个卷积神经网络(源模型)生成攻击示例,然后使用这些示例攻击一个未知的目标模型。现有的方法在不同类型的深度神经网络(DNN)模型之间存在很大的不满意的攻击传输性能。在这篇论文中,我们发现这种问题是由输出不一致问题引起的。为了解决这个问题并有效利用现有的DNN模型,我们提出了一个共同知识学习(CKL)框架,用于学习更好的网络权重,以生成更好的攻击示例,并且能够更好地传输到目标模型。具体来说,我们构建了一个多教程框架,其中知识从不同的教师模型中提取到一个学生网络中。由于通常通过输入的梯度来生成攻击示例,我们在学生和教师模型之间受限制梯度,以进一步缓解输出不一致问题,提高攻击传输性能。广泛的实验表明,我们的提议可以明显提高攻击传输性能。

HrSegNet : Real-time High-Resolution Neural Network with Semantic Guidance for Crack Segmentation

  • paper_url: http://arxiv.org/abs/2307.00270
  • repo_url: https://github.com/CHDyshli/HrSegNet4CrackSegmentation
  • paper_authors: Yongshang Li, Ronggui Ma, Han Liu, Gaoli Cheng
  • for: 这个研究是为了提高建筑物破损检测的精度和效率,特别是在实时应用中。
  • methods: 我们提出了一个高分辨率模型,具有semantic guidance,专门用于实时破损分类。我们的模型保持高分辨率 throughout the entire process,并使用low-resolution semantic features导引高分辨率 features的重建。我们还设计了一个简单 yet effective的方法来控制模型的计算成本,以提高效率。
  • results: 我们的模型HrSegNet在 crack dataset CrackSeg9k 上 achieves the best trade-off between efficiency and effectiveness。我们的最快模型HrSegNet-B16 在 182 FPS 上获得 78.43% mIoU,而我们的最精准模型HrSegNet-B48 在 140.3 FPS 上获得 80.32% mIoU。
    Abstract Through extensive research on deep learning in recent years and its application in construction, crack detection has evolved rapidly from rough detection at the image-level and patch-level to fine-grained detection at the pixel-level, which better suits the nature of this field. Despite numerous existing studies utilizing off-the-shelf deep learning models or enhancing them, these models are not always effective or efficient in real-world applications. In order to bridge this gap, we propose a High-resolution model with Semantic guidance, specifically designed for real-time crack segmentation, referred to as HrSegNet. Our model maintains high resolution throughout the entire process, as opposed to recovering from low-resolution features to high-resolution ones, thereby maximizing the preservation of crack details. Moreover, to enhance the context information, we use low-resolution semantic features to guide the reconstruction of high-resolution features. To ensure the efficiency of the algorithm, we design a simple yet effective method to control the computation cost of the entire model by controlling the capacity of high-resolution channels, while providing the model with extremely strong scalability. Extensive quantitative and qualitative evaluations demonstrate that our proposed HrSegNet has exceptional crack segmentation capabilities, and that maintaining high resolution and semantic guidance are crucial to the final prediction. Compared to state-of-the-art segmentation models, HrSegNet achieves the best trade-off between efficiency and effectiveness. Specifically, on the crack dataset CrackSeg9k, our fastest model HrSegNet-B16 achieves a speed of 182 FPS with 78.43% mIoU, while our most accurate model HrSegNet-B48 achieves 80.32% mIoU with an inference speed of 140.3 FPS.
    摘要 随着深度学习在近年的研究和应用于建筑领域的扩展,从图像级和补充级到像素级的裂隙检测技术快速发展,更加适应建筑领域的特点。 despite numerous existing studies using off-the-shelf deep learning models or enhancing them, these models are not always effective or efficient in real-world applications. To bridge this gap, we propose a High-resolution model with Semantic guidance, specifically designed for real-time crack segmentation, referred to as HrSegNet. Our model maintains high resolution throughout the entire process, as opposed to recovering from low-resolution features to high-resolution ones, thereby maximizing the preservation of crack details. Moreover, to enhance the context information, we use low-resolution semantic features to guide the reconstruction of high-resolution features. To ensure the efficiency of the algorithm, we design a simple yet effective method to control the computation cost of the entire model by controlling the capacity of high-resolution channels, while providing the model with extremely strong scalability. Extensive quantitative and qualitative evaluations demonstrate that our proposed HrSegNet has exceptional crack segmentation capabilities, and that maintaining high resolution and semantic guidance are crucial to the final prediction. Compared to state-of-the-art segmentation models, HrSegNet achieves the best trade-off between efficiency and effectiveness. Specifically, on the crack dataset CrackSeg9k, our fastest model HrSegNet-B16 achieves a speed of 182 FPS with 78.43% mIoU, while our most accurate model HrSegNet-B48 achieves 80.32% mIoU with an inference speed of 140.3 FPS.

AE-RED: A Hyperspectral Unmixing Framework Powered by Deep Autoencoder and Regularization by Denoising

  • paper_url: http://arxiv.org/abs/2307.00269
  • repo_url: None
  • paper_authors: Min Zhao, Jie Chen, Nicolas Dobigeon
  • for: This paper is written for the purpose of proposing a novel framework for spectral unmixing, which integrates autoencoder networks with regularization by denoising (RED) to enhance the unmixing performance.
  • methods: The paper uses a generic unmixing framework that combines deep autoencoder networks with regularization by denoising (RED) to solve the blind unmixing problem. The framework consists of two subproblems: the first one is solved using deep autoencoders to implicitly regularize the estimates and model the mixture mechanism, while the second one leverages denoising techniques to bring in explicit information.
  • results: The paper reports superior unmixing performance compared to state-of-the-art approaches on both synthetic and real data sets. The proposed framework is able to effectively integrate the advantages of deep autoencoder based unmixing methods and priors provided by denoisers, leading to improved unmixing results.
    Abstract Spectral unmixing has been extensively studied with a variety of methods and used in many applications. Recently, data-driven techniques with deep learning methods have obtained great attention to spectral unmixing for its superior learning ability to automatically learn the structure information. In particular, autoencoder based architectures are elaborately designed to solve blind unmixing and model complex nonlinear mixtures. Nevertheless, these methods perform unmixing task as blackboxes and lack of interpretability. On the other hand, conventional unmixing methods carefully design the regularizer to add explicit information, in which algorithms such as plug-and-play (PnP) strategies utilize off-the-shelf denoisers to plug powerful priors. In this paper, we propose a generic unmixing framework to integrate the autoencoder network with regularization by denoising (RED), named AE-RED. More specially, we decompose the unmixing optimized problem into two subproblems. The first one is solved using deep autoencoders to implicitly regularize the estimates and model the mixture mechanism. The second one leverages the denoiser to bring in the explicit information. In this way, both the characteristics of the deep autoencoder based unmixing methods and priors provided by denoisers are merged into our well-designed framework to enhance the unmixing performance. Experiment results on both synthetic and real data sets show the superiority of our proposed framework compared with state-of-the-art unmixing approaches.
    摘要 On the other hand, conventional unmixing methods use carefully designed regularizers to add explicit information. Algorithms such as plug-and-play (PnP) strategies use off-the-shelf denoisers to plug powerful priors. In this paper, we propose a generic unmixing framework that integrates autoencoder networks with regularization by denoising (RED), called AE-RED.We decompose the unmixing optimized problem into two subproblems. The first one is solved using deep autoencoders to implicitly regularize the estimates and model the mixture mechanism. The second one leverages the denoiser to bring in explicit information. By merging the characteristics of deep autoencoder-based unmixing methods and the priors provided by denoisers, our well-designed framework can enhance the unmixing performance.Experiment results on both synthetic and real data sets show the superiority of our proposed framework compared with state-of-the-art unmixing approaches.

Deep Angiogram: Trivializing Retinal Vessel Segmentation

  • paper_url: http://arxiv.org/abs/2307.00245
  • repo_url: None
  • paper_authors: Dewei Hu, Xing Yao, Jiacheng Wang, Yuankai K. Tao, Ipek Oguz
  • for: 本研究旨在提出一种可以Robustly分割血管的深度学习模型,以便在不同的域域中进行血管分割。
  • methods: 该模型使用了对比强化自适应网络,可以过滤不相关的特征并生成一个深度图像,表示血管。然后通过阈值处理,可以实现血管分割。
  • results: 对比基eline模型,该模型在不同的目标域域上能够提供更高的分割性能,并且可以生成稳定的血管图像,提供非侵入性、安全的替代方案。
    Abstract Among the research efforts to segment the retinal vasculature from fundus images, deep learning models consistently achieve superior performance. However, this data-driven approach is very sensitive to domain shifts. For fundus images, such data distribution changes can easily be caused by variations in illumination conditions as well as the presence of disease-related features such as hemorrhages and drusen. Since the source domain may not include all possible types of pathological cases, a model that can robustly recognize vessels on unseen domains is desirable but remains elusive, despite many proposed segmentation networks of ever-increasing complexity. In this work, we propose a contrastive variational auto-encoder that can filter out irrelevant features and synthesize a latent image, named deep angiogram, representing only the retinal vessels. Then segmentation can be readily accomplished by thresholding the deep angiogram. The generalizability of the synthetic network is improved by the contrastive loss that makes the model less sensitive to variations of image contrast and noisy features. Compared to baseline deep segmentation networks, our model achieves higher segmentation performance via simple thresholding. Our experiments show that the model can generate stable angiograms on different target domains, providing excellent visualization of vessels and a non-invasive, safe alternative to fluorescein angiography.
    摘要 Among the research efforts to segment the retinal vasculature from fundus images, deep learning models consistently achieve superior performance. However, this data-driven approach is very sensitive to domain shifts. For fundus images, such data distribution changes can easily be caused by variations in illumination conditions as well as the presence of disease-related features such as hemorrhages and drusen. Since the source domain may not include all possible types of pathological cases, a model that can robustly recognize vessels on unseen domains is desirable but remains elusive, despite many proposed segmentation networks of ever-increasing complexity. In this work, we propose a contrastive variational auto-encoder that can filter out irrelevant features and synthesize a latent image, named deep angiogram, representing only the retinal vessels. Then segmentation can be readily accomplished by thresholding the deep angiogram. The generalizability of the synthetic network is improved by the contrastive loss that makes the model less sensitive to variations of image contrast and noisy features. Compared to baseline deep segmentation networks, our model achieves higher segmentation performance via simple thresholding. Our experiments show that the model can generate stable angiograms on different target domains, providing excellent visualization of vessels and a non-invasive, safe alternative to fluorescein angiography.Here is a word-for-word translation of the text into Simplified Chinese:在血管胶层图像分割方面,深度学习模型一直表现出超越性能。然而,这种数据驱动的方法对域Shift非常敏感。对于胶层图像,这种数据分布变化可以轻松地由照明条件的变化以及疾病相关特征所致。因为源领域可能不包含所有可能的疾病情况,一个可以在未seen域中坚定地识别血管的模型是极其愿望的,但尚未实现。在这项工作中,我们提出了一种对比吸引变换自动编码器,可以筛选出无关的特征,并生成一个含有只有血管的潜在图像,称为深度血管图像。然后,可以通过resholding这个深度血管图像来进行分割。对于基线深度分割网络,我们的模型可以通过简单的阈值设定来实现更高的分割性能。我们的实验表明,模型可以在不同的目标域上稳定生成安全、不侵入的抗颜色气体抑血管图像,提供出色的血管视化和替代吸入气体抑血管报告的非侵入、安全的选择。

S-Omninet: Structured Data Enhanced Universal Multimodal Learning Architecture

  • paper_url: http://arxiv.org/abs/2307.00226
  • repo_url: None
  • paper_authors: Ye Xue, Diego Klabjan, Jean Utke
  • for: 这篇论文主要是关于多模态多任务学习,强调在不同频谱和结构数据上学习多种任务的能力。
  • methods: 该论文提出了一种名为Structured-data-enhanced Omninet(S-Omninet)的新型多模态模型,通过跨缓存注意力和嵌入patch来增强视觉特征表示,同时支持结构数据。
  • results: 对多个多模态数据集进行测试,S-Omninet模型表现出了明显的改善,较基eline模型Omninet具有更好的学习能力和灵活性。
    Abstract Multimodal multitask learning has attracted an increasing interest in recent years. Singlemodal models have been advancing rapidly and have achieved astonishing results on various tasks across multiple domains. Multimodal learning offers opportunities for further improvements by integrating data from multiple modalities. Many methods are proposed to learn on a specific type of multimodal data, such as vision and language data. A few of them are designed to handle several modalities and tasks at a time. In this work, we extend and improve Omninet, an architecture that is capable of handling multiple modalities and tasks at a time, by introducing cross-cache attention, integrating patch embeddings for vision inputs, and supporting structured data. The proposed Structured-data-enhanced Omninet (S-Omninet) is a universal model that is capable of learning from structured data of various dimensions effectively with unstructured data through cross-cache attention, which enables interactions among spatial, temporal, and structured features. We also enhance spatial representations in a spatial cache with patch embeddings. We evaluate the proposed model on several multimodal datasets and demonstrate a significant improvement over the baseline, Omninet.
    摘要 多modal多任务学习在最近几年内受到了越来越多的关注。单modal模型在不同领域的多种任务上取得了非常出色的成绩。多modal学习具有融合多种数据的机会,可以进一步提高模型的性能。许多方法是为了处理特定类型的多modal数据而提出,其中一些可以同时处理多个模式和任务。在这个工作中,我们扩展并改进了Omninet架构,该架构可以同时处理多个模式和任务。我们引入了交叉缓存注意力,将视觉输入集成到缓存中,并支持结构化数据。我们称之为Structured-data-enhanced Omninet(S-Omninet)。该模型可以有效地从结构化数据中学习,并且可以通过交叉缓存注意力和覆盖式嵌入来实现视觉特征的增强。我们在多个多modal数据集上评估了提议模型,并证明了与基线模型Omninet相比有显著的提高。

StyleStegan: Leak-free Style Transfer Based on Feature Steganography

  • paper_url: http://arxiv.org/abs/2307.00225
  • repo_url: None
  • paper_authors: Xiujian Liang, Bingshan Liu, Qichao Ying, Zhenxing Qian, Xinpeng Zhang
  • for: 解决现代社交媒体中存在的内容泄露问题,以便实现序列和反向的风格传递。
  • methods: 基于特征隐写技术的风格传递方法,包括两个主要组成部分:风格传递方法和图像隐写方法。
  • results: 对公共可用数据集MS-COCO和Wikiart进行了全面的实验验证,结果显示StyleStegan成功解决了风格传递中的内容泄露问题,相比于一个不优化的基线模型,SSIM表现指标在序列和反向风格传递任务中分别提高了14.98%和7.28%。
    Abstract In modern social networks, existing style transfer methods suffer from a serious content leakage issue, which hampers the ability to achieve serial and reversible stylization, thereby hindering the further propagation of stylized images in social networks. To address this problem, we propose a leak-free style transfer method based on feature steganography. Our method consists of two main components: a style transfer method that accomplishes artistic stylization on the original image and an image steganography method that embeds content feature secrets on the stylized image. The main contributions of our work are as follows: 1) We identify and explain the phenomenon of content leakage and its underlying causes, which arise from content inconsistencies between the original image and its subsequent stylized image. 2) We design a neural flow model for achieving loss-free and biased-free style transfer. 3) We introduce steganography to hide content feature information on the stylized image and control the subsequent usage rights. 4) We conduct comprehensive experimental validation using publicly available datasets MS-COCO and Wikiart. The results demonstrate that StyleStegan successfully mitigates the content leakage issue in serial and reversible style transfer tasks. The SSIM performance metrics for these tasks are 14.98% and 7.28% higher, respectively, compared to a suboptimal baseline model.
    摘要 现代社交媒体中,现有的风格传输方法受到严重的内容泄露问题困扰,这限制了实现串行和反转的风格化,从而阻碍了更多的风格化图像在社交媒体上传播。为解决这问题,我们提出了无泄露风格传输方法基于特征隐藏。我们的方法包括两个主要组成部分:一个实现艺术风格化的风格传输方法和一个嵌入隐藏特征信息的图像隐藏方法。我们的主要贡献如下:1. 我们识别和解释内容泄露问题的现象和其根本原因,这些原因来自于图像和其后风格化图像之间的内容不一致。2. 我们设计了一个无损和偏见的风格传输模型。3. 我们引入隐藏特征信息的图像隐藏技术,以控制后续使用权限。4. 我们对公开available的MS-COCO和Wikiart dataset进行了广泛的实验验证。结果表明,StyleStegan成功解决了串行和反转风格传输任务中的内容泄露问题。对于这两个任务,我们的SSIM性能指标分别高于相对优化的基eline模型14.98%和7.28%。

Q-YOLO: Efficient Inference for Real-time Object Detection

  • paper_url: http://arxiv.org/abs/2307.04816
  • repo_url: None
  • paper_authors: Mingze Wang, Huixin Sun, Jun Shi, Xuhui Liu, Baochang Zhang, Xianbin Cao
  • for: 本研究旨在提高资源有限的边缘设备上部署实时物体检测模型的效率,以实现实时检测而减少计算和内存占用。
  • methods: 本文提出了一种低位数量化方法,称为Q-YOLO,以构建高效的一stage检测器。Q-YOLO使用了一个完整的Post-Training Quantization(PTQ)管道,并采用了一种名为Unilateral Histogram-based(UH)活动量化方案,通过 histogram 分析,确定最大 truncation 值,以最小化 Mean Squared Error(MSE)量化错误。
  • results: 对COCO数据集进行了广泛的实验, demonstarted Q-YOLO的有效性,并在其他PTQ方法之上具有更好的平衡性,同时实现了减少计算和内存占用的实时检测。这些研究启动了资源有限边缘设备上部署物体检测模型的高效部署,以实现实时检测而减少计算和内存占用。
    Abstract Real-time object detection plays a vital role in various computer vision applications. However, deploying real-time object detectors on resource-constrained platforms poses challenges due to high computational and memory requirements. This paper describes a low-bit quantization method to build a highly efficient one-stage detector, dubbed as Q-YOLO, which can effectively address the performance degradation problem caused by activation distribution imbalance in traditional quantized YOLO models. Q-YOLO introduces a fully end-to-end Post-Training Quantization (PTQ) pipeline with a well-designed Unilateral Histogram-based (UH) activation quantization scheme, which determines the maximum truncation values through histogram analysis by minimizing the Mean Squared Error (MSE) quantization errors. Extensive experiments on the COCO dataset demonstrate the effectiveness of Q-YOLO, outperforming other PTQ methods while achieving a more favorable balance between accuracy and computational cost. This research contributes to advancing the efficient deployment of object detection models on resource-limited edge devices, enabling real-time detection with reduced computational and memory overhead.
    摘要

More for Less: Compact Convolutional Transformers Enable Robust Medical Image Classification with Limited Data

  • paper_url: http://arxiv.org/abs/2307.00213
  • repo_url: None
  • paper_authors: Andrew Kean Gao
  • for: 这篇研究旨在测试Compact Convolutional Transformers(CCT)在生物医学影像分类 зада中的可行性,以扩展Transformers在生物医学领域的应用。
  • methods: 本研究使用了一种混合了 transformers 和卷积层的方法——CCT,以扩展Transformers的应用范围。
  • results: 研究获得了92.49%的类别准确率和0.9935的微 averaged ROC AUC,表明CCT在有限数据情况下可以实现高度的类别准确率。
    Abstract Transformers are very powerful tools for a variety of tasks across domains, from text generation to image captioning. However, transformers require substantial amounts of training data, which is often a challenge in biomedical settings, where high quality labeled data can be challenging or expensive to obtain. This study investigates the efficacy of Compact Convolutional Transformers (CCT) for robust medical image classification with limited data, addressing a key issue faced by conventional Vision Transformers - their requirement for large datasets. A hybrid of transformers and convolutional layers, CCTs demonstrate high accuracy on modestly sized datasets. We employed a benchmark dataset of peripheral blood cell images of eight distinct cell types, each represented by approximately 2,000 low-resolution (28x28x3 pixel) samples. Despite the dataset size being smaller than those typically used with Vision Transformers, we achieved a commendable classification accuracy of 92.49% and a micro-average ROC AUC of 0.9935. The CCT also learned quickly, exceeding 80% validation accuracy after five epochs. Analysis of per-class precision, recall, F1, and ROC showed that performance was strong across cell types. Our findings underscore the robustness of CCTs, indicating their potential as a solution to data scarcity issues prevalent in biomedical imaging. We substantiate the applicability of CCTs in data-constrained areas and encourage further work on CCTs.
    摘要 它们(transformers)是非常强大的工具,可以在多个领域中进行多种任务,从文本生成到图像描述。然而,transformers需要大量的训练数据,而在生物医学领域,高质量的标注数据可能困难或昂贵。这项研究探讨了Compact Convolutional Transformers(CCT)在医学图像分类中的可靠性,解决了传统的视图变换器(Vision Transformers)的问题,即它们需要大量的数据。CCT是将transformers和卷积层结合在一起的混合模型。我们使用了一个标准的血液细胞图像数据集,包含8种不同的血液细胞类型,每种类型都有约2000个低分辨率(28x28x3像素)的样本。尽管数据集的大小较小于传统使用的Vision Transformers,但我们达到了92.49%的分类精度和0.9935的微均ROC AUC。CCT还快速学习,在5个epoch后超过80%的验证精度。我们的分析表明,CCT在每个类型中的精度、回归、F1和ROC都具有强大的表现。我们的发现证明了CCT的可靠性,表明它们在数据缺乏的情况下可以作为解决方案。我们鼓励进一步的研究和应用CCTs。

Internal-External Boundary Attention Fusion for Glass Surface Segmentation

  • paper_url: http://arxiv.org/abs/2307.00212
  • repo_url: None
  • paper_authors: Dongshen Han, Seungkyu Lee
  • for: 本研究旨在提出一种基于深度学习的玻璃表面特征描述方法,以便从单色图像中提取玻璃区域。
  • methods: 我们提出了一种基于Semantic Segmentation的方法,其中包括内部和外部边界注意模块,通过分别学习和选择玻璃表面内和外部的视觉特征来描述玻璃区域。
  • results: 我们在六个公共评测 dataset 上进行了比较,与现状的方法进行了比较,得到了出色的结果。
    Abstract Glass surfaces of transparent objects and mirrors are not able to be uniquely and explicitly characterized by their visual appearances because they contain the visual appearance of other reflected or transmitted surfaces as well. Detecting glass regions from a single-color image is a challenging task. Recent deep-learning approaches have paid attention to the description of glass surface boundary where the transition of visual appearances between glass and non-glass surfaces are observed. In this work, we analytically investigate how glass surface boundary helps to characterize glass objects. Inspired by prior semantic segmentation approaches with challenging image types such as X-ray or CT scans, we propose separated internal-external boundary attention modules that individually learn and selectively integrate visual characteristics of the inside and outside region of glass surface from a single color image. Our proposed method is evaluated on six public benchmarks comparing with state-of-the-art methods showing promising results.
    摘要 玻璃表面和镜子的透明物体表面无法uniquely和explicitly characterized by their visual appearances,因为它们包含了其他反射或传输的表面的视觉特征。从单一颜色图像中检测玻璃区域是一个具有挑战性的任务。现代深度学习方法已经强调描述玻璃表面边界, где视觉特征的转变可以观察到。在这项工作中,我们分析了如何使用玻璃表面边界来 caracterize glass objects。受过去的semantic segmentation方法的启发,我们提议了分开的内部外部边界注意力模块,它们分别学习和选择ively integrate玻璃表面内部和外部区域的视觉特征从单一颜色图像中。我们的提议方法在六个公共benchmark上进行了评估,与现状的方法相比显示了可塑性的结果。

AIGCIQA2023: A Large-scale Image Quality Assessment Database for AI Generated Images: from the Perspectives of Quality, Authenticity and Correspondence

  • paper_url: http://arxiv.org/abs/2307.00211
  • repo_url: https://github.com/wangjiarui153/aigciqa2023
  • paper_authors: Jiarui Wang, Huiyu Duan, Jing Liu, Shi Chen, Xiongkuo Min, Guangtao Zhai
  • for: 这研究的目的是为了更好地了解人类对AI生成图像的视觉偏好。
  • methods: 这些研究使用了6种现状最佳的文本到图像生成模型,生成了2000多个图像,并通过一组有组织的主观试验评估人类对每个图像的质量、真实性和准确性。
  • results: 这些研究结果显示了人类对AI生成图像的视觉偏好,并对现有的IQA指标进行了评估。
    Abstract In this paper, in order to get a better understanding of the human visual preferences for AIGIs, a large-scale IQA database for AIGC is established, which is named as AIGCIQA2023. We first generate over 2000 images based on 6 state-of-the-art text-to-image generation models using 100 prompts. Based on these images, a well-organized subjective experiment is conducted to assess the human visual preferences for each image from three perspectives including quality, authenticity and correspondence. Finally, based on this large-scale database, we conduct a benchmark experiment to evaluate the performance of several state-of-the-art IQA metrics on our constructed database.
    摘要 在这篇论文中,为了更好地理解人类对AIGI的视觉偏好,我们建立了一个大规模的AIGC图像评价数据库,命名为AIGCIQA2023。我们首先生成了超过2000个图像,使用100个文本生成模型,并对这些图像进行了三个视觉偏好的主观测试,包括图像质量、准确性和匹配度。最后,基于这个大规模数据库,我们进行了多种现代IQA指标的 benchmark测试,以评估它们在我们构建的数据库中的表现。

Filter Pruning for Efficient CNNs via Knowledge-driven Differential Filter Sampler

  • paper_url: http://arxiv.org/abs/2307.00198
  • repo_url: https://github.com/osilly/kdfs
  • paper_authors: Shaohui Lin, Wenxuan Huang, Jiao Xie, Baochang Zhang, Yunhang Shen, Zhou Yu, Jungong Han, David Doermann
  • for: 这 paper 是为了提高 Convolutional Neural Networks (CNNs) 的计算速度和内存占用量,并且可以应用于 Edge 设备和云服务。
  • methods: 这 paper 提出了一种新的 Knowledge-driven Differential Filter Sampler (KDFS) 框架,用于对 CNNs 进行filter pruning。KDFS 使用一个 learnable 抽象 sampler 来生成一个 binary mask vector для每个层,以确定该层的 filters 是否为 redundant。
  • results: 对多个 dataset 进行了广泛的实验,显示了 KDFS 的效果性。例如,使用 KDFS 压缩 Base Model 在 ImageNet 上 achieve $55.36%$ 计算减少,$42.86%$ 参数减少,仅 dropped $0.35%$ Top-1 准确率,significantly outperforming 现有方法。
    Abstract Filter pruning simultaneously accelerates the computation and reduces the memory overhead of CNNs, which can be effectively applied to edge devices and cloud services. In this paper, we propose a novel Knowledge-driven Differential Filter Sampler~(KDFS) with Masked Filter Modeling~(MFM) framework for filter pruning, which globally prunes the redundant filters based on the prior knowledge of a pre-trained model in a differential and non-alternative optimization. Specifically, we design a differential sampler with learnable sampling parameters to build a binary mask vector for each layer, determining whether the corresponding filters are redundant. To learn the mask, we introduce masked filter modeling to construct PCA-like knowledge by aligning the intermediate features from the pre-trained teacher model and the outputs of the student decoder taking sampling features as the input. The mask and sampler are directly optimized by the Gumbel-Softmax Straight-Through Gradient Estimator in an end-to-end manner in combination with global pruning constraint, MFM reconstruction error, and dark knowledge. Extensive experiments demonstrate the proposed KDFS's effectiveness in compressing the base models on various datasets. For instance, the pruned ResNet-50 on ImageNet achieves $55.36\%$ computation reduction, and $42.86\%$ parameter reduction, while only dropping $0.35\%$ Top-1 accuracy, significantly outperforming the state-of-the-art methods. The code is available at \url{https://github.com/Osilly/KDFS}.
    摘要 <>通过同时进行过滤排序和减少计算和内存开销, convolutional neural networks (CNNs) 可以更加快速和高效地进行计算。在这篇论文中,我们提出了一种新的知识驱动的差分过滤排序(KDFS)框架,用于过滤排序。该框架基于预训练模型的先验知识,并在差分和非交换的优化下进行全局减少重复的过滤器。我们设计了一种差分抽样器,用于生成每层的二进制掩码向量,以确定对应的过滤器是否为重复的。为了学习掩码,我们引入了带有掩码的过滤器模型(MFM),以构建类似于Principal Component Analysis(PCA)的知识。我们使用Gumbel-Softmax Straight-Through Gradient Estimator来直接优化抽样器和掩码,并与全局减少约束、MFM重建误差和黑知识相结合。我们对多个数据集进行了广泛的实验,并证明了我们的KDFS有效地压缩基本模型。例如,采用KDFS压缩的ResNet-50在ImageNet上达到了$55.36\%$ 的计算减少和$42.86\%$ 的参数减少,而且只有$0.35\%$ 的Top-1准确率下降,与当前的方法显著超越。代码可以在 \url{https://github.com/Osilly/KDFS} 上获取。Note: The translation is in Simplified Chinese, which is one of the two standardized Chinese languages used in mainland China. The other standardized Chinese language is Traditional Chinese, which is used in Taiwan and other parts of the world.

Long-Tailed Continual Learning For Visual Food Recognition

  • paper_url: http://arxiv.org/abs/2307.00183
  • repo_url: None
  • paper_authors: Jiangpeng He, Luotao Lin, Jack Ma, Heather A. Eicher-Miller, Fengqing Zhu
  • for: 本研究旨在解决深度学习基于食物识别中的两个主要问题,即随着时间的推移,模型需要不断学习新的食物类型而不导致已知的食物类型忘记,以及食物图像在实际生活中的分布是长尾分布,其中一些食物类型的图像比较罕见。
  • methods: 本研究提出了一种新的终端向量 continual learning 方法,用于解决上述两个问题。该方法包括一个额外的预测器来实现知识储存,以避免在 kontinual learning 过程中的表现不一致,以及一种新的数据增强技术, integrate class-activation-map (CAM) 和 CutMix,以提高对instance-rare食物类型的泛化能力。
  • results: 对比 existed 方法,提出的方法显示了大幅度的提升性能,特别是在instance-rare食物类型上。
    Abstract Deep learning based food recognition has achieved remarkable progress in predicting food types given an eating occasion image. However, there are two major obstacles that hinder deployment in real world scenario. First, as new foods appear sequentially overtime, a trained model needs to learn the new classes continuously without causing catastrophic forgetting for already learned knowledge of existing food types. Second, the distribution of food images in real life is usually long-tailed as a small number of popular food types are consumed more frequently than others, which can vary in different populations. This requires the food recognition method to learn from class-imbalanced data by improving the generalization ability on instance-rare food classes. In this work, we focus on long-tailed continual learning and aim to address both aforementioned challenges. As existing long-tailed food image datasets only consider healthy people population, we introduce two new benchmark food image datasets, VFN-INSULIN and VFN-T2D, which exhibits on the real world food consumption for insulin takers and individuals with type 2 diabetes without taking insulin, respectively. We propose a novel end-to-end framework for long-tailed continual learning, which effectively addresses the catastrophic forgetting by applying an additional predictor for knowledge distillation to avoid misalignment of representation during continual learning. We also introduce a novel data augmentation technique by integrating class-activation-map (CAM) and CutMix, which significantly improves the generalization ability for instance-rare food classes to address the class-imbalance issue. The proposed method show promising performance with large margin improvements compared with existing methods.
    摘要 深度学习基于食物识别已经取得了非常出色的进步,可以预测给定食物吃掉的场景图像中的食物类型。然而,现实世界中的部署受到两个主要障碍:首先,随着时间的推移,已经学习的食物类型需要不断学习新的类型,而不会导致已经学习的食物类型的恐慌性忘记。其次,食物图像在实际生活中的分布是长尾分布,食物类型的极少数是常见的,而其他类型的食物往往很少被 consume,这会影响食物识别方法的学习。在这种情况下,我们将注意力集中在长尾 continual learning 上,以解决上述两个挑战。现有的长尾食物图像数据集只考虑了健康人口,因此我们引入了两个新的标准食物图像数据集:VFN-INSULIN和VFN-T2D,它们分别表示没有服用 инсулин的糖尿病患者和服用 инсулин的糖尿病患者。我们提出了一种新的端到端框架,用于长尾 continual learning,以避免在 continual learning 过程中的恐慌性忘记。我们还引入了一种新的数据增强技术,通过 integrating class-activation-map (CAM) 和 CutMix,可以显著提高对不常见食物类型的泛化能力,以解决类别不均衡问题。我们的方法在与现有方法进行比较时表现出了大幅提升的表现。

Single-Stage Heavy-Tailed Food Classification

  • paper_url: http://arxiv.org/abs/2307.00182
  • repo_url: None
  • paper_authors: Jiangpeng He, Fengqing Zhu
  • for: 这个论文旨在解决实际应用中食物分类问题的两个主要障碍。
  • methods: 我们提出了一种新的单阶段(即端到端)食物分类框架,以解决严重的类别不均衡问题和单例问题。
  • results: 我们在两个重点食物数据集上进行了评估,并实现了与现有方法相比的超过5%的提升。
    Abstract Deep learning based food image classification has enabled more accurate nutrition content analysis for image-based dietary assessment by predicting the types of food in eating occasion images. However, there are two major obstacles to apply food classification in real life applications. First, real life food images are usually heavy-tailed distributed, resulting in severe class-imbalance issue. Second, it is challenging to train a single-stage (i.e. end-to-end) framework under heavy-tailed data distribution, which cause the over-predictions towards head classes with rich instances and under-predictions towards tail classes with rare instance. In this work, we address both issues by introducing a novel single-stage heavy-tailed food classification framework. Our method is evaluated on two heavy-tailed food benchmark datasets, Food101-LT and VFN-LT, and achieves the best performance compared to existing work with over 5% improvements for top-1 accuracy.
    摘要 深度学习基于食物图像分类已经使得食物内含营养物质的分析变得更加精准,通过预测餐厅图像中的食物类型。然而,在实际应用中存在两个主要障碍:首先,实际食物图像通常具有极端分布,导致类型分布不均匀的问题。其次,在极端数据分布下训练单 Stage(即端到端)框架是困难的,这会导致头类(即常见食物)的过度预测和尾类(即罕见食物)的下预测。在这种情况下,我们提出了一种新的单Stage极端食物分类框架。我们的方法在两个极端食物标准测试集Food101-LT和VFN-LT上进行评估,并达到了现有方法的最高性能,相对于前一个最佳方法,我们的方法实现了超过5%的提升。

Unsupervised Coordinate-Based Video Denoising

  • paper_url: http://arxiv.org/abs/2307.00179
  • repo_url: None
  • paper_authors: Mary Damilola Aiyetigbo, Dineshchandar Ravichandran, Reda Chalhoub, Peter Kalivas, Nianyi Li
  • for: 这个论文旨在提出一种新的无监督视频干扰深度学习方法,可以帮助解决数据缺乏问题,并且对不同干扰模式具有耐性,因此它的应用范围广泛。
  • methods: 该方法包括三个模块:特征生成器生成特征地图,干扰约束网络生成清晰但微妙的参照帧,以及重新引入高频环境细节。通过坐标基网络,我们可以大幅简化网络结构,保留高频环境细节在干扰视频帧中。
  • results: 我们在 simulated 和实际捕获的 calcium 影像视频序列上进行了广泛的实验,并证明了我们的方法可以有效地去干扰真实世界的 calcium 影像视频序列,不需要对干扰模型和数据增强进行学习和训练。
    Abstract In this paper, we introduce a novel unsupervised video denoising deep learning approach that can help to mitigate data scarcity issues and shows robustness against different noise patterns, enhancing its broad applicability. Our method comprises three modules: a Feature generator creating features maps, a Denoise-Net generating denoised but slightly blurry reference frames, and a Refine-Net re-introducing high-frequency details. By leveraging the coordinate-based network, we can greatly simplify the network structure while preserving high-frequency details in the denoised video frames. Extensive experiments on both simulated and real-captured demonstrate that our method can effectively denoise real-world calcium imaging video sequences without prior knowledge of noise models and data augmentation during training.
    摘要 在这篇论文中,我们介绍了一种新的无监督视频干净深度学习方法,可以帮助解决数据缺乏问题,并且对不同噪声模式具有较好的Robustness,从而扩大其应用范围。我们的方法包括三个模块:一个特征生成器生成特征地图,一个Denoi-Net生成干净但有些模糊的参照帧,以及一个Refin-Net重新引入高频环境细节。通过利用坐标基于网络,我们可以大大简化网络结构,保留高频环境细节在干净视频帧中。我们在模拟和实际捕捉的视频序列上进行了广泛的实验,证明我们的方法可以有效地干净实际 calcium 影像视频序列,不需要先知噪声模型和数据扩展 durante 训练。

Multiscale Progressive Text Prompt Network for Medical Image Segmentation

  • paper_url: http://arxiv.org/abs/2307.00174
  • repo_url: https://github.com/codehxj/MPTPN-for--Medical-Image-Segmentation
  • paper_authors: Xianjun Han, Qianqian Chen, Zhaoyang Xie, Xuejun Li, Hongyu Yang
    for:本文旨在提出一种使用进步文本提示来引导医学图像分割任务的方法,以便降低需要大量标注数据的需求,从而提高分割结果的准确性。methods:本方法包括两个阶段。第一阶段使用对自然图像进行对比学习,先使得Prompt Encoder(PPE)学习出力强大的多样性特征。第二阶段将医学图像和文本提示送入PPE,以实现下游医学图像分割任务。一个多尺度特征融合块(MSFF)将PPE中的特征融合,生成多尺度多样性特征。results:与使用仅图像不同,我们的模型可以在低数据标注成本下获得高质量结果。此外,我们的模型不仅在医学图像上具有优秀可靠性和有效性,还在自然图像上表现良好。在不同的图像数据集上进行了实验,结果表明我们的模型是有效和可靠的。
    Abstract The accurate segmentation of medical images is a crucial step in obtaining reliable morphological statistics. However, training a deep neural network for this task requires a large amount of labeled data to ensure high-accuracy results. To address this issue, we propose using progressive text prompts as prior knowledge to guide the segmentation process. Our model consists of two stages. In the first stage, we perform contrastive learning on natural images to pretrain a powerful prior prompt encoder (PPE). This PPE leverages text prior prompts to generate multimodality features. In the second stage, medical image and text prior prompts are sent into the PPE inherited from the first stage to achieve the downstream medical image segmentation task. A multiscale feature fusion block (MSFF) combines the features from the PPE to produce multiscale multimodality features. These two progressive features not only bridge the semantic gap but also improve prediction accuracy. Finally, an UpAttention block refines the predicted results by merging the image and text features. This design provides a simple and accurate way to leverage multiscale progressive text prior prompts for medical image segmentation. Compared with using only images, our model achieves high-quality results with low data annotation costs. Moreover, our model not only has excellent reliability and validity on medical images but also performs well on natural images. The experimental results on different image datasets demonstrate that our model is effective and robust for image segmentation.
    摘要 “精确的医疗影像分类是医疗影像评估中不可或缺的一步。然而,将深度神经网络训练 для这个任务需要大量的标注数据,以确保高精度结果。为解决这个问题,我们提出使用进步文本提示来导引分类过程。我们的模型由两个阶段组成。在第一个阶段,我们使用对自然影像进行对比学习,以预训一个具有强大能力的文本提示encoder(PPE)。这个PPE通过文本提示来生成多元特征。在第二个阶段,医疗影像和文本提示被送入PPE来完成下游医疗影像分类任务。一个多尺度特征融合块(MSFF)组合PPE生成的特征,以生成多尺度多元特征。这两个进步特征不仅bridges semantic gap,而且提高预测精度。最后,一个UpAttention块精确地调整预测结果,通过融合影像和文本特征。这个设计提供了一个简单而准确的方法,使用多尺度进步文本提示来进行医疗影像分类。相比于仅使用影像,我们的模型可以 дости得高品质结果,并且降低标注数据的成本。此外,我们的模型不仅在医疗影像上表现出色,还能够在自然影像上表现出良好的性能。实验结果表明,我们的模型是可靠和有效的 для影像分类。”

Hierarchical Neural Coding for Controllable CAD Model Generation

  • paper_url: http://arxiv.org/abs/2307.00149
  • repo_url: https://github.com/samxuxiang/hnc-cad
  • paper_authors: Xiang Xu, Pradeep Kumar Jayaraman, Joseph G. Lambourne, Karl D. D. Willis, Yasutaka Furukawa
  • for: 这 paper 的目的是提出一种新的生成模型,用于计算机支持设计 (CAD) 领域。
  • methods: 这 paper 使用了一种三级层次结构的神经网络代码,从全局部件安排下降到本地曲线几何。具体来说,使用了一种新的矢量量化 VAE WITH “masked skip connection” 抽取设计变化。两个阶段归一化变换学习代码树从不完整的 CAD 模型中生成代码树,然后按照设计意图完成 CAD 模型。
  • results: 广泛的实验表明,这种方法在Random generation 和 conditional generation 任务上具有优秀表现,同时允许用户在设计过程中进行新的交互操作。代码可以在 https://github.com/samxuxiang/hnc-cad 上下载。
    Abstract This paper presents a novel generative model for Computer Aided Design (CAD) that 1) represents high-level design concepts of a CAD model as a three-level hierarchical tree of neural codes, from global part arrangement down to local curve geometry; and 2) controls the generation or completion of CAD models by specifying the target design using a code tree. Concretely, a novel variant of a vector quantized VAE with "masked skip connection" extracts design variations as neural codebooks at three levels. Two-stage cascaded auto-regressive transformers learn to generate code trees from incomplete CAD models and then complete CAD models following the intended design. Extensive experiments demonstrate superior performance on conventional tasks such as random generation while enabling novel interaction capabilities on conditional generation tasks. The code is available at https://github.com/samxuxiang/hnc-cad.
    摘要 这篇论文提出了一种新的生成模型,用于计算机支持设计(CAD),它可以1)将高级设计想法表示为三级层次的神经码树,从全局部件安排下到本地曲线几何; 2)通过指定目标设计来控制生成或完成CAD模型。具体来说,这种新的vector量化VAE中的"masked skip connection"抽取出设计变化的神经码书在三级层次。两个阶段的堆叠自动逆Transformers学习从部分完整的CAD模型中生成代码树,然后根据意图的设计完成CAD模型。广泛的实验表明,这种方法在Random Generation任务上具有优秀的性能,同时允许 Conditional Generation任务中的新式交互能力。代码可以在https://github.com/samxuxiang/hnc-cad中找到。

An End-to-End Review of Gaze Estimation and its Interactive Applications on Handheld Mobile Devices

  • paper_url: http://arxiv.org/abs/2307.00122
  • repo_url: None
  • paper_authors: Yaxiong Lei, Shijing He, Mohamed Khamis, Juan Ye
  • for: 本研究目的是审查手持式移动设备上的互动系统,以视线作为单独或辅助互动模式。
  • methods: 本研究使用了视线测量技术,包括深度学习等方法,以提高视线估计精度。
  • results: 本研究给出了互动系统中视线估计的状况,包括视线测量技术、视线估计工作流程、以及深度学习等方法。
    Abstract In recent years we have witnessed an increasing number of interactive systems on handheld mobile devices which utilise gaze as a single or complementary interaction modality. This trend is driven by the enhanced computational power of these devices, higher resolution and capacity of their cameras, and improved gaze estimation accuracy obtained from advanced machine learning techniques, especially in deep learning. As the literature is fast progressing, there is a pressing need to review the state of the art, delineate the boundary, and identify the key research challenges and opportunities in gaze estimation and interaction. This paper aims to serve this purpose by presenting an end-to-end holistic view in this area, from gaze capturing sensors, to gaze estimation workflows, to deep learning techniques, and to gaze interactive applications.
    摘要 Recently, there has been an increasing number of interactive systems on handheld mobile devices that use gaze as a single or complementary interaction modality. This trend is driven by the enhanced computational power of these devices, the higher resolution and capacity of their cameras, and the improved gaze estimation accuracy obtained from advanced machine learning techniques, especially in deep learning. As the literature is rapidly advancing, there is a pressing need to review the state of the art, delineate the boundary, and identify the key research challenges and opportunities in gaze estimation and interaction. This paper aims to serve this purpose by presenting an end-to-end holistic view in this area, from gaze capturing sensors, to gaze estimation workflows, to deep learning techniques, and to gaze interactive applications.

Prompting classes: Exploring the Power of Prompt Class Learning in Weakly Supervised Semantic Segmentation

  • paper_url: http://arxiv.org/abs/2307.00097
  • repo_url: https://github.com/rb080/wss_pole
  • paper_authors: Balamurali Murugesan, Rukhshanda Hussain, Rajarshi Bhattacharya, Ismail Ben Ayed, Jose Dolz
  • for: 这篇论文是关于弱监督semantic segmentation(WSSS)问题的研究,旨在exploring whether prompt tuning可以提高WSSS的性能。
  • methods: 该论文使用CLIP-based模型进行预训练,然后通过 modifying text prompt来进行prompt tuning。
  • results: 研究发现,只需要修改类token的文本提示,可以对Class Activation Map(CAM)产生更大的影响,而且类token与图像真实分类结果不一定相符。基于这些发现, authors提出了一种新的PrOmpt cLass lEarning(POLE)策略,并通过广泛的实验表明该方法可以在一个常见的WSSS benchmark中达到SOTA性能。
    Abstract Recently, CLIP-based approaches have exhibited remarkable performance on generalization and few-shot learning tasks, fueled by the power of contrastive language-vision pre-training. In particular, prompt tuning has emerged as an effective strategy to adapt the pre-trained language-vision models to downstream tasks by employing task-related textual tokens. Motivated by this progress, in this work we question whether other fundamental problems, such as weakly supervised semantic segmentation (WSSS), can benefit from prompt tuning. Our findings reveal two interesting observations that shed light on the impact of prompt tuning on WSSS. First, modifying only the class token of the text prompt results in a greater impact on the Class Activation Map (CAM), compared to arguably more complex strategies that optimize the context. And second, the class token associated with the image ground truth does not necessarily correspond to the category that yields the best CAM. Motivated by these observations, we introduce a novel approach based on a PrOmpt cLass lEarning (POLE) strategy. Through extensive experiments we demonstrate that our simple, yet efficient approach achieves SOTA performance in a well-known WSSS benchmark. These results highlight not only the benefits of language-vision models in WSSS but also the potential of prompt learning for this problem. The code is available at https://github.com/rB080/WSS_POLE.
    摘要 近期,基于CLIP的方法在泛化和少量学习任务上表现出色,受到语言视觉预训的能力的推动。特别是在任务相关的文本渠道上进行预训,以适应下游任务的灵活性。在这种进步的推动下,我们问题是否可以利用预训来解决基本的问题,如弱监督Semantic Segmentation(WSSS)。我们的发现是,只 modify文本提示中的类token,对于Class Activation Map(CAM)产生了更大的影响,相比较复杂的策略,如上下文优化。其次,与图像真实标签相关的类token不一定是导致CAM最佳化的类别。这些发现引发了我们提出一种基于Prompt Class Learning(POLE)策略的新方法。通过广泛的实验,我们证明了我们的简单 yet efficient approach可以在一个知名的WSSS标准 bencmark中达到SOTA性能。这些结果不仅 highlight了语言视觉模型在WSSS中的优势,还提醒了预训的潜在在这个问题上的潜力。代码可以在https://github.com/rB080/WSS_POLE上获取。

A Parts Based Registration Loss for Detecting Knee Joint Areas

  • paper_url: http://arxiv.org/abs/2307.00083
  • repo_url: None
  • paper_authors: Juha Tiirola
  • for: 本 paper 是为了提高膝关节区域进行微调的精度和效率。
  • methods: 本 paper 使用了一种基于部件损失的微调方法,其中部件是从参照图像中自动提取的抽象特征向量,并且在测试图像上鼓励相应的部件具有与参照图像中相对应的空间配置。
  • results: 本 paper 的实验结果表明,使用该微调方法可以提高膝关节区域的匹配精度和效率。
    Abstract In this paper, a parts based loss is considered for finetune registering knee joint areas. Here the parts are defined as abstract feature vectors with location and they are automatically selected from a reference image. For a test image the detected parts are encouraged to have a similar spatial configuration than the corresponding parts in the reference image.
    摘要 在本文中,我们考虑了基于部件的损失函数进行膝关节区域软件定制。在这里,部件被定义为抽象特征向量,其中包含位置信息。这些部件将从参照图像自动选择出来。对测试图像来说,检测到的部件将被鼓励与对应的参照图像中的部件具有类似的空间配置。

Situated Cameras, Situated Knowledges: Towards an Egocentric Epistemology for Computer Vision

  • paper_url: http://arxiv.org/abs/2307.00064
  • repo_url: None
  • paper_authors: Samuel Goree, David Crandall
  • for: 讲述科学知识的 feminist epistemology 和 egocentric 计算机视觉之间的关系
  • methods: 使用质量的、人类中心的方法,以补充性能指标,强调人群工作者的literal和metaphorical视角
  • results: argued for the use of qualitative, human-centric methods as a complement to performance benchmarks, to center both the literal and metaphorical perspective of human crowd workers in CV.
    Abstract In her influential 1988 paper, Situated Knowledges, Donna Haraway uses vision and perspective as a metaphor to discuss scientific knowledge. Today, egocentric computer vision discusses many of the same issues, except in a literal vision context. In this short position paper, we collapse that metaphor, and explore the interactions between feminist epistemology and egocentric CV as "Egocentric Epistemology." Using this framework, we argue for the use of qualitative, human-centric methods as a complement to performance benchmarks, to center both the literal and metaphorical perspective of human crowd workers in CV.
    摘要 哈拉韦(Donna Haraway)在1988年的著名论文《固有知识》中使用视野和视点作为比喻来讨论科学知识。今天, Egocentric Computer Vision(EGOCV)讨论了这些问题,但在literal视点上。在这篇短论文中,我们将这个比喻 collapse,并探讨 feminist epistemology 和 EGOCV 之间的交互。我们 argue 使用质量的、人类中心的方法作为性能指标的补充,以中心 literal 和比喻的人群工作者的视点在 CV 中。

Hardwiring ViT Patch Selectivity into CNNs using Patch Mixing

  • paper_url: http://arxiv.org/abs/2306.17848
  • repo_url: None
  • paper_authors: Ariel N. Lee, Sarah Adel Bargal, Janavi Kasera, Stan Sclaroff, Kate Saenko, Nataniel Ruiz
  • For: The paper aims to investigate whether CNNs can be trained to have the same ability as ViTs to ignore out-of-context information and handle occlusions using a data augmentation method called Patch Mixing.* Methods: The paper uses Patch Mixing to train both CNNs and ViTs and assesses their performance on occlusion benchmarks.* Results: The paper finds that ViTs do not improve nor degrade when trained using Patch Mixing, but CNNs acquire new capabilities to ignore out-of-context information and improve on occlusion benchmarks.Here are the three points in Simplified Chinese:* For: 这个研究的目标是看看 whether CNNs 可以通过 Patch Mixing 数据增强方法来模拟 VITs 的忽略外 context 信息和处理 occlusion 能力。* Methods: 这个研究使用 Patch Mixing 来训练 CNNs 和 VITs,并评估它们在 occlusion benchmarck 上的表现。* Results: 研究发现,VITs 不会因 Patch Mixing 训练而改善或下降,但是 CNNs 通过 Patch Mixing 训练获得了新的忽略外 context 信息和改善 occlusion benchmark 的能力。
    Abstract Vision transformers (ViTs) have significantly changed the computer vision landscape and have periodically exhibited superior performance in vision tasks compared to convolutional neural networks (CNNs). Although the jury is still out on which model type is superior, each has unique inductive biases that shape their learning and generalization performance. For example, ViTs have interesting properties with respect to early layer non-local feature dependence, as well as self-attention mechanisms which enhance learning flexibility, enabling them to ignore out-of-context image information more effectively. We hypothesize that this power to ignore out-of-context information (which we name $\textit{patch selectivity}$), while integrating in-context information in a non-local manner in early layers, allows ViTs to more easily handle occlusion. In this study, our aim is to see whether we can have CNNs $\textit{simulate}$ this ability of patch selectivity by effectively hardwiring this inductive bias using Patch Mixing data augmentation, which consists of inserting patches from another image onto a training image and interpolating labels between the two image classes. Specifically, we use Patch Mixing to train state-of-the-art ViTs and CNNs, assessing its impact on their ability to ignore out-of-context patches and handle natural occlusions. We find that ViTs do not improve nor degrade when trained using Patch Mixing, but CNNs acquire new capabilities to ignore out-of-context information and improve on occlusion benchmarks, leaving us to conclude that this training method is a way of simulating in CNNs the abilities that ViTs already possess. We will release our Patch Mixing implementation and proposed datasets for public use. Project page: https://arielnlee.github.io/PatchMixing/
    摘要 计算机视觉领域中,视传播器(ViTs)已经发生了重大的变革,并且在视觉任务中 periodic 表现出了更高的性能比 CNNs。虽然哪个模型类型是更好的仍然是一个问题,但每种模型都具有独特的推导偏见,这些偏见会影响它们的学习和泛化性能。例如,ViTs在 early layer 非本地特征依赖方面有许多有趣的性质,同时具有自注意机制,这使得它们可以更好地忽略不相关的图像信息,并且可以更好地处理 occlusion。在本研究中,我们想要看看我们可以使 CNNs 通过有效地硬编码这种偏见来模拟 ViTs 的能力。特别是,我们使用 Patch Mixing 数据增强来训练 state-of-the-art ViTs 和 CNNs,并评估其对不相关的 patch 和自然遮挡的影响。我们发现,ViTs 不会因 Patch Mixing 训练而改善或恶化,但 CNNs 可以通过 Patch Mixing 训练获得忽略不相关信息的新能力,并在 occlusion 标准测试中提高表现。因此,我们结论认为,这种训练方法可以在 CNNs 中模拟 ViTs 所拥有的能力。我们将在项目页面上发布我们的 Patch Mixing 实现和建议的数据集。项目页面:https://arielnlee.github.io/PatchMixing/

Magic123: One Image to High-Quality 3D Object Generation Using Both 2D and 3D Diffusion Priors

  • paper_url: http://arxiv.org/abs/2306.17843
  • repo_url: https://github.com/guochengqian/magic123
  • paper_authors: Guocheng Qian, Jinjie Mai, Abdullah Hamdi, Jian Ren, Aliaksandr Siarohin, Bing Li, Hsin-Ying Lee, Ivan Skorokhodov, Peter Wonka, Sergey Tulyakov, Bernard Ghanem
  • for: 生成从单个未处理图像中的高质量、有Texture的3D模型
  • methods: 使用两个stage的Coarse-to-fineapproach,首先优化神经辐射场生成粗糙Geometry,然后采用具有内存效率的可微分网格表示方法生成高分辨率网格
  • results: 证明在Synthetic benchmark和多种真实图像上具有显著改进,并且提供了可用于下载代码、模型和生成的3D资产的链接
    Abstract We present Magic123, a two-stage coarse-to-fine approach for high-quality, textured 3D meshes generation from a single unposed image in the wild using both2D and 3D priors. In the first stage, we optimize a neural radiance field to produce a coarse geometry. In the second stage, we adopt a memory-efficient differentiable mesh representation to yield a high-resolution mesh with a visually appealing texture. In both stages, the 3D content is learned through reference view supervision and novel views guided by a combination of 2D and 3D diffusion priors. We introduce a single trade-off parameter between the 2D and 3D priors to control exploration (more imaginative) and exploitation (more precise) of the generated geometry. Additionally, we employ textual inversion and monocular depth regularization to encourage consistent appearances across views and to prevent degenerate solutions, respectively. Magic123 demonstrates a significant improvement over previous image-to-3D techniques, as validated through extensive experiments on synthetic benchmarks and diverse real-world images. Our code, models, and generated 3D assets are available at https://github.com/guochengqian/Magic123.
    摘要 我们介绍Magic123,一种两阶段粗糙到细节的方法,用单张未处理图像在野外生成高质量、纹理化3D mesh。在第一阶段,我们优化神经辐射场以生成粗糙的几何体。在第二阶段,我们采用内存效率高的可 diffeomorphic mesh表示来生成高分辨率的网格,并且使用参考视图监督和2D和3D扩散约束来学习3D内容。我们引入一个单一的变量来控制2D和3D约束之间的平衡,以便在生成几何体时进行更好的探索和利用。此外,我们采用文本反转和单目深度正则化来促进视图之间的一致性和避免崩溃解决方案。Magic123在前一代图像到3D技术的基础上显著提高了性能,经过了广泛的synthetic benchmark和多种实际图像的测试。我们的代码、模型和生成的3D资产可以在https://github.com/guochengqian/Magic123中下载。

Federated Ensemble YOLOv5 - A Better Generalized Object Detection Algorithm

  • paper_url: http://arxiv.org/abs/2306.17829
  • repo_url: None
  • paper_authors: Vinit Hegiste, Tatjana Legler, Martin Ruskowski
  • for: 这篇研究旨在探讨 Federated Learning(FL)在物件探测中的应用,以提高模型的通用性,并与中央训练方法进行比较。
  • methods: 本研究使用 Federated Averaging(FED Avg)和 Federated SGD(FED SGD)来训练 YOLOv5 模型,并使用随机抽样法无替换地将客户端上的数据分配给不同的客户端。
  • results: 实验结果显示,使用 FL 训练的 YOLOv5 模型在测试集上显示出更高的精度和更好的通用性,尤其是在测试集中包含了两个不同的客户端数据,并且这些数据不在训练集中。这些结果表明,FL 可以被视为一种聚合算法,类似于 Bagging 和 Boosting 技术的聚合。因此,FL 不只是一种关于隐私的方法,还是一种可以提高机器学习模型的性能的方法。
    Abstract Federated learning (FL) has gained significant traction as a privacy-preserving algorithm, but the underlying resembles of federated learning algorithm like Federated averaging (FED Avg) or Federated SGD (FED SGD) to ensemble learning algorithms has not been fully explored. The purpose of this paper is to examine the application of FL to object detection as a method to enhance generalizability, and to compare its performance against a centralized training approach for an object detection algorithm. Specifically, we investigate the performance of a YOLOv5 model trained using FL across multiple clients and employ a random sampling strategy without replacement, so each client holds a portion of the same dataset used for centralized training. Our experimental results showcase the superior efficiency of the FL object detector's global model in generating accurate bounding boxes for unseen objects, with the test set being a mixture of objects from two distinct clients not represented in the training dataset. These findings suggest that FL can be viewed from an ensemble algorithm perspective, akin to a synergistic blend of Bagging and Boosting techniques. As a result, FL can be seen not only as a method to enhance privacy, but also as a method to enhance the performance of a machine learning model.
    摘要

Look, Remember and Reason: Visual Reasoning with Grounded Rationales

  • paper_url: http://arxiv.org/abs/2306.17778
  • repo_url: None
  • paper_authors: Apratim Bhattacharyya, Sunny Panchal, Mingu Lee, Reza Pourreza, Pulkit Madan, Roland Memisevic
  • for: 本研究旨在探讨大型自然语言模型在视觉理解任务中的表现,以及如何使用人类视觉解决方法来改进模型的表现。
  • methods: 本研究使用了人类视觉解决方法,即“看、记忆、理解”的三步过程,将视觉信息逐步提取并融合到理解过程中。此外,研究还引入了视觉输入的理由,以便将低级视觉能力,如物体识别和跟踪,作为副任务来增强模型的表现。
  • results: 研究表明,通过引入人类视觉解决方法和低级视觉能力,可以使现有的大型自然语言模型在多种视觉理解任务中表现竞争力强,并且在CLEVR、CATER和ACRE数据集上达到了state-of-the-art水平。
    Abstract Large language models have recently shown human level performance on a variety of reasoning tasks. However, the ability of these models to perform complex visual reasoning has not been studied in detail yet. A key challenge in many visual reasoning tasks is that the visual information needs to be tightly integrated in the reasoning process. We propose to address this challenge by drawing inspiration from human visual problem solving which depends on a variety of low-level visual capabilities. It can often be cast as the three step-process of ``Look, Remember, Reason'': visual information is incrementally extracted using low-level visual routines in a step-by-step fashion until a final answer is reached. We follow the same paradigm to enable existing large language models, with minimal changes to the architecture, to solve visual reasoning problems. To this end, we introduce rationales over the visual input that allow us to integrate low-level visual capabilities, such as object recognition and tracking, as surrogate tasks. We show competitive performance on diverse visual reasoning tasks from the CLEVR, CATER, and ACRE datasets over state-of-the-art models designed specifically for these tasks.
    摘要 We propose a similar approach to enable existing large language models to solve visual reasoning problems with minimal changes to the architecture. To integrate low-level visual capabilities, such as object recognition and tracking, we introduce rationales over the visual input. Our approach allows us to demonstrate competitive performance on diverse visual reasoning tasks from the CLEVR, CATER, and ACRE datasets, outperforming state-of-the-art models designed specifically for these tasks.

MTR++: Multi-Agent Motion Prediction with Symmetric Scene Modeling and Guided Intention Querying

  • paper_url: http://arxiv.org/abs/2306.17770
  • repo_url: https://github.com/sshaoshuai/mtr
  • paper_authors: Shaoshuai Shi, Li Jiang, Dengxin Dai, Bernt Schiele
  • for: 提高自动驾驶系统的预测能力,更好地理解交通参与者的行为和环境 context。
  • methods: 提出了 Motion TRansformer(MTR)框架,使用变换器编码器-解码器结构,并增加了学习目的查询,以提高fficient和准确地预测未来轨迹。
  • results: 在多种 benchmark 上达到了状态zig-zag的表现,而 MTR++ 框架在多个 agent 同时预测多modal motion 方面表现出色,超越了 MTR 框架。
    Abstract Motion prediction is crucial for autonomous driving systems to understand complex driving scenarios and make informed decisions. However, this task is challenging due to the diverse behaviors of traffic participants and complex environmental contexts. In this paper, we propose Motion TRansformer (MTR) frameworks to address these challenges. The initial MTR framework utilizes a transformer encoder-decoder structure with learnable intention queries, enabling efficient and accurate prediction of future trajectories. By customizing intention queries for distinct motion modalities, MTR improves multimodal motion prediction while reducing reliance on dense goal candidates. The framework comprises two essential processes: global intention localization, identifying the agent's intent to enhance overall efficiency, and local movement refinement, adaptively refining predicted trajectories for improved accuracy. Moreover, we introduce an advanced MTR++ framework, extending the capability of MTR to simultaneously predict multimodal motion for multiple agents. MTR++ incorporates symmetric context modeling and mutually-guided intention querying modules to facilitate future behavior interaction among multiple agents, resulting in scene-compliant future trajectories. Extensive experimental results demonstrate that the MTR framework achieves state-of-the-art performance on the highly-competitive motion prediction benchmarks, while the MTR++ framework surpasses its precursor, exhibiting enhanced performance and efficiency in predicting accurate multimodal future trajectories for multiple agents.
    摘要 <> translate("Motion prediction is crucial for autonomous driving systems to understand complex driving scenarios and make informed decisions. However, this task is challenging due to the diverse behaviors of traffic participants and complex environmental contexts. In this paper, we propose Motion TRansformer (MTR) frameworks to address these challenges. The initial MTR framework utilizes a transformer encoder-decoder structure with learnable intention queries, enabling efficient and accurate prediction of future trajectories. By customizing intention queries for distinct motion modalities, MTR improves multimodal motion prediction while reducing reliance on dense goal candidates. The framework comprises two essential processes: global intention localization, identifying the agent's intent to enhance overall efficiency, and local movement refinement, adaptively refining predicted trajectories for improved accuracy. Moreover, we introduce an advanced MTR++ framework, extending the capability of MTR to simultaneously predict multimodal motion for multiple agents. MTR++ incorporates symmetric context modeling and mutually-guided intention querying modules to facilitate future behavior interaction among multiple agents, resulting in scene-compliant future trajectories. Extensive experimental results demonstrate that the MTR framework achieves state-of-the-art performance on the highly-competitive motion prediction benchmarks, while the MTR++ framework surpasses its precursor, exhibiting enhanced performance and efficiency in predicting accurate multimodal future trajectories for multiple agents.")Here's the translation:自动驾驶系统需要有效预测动向,以便更好地理解复杂的驾驶场景和做出 Informed Decisions。然而,这个任务受到交通参与者的多样化行为和环境上下文的复杂性的限制。在这篇论文中,我们提出了 Motion TRansformer(MTR)框架,以解决这些挑战。MTR使用了可学习的意图查询,以提高效率和准确性。通过对不同的动态模式进行定制意图查询,MTR可以提高多模态动向预测,同时减少依赖于稠密目标候选人。MTR框架包括两个基本过程:全局意图划定,用于提高总体效率,以及本地运动细化,用于提高预测的准确性。此外,我们还提出了 MTR++ 框架,它可以同时预测多个 Agent 的多模态动向。MTR++ 框架包括对称上下文模型和互相引导意图查询模块,以便在多个 Agent 之间互动,并生成符合场景的未来轨迹。实验结果表明,MTR 框架在高度竞争的动向预测Benchmark上 achieve state-of-the-art 性能,而 MTR++ 框架在其前一代的基础上,具有更高的性能和效率,可以准确预测多个 Agent 的多模态未来轨迹。

cs.AI - 2023-07-01

Minimizing Energy Consumption of Deep Learning Models by Energy-Aware Training

  • paper_url: http://arxiv.org/abs/2307.00368
  • repo_url: None
  • paper_authors: Dario Lazzaro, Antonio Emanuele Cinà, Maura Pintor, Ambra Demontis, Battista Biggio, Fabio Roli, Marcello Pelillo
  • for: 降低深度学习模型的能耗
  • methods: 使用梯度算法对模型训练进行精度补偿,以提高模型的能效性
  • results: 通过三个数据集和两种深度神经网络的实验分析,我们证明了我们的能源意识训练算法EAT可以培育出具有更好的平衡 между分类性能和能效性的网络。
    Abstract Deep learning models undergo a significant increase in the number of parameters they possess, leading to the execution of a larger number of operations during inference. This expansion significantly contributes to higher energy consumption and prediction latency. In this work, we propose EAT, a gradient-based algorithm that aims to reduce energy consumption during model training. To this end, we leverage a differentiable approximation of the $\ell_0$ norm, and use it as a sparse penalty over the training loss. Through our experimental analysis conducted on three datasets and two deep neural networks, we demonstrate that our energy-aware training algorithm EAT is able to train networks with a better trade-off between classification performance and energy efficiency.
    摘要 深度学习模型在推理过程中的参数数量呈现显著增长趋势,导致执行更多的操作,从而增加能耗和预测延迟。在本工作中,我们提出了EAT算法,通过对训练损失使用可导的$\ell_0$范数惩罚,以降低训练过程中的能耗。我们通过对三个数据集和两个深度神经网络进行实验分析,证明了EAT算法可以培育更好的对比性和能效性之间的平衡。

CephGPT-4: An Interactive Multimodal Cephalometric Measurement and Diagnostic System with Visual Large Language Model

  • paper_url: http://arxiv.org/abs/2307.07518
  • repo_url: None
  • paper_authors: Lei Ma, Jincong Han, Zhaoxin Wang, Dian Zhang
  • for: 这个研究旨在开发一个基于多modal cephalometric医疗数据的诊断语言模型,以提高颜面医学评估和诊断的精度和效率。
  • methods: 本研究使用了多modal cephalometric数据,包括颜面影像和医生与病人之间的对话数据,并使用了U-net自动分析颜面特征点和生成诊断报告。然后,这些数据被精度地调整在Minigpt-4和VisualGLM上,以提高诊断的精度和可靠性。
  • results: 研究结果显示,CephGPT-4模型在诊断上表现出色,具有巨大的应用潜力,可能将改变颜面医学评估和诊断的方式。这些创新可能将在颜面医学中产生革命性的影响。
    Abstract Large-scale multimodal language models (LMMs) have achieved remarkable success in general domains. However, the exploration of diagnostic language models based on multimodal cephalometric medical data remains limited. In this paper, we propose a novel multimodal cephalometric analysis and diagnostic dialogue model. Firstly, a multimodal orthodontic medical dataset is constructed, comprising cephalometric images and doctor-patient dialogue data, with automatic analysis of cephalometric landmarks using U-net and generation of diagnostic reports. Then, the cephalometric dataset and generated diagnostic reports are separately fine-tuned on Minigpt-4 and VisualGLM. Results demonstrate that the CephGPT-4 model exhibits excellent performance and has the potential to revolutionize orthodontic measurement and diagnostic applications. These innovations hold revolutionary application potential in the field of orthodontics.
    摘要 大规模多Modal语言模型(LMMs)在通用领域已经取得了很大成功。然而,对多Modal cephalometric医疗数据的诊断语言模型的探索仍然受限。在这篇论文中,我们提出了一种新的多Modal cephalometric分析和诊断对话模型。首先,我们构建了一个多Modal cephalometric医疗数据集,包括 Cephalometric图像和医生-病人对话数据,并自动分析 Cephalometric 特征点使用 U-net 生成诊断报告。然后, Cephalometric 数据集和生成的诊断报告分别在 Minigpt-4 和 VisualGLM 上进行了精细调整。结果表明,CephGPT-4 模型在性能方面表现出色,有可能在 ortodontic 测量和诊断应用中引领时代。这些创新拥有在orthodontics 领域的革命性应用潜力。

The future of human-centric eXplainable Artificial Intelligence (XAI) is not post-hoc explanations

  • paper_url: http://arxiv.org/abs/2307.00364
  • repo_url: None
  • paper_authors: Vinitra Swamy, Jibril Frej, Tanja Käser
  • for: 本文提出了一个呼吁,即从现有的黑obox模型解释方法中扩展到设计可解释的神经网络架构,以解决现有的解释器有限制的问题。
  • methods: 本文提出了两种解释器设计方法,包括适应路由的可解释 conditional computation,以及诊断标准的Iterative Model Learning。
  • results: 本文认为,未来的人类中心的XAI不应该仅仅是解释黑obox,而是通过设计可解释的神经网络来实现。
    Abstract Explainable Artificial Intelligence (XAI) plays a crucial role in enabling human understanding and trust in deep learning systems, often defined as determining which features are most important to a model's prediction. As models get larger, more ubiquitous, and pervasive in aspects of daily life, explainability is necessary to avoid or minimize adverse effects of model mistakes. Unfortunately, current approaches in human-centric XAI (e.g. predictive tasks in healthcare, education, or personalized ads) tend to rely on a single explainer. This is a particularly concerning trend when considering that recent work has identified systematic disagreement in explainability methods when applied to the same points and underlying black-box models. In this paper, we therefore present a call for action to address the limitations of current state-of-the-art explainers. We propose to shift from post-hoc explainability to designing interpretable neural network architectures; moving away from approximation techniques in human-centric and high impact applications. We identify five needs of human-centric XAI (real-time, accurate, actionable, human-interpretable, and consistent) and propose two schemes for interpretable-by-design neural network workflows (adaptive routing for interpretable conditional computation and diagnostic benchmarks for iterative model learning). We postulate that the future of human-centric XAI is neither in explaining black-boxes nor in reverting to traditional, interpretable models, but in neural networks that are intrinsically interpretable.
    摘要 人工智能可解释(XAI)在深度学习系统中扮演了关键角色,通常是确定模型预测中最重要的特征。随着模型变得更大、更普遍和生活中更加普遍,解释性变得必不可少,以避免或减少模型错误的不良影响。然而,当前的人类中心XAI方法(如医疗预测任务、教育预测任务或个性化广告)通常仅仅依赖于单一的解释器。这是一个特别担忧的趋势,因为最近的研究发现了不同的解释方法在处理同一个点和下面的黑盒模型时存在系统性的不一致。在这篇论文中,我们因此提出了一个呼吁,以解决当前state-of-the-art解释器的局限性。我们建议从后期解释转移到设计可解释的神经网络架构,停止使用近似技术在人类中心和高Impact应用中。我们认为人类中心XAI的未来不在于解释黑盒子,nor在返回到传统可解释的模型,而在于内置可解释的神经网络架构。我们 identific five needs of human-centered XAI(实时、准确、操作可能、人类可解释和一致),并提出了两种可解释设计神经网络工作流程(可变路由对 conditional computation的解释和诊断指标 дляiterative模型学习)。我们认为未来的人类中心XAI是不是解释黑盒子,nor在返回到传统可解释的模型,而在于内置可解释的神经网络架构。

A Comparative Study of Machine Learning Algorithms for Anomaly Detection in Industrial Environments: Performance and Environmental Impact

  • paper_url: http://arxiv.org/abs/2307.00361
  • repo_url: None
  • paper_authors: Álvaro Huertas-García, Carlos Martí-González, Rubén García Maezo, Alejandro Echeverría Rey
  • for: 本研究旨在开发一种可持续的人工智能(AI)和机器学习(ML)模型,用于避免异常检测中的高计算需求和相关的环境影响。
  • methods: 本研究采用了多种机器学习算法和不同的多层感知器(MLP)配置,并且对这些模型进行了仔细的评估。
  • results: 研究发现, tradicional的机器学习算法如决策树和随机森林可以达到强健的性能和效率,但是优化后的MLP配置可以提供更高的性能,尽管与此同时也增加了资源的消耗。
    Abstract In the context of Industry 4.0, the use of artificial intelligence (AI) and machine learning for anomaly detection is being hampered by high computational requirements and associated environmental effects. This study seeks to address the demands of high-performance machine learning models with environmental sustainability, contributing to the emerging discourse on 'Green AI.' An extensive variety of machine learning algorithms, coupled with various Multilayer Perceptron (MLP) configurations, were meticulously evaluated. Our investigation encapsulated a comprehensive suite of evaluation metrics, comprising Accuracy, Area Under the Curve (AUC), Recall, Precision, F1 Score, Kappa Statistic, Matthews Correlation Coefficient (MCC), and F1 Macro. Simultaneously, the environmental footprint of these models was gauged through considerations of time duration, CO2 equivalent, and energy consumption during the training, cross-validation, and inference phases. Traditional machine learning algorithms, such as Decision Trees and Random Forests, demonstrate robust efficiency and performance. However, superior outcomes were obtained with optimised MLP configurations, albeit with a commensurate increase in resource consumption. The study incorporated a multi-objective optimisation approach, invoking Pareto optimality principles, to highlight the trade-offs between a model's performance and its environmental impact. The insights derived underscore the imperative of striking a balance between model performance, complexity, and environmental implications, thus offering valuable directions for future work in the development of environmentally conscious machine learning models for industrial applications.
    摘要 在第四次工业 revolution中,人工智能(AI)和机器学习(ML)的异常检测使用受到高度计算需求和相关环境影响的限制。本研究旨在满足高性能机器学习模型的环境可持续性,贡献到emerging discourse on 'Green AI'。我们对多种机器学习算法和多层感知器(MLP)的不同配置进行了仔细评估。我们的调查包括了多个评价指标,包括准确率、抛物线指标(AUC)、回归率、准确率、F1 Score、κ统计、玛特各自相互关系系数(MCC)和F1 Macro。同时,我们对这些模型的环境影响进行了评估,包括训练、批处理和推理阶段的时间长度、CO2等价和能 consumption。传统的机器学习算法,如决策树和随机森林,表现了强大的效率和性能。然而,优化的 MLP 配置可以提供更高的性能,尽管与此同时,资源占用也增加了。我们的研究涉及了多目标优化方法, invoke Pareto 优化原理,以阐明模型性能和环境影响之间的负面关系。研究结果表明,在开发工业应用中的机器学习模型时,必须坚持 struck a balance между模型性能、复杂性和环境影响,从而提供有价值的指导 для未来的工作。

When Synthetic Data Met Regulation

  • paper_url: http://arxiv.org/abs/2307.00359
  • repo_url: None
  • paper_authors: Georgi Ganev
  • for: 本研究 argue that differentially private generative models 生成的伪数据可以具有足够的隐私保护,因此可以被认为是匿名数据和符合法规的。
  • methods: 本研究使用 differentially private generative models,such as 隐私均衡生成器和匿名生成器,来生成伪数据。
  • results: 本研究结果表明,使用 differentially private generative models 生成的伪数据可以具有足够的隐私保护,并且可以满足不同的隐私保护标准。
    Abstract In this paper, we argue that synthetic data produced by Differentially Private generative models can be sufficiently anonymized and, therefore, anonymous data and regulatory compliant.
    摘要 在这篇论文中,我们 argued that由Diff privacy的生成模型生成的Synthetic数据可以具有足够的隐私保护,因此可以被视为匿名数据,符合相关的法规。Here's a breakdown of the translation:* "In this paper" is translated as "在这篇论文中" (在这篇论文中).* "we argue" is translated as "我们 argued" (我们 argued).* "that synthetic data produced by Differentially Private generative models can be sufficiently anonymized" is translated as "由Diff privacy的生成模型生成的Synthetic数据可以具有足够的隐私保护" (由Diff privacy的生成模型生成的Synthetic数据可以具有足够的隐私保护).* "and, therefore, anonymous data and regulatory compliant" is translated as "因此可以被视为匿名数据,符合相关的法规" (因此可以被视为匿名数据,符合相关的法规).

Variation-aware Vision Transformer Quantization

  • paper_url: http://arxiv.org/abs/2307.00331
  • repo_url: https://github.com/huangowen/vvtq
  • paper_authors: Xijie Huang, Zhiqiang Shen, Kwang-Ting Cheng
  • for: 本研究旨在提高镜像变换器(ViT)的训练和推理效率,通过量化来减少模型的计算量和存储量。
  • methods: 本研究使用量化敏感训练(QAT)和知识塑型学习(KD)来解决ViT量化中的变化问题,并提出了模块依赖量化方案和变化敏感规则来稳定训练。
  • results: 在ImageNet-1K上,我们实现了2位Swint-T的77.66% Top-1准确率,比前一个状态的量化模型高出3.35%。
    Abstract Despite the remarkable performance of Vision Transformers (ViTs) in various visual tasks, the expanding computation and model size of ViTs have increased the demand for improved efficiency during training and inference. To address the heavy computation and parameter drawbacks, quantization is frequently studied in the community as a representative model compression technique and has seen extensive use on CNNs. However, due to the unique properties of CNNs and ViTs, the quantization applications on ViTs are still limited and underexplored. In this paper, we identify the difficulty of ViT quantization on its unique variation behaviors, which differ from traditional CNN architectures. The variations indicate the magnitude of the parameter fluctuations and can also measure outlier conditions. Moreover, the variation behaviors reflect the various sensitivities to the quantization of each module. The quantization sensitivity analysis and comparison of ViTs with CNNs help us locate the underlying differences in variations. We also find that the variations in ViTs cause training oscillations, bringing instability during quantization-aware training (QAT). Correspondingly, we solve the variation problem with an efficient knowledge-distillation-based variation-aware quantization method. The multi-crop knowledge distillation scheme can accelerate and stabilize the training and alleviate the variation's influence during QAT. We also proposed a module-dependent quantization scheme and a variation-aware regularization term to suppress the oscillation of weights. On ImageNet-1K, we obtain a 77.66% Top-1 accuracy on the extremely low-bit scenario of 2-bit Swin-T, outperforming the previous state-of-the-art quantized model by 3.35%.
    摘要 尽管视觉转换器(ViT)在视觉任务中表现出色,但是它们的计算量和模型大小的增加使得训练和推理的效率提高成为了一个重要问题。为了解决计算量和参数占用的问题,量化被广泛研究和应用于 convolutional neural networks(CNN)中,但是由于ViT的特殊性,对ViT的量化仍然具有一定的挑战和未explored领域。在这篇论文中,我们发现了ViT量化的困难点,即它们的特殊变化行为,与传统CNN结构不同。这些变化指示参数的总体趋势和异常情况的度量,同时还反映每个模块的量化敏感度。我们通过对ViT和CNN的量化敏感度分析和比较,找到了它们之间的主要差异。此外,我们发现了ViT中的变化会导致训练 oscillaions,从而使量化授学(QAT)中的训练不稳定。为了解决这个问题,我们提出了一种高效的知识塑化基于变化感知量化方法。我们的方法包括多个资源知识塑化和模块 dependent quantization scheme,以及一种变化感知 regularization term,可以减少权重的振荡。在ImageNet-1K上,我们实现了2比特的Swin-T模型,达到了77.66%的Top-1准确率,比前一个状态的量化模型提高3.35%。

FedCP: Separating Feature Information for Personalized Federated Learning via Conditional Policy

  • paper_url: http://arxiv.org/abs/2307.01217
  • repo_url: https://github.com/tsingz0/fedcp
  • paper_authors: Jianqing Zhang, Yang Hua, Hao Wang, Tao Song, Zhengui Xue, Ruhui Ma, Haibing Guan
  • for: 这篇论文主要是为了解决个性化联合学习(pFL)中数据来源的问题,通过生成每个样本的条件政策来分别处理全局信息和个性信息。
  • methods: 该方法提出了基于 Federated Conditional Policy(FedCP)的新方法,它通过为每个样本生成一个条件政策,将其特征分为全局信息和个性信息两部分,然后通过全局头和个性头进行处理。相比之下,现有的pFL方法更加注重全局信息和个性信息的混合。
  • results: 在计算机视觉和自然语言处理领域的广泛实验中,FedCP方法比前十一种状态之arteMethods的方法高效,提高了6.69%。此外,FedCP方法在部分客户意外drop out时仍然保持优势,这经常发生在移动设备上。代码可以在https://github.com/TsingZ0/FedCP中找到。
    Abstract Recently, personalized federated learning (pFL) has attracted increasing attention in privacy protection, collaborative learning, and tackling statistical heterogeneity among clients, e.g., hospitals, mobile smartphones, etc. Most existing pFL methods focus on exploiting the global information and personalized information in the client-level model parameters while neglecting that data is the source of these two kinds of information. To address this, we propose the Federated Conditional Policy (FedCP) method, which generates a conditional policy for each sample to separate the global information and personalized information in its features and then processes them by a global head and a personalized head, respectively. FedCP is more fine-grained to consider personalization in a sample-specific manner than existing pFL methods. Extensive experiments in computer vision and natural language processing domains show that FedCP outperforms eleven state-of-the-art methods by up to 6.69%. Furthermore, FedCP maintains its superiority when some clients accidentally drop out, which frequently happens in mobile settings. Our code is public at https://github.com/TsingZ0/FedCP.
    摘要 最近,个性化联合学习(pFL)已经吸引了越来越多的关注,以保护隐私、合作学习以及客户端间的统计差异等方面。现有的大多数pFL方法都是利用客户端级模型参数中的全球信息和个性信息来获得利益,而忽略了数据的来源。为了解决这个问题,我们提出了联合条件策略(FedCP)方法,该方法在每个样本中分离特定信息和个性信息,然后由全球头和个性头进行处理。FedCP比现有的pFL方法更加细化,能够在样本具有特定方式进行个性化。在计算机视觉和自然语言处理领域进行了广泛的实验,并证明FedCP可以与11种现有方法进行比较,在6.69%的提高率下超越其他方法。此外,FedCP在客户端意外退出时仍保持其优势,这经常发生在移动设备上。我们的代码可以在https://github.com/TsingZ0/FedCP中找到。

DoReMi: Grounding Language Model by Detecting and Recovering from Plan-Execution Misalignment

  • paper_url: http://arxiv.org/abs/2307.00329
  • repo_url: None
  • paper_authors: Yanjiang Guo, Yen-Jen Wang, Lihan Zha, Zheyuan Jiang, Jianyu Chen
  • For: This paper aims to improve the grounding of language models in robotic tasks to ensure that the sequences generated by the language model are both logically correct and practically executable, and to recover from misalignments between plan and execution.* Methods: The proposed method, DoReMi, leverages large language models (LLMs) for both planning and generating constraints for planned steps, and uses a vision question answering (VQA) model to check constraints during low-level skill execution. If certain misalignment occurs, the method will call the language model to re-plan in order to recover from misalignments.* Results: Experiments on various complex tasks including robot arms and humanoid robots demonstrate that DoReMi can lead to higher task success rates and shorter task completion times. Videos of DoReMi are available at https://sites.google.com/view/doremi-paper.
    Abstract Large language models encode a vast amount of semantic knowledge and possess remarkable understanding and reasoning capabilities. Previous research has explored how to ground language models in robotic tasks to ensure that the sequences generated by the language model are both logically correct and practically executable. However, low-level execution may deviate from the high-level plan due to environmental perturbations or imperfect controller design. In this paper, we propose DoReMi, a novel language model grounding framework that enables immediate Detection and Recovery from Misalignments between plan and execution. Specifically, LLMs are leveraged for both planning and generating constraints for planned steps. These constraints can indicate plan-execution misalignments and we use a vision question answering (VQA) model to check constraints during low-level skill execution. If certain misalignment occurs, our method will call the language model to re-plan in order to recover from misalignments. Experiments on various complex tasks including robot arms and humanoid robots demonstrate that our method can lead to higher task success rates and shorter task completion times. Videos of DoReMi are available at https://sites.google.com/view/doremi-paper.
    摘要 大型语言模型储存了巨量的 semantics 知识,具有惊人的理解和逻辑能力。先前的研究已经探讨了如何将语言模型绑定到机器人任务中,以确保生成的序列是有alogical Correct 并可执行。然而,低级执行可能会与高级计划不符 due to environmental perturbations 或 imperfect controller design。在这篇论文中,我们提出了 DoReMi,一种新的语言模型固定框架,可以立即探测和恢复 plan-execution 的偏差。具体来说,我们利用 LLMS 进行规划和生成约束,这些约束可以标识 plan-execution 的偏差,并使用视觉问答(VQA)模型来在低级技能执行过程中检查约束。如果发生certain misalignment,我们的方法会让语言模型重新规划,以便从偏差中恢复。我们在不同的复杂任务,包括机器人臂和人iform robot,进行了多个实验,结果表明,我们的方法可以提高任务成功率和任务完成时间。视频 demo 可以在https://sites.google.com/view/doremi-paper 找到。

SHARCS: Shared Concept Space for Explainable Multimodal Learning

  • paper_url: http://arxiv.org/abs/2307.00316
  • repo_url: https://github.com/gabriele-dominici/SHARCS
  • paper_authors: Gabriele Dominici, Pietro Barbiero, Lucie Charlotte Magister, Pietro Liò, Nikola Simidjievski
  • for: 本文旨在提出一种可解释的多模态学习方法,以便在复杂的实际问题中解决问题,而不需要培育大量数据。
  • methods: 本文使用了SHARCS(共享概念空间)approach,它可以将不同的多 modalities 映射到单一的概念 manifold 中,从而实现INTUITIVE的cross-modal概念映射。
  • results: 实验结果表明,SHARCS可以实现高度可解释的任务预测,同时也可以提高下游预测性能。此外,SHARCS可以在实际 significannot 情况下运行,如缺失modalities的检索和cross-modal解释等。
    Abstract Multimodal learning is an essential paradigm for addressing complex real-world problems, where individual data modalities are typically insufficient to accurately solve a given modelling task. While various deep learning approaches have successfully addressed these challenges, their reasoning process is often opaque; limiting the capabilities for a principled explainable cross-modal analysis and any domain-expert intervention. In this paper, we introduce SHARCS (SHARed Concept Space) -- a novel concept-based approach for explainable multimodal learning. SHARCS learns and maps interpretable concepts from different heterogeneous modalities into a single unified concept-manifold, which leads to an intuitive projection of semantically similar cross-modal concepts. We demonstrate that such an approach can lead to inherently explainable task predictions while also improving downstream predictive performance. Moreover, we show that SHARCS can operate and significantly outperform other approaches in practically significant scenarios, such as retrieval of missing modalities and cross-modal explanations. Our approach is model-agnostic and easily applicable to different types (and number) of modalities, thus advancing the development of effective, interpretable, and trustworthy multimodal approaches.
    摘要 多Modal学习是现代问题解决的重要方法论,因为各个数据模式通常无法准确地解决给定的模型任务。虽然多种深度学习方法已成功解决这些挑战,但它们的理解过程经常是不透明的,限制了跨模态分析的可解释性和领域专家的干预。本文介绍了SHARCS(共享概念空间)——一种新的概念基于方法,用于可解释的多Modal学习。SHARCS学习和映射不同的多样化模式中的可解释概念,并将它们映射到单一的共享概念空间中,从而导致跨模态相似的概念之间的直观投影。我们示出了这种方法可以导致自然的解释任务预测,同时也提高了下游预测性能。此外,我们还证明了SHARCS可以在实际重要的场景中运行,如缺失模式的检索和跨模态解释。我们的方法是模型不依存的,可以适用于不同类型和数量的模式,因此推动了有效、可解释、可信worthy的多Modal方法的发展。

Gradients Look Alike: Sensitivity is Often Overestimated in DP-SGD

  • paper_url: http://arxiv.org/abs/2307.00310
  • repo_url: None
  • paper_authors: Anvith Thudi, Hengrui Jia, Casey Meehan, Ilia Shumailov, Nicolas Papernot
  • for: 这个论文是为了研究 differentially private stochastic gradient descent (DP-SGD) 算法的隐私分析。
  • methods: 这个论文使用了修改了的每步隐私分析方法,以考虑模型更新的分布。
  • results: 该论文发现,使用这种新的隐私分析方法,可以更好地证明 DP-SGD 在许多数据点上保持隐私。特别是,正确分类的数据点可以获得更好的隐私保证。
    Abstract Differentially private stochastic gradient descent (DP-SGD) is the canonical algorithm for private deep learning. While it is known that its privacy analysis is tight in the worst-case, several empirical results suggest that when training on common benchmark datasets, the models obtained leak significantly less privacy for many datapoints. In this paper, we develop a new analysis for DP-SGD that captures the intuition that points with similar neighbors in the dataset enjoy better privacy than outliers. Formally, this is done by modifying the per-step privacy analysis of DP-SGD to introduce a dependence on the distribution of model updates computed from a training dataset. We further develop a new composition theorem to effectively use this new per-step analysis to reason about an entire training run. Put all together, our evaluation shows that this novel DP-SGD analysis allows us to now formally show that DP-SGD leaks significantly less privacy for many datapoints. In particular, we observe that correctly classified points obtain better privacy guarantees than misclassified points.
    摘要 differentially private stochastic gradient descent (DP-SGD) 是深度学习中的权限保持算法。虽然已知其隐私分析在最坏情况下是紧张的,但一些实际结果表明在常用的 benchmark 数据集上训练时,模型获得的隐私泄露量较少。在这篇论文中,我们开发了一种新的分析方法,它捕捉到数据集中相似的点在隐私保护方面更好的情况。具体来说,我们在 DP-SGD 每步隐私分析中引入了基于训练集计算的模型更新的分布。我们还开发了一种新的组合定理,以便使用这种新的每步分析来考虑整个训练过程。总之,我们的评估表明,这种新的 DP-SGD 分析方法可以让我们正式地表明 DP-SGD 在许多数据点上隐私泄露量较少。具体来说,我们发现正确分类的点在隐私保护方面获得更好的保证。Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.

SyMFM6D: Symmetry-aware Multi-directional Fusion for Multi-View 6D Object Pose Estimation

  • paper_url: http://arxiv.org/abs/2307.00306
  • repo_url: https://github.com/boschresearch/symfm6d
  • paper_authors: Fabian Duffhauss, Sebastian Koch, Hanna Ziesche, Ngo Anh Vien, Gerhard Neumann
  • for: automated systems to interact safely with the environment
  • methods: multi-view 6D pose estimator called SyMFM6D, deep multi-directional fusion network, least-squares fitting
  • results: significantly outperforms the state-of-the-art in both single-view and multi-view 6D pose estimation, robust towards inaccurate camera calibration and dynamic camera setupsHere’s the simplified Chinese text:
  • for: 自动化系统与环境互动安全
  • methods: 多视角6D姿态估测器called SyMFM6D, 深度多向统合网络, 最小二乘推算
  • results: 与状态顶尖优化, 具有实验室对照测试和类比测试的优化性Please note that the simplified Chinese text is written in the traditional Chinese format, which is different from the simplified Chinese format used in mainland China.
    Abstract Detecting objects and estimating their 6D poses is essential for automated systems to interact safely with the environment. Most 6D pose estimators, however, rely on a single camera frame and suffer from occlusions and ambiguities due to object symmetries. We overcome this issue by presenting a novel symmetry-aware multi-view 6D pose estimator called SyMFM6D. Our approach efficiently fuses the RGB-D frames from multiple perspectives in a deep multi-directional fusion network and predicts predefined keypoints for all objects in the scene simultaneously. Based on the keypoints and an instance semantic segmentation, we efficiently compute the 6D poses by least-squares fitting. To address the ambiguity issues for symmetric objects, we propose a novel training procedure for symmetry-aware keypoint detection including a new objective function. Our SyMFM6D network significantly outperforms the state-of-the-art in both single-view and multi-view 6D pose estimation. We furthermore show the effectiveness of our symmetry-aware training procedure and demonstrate that our approach is robust towards inaccurate camera calibration and dynamic camera setups.
    摘要 检测对象和估算其6D姿态是自动化系统与环境安全交互的关键。大多数6D姿态估计器, however,依赖单个摄像头帧并受到遮挡和对象 симметрии的歧义所迷惑。我们解决这个问题,通过提出一种新的对称意识多视点6D姿态估计器,称为 SyMFM6D。我们的方法可以有效地将RGB-D帧从多个视角集成到深度多向ional fusion网络中,并同时预测场景中所有对象的预定的关键点。基于关键点和实例Semantic部分,我们高效地计算6D姿态。为了解决对称对象的歧义问题,我们提议一种新的训练程序,包括一个新的目标函数。我们的SyMFM6D网络在单个视角和多视角6D姿态估计中都显著超过了状态艺术。我们进一步验证了我们的对称意识训练程序的有效性,并示出我们的方法对不正确的摄像头尺寸和动态摄像头设置是稳定的。

SysNoise: Exploring and Benchmarking Training-Deployment System Inconsistency

  • paper_url: http://arxiv.org/abs/2307.00280
  • repo_url: None
  • paper_authors: Yan Wang, Yuhang Li, Ruihao Gong, Aishan Liu, Yanfei Wang, Jian Hu, Yongqiang Yao, Yunchen Zhang, Tianzi Xiao, Fengwei Yu, Xianglong Liu
  • for: 本研究旨在探讨深度学习模型在不同系统实现中的Robustness,即SysNoise问题。
  • methods: 本文首次引入SysNoise,并将其分为三类基于推理阶段。然后,我们构建了一个全面的benchmark,用于量化SysNoise对20多个模型的影响,包括图像分类、物体检测、实例 segmentation和自然语言处理任务。
  • results: 我们的广泛实验显示,SysNoise会对不同任务的模型Robustness产生影响,而常见的mitigation技术如数据增强和对抗训练显示有限的效果。
    Abstract Extensive studies have shown that deep learning models are vulnerable to adversarial and natural noises, yet little is known about model robustness on noises caused by different system implementations. In this paper, we for the first time introduce SysNoise, a frequently occurred but often overlooked noise in the deep learning training-deployment cycle. In particular, SysNoise happens when the source training system switches to a disparate target system in deployments, where various tiny system mismatch adds up to a non-negligible difference. We first identify and classify SysNoise into three categories based on the inference stage; we then build a holistic benchmark to quantitatively measure the impact of SysNoise on 20+ models, comprehending image classification, object detection, instance segmentation and natural language processing tasks. Our extensive experiments revealed that SysNoise could bring certain impacts on model robustness across different tasks and common mitigations like data augmentation and adversarial training show limited effects on it. Together, our findings open a new research topic and we hope this work will raise research attention to deep learning deployment systems accounting for model performance. We have open-sourced the benchmark and framework at https://modeltc.github.io/systemnoise_web.
    摘要 广泛的研究表明,深度学习模型对于陌生和自然噪声极敏感,然而对于不同系统实现引起的噪声知之少。在这篇论文中,我们首次介绍了系统噪声(SysNoise),这是在深度学习训练部署过程中频繁出现,但通常被忽略的噪声。具体来说,SysNoise发生在训练系统转换到目标系统时,其中各种微小系统差异积累而成为一个不可忽略的差异。我们首先将SysNoise分类为三类基于推理阶段,然后构建了一个整体性的测试基准,以量化SysNoise对20多个模型的影响,包括图像分类、物体检测、实例分割和自然语言处理任务。我们的广泛实验表明,SysNoise可以对不同任务的模型Robustness产生一定的影响,而常见的 Mitigation技术如数据扩展和防御学习也对其具有有限的效果。总之,我们的发现开启了一个新的研究领域,我们希望这项工作将引起深度学习部署系统的模型性能考虑。我们已经在https://modeltc.github.io/systemnoise_web上公开了测试基准和框架。

Causing is Achieving – A solution to the problem of causation

  • paper_url: http://arxiv.org/abs/2307.07517
  • repo_url: None
  • paper_authors: Riichiro Mizoguchi
  • for: 本研究旨在解释和模型 causation 问题,即在 premise 上假设 causation 是真的情况下,从应用ontoлогиical 的角度出发。
  • methods: 本研究使用系统功能理论来理解和模型 causation,并通过分析 causal 理论中的四个子函数(Achieves、Prevents、Allows和Disallows)来解释 causation。
  • results: 研究结果表明,causation 可以通过系统功能理论来理解,并且 Achieves 函数是 causation 的核心。然而,Achieves 函数的本质还需要进一步阐述。
    Abstract From the standpoint of applied ontology, the problem of understanding and modeling causation has been recently challenged on the premise that causation is real. As a consequence, the following three results were obtained: (1) causation can be understood via the notion of systemic function; (2) any cause can be decomposed using only four subfunctions, namely Achieves, Prevents, Allows, and Disallows; and (3) the last three subfunctions can be defined in terms of Achieves alone. It follows that the essence of causation lies in a single function, namely Achieves. It remains to elucidate the nature of the Achieves function, which has been elaborated only partially in the previous work. In this paper, we first discuss a couple of underlying policies in the above-mentioned causal theory since these are useful in the discussion, then summarize the results obtained in the former paper, and finally reveal the nature of Achieves giving a complete solution to the problem of what causation is.
    摘要 从应用ontoology的角度来看,理解和模型 causation 的问题在最近遭到了实际存在的假设。因此,以下三个结果得到:(1) causation 可以通过系统功能的概念来理解;(2)任何原因都可以通过四个子功能:成就、防止、允许和禁止来分解;(3)上述三个子功能都可以通过成就来定义。因此, causation 的本质就是成就功能。然而, Achieves 功能的本质仍然未能完全解释。在这篇文章中,我们首先讨论了在上述 causal 理论中的一些基本政策,然后概述了前一篇文章中的结果,最后揭示了 Achieves 功能的本质,从而完全解决了 causation 是什么的问题。

Finding differences in perspectives between designers and engineers to develop trustworthy AI for autonomous cars

  • paper_url: http://arxiv.org/abs/2307.03193
  • repo_url: None
  • paper_authors: Gustav Jonelid, K. R. Larsson
  • for: 本研究旨在探讨开发可靠的人工智能(AI)系统,特别是自动驾驶车辆中的AI系统,以及如何在不同视角下实现可靠性和伦理原则。
  • methods: 本研究采用了多种方法,包括文献综述、案例研究和问题探讨,以探索不同视角下的差异和关键因素,并提出了bridging gap的策略。
  • results: 本研究发现了开发可靠AI系统的三大支柱:透明度、可靠性和安全性。此外,还提出了一些实践的建议,以帮助开发人员在技术进步和伦理原则之间做出妥协。
    Abstract In the context of designing and implementing ethical Artificial Intelligence (AI), varying perspectives exist regarding developing trustworthy AI for autonomous cars. This study sheds light on the differences in perspectives and provides recommendations to minimize such divergences. By exploring the diverse viewpoints, we identify key factors contributing to the differences and propose strategies to bridge the gaps. This study goes beyond the trolley problem to visualize the complex challenges of trustworthy and ethical AI. Three pillars of trustworthy AI have been defined: transparency, reliability, and safety. This research contributes to the field of trustworthy AI for autonomous cars, providing practical recommendations to enhance the development of AI systems that prioritize both technological advancement and ethical principles.
    摘要 在设计和实施优化人工智能(AI)方面,有不同的视点关于开发可靠的AI汽车。这项研究探讨了这些视点的差异,并提供了减讳措施。通过探究多种视点,我们 indentify了关键因素导致差异,并提议了桥渡措施。这项研究超越了拖车问题,可视化了优化和伦理AI的复杂挑战。三个柱子定义了可靠AI的特征:透明度、可靠性和安全性。这项研究对可靠AI汽车的开发做出了实践推荐,以满足技术进步和伦理原则的平衡。

Hierarchical Pretraining for Biomedical Term Embeddings

  • paper_url: http://arxiv.org/abs/2307.00266
  • repo_url: None
  • paper_authors: Bryan Cai, Sihang Zeng, Yucong Lin, Zheng Yuan, Doudou Zhou, Lu Tian
  • for: 本研究旨在使用自然语言处理(NLP)技术对医疗记录(EHR)中的诊断和治疗信息进行数字化处理,以便在临床决策和患者轨迹预测等应用中使用。
  • methods: 本研究使用了表示学习来转化医疗术语为含义嵌入,然后使用这些嵌入作为预测模型的输入特征。为了提高表示学习的效果,研究人员使用了生物医学知识图(biomedical knowledge graph)来练化预训练的语言模型。
  • results: 研究人员通过修改对比损失函数,使得模型能够从层次结构中提取信息,从而学习了对于层次结构的词语对之间的距离。这些词语对之间的距离可以用于进一步的生物医学应用。
    Abstract Electronic health records (EHR) contain narrative notes that provide extensive details on the medical condition and management of patients. Natural language processing (NLP) of clinical notes can use observed frequencies of clinical terms as predictive features for downstream applications such as clinical decision making and patient trajectory prediction. However, due to the vast number of highly similar and related clinical concepts, a more effective modeling strategy is to represent clinical terms as semantic embeddings via representation learning and use the low dimensional embeddings as feature vectors for predictive modeling. To achieve efficient representation, fine-tuning pretrained language models with biomedical knowledge graphs may generate better embeddings for biomedical terms than those from standard language models alone. These embeddings can effectively discriminate synonymous pairs of from those that are unrelated. However, they often fail to capture different degrees of similarity or relatedness for concepts that are hierarchical in nature. To overcome this limitation, we propose HiPrBERT, a novel biomedical term representation model trained on additionally complied data that contains hierarchical structures for various biomedical terms. We modify an existing contrastive loss function to extract information from these hierarchies. Our numerical experiments demonstrate that HiPrBERT effectively learns the pair-wise distance from hierarchical information, resulting in a substantially more informative embeddings for further biomedical applications
    摘要 电子健康记录(EHR)包含医学报告中的描述性注释,提供了详细的医学状况和病人管理信息。自然语言处理(NLP)技术可以从临床报告中的观察频率中提取临床术语的预测特征,用于下游应用程序 such as 临床决策和病人轨迹预测。然而,由于临床术语的数量太多,高度相似和相关的词汇会导致模型选择困难。为了解决这个问题,我们提出了HiPrBERT,一种基于生物医学知识图表(KG)的新的生物医学术语表示模型。我们修改了现有的对比损失函数,以EXTRACT Hierarchical information。我们的数学实验表明,HiPrBERT可以有效地从层次结构中提取对比损失,从而生成更加有用的生物医学术语表示。

InstructEval: Systematic Evaluation of Instruction Selection Methods

  • paper_url: http://arxiv.org/abs/2307.00259
  • repo_url: None
  • paper_authors: Anirudh Ajith, Chris Pan, Mengzhou Xia, Ameet Deshpande, Karthik Narasimhan
  • for: 本研究目的是对inception learning(ICL)中的指令选择算法进行全面评估。
  • methods: 本研究使用了13种不同规模和四种模型家族的大语言模型(LLM),并对九个任务进行评估。七种受欢迎的指令选择方法在五个关键指标上进行了评估。
  • results: 研究发现,使用手动撰写的指令或简单的指令无任务特定描述可以在总体性能方面获得更高的ICL性能,这指示了自动指令生成方法的不足。
    Abstract In-context learning (ICL) performs tasks by prompting a large language model (LLM) using an instruction and a small set of annotated examples called demonstrations. Recent work has shown that precise details of the inputs used in the ICL prompt significantly impact performance, which has incentivized instruction selection algorithms. The effect of instruction-choice however is severely underexplored, with existing analyses restricted to shallow subsets of models and tasks, limiting the generalizability of their insights. We develop InstructEval, an ICL evaluation suite to conduct a thorough assessment of these techniques. The suite includes 13 open-sourced LLMs of varying scales from four model families, and covers nine tasks across three categories. Using the suite, we evaluate the relative performance of seven popular instruction selection methods over five metrics relevant to ICL. Our experiments reveal that using curated manually-written instructions or simple instructions without any task-specific descriptions often elicits superior ICL performance overall than that of automatic instruction-induction methods, pointing to a lack of generalizability among the latter. We release our evaluation suite for benchmarking instruction selection approaches and enabling more generalizable methods in this space.
    摘要 启发式学习(ICL)通过提示一大语言模型(LLM)使用 instrucion 和一小组注释的示例来完成任务。 recent work 表明,ICL 的输入精度对性能有很大的影响,这引发了 instruction 选择算法的研究。然而, instruction 选择的影响仍然未得到了充分的探讨,现有的分析受限于一些简单的模型和任务,这限制了其可重用性。我们开发了 InstructEval,一个 ICLEvaluation 集,用于进行ICL 评估的全面评估。该集包括13个开源的 LLM varying scales 从四个模型家族,覆盖了九个任务 across 三个类别。使用该集,我们评估了七种流行的 instruction 选择方法的相对性能,并发现使用手动撰写的 instrucion 或简单的 instrucion без任务特定描述可以很好地优化 ICLE 性能,这表明了 automatic instruction-induction 方法的不足。我们发布了我们的评估集,以便对 instruction 选择策略进行标准化的评估,并促进更通用的方法在这个领域。

Efficient Subclass Segmentation in Medical Images

  • paper_url: http://arxiv.org/abs/2307.00257
  • repo_url: https://github.com/ovo1111/efficientsubclasslearning
  • paper_authors: Linrui Dai, Wenhui Lei, Xiaofan Zhang
  • for: 降低医疗图像分析成本,使用粗细分类标签进行标注。
  • methods: 提出一种利用类别层级结构设计网络架构,并使用任务驱动的数据生成方法来让网络更容易识别不同的 subclass 类别。
  • results: 在 BraTS2021 和 ACDC 数据集上实验,该方法可以与全 subclass 标签样本一样具有相似的准确性,仅使用有限的 subclass 标签和足够的 superclass 标签。
    Abstract As research interests in medical image analysis become increasingly fine-grained, the cost for extensive annotation also rises. One feasible way to reduce the cost is to annotate with coarse-grained superclass labels while using limited fine-grained annotations as a complement. In this way, fine-grained data learning is assisted by ample coarse annotations. Recent studies in classification tasks have adopted this method to achieve satisfactory results. However, there is a lack of research on efficient learning of fine-grained subclasses in semantic segmentation tasks. In this paper, we propose a novel approach that leverages the hierarchical structure of categories to design network architecture. Meanwhile, a task-driven data generation method is presented to make it easier for the network to recognize different subclass categories. Specifically, we introduce a Prior Concatenation module that enhances confidence in subclass segmentation by concatenating predicted logits from the superclass classifier, a Separate Normalization module that stretches the intra-class distance within the same superclass to facilitate subclass segmentation, and a HierarchicalMix model that generates high-quality pseudo labels for unlabeled samples by fusing only similar superclass regions from labeled and unlabeled images. Our experiments on the BraTS2021 and ACDC datasets demonstrate that our approach achieves comparable accuracy to a model trained with full subclass annotations, with limited subclass annotations and sufficient superclass annotations. Our approach offers a promising solution for efficient fine-grained subclass segmentation in medical images. Our code is publicly available here.
    摘要 As research interests in medical image analysis become increasingly fine-grained, the cost of extensive annotation also rises. One feasible way to reduce the cost is to annotate with coarse-grained superclass labels while using limited fine-grained annotations as a complement. In this way, fine-grained data learning is assisted by ample coarse annotations. Recent studies in classification tasks have adopted this method to achieve satisfactory results. However, there is a lack of research on efficient learning of fine-grained subclasses in semantic segmentation tasks. In this paper, we propose a novel approach that leverages the hierarchical structure of categories to design network architecture. Meanwhile, a task-driven data generation method is presented to make it easier for the network to recognize different subclass categories. Specifically, we introduce a Prior Concatenation module that enhances confidence in subclass segmentation by concatenating predicted logits from the superclass classifier, a Separate Normalization module that stretches the intra-class distance within the same superclass to facilitate subclass segmentation, and a HierarchicalMix model that generates high-quality pseudo labels for unlabeled samples by fusing only similar superclass regions from labeled and unlabeled images. Our experiments on the BraTS2021 and ACDC datasets demonstrate that our approach achieves comparable accuracy to a model trained with full subclass annotations, with limited subclass annotations and sufficient superclass annotations. Our approach offers a promising solution for efficient fine-grained subclass segmentation in medical images. Our code is publicly available here.Here's the translation in Traditional Chinese:随着医疗图像分析的研究范畴逐渐精细化,丰富标注的成本也逐渐增加。一种可行的方法是使用粗细分类标签进行标注,并使用有限的细分类标签作为辅助。这样可以帮助精细数据学习。Recent studies in classification tasks have adopted this method to achieve satisfactory results. However, there is a lack of research on efficient learning of fine-grained subclasses in semantic segmentation tasks. In this paper, we propose a novel approach that leverages the hierarchical structure of categories to design network architecture. Meanwhile, a task-driven data generation method is presented to make it easier for the network to recognize different subclass categories. Specifically, we introduce a Prior Concatenation module that enhances confidence in subclass segmentation by concatenating predicted logits from the superclass classifier, a Separate Normalization module that stretches the intra-class distance within the same superclass to facilitate subclass segmentation, and a HierarchicalMix model that generates high-quality pseudo labels for unlabeled samples by fusing only similar superclass regions from labeled and unlabeled images. Our experiments on the BraTS2021 and ACDC datasets demonstrate that our approach achieves comparable accuracy to a model trained with full subclass annotations, with limited subclass annotations and sufficient superclass annotations. Our approach offers a promising solution for efficient fine-grained subclass segmentation in medical images. Our code is publicly available here.

An ML approach to resolution of singularities

  • paper_url: http://arxiv.org/abs/2307.00252
  • repo_url: None
  • paper_authors: Gergely Bérczi, Honglu Fan, Mingcong Zeng
  • for: 解决系数方程集中的精度问题
  • methods: 使用人工智能学习代理来找到最佳解决方案
  • results: 在某些领域中,训练模型可以超越现有的选择规则,减少符号计算中的浮点数量,提供了一个证明机会,证明近期的机器学习技术可以改善符号计算中的性能。
    Abstract The solution set of a system of polynomial equations typically contains ill-behaved, singular points. Resolution is a fundamental process in geometry in which we replace singular points with smooth points, while keeping the rest of the solution set unchanged. Resolutions are not unique: the usual way to describe them involves repeatedly performing a fundamental operation known as "blowing-up", and the complexity of the resolution highly depends on certain choices. The process can be translated into various versions of a 2-player game, the so-called Hironaka game, and a winning strategy for the first player provides a solution to the resolution problem. In this paper we introduce a new approach to the Hironaka game that uses reinforcement learning agents to find optimal resolutions of singularities. In certain domains, the trained model outperforms state-of-the-art selection heuristics in total number of polynomial additions performed, which provides a proof-of-concept that recent developments in machine learning have the potential to improve performance of algorithms in symbolic computation.
    摘要 解决集合中的系数方程通常包含病态、单点。解决是几何基本过程,我们将病态点换为平滑点,保持解决集合不变。解决不唯一,通常通过重复执行基本操作“膨胀”来描述。解决过程可以翻译为各种2个玩家游戏,称为希逊加game,并且一个赢家的策略可以提供解决病态点的解决方案。在这篇论文中,我们介绍了一种使用强化学习代理来找到病态点的优化解决方案。在某些领域中,训练模型在总数学表达数量上超过了现有的选择规则,这提供了一个证明,表明最近的机器学习发展有助于提高符号计算中的算法性能。

THUIR2 at NTCIR-16 Session Search (SS) Task

  • paper_url: http://arxiv.org/abs/2307.00250
  • repo_url: None
  • paper_authors: Weihang Su, Xiangsheng Li, Yiqun Liu, Min Zhang, Shaoping Ma
  • for: 这个论文描述了我们在NTCIR-161 Session Search(SS)任务中FOSS和POSS子任务中的方法和结果。
  • methods: 我们使用学习到排名和微调预训练语言模型来进行提交。我们在预训练语言模型上微调了数据和会话信息,并将它们组装成学习到排名方法。
  • results: 在预liminary评估中,我们的组装模型在FOSS子任务中获得了所有参与者中最好的表现,而在POSS子任务中也获得了最好的表现。
    Abstract Our team(THUIR2) participated in both FOSS and POSS subtasks of the NTCIR-161 Session Search (SS) Task. This paper describes our approaches and results. In the FOSS subtask, we submit five runs using learning-to-rank and fine-tuned pre-trained language models. We fine-tuned the pre-trained language model with ad-hoc data and session information and assembled them by a learning-to-rank method. The assembled model achieves the best performance among all participants in the preliminary evaluation. In the POSS subtask, we used an assembled model which also achieves the best performance in the preliminary evaluation.
    摘要 我们团队(THUIR2)参加了NTCIR-161Session Search(SS)任务的FOSS和POSS子任务。本文描述了我们的方法和成果。在FOSS子任务中,我们提交了五次运行,使用学习到排序和调整的预训练语言模型。我们对预训练语言模型进行了特点数据和会话信息的调整,并使用学习到排序方法将其组装起来。组装后的模型在初步评估中表现最佳。在POSS子任务中,我们使用同样的组装模型,也在初步评估中表现最佳。

VesselMorph: Domain-Generalized Retinal Vessel Segmentation via Shape-Aware Representation

  • paper_url: http://arxiv.org/abs/2307.00240
  • repo_url: None
  • paper_authors: Dewei Hu, Hao Li, Han Liu, Xing Yao, Jiacheng Wang, Ipek Oguz
  • for: 这个论文主要针对的问题是如何提高深度学习算法在医疗图像处理中的普适性,具体来说是解决retinal vessel segmentation任务中的域shift问题。
  • methods: 该论文提出了一种名为VesselMorph的方法,它利用域shift不变的形态特征来提高深度模型的普适性。该方法基于Frangi滤波器和Diffusion Tensor Imaging literatura,引入了一个Hessian基于的二元tensor场来描述血管的形态,并将Intensity图像和tensor场映射到一个隐藏空间进行特征提取。然后,通过一种权重平衡技巧将两个隐藏表示 fusion,并将结果传递给一个分割网络进行分割。
  • results: 该论文在六个公共数据集上进行了测试,并取得了与竞争方法相比的更高的普适性表现。
    Abstract Due to the absence of a single standardized imaging protocol, domain shift between data acquired from different sites is an inherent property of medical images and has become a major obstacle for large-scale deployment of learning-based algorithms. For retinal vessel images, domain shift usually presents as the variation of intensity, contrast and resolution, while the basic tubular shape of vessels remains unaffected. Thus, taking advantage of such domain-invariant morphological features can greatly improve the generalizability of deep models. In this study, we propose a method named VesselMorph which generalizes the 2D retinal vessel segmentation task by synthesizing a shape-aware representation. Inspired by the traditional Frangi filter and the diffusion tensor imaging literature, we introduce a Hessian-based bipolar tensor field to depict the morphology of the vessels so that the shape information is taken into account. We map the intensity image and the tensor field to a latent space for feature extraction. Then we fuse the two latent representations via a weight-balancing trick and feed the result to a segmentation network. We evaluate on six public datasets of fundus and OCT angiography images from diverse patient populations. VesselMorph achieves superior generalization performance compared with competing methods in different domain shift scenarios.
    摘要 Inspired by the traditional Frangi filter and the diffusion tensor imaging literature, we introduce a Hessian-based bipolar tensor field to depict the morphology of the vessels, thereby incorporating shape information. We map the intensity image and the tensor field to a latent space for feature extraction. Then, we fuse the two latent representations via a weight-balancing trick and feed the result to a segmentation network.We evaluate VesselMorph on six public datasets of fundus and OCT angiography images from diverse patient populations. Compared with competing methods, VesselMorph achieves superior generalization performance in different domain shift scenarios.

Forward-Forward Algorithm for Hyperspectral Image Classification: A Preliminary Study

  • paper_url: http://arxiv.org/abs/2307.00231
  • repo_url: None
  • paper_authors: Sidike Paheding, Abel A. Reyes-Angulo
  • for: 本研究探讨了使用forward-forward算法(FFA)来优化卷积神经网络的 Parameters,以提高干涉спектル图像分类的性能。
  • methods: 本研究使用了传统的反向传播算法(back-propagation algorithm)和FFA两种方法进行比较,以评估FFA在干涉спектル图像分类中的性能。
  • results: 初步的实验结果表明,FFA可以提高干涉спектル图像分类的性能,并且比传统的反向传播算法更具有潜在的应用前景。
    Abstract The back-propagation algorithm has long been the de-facto standard in optimizing weights and biases in neural networks, particularly in cutting-edge deep learning models. Its widespread adoption in fields like natural language processing, computer vision, and remote sensing has revolutionized automation in various tasks. The popularity of back-propagation stems from its ability to achieve outstanding performance in tasks such as classification, detection, and segmentation. Nevertheless, back-propagation is not without its limitations, encompassing sensitivity to initial conditions, vanishing gradients, overfitting, and computational complexity. The recent introduction of a forward-forward algorithm (FFA), which computes local goodness functions to optimize network parameters, alleviates the dependence on substantial computational resources and the constant need for architectural scaling. This study investigates the application of FFA for hyperspectral image classification. Experimental results and comparative analysis are provided with the use of the traditional back-propagation algorithm. Preliminary results show the potential behind FFA and its promises.
    摘要 <>传播算法已经在神经网络中优化参数的标准方法了,特别是在高级深度学习模型中。其广泛应用在自然语言处理、计算机视觉和远程感知等领域,对自动化多任务做出了革命性的改变。传播算法的各种优点使得它在分类、检测和分割等任务中表现出色。然而,传播算法也有其局限性,包括依赖于初始条件、消失梯度、过拟合和计算复杂性。随着前向传播算法(FFA)的出现,它计算地方优良函数来优化网络参数,从而减少了计算资源的依赖和建筑层次的需求。本研究通过对干扰图像分类进行实验和比较分析,探讨了FFA在干扰图像分类中的应用。初步结果表明FFA具有潜在的优势和承诺。

Image Matters: A New Dataset and Empirical Study for Multimodal Hyperbole Detection

  • paper_url: http://arxiv.org/abs/2307.00209
  • repo_url: None
  • paper_authors: Huixuan Zhang, Xiaojun Wan
  • for: 本研究旨在探讨多模态夸大表达的检测问题。
  • methods: 我们使用Weibo(一种中文社交媒体)上的多模态数据创建了检测数据集(将要公开),并使用文本和图像作为两种模态进行检测。此外,我们还评估了不同的预训练多模态编码器在这个下游任务中的表现。
  • results: 我们在五个不同主题的数据集上进行了跨领域性能评估,并发现了不同的模型在不同主题上的表现。这些研究可以作为参考,并指明了未来多模态夸大检测研究的方向。
    Abstract Hyperbole, or exaggeration, is a common linguistic phenomenon. The detection of hyperbole is an important part of understanding human expression. There have been several studies on hyperbole detection, but most of which focus on text modality only. However, with the development of social media, people can create hyperbolic expressions with various modalities, including text, images, videos, etc. In this paper, we focus on multimodal hyperbole detection. We create a multimodal detection dataset\footnote{The dataset will be released to the community.} from Weibo (a Chinese social media) and carry out some studies on it. We treat the text and image from a piece of weibo as two modalities and explore the role of text and image for hyperbole detection. Different pre-trained multimodal encoders are also evaluated on this downstream task to show their performance. Besides, since this dataset is constructed from five different topics, we also evaluate the cross-domain performance of different models. These studies can serve as a benchmark and point out the direction of further study on multimodal hyperbole detection.
    摘要 悖论(或者是夸大)是人类表达中常见的语言现象。检测悖论是理解人类表达的重要部分。虽然有很多关于悖论检测的研究,但大多数研究都专注于文本模式。然而,随着社交媒体的发展,人们可以通过不同的模式,包括文本、图片、视频等,表达悖论。在这篇论文中,我们将关注多modal悖论检测。我们创建了一个多modal检测数据集(将于社区公开),并从微博(一种中文社交媒体)中提取数据进行一些研究。在这些研究中,我们将文本和图片作为两种模式,explore这两种模式在悖论检测中的作用。此外,我们还将不同的预训练多modalEncoder评估其在这个下游任务中的性能。此外,由于这个数据集来自五个不同的主题,我们还会评估不同模型在不同领域的跨领域性能。这些研究可以 servir为参考和指导未来的多modal悖论检测研究。

General Part Assembly Planning

  • paper_url: http://arxiv.org/abs/2307.00206
  • repo_url: https://github.com/daria-kafler/GA-SEI56-Project-02
  • paper_authors: Yulong Li, Andy Zeng, Shuran Song
  • for: investigate general part assembly, creating novel target assemblies with unseen part shapes
  • methods: transformer based model architecture, accurately predicts part poses by inferring how each part shape corresponds to the target shape
  • results: generalization abilities to novel and diverse target and part shapes, demonstrated through experiments on 3D CAD models and real-world scans.
    Abstract Most successes in autonomous robotic assembly have been restricted to single target or category. We propose to investigate general part assembly, the task of creating novel target assemblies with unseen part shapes. To tackle the planning of general part assembly, we present General Part Assembly Transformer (GPAT), a transformer based model architecture that accurately predicts part poses by inferring how each part shape corresponds to the target shape. Our experiments on both 3D CAD models and real-world scans demonstrate GPAT's generalization abilities to novel and diverse target and part shapes. Project website: https://general-part-assembly.github.io/
    摘要 多数自主机器人组装成功都受限于单个目标或类别。我们提议调查通用部件组装,即创建未看过的部件形状的新目标组装。为解决通用部件组装的规划问题,我们提出了通用部件组装变换器(GPAT),一种基于变换器的模型建筑,可以准确预测部件位置sBy inferring each part shape corresponds to the target shape。我们的实验表明GPAT在不同目标和部件形状下具有普适性。项目网站:https://general-part-assembly.github.io/Note: Please note that the translation is in Simplified Chinese, which is one of the two standard forms of Chinese. If you prefer Traditional Chinese, please let me know and I can provide that as well.

  • paper_url: http://arxiv.org/abs/2307.01214
  • repo_url: None
  • paper_authors: Rui Song, Fausto Giunchiglia, Yingji Li, Hao Xu
  • for: 提高文本分类模型的稳定性和泛化能力,避免快捷学习问题
  • methods: 提出了一种新的单词组挖掘方法,可以捕捉单词组的 causal 效应,并将其排序以便增强预测的稳定性
  • results: 通过多种任务的实验,证明提出的方法可以提高文本分类模型的泛化能力和稳定性,并且可以避免快捷学习问题
    Abstract Despite large-scale pre-trained language models have achieved striking results for text classificaion, recent work has raised concerns about the challenge of shortcut learning. In general, a keyword is regarded as a shortcut if it creates a superficial association with the label, resulting in a false prediction. Conversely, shortcut learning can be mitigated if the model relies on robust causal features that help produce sound predictions. To this end, many studies have explored post-hoc interpretable methods to mine shortcuts and causal features for robustness and generalization. However, most existing methods focus only on single word in a sentence and lack consideration of word-group, leading to wrong causal features. To solve this problem, we propose a new Word-Group mining approach, which captures the causal effect of any keyword combination and orders the combinations that most affect the prediction. Our approach bases on effective post-hoc analysis and beam search, which ensures the mining effect and reduces the complexity. Then, we build a counterfactual augmentation method based on the multiple word-groups, and use an adaptive voting mechanism to learn the influence of different augmentated samples on the prediction results, so as to force the model to pay attention to effective causal features. We demonstrate the effectiveness of the proposed method by several tasks on 8 affective review datasets and 4 toxic language datasets, including cross-domain text classificaion, text attack and gender fairness test.
    摘要 尽管大规模预训语言模型已经实现了TEXT分类的突出成果,然而最近的研究表明, shortcut learning 是一个挑战。在总的来说,一个关键词被视为一个短cut if it creates a superficial association with the label, resulting in a false prediction。然而, shortcut learning 可以被mitigated if the model relies on robust causal features that help produce sound predictions。为此,许多研究已经explored post-hoc interpretable methods to mine shortcuts and causal features for robustness and generalization。然而,大多数现有方法只focus on single word in a sentence and lack consideration of word-group, leading to wrong causal features。为解决这个问题,我们提出了一种新的Word-Group mining approach,该方法可以捕捉任何关键字组的 causal effect和排序这些组合对于预测的影响。我们的方法基于有效的post-hoc分析和搜索,以确保挖掘的效果并降低复杂性。然后,我们构建了基于多个word-group的 counterfactual augmentation 方法,并使用一种适应性投票机制来学习不同扩展样本对预测结果的影响,以 forced the model to pay attention to有效的 causal features。我们在8种affective review dataset和4种toxic language dataset上进行了多个任务的实验,包括跨频text classification、text attack和性别公平测试。结果表明,我们的方法可以有效地挖掘出有用的 causal features,并且可以在不同的预测任务中提高模型的 robustness和一致性。

An Interpretable Constructive Algorithm for Incremental Random Weight Neural Networks and Its Application

  • paper_url: http://arxiv.org/abs/2307.00185
  • repo_url: None
  • paper_authors: Jing Nan, Wei Dai, Guan Yuan, Ping Zhou
  • for: 提高IRWNN的解释性和推理能力
  • methods: 基于几何关系 между隐藏参数和剩余错误,提出可解释的构建算法(ICA),并采用节点池策略获取更易于整合的隐藏参数。
  • results: ICA比其他构建算法在模型化速度、模型准确率和模型网络结构方面表现出色,并在实际应用中有效地解决了模型化问题。
    Abstract Incremental random weight neural networks (IRWNNs) have gained attention in view of its easy implementation and fast learning. However, a significant drawback of IRWNNs is that the elationship between the hidden parameters (node)and the residual error (model performance) is difficult to be interpreted. To address the above issue, this article proposes an interpretable constructive algorithm (ICA) with geometric information constraint. First, based on the geometric relationship between the hidden parameters and the residual error, an interpretable geometric information constraint is proposed to randomly assign the hidden parameters. Meanwhile, a node pool strategy is employed to obtain hidden parameters that is more conducive to convergence from hidden parameters satisfying the proposed constraint. Furthermore, the universal approximation property of the ICA is proved. Finally, a lightweight version of ICA is presented for large-scale data modeling tasks. Experimental results on six benchmark datasets and a numerical simulation dataset demonstrate that the ICA outperforms other constructive algorithms in terms of modeling speed, model accuracy, and model network structure. Besides, two practical industrial application case are used to validate the effectiveness of ICA in practical applications.
    摘要 incremenetal random weight neural networks (IRWNNs) 已经受到关注,因为它的实现容易,学习速度快。然而,IRWNNs 的一个显著缺点是隐藏参数(节点)和剩余误差(模型性能)之间的关系难以理解。为了解决这个问题,本文提出了可解释性建构算法(ICA),其中包含几何信息约束。首先,根据隐藏参数和剩余误差之间的几何关系,提出了一个可解释性的几何信息约束,用于随机分配隐藏参数。同时,使用节点池策略来获取更易于收敛的隐藏参数。此外,本文证明了ICA的通用适应性。最后,为大规模数据模型任务提出了一种轻量级版本的ICA。实验结果表明,ICA在六个标准测试集和一个数学模拟集上比其他构建算法更快速、更高准确、更有效的建模结构。此外,两个实际应用案例 validate了ICA在实际应用中的有效性。

Personality Traits in Large Language Models

  • paper_url: http://arxiv.org/abs/2307.00184
  • repo_url: None
  • paper_authors: Mustafa Safdari, Greg Serapio-García, Clément Crepy, Stephen Fitz, Peter Romero, Luning Sun, Marwa Abdulhai, Aleksandra Faust, Maja Matarić
  • for: 这个论文旨在研究和评估大语言模型(LLM)生成的文本中表现出的人性特质,以及如何使用这些模型来模拟和形成人性特质。
  • methods: 这个研究使用了评估人性特质的有效方法,并对多种广泛使用的LLM进行了测试和分析。
  • results: 研究发现:1)某些LLM生成的文本中的人性特质是可靠和有效的;2)更大和更精准地训练的LLM模型具有更强的人性特质可靠性和有效性;3)可以通过特定的配置和训练来形成LLM生成的文本中的人性特质,以达到某种特定的人性评估标准。
    Abstract The advent of large language models (LLMs) has revolutionized natural language processing, enabling the generation of coherent and contextually relevant text. As LLMs increasingly power conversational agents, the synthesized personality embedded in these models by virtue of their training on large amounts of human-generated data draws attention. Since personality is an important factor determining the effectiveness of communication, we present a comprehensive method for administering validated psychometric tests and quantifying, analyzing, and shaping personality traits exhibited in text generated from widely-used LLMs. We find that: 1) personality simulated in the outputs of some LLMs (under specific prompting configurations) is reliable and valid; 2) evidence of reliability and validity of LLM-simulated personality is stronger for larger and instruction fine-tuned models; and 3) personality in LLM outputs can be shaped along desired dimensions to mimic specific personality profiles. We also discuss potential applications and ethical implications of our measurement and shaping framework, especially regarding responsible use of LLMs.
    摘要 LLM的出现对自然语言处理造成了革命,使得生成了Contextually relevant和 coherent的文本。随着 LLM 在对话代理人中增加使用,模型内嵌在这些数据上的人性吸引了注意。人性是通信效果的重要因素,我们提出了一种全面的方法,通过验证了大量人类生成的数据来管理和分析 LL 的文本生成中的人性特质。我们发现:1)一些 LL 的输出中 simulate 的人性可靠和有效; 2)大型和指定微调的模型的人性可靠性和有效性更强; 3)可以通过指定特定人性特征来形成 LL 的输出中的人性。我们还讨论了 Our measurement and shaping framework 的应用和伦理问题,特别是关于负责任的 LL 使用。

The Integer Linear Programming Inference Cookbook

  • paper_url: http://arxiv.org/abs/2307.00171
  • repo_url: None
  • paper_authors: Vivek Srikumar, Dan Roth
  • for: 这篇论文旨在帮助读者将自然语言处理问题转换为整数线性 програм的实例。
  • methods: 本文使用了多种方法,包括带有约束的整数线性Program、约束分解和约束优化。
  • results: 本文提供了两个实践例子,用于说明如何使用这些方法解决各种自然语言处理问题。
    Abstract Over the years, integer linear programs have been employed to model inference in many natural language processing problems. This survey is meant to guide the reader through the process of framing a new inference problem as an instance of an integer linear program and is structured as a collection of recipes. At the end, we will see two worked examples to illustrate the use of these recipes.
    摘要 Over the years, 整数线性Programs have been employed to model inference in many natural language processing problems. This survey is meant to guide the reader through the process of framing a new inference problem as an instance of an integer linear program and is structured as a collection of recipes. At the end, we will see two worked examples to illustrate the use of these recipes.Here's the translation of each sentence:1. Over the years, 整数线性Programs have been employed to model inference in many natural language processing problems.整数线性Programs (integer linear programs) 在多年来 already been used to model inference in many natural language processing problems.2. This survey is meant to guide the reader through the process of framing a new inference problem as an instance of an integer linear program and is structured as a collection of recipes.这个survey 是为了引导读者通过将新的推理问题转换为整数线性Programs的实例,并以一系列的方法(recipes)的形式进行结构化。3. At the end, we will see two worked examples to illustrate the use of these recipes.到最后,我们将看到两个实例,以示这些方法的使用。

VoxWatch: An open-set speaker recognition benchmark on VoxCeleb

  • paper_url: http://arxiv.org/abs/2307.00169
  • repo_url: None
  • paper_authors: Raghuveer Peri, Seyed Omid Sadjadi, Daniel Garcia-Romero
  • for: 这篇论文是关于开放集成人识别(OSI)的研究,具体来说是研究如何在识别人员时避免 False Alarm 问题。
  • methods: 这篇论文使用了三种强大的神经网络系统来实现 speaker detection 任务,并使用 VoxCeleb 数据集来构建了首个公共benchmark。
  • results: 研究结果显示,通过 score calibration 和 score fusion 两种方法可以大幅提高 OSI 性能,而原则上采用的 adaptive score normalization 并不一定能够提高表现。
    Abstract Despite its broad practical applications such as in fraud prevention, open-set speaker identification (OSI) has received less attention in the speaker recognition community compared to speaker verification (SV). OSI deals with determining if a test speech sample belongs to a speaker from a set of pre-enrolled individuals (in-set) or if it is from an out-of-set speaker. In addition to the typical challenges associated with speech variability, OSI is prone to the "false-alarm problem"; as the size of the in-set speaker population (a.k.a watchlist) grows, the out-of-set scores become larger, leading to increased false alarm rates. This is in particular challenging for applications in financial institutions and border security where the watchlist size is typically of the order of several thousand speakers. Therefore, it is important to systematically quantify the false-alarm problem, and develop techniques that alleviate the impact of watchlist size on detection performance. Prior studies on this problem are sparse, and lack a common benchmark for systematic evaluations. In this paper, we present the first public benchmark for OSI, developed using the VoxCeleb dataset. We quantify the effect of the watchlist size and speech duration on the watchlist-based speaker detection task using three strong neural network based systems. In contrast to the findings from prior research, we show that the commonly adopted adaptive score normalization is not guaranteed to improve the performance for this task. On the other hand, we show that score calibration and score fusion, two other commonly used techniques in SV, result in significant improvements in OSI performance.
    摘要 尽管 откры集话者标识(OSI)在验证人员身份方面有广泛的实际应用,如防止诈骗等,但在话者认证社区中它却得到了较少的关注。OSI的任务是 determin whether a test speech sample belongs to a speaker from a set of pre-enrolled individuals(in-set)or if it is from an out-of-set speaker。在Speech variability的常见挑战之外,OSI还面临着"false-alarm problem",即预先列出的话者人数(也称为watchlist)增加后,out-of-set scores的增加会导致false alarm rate的增加。这对于金融机构和边境安全应用 particulary challenging,因为watchlist的大小通常在几千个话者之间。因此,需要系统地量化false-alarm problem,并开发有效地降低watchlist size对于检测性能的影响的技术。 Prior studies on this problem are sparse, and lack a common benchmark for systematic evaluations. In this paper, we present the first public benchmark for OSI, developed using the VoxCeleb dataset. We quantify the effect of the watchlist size and speech duration on the watchlist-based speaker detection task using three strong neural network based systems. Unlike the findings from prior research, we show that the commonly adopted adaptive score normalization is not guaranteed to improve the performance for this task. On the other hand, we show that score calibration and score fusion, two other commonly used techniques in SV, result in significant improvements in OSI performance.

Counterfactual Collaborative Reasoning

  • paper_url: http://arxiv.org/abs/2307.00165
  • repo_url: None
  • paper_authors: Jianchao Ji, Zelong Li, Shuyuan Xu, Max Xiong, Juntao Tan, Yingqiang Ge, Hao Wang, Yongfeng Zhang
  • for: 提高机器学习模型的准确率和可解释性
  • methods: 结合Counterfactual Collaborative Reasoning (CCR)和神经网络逻辑理解,提高机器学习模型的性能和可读性
  • results: 对三个实际数据集进行实验,CCR模型在准确率和可解释性方面都超过了非增强模型和隐式增强模型,同时也提高了模型的可读性。
    Abstract Causal reasoning and logical reasoning are two important types of reasoning abilities for human intelligence. However, their relationship has not been extensively explored under machine intelligence context. In this paper, we explore how the two reasoning abilities can be jointly modeled to enhance both accuracy and explainability of machine learning models. More specifically, by integrating two important types of reasoning ability -- counterfactual reasoning and (neural) logical reasoning -- we propose Counterfactual Collaborative Reasoning (CCR), which conducts counterfactual logic reasoning to improve the performance. In particular, we use recommender system as an example to show how CCR alleviate data scarcity, improve accuracy and enhance transparency. Technically, we leverage counterfactual reasoning to generate "difficult" counterfactual training examples for data augmentation, which -- together with the original training examples -- can enhance the model performance. Since the augmented data is model irrelevant, they can be used to enhance any model, enabling the wide applicability of the technique. Besides, most of the existing data augmentation methods focus on "implicit data augmentation" over users' implicit feedback, while our framework conducts "explicit data augmentation" over users explicit feedback based on counterfactual logic reasoning. Experiments on three real-world datasets show that CCR achieves better performance than non-augmented models and implicitly augmented models, and also improves model transparency by generating counterfactual explanations.
    摘要 人工智能中的 causal 理解和逻辑理解是两种重要的理解能力。然而,这两种理解能力在机器智能上的关系尚未得到广泛探讨。在这篇论文中,我们探讨了如何将这两种理解能力联合起来,以提高机器学习模型的准确性和可解释性。具体来说,我们提出了Counterfactual Collaborative Reasoning(CCR),它通过对假设逻辑进行counterfactual逻辑理解来提高性能。例如,我们使用了推荐系统作为示例,显示了如何CCR可以缓解数据稀缺、提高准确性和提高透明度。技术上,我们利用了counterfactual逻辑来生成“difficult”的假设反例用于数据增强,这些反例,与原始训练例子一起,可以提高模型性能。由于这些增强数据不依赖于模型,因此可以用于提高任何模型,使得这种技术具有广泛的可应用性。此外,大多数现有的数据增强方法都是基于用户的偏好进行“隐式数据增强”,而我们的框架则是基于counterfactual逻辑进行“显式数据增强”。实验结果表明,CCR在三个实际 datasets 上表现出比非增强模型和隐式增强模型更好的性能,并且也提高了模型的透明度。

FFPDG: Fast, Fair and Private Data Generation

  • paper_url: http://arxiv.org/abs/2307.00161
  • repo_url: None
  • paper_authors: Weijie Xu, Jinjin Zhao, Francis Iannacci, Bo Wang
  • for: 本研究的目的是设计一种快速、公正、灵活和隐私的数据生成方法,以满足现实世界应用场景中的数据需求。
  • methods: 我们采用的方法是基于GAN的方法,但是我们对GAN进行了一些修改以确保数据生成的公正性和隐私性。我们还使用了一些新的技术来提高数据生成的效率和质量。
  • results: 我们的实验结果表明,模型在使用我们提出的数据生成方法后可以在真实应用场景中进行良好的推理。此外,我们还发现了一些有趣的应用场景,其中包括隐私保护和数据分布彩色等。
    Abstract Generative modeling has been used frequently in synthetic data generation. Fairness and privacy are two big concerns for synthetic data. Although Recent GAN [\cite{goodfellow2014generative}] based methods show good results in preserving privacy, the generated data may be more biased. At the same time, these methods require high computation resources. In this work, we design a fast, fair, flexible and private data generation method. We show the effectiveness of our method theoretically and empirically. We show that models trained on data generated by the proposed method can perform well (in inference stage) on real application scenarios.
    摘要 TRANSLATE_TEXT: Generative modeling has been used frequently in synthetic data generation. Fairness and privacy are two big concerns for synthetic data. Although Recent GAN based methods show good results in preserving privacy, the generated data may be more biased. At the same time, these methods require high computation resources. In this work, we design a fast, fair, flexible and private data generation method. We show the effectiveness of our method theoretically and empirically. We show that models trained on data generated by the proposed method can perform well (in inference stage) on real application scenarios.TRANSLATION: 生成模型在假数据生成中广泛使用。假数据中的公平和隐私是两大关注点。虽然最近的GAN基于方法可以保持隐私,但生成的数据可能更加偏向。同时,这些方法需要高度的计算资源。在这个工作中,我们设计了快速、公平、灵活和隐私的数据生成方法。我们通过理论和实验证明了我们的方法的效果。我们还证明了基于我们的方法训练的模型在真实应用场景中的推理阶段可以表现出色。

Stitched ViTs are Flexible Vision Backbones

  • paper_url: http://arxiv.org/abs/2307.00154
  • repo_url: https://github.com/ziplab/sn-netv2
  • paper_authors: Zizheng Pan, Jing Liu, Haoyu He, Jianfei Cai, Bohan Zhuang
  • for: 该研究旨在提出一种新的模型粘合框架,可以在运行时进行多种性能-效率质量的负担重量调整。
  • methods: 该研究使用了缝合 neural network 框架,并提出了一种两向缝合方案,以扩大缝合空间。此外,还提出了一种基于 FLOPs 分布的资源限制采样策略。
  • results: 实验结果表明,SN-Netv2 可以作为一种灵活的视觉底层模型,在 ImageNet-1K、ADE20K、COCO-Stuff-10K、NYUv2 和 COCO-2017 上表现出色,并且在训练效率和适应性方面具有明显的优势。
    Abstract Large pretrained plain vision Transformers (ViTs) have been the workhorse for many downstream tasks. However, existing works utilizing off-the-shelf ViTs are inefficient in terms of training and deployment, because adopting ViTs with individual sizes requires separate training and is restricted by fixed performance-efficiency trade-offs. In this paper, we are inspired by stitchable neural networks, which is a new framework that cheaply produces a single model that covers rich subnetworks by stitching pretrained model families, supporting diverse performance-efficiency trade-offs at runtime. Building upon this foundation, we introduce SN-Netv2, a systematically improved model stitching framework to facilitate downstream task adaptation. Specifically, we first propose a Two-way stitching scheme to enlarge the stitching space. We then design a resource-constrained sampling strategy that takes into account the underlying FLOPs distributions in the space for improved sampling. Finally, we observe that learning stitching layers is a low-rank update, which plays an essential role on downstream tasks to stabilize training and ensure a good Pareto frontier. With extensive experiments on ImageNet-1K, ADE20K, COCO-Stuff-10K, NYUv2 and COCO-2017, SN-Netv2 demonstrates strong ability to serve as a flexible vision backbone, achieving great advantages in both training efficiency and adaptation. Code will be released at https://github.com/ziplab/SN-Netv2.
    摘要 大型预训练的普通视transformer(ViT)已成为许多下游任务的工具 horse。然而,现有的使用准备好的ViT的工作是不fficient的,因为采用固定大小的ViT需要分开训练和部署,并且受到fixed performance-efficiency trade-offs的限制。本文通过使用可供购买的神经网络框架,提出了一种新的折衔网络框架,可以低成本地生成一个单一的模型,覆盖丰富的性能-效率质量的范围。基于这个基础,我们介绍SN-Netv2,一个系统性改进的模型拼接框架,以便下游任务的适应。具体来说,我们首先提出了两个方向的拼接方案,以扩大拼接空间。然后,我们设计了考虑下面的FLOPs分布的资源受限 sampling策略。最后,我们发现学习拼接层是一个低级别的更新,在下游任务中发挥了重要的稳定化训练和保证良好的Pareto前沿作用。通过对ImageNet-1K、ADE20K、COCO-Stuff-10K、NYUv2和COCO-2017进行了广泛的实验,SN-Netv2表现出了强大的灵活视觉核心能力,在训练效率和适应方面具有明显的优势。代码将在https://github.com/ziplab/SN-Netv2上发布。

Large Language Models (GPT) for automating feedback on programming assignments

  • paper_url: http://arxiv.org/abs/2307.00150
  • repo_url: None
  • paper_authors: Maciej Pankiewicz, Ryan S. Baker
  • for: This paper aims to improve the automated feedback process for programming assignments by using OpenAI’s GPT-3.5 model to generate personalized hints for students.
  • methods: The authors use GPT-3.5 to generate personalized hints for students solving programming assignments on an automated assessment platform.
  • results: The experimental group (with GPT hints enabled) performed better in terms of successful submissions and took less time to solve assignments, but there was a potential over-reliance on GPT-generated feedback.Here’s the same information in Simplified Chinese text:
  • for: 这篇论文目的是提高自动化反馈过程中的程序作业答案,使用OpenAI的GPT-3.5模型生成个性化提示 для学生。
  • methods: 作者使用GPT-3.5来生成个性化提示,用于学生在自动评测平台上解决程序作业。
  • results: 实验组(含GPT提示)的学生 Correctness rate提高,并需要更少的时间解决任务,但可能存在GPT生成反馈过依的问题。
    Abstract Addressing the challenge of generating personalized feedback for programming assignments is demanding due to several factors, like the complexity of code syntax or different ways to correctly solve a task. In this experimental study, we automated the process of feedback generation by employing OpenAI's GPT-3.5 model to generate personalized hints for students solving programming assignments on an automated assessment platform. Students rated the usefulness of GPT-generated hints positively. The experimental group (with GPT hints enabled) relied less on the platform's regular feedback but performed better in terms of percentage of successful submissions across consecutive attempts for tasks, where GPT hints were enabled. For tasks where the GPT feedback was made unavailable, the experimental group needed significantly less time to solve assignments. Furthermore, when GPT hints were unavailable, students in the experimental condition were initially less likely to solve the assignment correctly. This suggests potential over-reliance on GPT-generated feedback. However, students in the experimental condition were able to correct reasonably rapidly, reaching the same percentage correct after seven submission attempts. The availability of GPT hints did not significantly impact students' affective state.
    摘要

A Personalized Household Assistive Robot that Learns and Creates New Breakfast Options through Human-Robot Interaction

  • paper_url: http://arxiv.org/abs/2307.00114
  • repo_url: None
  • paper_authors: Ali Ayub, Chrystopher L. Nehaniv, Kerstin Dautenhahn
  • for: 这篇论文是为了开发一个能够学习用户喜好的家庭助手机器人而设计的认知架构。
  • methods: 该架构使用了当前最佳的感知学算法、计算机科学中的认知模型记忆和学习计算、用于选择和放置家庭中的物品任务计划器、用于与用户交互的图形用户界面(GUI)以及一种基于学习的新 breakfast 选项生成方法。
  • results: 实验结果表明,该架构能够从用户处学习个性化的早餐选项,并使用这些知识生成 nunca before learned by the robot的新 breakfast 选项。
    Abstract For robots to assist users with household tasks, they must first learn about the tasks from the users. Further, performing the same task every day, in the same way, can become boring for the robot's user(s), therefore, assistive robots must find creative ways to perform tasks in the household. In this paper, we present a cognitive architecture for a household assistive robot that can learn personalized breakfast options from its users and then use the learned knowledge to set up a table for breakfast. The architecture can also use the learned knowledge to create new breakfast options over a longer period of time. The proposed cognitive architecture combines state-of-the-art perceptual learning algorithms, computational implementation of cognitive models of memory encoding and learning, a task planner for picking and placing objects in the household, a graphical user interface (GUI) to interact with the user and a novel approach for creating new breakfast options using the learned knowledge. The architecture is integrated with the Fetch mobile manipulator robot and validated, as a proof-of-concept system evaluation in a large indoor environment with multiple kitchen objects. Experimental results demonstrate the effectiveness of our architecture to learn personalized breakfast options from the user and generate new breakfast options never learned by the robot.
    摘要 The proposed cognitive architecture combines state-of-the-art perceptual learning algorithms, computational implementation of cognitive models of memory encoding and learning, a task planner for picking and placing objects in the household, a graphical user interface (GUI) to interact with the user, and a novel approach for creating new breakfast options using the learned knowledge. The architecture is integrated with the Fetch mobile manipulator robot and validated as a proof-of-concept system evaluation in a large indoor environment with multiple kitchen objects.Experimental results demonstrate the effectiveness of our architecture to learn personalized breakfast options from the user and generate new breakfast options never learned by the robot.

Performance of ChatGPT on USMLE: Unlocking the Potential of Large Language Models for AI-Assisted Medical Education

  • paper_url: http://arxiv.org/abs/2307.00112
  • repo_url: None
  • paper_authors: Prabin Sharma, Kisan Thapa, Dikshya Thapa, Prastab Dhakal, Mala Deep Upadhaya, Santosh Adhikari, Salik Ram Khanal
  • For: The paper aims to evaluate the reliability of ChatGPT in answering complex medical and clinical questions.* Methods: The study uses Harvard University gross anatomy and the United States Medical Licensing Examination (USMLE) questionnaire to assess the accuracy of ChatGPT-generated answers. The results are evaluated using a 2-way ANOVA and posthoc analysis.* Results: The study finds that ChatGPT-generated answers are more context-oriented and represent a better model for deductive reasoning than regular Google search results. ChatGPT obtained 58.8% on logical questions and 60% on ethical questions, which is approaching the passing range for logical questions and has crossed the threshold for ethical questions.Here is the information in Simplified Chinese text:
  • for: 这项研究的目的是评估ChatGPT是否可靠地回答复杂的医学和临床问题。
  • methods: 这项研究使用哈佛大学解剖学和美国医学资格考试(USMLE)问卷来评估ChatGPT生成的答案的准确性。结果使用2重分析和后续分析进行评估。
  • results: 研究发现ChatGPT生成的答案更加具有上下文特征,代表了更好的推理模型,比regular Google搜索结果更佳。ChatGPT在逻辑问题上 obtianed 58.8%,在伦理问题上 obtianed 60%,已经接近逻辑问题的通过线和超过伦理问题的阈值。
    Abstract Artificial intelligence is gaining traction in more ways than ever before. The popularity of language models and AI-based businesses has soared since ChatGPT was made available to the general public via OpenAI. It is becoming increasingly common for people to use ChatGPT both professionally and personally. Considering the widespread use of ChatGPT and the reliance people place on it, this study determined how reliable ChatGPT can be for answering complex medical and clinical questions. Harvard University gross anatomy along with the United States Medical Licensing Examination (USMLE) questionnaire were used to accomplish the objective. The paper evaluated the obtained results using a 2-way ANOVA and posthoc analysis. Both showed systematic covariation between format and prompt. Furthermore, the physician adjudicators independently rated the outcome's accuracy, concordance, and insight. As a result of the analysis, ChatGPT-generated answers were found to be more context-oriented and represented a better model for deductive reasoning than regular Google search results. Furthermore, ChatGPT obtained 58.8% on logical questions and 60% on ethical questions. This means that the ChatGPT is approaching the passing range for logical questions and has crossed the threshold for ethical questions. The paper believes ChatGPT and other language learning models can be invaluable tools for e-learners; however, the study suggests that there is still room to improve their accuracy. In order to improve ChatGPT's performance in the future, further research is needed to better understand how it can answer different types of questions.
    摘要 人工智能在各方面得到了更多的应用。自chatGPT被开源AI社区推出以来,语言模型和AI相关业务的受欢迎程度减加了。现在,更多的人在职业和个人水平上使用chatGPT。为了评估chatGPT对复杂医学和клиниче问题的可靠性,这个研究使用了哈佛大学析层解剖和美国医学资格考试(USMLE)问卷。研究使用2重ANOVA和后续分析对结果进行评估。结果表明,chatGPT生成的答案具有更高的上下文化和逻辑推理能力,比搜索结果更佳。此外,chatGPT在逻辑问题上取得了58.8%的分数,在伦理问题上取得了60%的分数。这表明chatGPT在逻辑问题上接近通过分数,在伦理问题上已经突破了阈值。研究认为,chatGPT和其他语言学习模型可以是电子学习者的宝贵工具,但研究还表明,以后要进一步了解chatGPT如何回答不同类型的问题,以提高其性能。

Ticket-BERT: Labeling Incident Management Tickets with Language Models

  • paper_url: http://arxiv.org/abs/2307.00108
  • repo_url: None
  • paper_authors: Zhexiong Liu, Cris Benge, Siduo Jiang
  • for: 这个论文的目的是为了提高缓解问题的效率,使用简单 yet robust的语言模型来标签问题票据。
  • methods: 这个论文使用的方法是使用提出的 ticket datasets 来训练 Ticket-BERT 语言模型。
  • results: 实验表明 Ticket-BERT 比基线和现有文本分类器更高效,并且在 Microsoft IcM 系统上部署后,通过活动学习循环和几个标注来快速适应新收集的问题票据。
    Abstract An essential aspect of prioritizing incident tickets for resolution is efficiently labeling tickets with fine-grained categories. However, ticket data is often complex and poses several unique challenges for modern machine learning methods: (1) tickets are created and updated either by machines with pre-defined algorithms or by engineers with domain expertise that share different protocols, (2) tickets receive frequent revisions that update ticket status by modifying all or parts of ticket descriptions, and (3) ticket labeling is time-sensitive and requires knowledge updates and new labels per the rapid software and hardware improvement lifecycle. To handle these issues, we introduce Ticket- BERT which trains a simple yet robust language model for labeling tickets using our proposed ticket datasets. Experiments demonstrate the superiority of Ticket-BERT over baselines and state-of-the-art text classifiers on Azure Cognitive Services. We further encapsulate Ticket-BERT with an active learning cycle and deploy it on the Microsoft IcM system, which enables the model to quickly finetune on newly-collected tickets with a few annotations.
    摘要 一个重要的决策因素是将事件票据分类为细化的类别,但事件数据往往复杂, pose 多个现代机器学习方法的挑战:(1)票据由机器或域专家生成或修改,(2)票据经常得到修订,修改票据状态,(3)票据标签是时间敏感的,需要持续更新和新增标签,以适应软件和硬件升级循环。为解决这些问题,我们介绍了票据BERT,一种训练简单 yet robust的语言模型,用于标记票据。实验表明,票据BERT在 Azure Cognitive Services 上比基准和当前文本分类器更高效。我们还将 Ticket-BERT 包装在活动学习循环中,并在 Microsoft IcM 系统上部署,这使得模型可以快速适应新收集的票据,只需要几个标注。

Obscured Wildfire Flame Detection By Temporal Analysis of Smoke Patterns Captured by Unmanned Aerial Systems

  • paper_url: http://arxiv.org/abs/2307.00104
  • repo_url: None
  • paper_authors: Uma Meleti, Abolfazl Razi
    for: 本研究文章目的是探讨使用RGB摄像头设备的无人机实时检测遮盖的野火(火焰被树木、烟雾、云层等自然障碍物遮盖)的挑战。methods: 我们提出了一种新的方法,即基于时间分析烟雾模式的semantic segmentation。我们的方法使用卷积神经网络架构,包括预训练的CNNEncoder和3D卷积来解码。我们还使用顺序堆叠特征来利用时间变化。results: 我们的方法可以准确地检测遮盖野火,达到了85.88%的Dice分数,同时实现了92.47%的高精度和90.67%的分类精度。与其他方法相比,我们的方法在视觉上表现出色,并在视频级别的火灾分类中达到了100%的准确率。
    Abstract This research paper addresses the challenge of detecting obscured wildfires (when the fire flames are covered by trees, smoke, clouds, and other natural barriers) in real-time using drones equipped only with RGB cameras. We propose a novel methodology that employs semantic segmentation based on the temporal analysis of smoke patterns in video sequences. Our approach utilizes an encoder-decoder architecture based on deep convolutional neural network architecture with a pre-trained CNN encoder and 3D convolutions for decoding while using sequential stacking of features to exploit temporal variations. The predicted fire locations can assist drones in effectively combating forest fires and pinpoint fire retardant chemical drop on exact flame locations. We applied our method to a curated dataset derived from the FLAME2 dataset that includes RGB video along with IR video to determine the ground truth. Our proposed method has a unique property of detecting obscured fire and achieves a Dice score of 85.88%, while achieving a high precision of 92.47% and classification accuracy of 90.67% on test data showing promising results when inspected visually. Indeed, our method outperforms other methods by a significant margin in terms of video-level fire classification as we obtained about 100% accuracy using MobileNet+CBAM as the encoder backbone.
    摘要

Queer People are People First: Deconstructing Sexual Identity Stereotypes in Large Language Models

  • paper_url: http://arxiv.org/abs/2307.00101
  • repo_url: None
  • paper_authors: Harnoor Dhingra, Preetiha Jayashanker, Sayali Moghe, Emma Strubell
  • for: 本研究旨在探讨大语言模型(LLMs)如何生成描述不同性取向人群的文本,并分析文本中各个群体之间的偏见。
  • methods: 本研究采用了对比研究的方法,通过对LLMs生成的文本进行分析,发现文本中各个群体之间存在明显的偏见。此外,研究还使用了SHAP分析方法,通过在文本中添加链接思维提示来减少偏见。
  • results: 研究发现,LLMs生成的文本中对LGBTQIA+社群存在明显的偏见,而使用SHAP分析方法可以减少这种偏见。这表明,在这种设定下,采用后续方法可以有效地减少LLMs生成的偏见。
    Abstract Large Language Models (LLMs) are trained primarily on minimally processed web text, which exhibits the same wide range of social biases held by the humans who created that content. Consequently, text generated by LLMs can inadvertently perpetuate stereotypes towards marginalized groups, like the LGBTQIA+ community. In this paper, we perform a comparative study of how LLMs generate text describing people with different sexual identities. Analyzing bias in the text generated by an LLM using regard score shows measurable bias against queer people. We then show that a post-hoc method based on chain-of-thought prompting using SHAP analysis can increase the regard of the sentence, representing a promising approach towards debiasing the output of LLMs in this setting.
    摘要 大语言模型(LLM)主要在不加处理的网络文本上进行训练,这些文本具有人们创造的各种社会偏见。因此,由LLM生成的文本可能不偏不倚地扩散对少数群体,如LGBTQIA+社区的刻板印象。在这篇论文中,我们进行了LLM生成文本中人们性取向的比较研究。我们使用regard score来测量文本中对不同性取向的人群的偏见,并发现了LLM生成文本中对Queer人群的偏见。然后,我们示出了基于链条思维提问的后续方法,使用SHAP分析可以提高文本中对 Queer人群的尊重度,表明了对LLM输出的减偏化方法的可行性。

Transformers in Healthcare: A Survey

  • paper_url: http://arxiv.org/abs/2307.00067
  • repo_url: None
  • paper_authors: Subhash Nerella, Sabyasachi Bandyopadhyay, Jiaqing Zhang, Miguel Contreras, Scott Siegel, Aysegul Bumin, Brandon Silva, Jessica Sena, Benjamin Shickel, Azra Bihorac, Kia Khezeli, Parisa Rashidi
    for: This paper provides an overview of how the Transformers neural network architecture has been adopted in healthcare to analyze various forms of data, including medical imaging, Electronic Health Records (EHR), social media, physiological signals, and biomolecular sequences.methods: The paper discusses the use of Transformer models in healthcare applications, including clinical diagnosis, report generation, data reconstruction, and drug/protein synthesis.results: The paper highlights the benefits and limitations of using Transformers in healthcare, including computational cost, model interpretability, fairness, alignment with human values, ethical implications, and environmental impact.
    Abstract With Artificial Intelligence (AI) increasingly permeating various aspects of society, including healthcare, the adoption of the Transformers neural network architecture is rapidly changing many applications. Transformer is a type of deep learning architecture initially developed to solve general-purpose Natural Language Processing (NLP) tasks and has subsequently been adapted in many fields, including healthcare. In this survey paper, we provide an overview of how this architecture has been adopted to analyze various forms of data, including medical imaging, structured and unstructured Electronic Health Records (EHR), social media, physiological signals, and biomolecular sequences. Those models could help in clinical diagnosis, report generation, data reconstruction, and drug/protein synthesis. We identified relevant studies using the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. We also discuss the benefits and limitations of using transformers in healthcare and examine issues such as computational cost, model interpretability, fairness, alignment with human values, ethical implications, and environmental impact.
    摘要 In this survey paper, we provide an overview of how this architecture has been adopted to analyze various forms of data, including medical imaging, structured and unstructured Electronic Health Records (EHR), social media, physiological signals, and biomolecular sequences. Those models could help in clinical diagnosis, report generation, data reconstruction, and drug/protein synthesis.We identified relevant studies using the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. We also discuss the benefits and limitations of using transformers in healthcare and examine issues such as computational cost, model interpretability, fairness, alignment with human values, ethical implications, and environmental impact.

Qualitative Prediction of Multi-Agent Spatial Interactions

  • paper_url: http://arxiv.org/abs/2307.00065
  • repo_url: None
  • paper_authors: Sariah Mghames, Luca Castri, Marc Hanheide, Nicola Bellotto
  • for: 本研究旨在理解和预测 dense scene 中多个机器人之间的互动,以满足在餐厅、仓库和医院等场景中部署服务机器人的需求。
  • methods: 本文提出了三种新的方法来建模和预测 dense scene 中多个机器人之间的互动,包括使用直观的Qualitative Trajectory Calculus (QTC)表示。这些方法考虑静态和动态上下文,并使用输入和时间注意力机制,并在中长期时间均衡测试。
  • results: 实验结果表明,使用 purely data-driven 网络进行运动预测,并对其进行 QTC 互动预测的输出处理,通常超过其他两种方法的性能。这三种方法还在不同的人类场景中进行了总体化评估。
    Abstract Deploying service robots in our daily life, whether in restaurants, warehouses or hospitals, calls for the need to reason on the interactions happening in dense and dynamic scenes. In this paper, we present and benchmark three new approaches to model and predict multi-agent interactions in dense scenes, including the use of an intuitive qualitative representation. The proposed solutions take into account static and dynamic context to predict individual interactions. They exploit an input- and a temporal-attention mechanism, and are tested on medium and long-term time horizons. The first two approaches integrate different relations from the so-called Qualitative Trajectory Calculus (QTC) within a state-of-the-art deep neural network to create a symbol-driven neural architecture for predicting spatial interactions. The third approach implements a purely data-driven network for motion prediction, the output of which is post-processed to predict QTC spatial interactions. Experimental results on a popular robot dataset of challenging crowded scenarios show that the purely data-driven prediction approach generally outperforms the other two. The three approaches were further evaluated on a different but related human scenarios to assess their generalisation capability.
    摘要 deploying 服务机器人在我们日常生活中,无论在餐厅、仓库或医院,需要对快速发展的 dense 和动态场景进行理解和预测。本文提出了三种新的方法来模型和预测多个机器人之间的互动,包括使用直观的Qualitative Trajectory Calculus(QTC)来创建一个符号驱动的深度学习架构。这三种方法都考虑了静止和动态上下文,并使用输入和时间注意力机制。它们在中期和长期时间 horizon 上测试。第一种方法将不同的QTC关系 integrate 到现有的深度学习网络中,以创建一个符号驱动的深度学习架构。第二种方法使用一种纯数据驱动的网络进行运动预测,并对输出进行后处理以预测QTC的空间互动。实验结果表明,纯数据驱动预测方法在一个流行的机器人数据集上通常表现出色,并且在不同但相关的人类场景中进行评估表明了泛化能力。

Breaking the Metric Voting Distortion Barrier

  • paper_url: http://arxiv.org/abs/2306.17838
  • repo_url: None
  • paper_authors: Moses Charikar, Prasanna Ramakrishnan, Kangning Wang, Hongxun Wu
  • for: 这 paper 的目的是设计一个投票规则,使得选举的候选人的平均距离选民最小。
  • methods: 这 paper 使用了一些新的投票规则,包括 Maximal Lotteries 和一些 hybrids 规则,以及一些抽样技术来实现更好的选举结果。
  • results: 这 paper 实现了一个新的投票规则,可以 garantuee 选举结果的偏差在 $2.753$ 以下。
    Abstract We consider the following well studied problem of metric distortion in social choice. Suppose we have an election with $n$ voters and $m$ candidates who lie in a shared metric space. We would like to design a voting rule that chooses a candidate whose average distance to the voters is small. However, instead of having direct access to the distances in the metric space, each voter gives us a ranked list of the candidates in order of distance. Can we design a rule that regardless of the election instance and underlying metric space, chooses a candidate whose cost differs from the true optimum by only a small factor (known as the distortion)? A long line of work culminated in finding deterministic voting rules with metric distortion $3$, which is the best possible for deterministic rules and many other classes of voting rules. However, without any restrictions, there is still a significant gap in our understanding: Even though the best lower bound is substantially lower at $2.112$, the best upper bound is still $3$, which is attained even by simple rules such as Random Dictatorship. Finding a rule that guarantees distortion $3 - \varepsilon$ for some constant $\varepsilon $ has been a major challenge in computational social choice. In this work, we give a rule that guarantees distortion less than $2.753$. To do so we study a handful of voting rules that are new to the problem. One is Maximal Lotteries, a rule based on the Nash equilibrium of a natural zero-sum game which dates back to the 60's. The others are novel rules that can be thought of as hybrids of Random Dictatorship and the Copeland rule. Though none of these rules can beat distortion $3$ alone, a careful randomization between Maximal Lotteries and any of the novel rules can.
    摘要 我们考虑了一个已经广泛研究过的社会选择问题,即度量扭曲问题。假设我们有一个选举有 $n$ 名选民和 $m$ 名候选人,这些候选人 lying in a shared metric space。我们想设计一个投票规则,使得选举出的候选人的平均距离选民小。然而,每名选民不直接给我们提供度量空间中的距离,而是给我们一个排序列表,其中每个候选人的排名顺序与其度量距离有关。我们可以设计一个规则,使得 regardless of the election instance and underlying metric space,选举出的候选人的成本与真实优质差不多(known as distortion)?一个长时间的工作最终导致了确定性投票规则的度量扭曲为3,这是确定性规则的最佳可能性,以及许多其他类型的投票规则。然而,没有任何限制,我们对于这个问题还是处于不够理解的阶段:尽管最佳下限比较低,大约为2.112,但是最好的上限仍然是3,这是由简单的规则如随机专制所实现。找到一个规则,可以保证度量扭曲小于2.753的常数ε的一个大型挑战在计算社会选择中。在这个工作中,我们提出了一个规则,可以保证度量扭曲小于2.753。为了实现这一点,我们研究了一些新的投票规则,其中之一是最大抽签规则,这是一种基于60年代的自然零Sum游戏的纳什平衡的规则。另外几个规则可以被视为Random Dictatorship和Copeland规则的混合体。虽然这些规则无法独立超过度量扭曲3,但是通过在Maximal Lotteries和这些新规则之间进行精细的随机化,可以实现更好的性能。

Resetting the Optimizer in Deep RL: An Empirical Study

  • paper_url: http://arxiv.org/abs/2306.17833
  • repo_url: None
  • paper_authors: Kavosh Asadi, Rasool Fakoor, Shoham Sabach
  • for: 这个论文主要是为了解决深度学习强化学习中的优值函数近似问题。
  • methods: 这个论文使用了现代variants of stochastic gradient descent algorithm,如Adam,来解决优化问题。这些优化器会在每个迭代中维护自己的内部参数,如梯度的首项和二项估计值,并在时间推移中更新这些参数。因此,在前一个迭代的信息会在当前迭代中影响优化器的内部参数。
  • results: 这个研究发现,使用这个简单的重置策略可以减轻这种影响,并使深度RL在Atari benchmark上表现出显著的改善。
    Abstract We focus on the task of approximating the optimal value function in deep reinforcement learning. This iterative process is comprised of approximately solving a sequence of optimization problems where the objective function can change per iteration. The common approach to solving the problem is to employ modern variants of the stochastic gradient descent algorithm such as Adam. These optimizers maintain their own internal parameters such as estimates of the first and the second moment of the gradient, and update these parameters over time. Therefore, information obtained in previous iterations is being used to solve the optimization problem in the current iteration. We hypothesize that this can contaminate the internal parameters of the employed optimizer in situations where the optimization landscape of the previous iterations is quite different from the current iteration. To hedge against this effect, a simple idea is to reset the internal parameters of the optimizer when starting a new iteration. We empirically investigate this resetting strategy by employing various optimizers in conjunction with the Rainbow algorithm. We demonstrate that this simple modification unleashes the true potential of modern optimizers, and significantly improves the performance of deep RL on the Atari benchmark.
    摘要 我团队关注深度游戏学习中的优化函数近似问题。这是一个迭代进行的过程,其中每个迭代都可能有不同的优化目标函数。通常来说,使用现代化的泊松逻辑梯度下降算法,如Adam,来解决这些优化问题。这些优化器会保持自己的内部参数,如梯度的首个和第二个 moment的估计,并在时间推移中更新这些参数。因此,在前一轮的信息被使用来解决当前轮的优化问题。我们提出了一个简单的想法,即在开始新轮的时候,重置优化器的内部参数。我们通过使用不同的优化器和雨bow算法进行实验,证明了这种简单的修改可以减轻现代优化器的内部参数污染问题,并在Atari测试benchmark上显著提高深度RL的性能。

Understanding Unfairness via Training Concept Influence

  • paper_url: http://arxiv.org/abs/2306.17828
  • repo_url: None
  • paper_authors: Yuanshun Yao, Yang Liu
  • for: 本研究旨在帮助实践者更好地理解他们的数据和算法是如何不公正的。这是一个相对未曾受到充分研究的问题。
  • methods: 我们通过对训练数据进行ounterfactual intervening和改变样本来研究这个问题。我们定义了一些概念,如特征(X)、标签(Y)和敏感特征(A),然后使用Influence function来计算每个样本对模型不公正性的影响。
  • results: 我们的框架可以帮助实践者理解观察到的不公正性,并修复训练数据。此外,它还可以应用于检测杂 Labeling、修复表示不均衡的问题,以及检测公正性 targeted poisoning 攻击。
    Abstract Knowing the causes of a model's unfairness helps practitioners better understand their data and algorithms. This is an important yet relatively unexplored task. We look into this problem through the lens of the training data - one of the major sources of unfairness. We ask the following questions: how would a model's fairness performance change if, in its training data, some samples (1) were collected from a different (e.g. demographic) group, (2) were labeled differently, or (3) some features were changed? In other words, we quantify the fairness influence of training samples by counterfactually intervening and changing samples based on predefined concepts, i.e. data attributes such as features (X), labels (Y), or sensitive attributes (A). To calculate a training sample's influence on the model's unfairness w.r.t a concept, we first generate counterfactual samples based on the concept, i.e. the counterfactual versions of the sample if the concept were changed. We then calculate the resulting impact on the unfairness, via influence function, if the counterfactual samples were used in training. Our framework not only helps practitioners understand the observed unfairness and repair their training data, but also leads to many other applications, e.g. detecting mislabeling, fixing imbalanced representations, and detecting fairness-targeted poisoning attacks.
    摘要 知道模型的不公平性的原因可以帮助实践者更好地理解其数据和算法。这是一个重要 yet 相对未经探索的任务。我们通过训练数据的镜像来查看这个问题。我们问的问题是:如果在模型的训练数据中有些样本(1)来自不同的群体(例如,人口学特征),(2)被标注 differently,或(3)某些特征被改变。换句话说,我们量化训练样本对模型的不公平性的影响,通过对概念(例如特征X、标签Y、敏感特征A)进行定义后,对样本进行counterfactual intervening,以计算样本对模型的不公平性的影响。我们的框架不仅帮助实践者理解观察到的不公平性和修复训练数据,还导致了许多其他应用,例如检测杂 labeling、修复不均衡表示、检测公平性-targeted 攻击。

DisCo: Disentangled Control for Referring Human Dance Generation in Real World

  • paper_url: http://arxiv.org/abs/2307.00040
  • repo_url: https://github.com/Wangt-CN/DisCo
  • paper_authors: Tan Wang, Linjie Li, Kevin Lin, Chung-Ching Lin, Zhengyuan Yang, Hanwang Zhang, Zicheng Liu, Lijuan Wang
  • for: This paper focuses on the problem of Referring Human Dance Generation, which aims to generate realistic human dance images and videos with precise target poses and diverse appearances, while also generalizing to unseen subjects, backgrounds, and poses.
  • methods: The proposed approach, DISCO, includes a novel model architecture with disentangled control and an effective human attribute pre-training to improve the faithfulness and compositionality of dance synthesis.
  • results: DISCO is able to generate high-quality human dance images and videos with diverse appearances and flexible motions, as demonstrated through extensive qualitative and quantitative results.
    Abstract Generative AI has made significant strides in computer vision, particularly in image/video synthesis conditioned on text descriptions. Despite the advancements, it remains challenging especially in the generation of human-centric content such as dance synthesis. Existing dance synthesis methods struggle with the gap between synthesized content and real-world dance scenarios. In this paper, we define a new problem setting: Referring Human Dance Generation, which focuses on real-world dance scenarios with three important properties: (i) Faithfulness: the synthesis should retain the appearance of both human subject foreground and background from the reference image, and precisely follow the target pose; (ii) Generalizability: the model should generalize to unseen human subjects, backgrounds, and poses; (iii) Compositionality: it should allow for composition of seen/unseen subjects, backgrounds, and poses from different sources. To address these challenges, we introduce a novel approach, DISCO, which includes a novel model architecture with disentangled control to improve the faithfulness and compositionality of dance synthesis, and an effective human attribute pre-training for better generalizability to unseen humans. Extensive qualitative and quantitative results demonstrate that DISCO can generate high-quality human dance images and videos with diverse appearances and flexible motions. Code, demo, video and visualization are available at: https://disco-dance.github.io/.
    摘要 “生成AI在计算机视觉领域做出了重要的进步,特别是在文本描述条件下的图像/视频生成。尽管有了进步,但是在人类中心的内容生成方面仍然存在挑战,如舞蹈生成。现有的舞蹈生成方法难以满足现实世界舞蹈场景中的差异。在这篇论文中,我们定义了一个新的问题设定:参照人类舞蹈生成(Referring Human Dance Generation),其特征包括:(i)忠诚度:生成的内容保留人体前景和背景的真实表现,并准确跟踪目标姿势;(ii)普适性:模型能够通过不同的人体、背景和姿势进行普适化;(iii)组合性:允许将已见/未见的人体、背景和姿势进行组合。为了解决这些挑战,我们提出了一种新的方法 DISCO,其包括一种新的模型架构,以提高舞蹈生成的忠诚度和组合性,以及一种有效的人体特征预训练,以提高对未见人体的普适性。我们的EXTENSIVE质量和量值结果表明,DISCO可以生成高质量的人类舞蹈图像和视频,具有多样的外表和灵活的动作。代码、示例、视频和可见化在:https://disco-dance.github.io/。”

Act3D: Infinite Resolution Action Detection Transformer for Robotic Manipulation

  • paper_url: http://arxiv.org/abs/2306.17817
  • repo_url: None
  • paper_authors: Theophile Gervet, Zhou Xian, Nikolaos Gkanatsios, Katerina Fragkiadaki
  • for: This paper proposes a manipulation policy called Act3D, which is designed to improve the accuracy and efficiency of robot manipulation tasks.
  • methods: Act3D uses a Transformer architecture to cast 6-DoF keypose prediction as 3D detection with adaptive spatial computation. It takes as input 3D feature clouds unprojected from one or more camera views, and uses relative spatial attention to select the best feature point for end-effector pose prediction.
  • results: Act3D achieves a new state-of-the-art in RLbench, an established manipulation benchmark, with 10% absolute improvement over the previous SOTA 2D multi-view policy and 22% absolute improvement with 3x less compute over the previous SOTA 3D policy.Here is the summary in Traditional Chinese:
  • for: 本研究提出了一个名为Act3D的抓取策略,旨在提高机器人抓取任务的精度和效率。
  • methods: Act3D使用Transformer架构,将6DoF关键位预测转换为3D探测,并使用相对的空间注意力选择最佳的特征点 для抓取器的姿势预测。
  • results: Act3D在RLbench中 achieve了新的州际优秀成绩,与前一代2D多视点策略相比提高了10%的绝对成绩,并且与前一代3D策略相比提高了22%的绝对成绩,并且仅需3x的计算量。
    Abstract 3D perceptual representations are well suited for robot manipulation as they easily encode occlusions and simplify spatial reasoning. Many manipulation tasks require high spatial precision in end-effector pose prediction, typically demanding high-resolution 3D perceptual grids that are computationally expensive to process. As a result, most manipulation policies operate directly in 2D, foregoing 3D inductive biases. In this paper, we propose Act3D, a manipulation policy Transformer that casts 6-DoF keypose prediction as 3D detection with adaptive spatial computation. It takes as input 3D feature clouds unprojected from one or more camera views, iteratively samples 3D point grids in free space in a coarse-to-fine manner, featurizes them using relative spatial attention to the physical feature cloud, and selects the best feature point for end-effector pose prediction. Act3D sets a new state-of-the-art in RLbench, an established manipulation benchmark. Our model achieves 10% absolute improvement over the previous SOTA 2D multi-view policy on 74 RLbench tasks and 22% absolute improvement with 3x less compute over the previous SOTA 3D policy. In thorough ablations, we show the importance of relative spatial attention, large-scale vision-language pre-trained 2D backbones, and weight tying across coarse-to-fine attentions. Code and videos are available at our project site: https://act3d.github.io/.
    摘要 三维感知表示是机器人操作中非常适合的,因为它们轻松地表示 occlusion 和简化空间理解。许多操作任务需要高精度的末端器pose预测,通常需要高分辨率的三维感知格,这些格是计算成本较高的。因此,大多数操作策略直接在二维上运行,忽略三维逻辑。在这篇论文中,我们提出了 Act3D,一种 manipulate 策略 transformer,将六个自由度的钥匙pose预测视为三维检测,并在可变的空间计算中进行适应计算。它从一或多个相机视图中提取的三维特征云,逐步 samples 三维点网格在自由空间中,将其特征化使用相对空间注意力,并选择最佳特征点 для末端器pose预测。Act3D 创造了RLbench中的新状态机,在74个RLbench任务中达到了10%的绝对提升,并在22%的绝对提升和3x的计算量下超越了先前的SOTA 3D策略。在严格的拓展中,我们显示了相对空间注意力的重要性,大规模的视语言预训练2D脊梁,以及粗略至细致的拓展关系的重要性。代码和视频可以在我们项目网站上获取:

Comparing Reinforcement Learning and Human Learning using the Game of Hidden Rules

  • paper_url: http://arxiv.org/abs/2306.17766
  • repo_url: None
  • paper_authors: Eric Pulick, Vladimir Menkov, Yonatan Mintz, Paul Kantor, Vicki Bier
  • for: 本研究旨在 investigate the impact of task structure on human learning (HL) and reinforcement learning (RL) performance.
  • methods: 研究使用了一个 especialized learning environment 来rigorously study the effect of task structure on HL and RL.
  • results: 研究发现,人类和RL算法在不同任务结构下表现有很大差异。
    Abstract Reliable real-world deployment of reinforcement learning (RL) methods requires a nuanced understanding of their strengths and weaknesses and how they compare to those of humans. Human-machine systems are becoming more prevalent and the design of these systems relies on a task-oriented understanding of both human learning (HL) and RL. Thus, an important line of research is characterizing how the structure of a learning task affects learning performance. While increasingly complex benchmark environments have led to improved RL capabilities, such environments are difficult to use for the dedicated study of task structure. To address this challenge we present a learning environment built to support rigorous study of the impact of task structure on HL and RL. We demonstrate the environment's utility for such study through example experiments in task structure that show performance differences between humans and RL algorithms.
    摘要 <>RL 技术的可靠实际应用需要深刻理解它们的优劣点和与人类学习方法的比较。人机系统在日益普遍,这些系统的设计受到了任务导向的人类学习(HL)和 RL 的影响。因此,研究任务结构对学习表现的影响是一项重要的研究方向。虽然RL algoritmas的能力在复杂的标准环境中得到了提高,但这些环境困难用于专门研究任务结构的问题。为解决这个挑战,我们提供了一个支持严谨研究任务结构对HL和RL表现的学习环境。我们通过一些例子实验表明了这个环境在研究任务结构的问题上的实用性。

cs.CL - 2023-07-01

Improving Text Matching in E-Commerce Search with A Rationalizable, Intervenable and Fast Entity-Based Relevance Model

  • paper_url: http://arxiv.org/abs/2307.00370
  • repo_url: None
  • paper_authors: Jiong Cai, Yong Jiang, Yue Zhang, Chengyue Jiang, Ke Yu, Jianhui Ji, Rong Xiao, Haihong Tang, Tao Wang, Zhongqiang Huang, Pengjun Xie, Fei Huang, Kewei Tu
  • for: 该论文主要目标是提高电商搜索系统中的搜索效果,尤其是预测用户查询的项目是否相关。
  • methods: 该论文提出了一种基于实体的相关性模型(EBRM),通过将查询-项目(QI)相关性问题拆分成多个查询-实体(QE)相关性问题,然后使用软逻辑形式进行汇聚,以提高准确率和搜索效率。
  • results: 该论文在电商网站上的实验结果表明,EBRM可以获得了可观的改善,同时具有计算效率的优势。
    Abstract Discovering the intended items of user queries from a massive repository of items is one of the main goals of an e-commerce search system. Relevance prediction is essential to the search system since it helps improve performance. When online serving a relevance model, the model is required to perform fast and accurate inference. Currently, the widely used models such as Bi-encoder and Cross-encoder have their limitations in accuracy or inference speed respectively. In this work, we propose a novel model called the Entity-Based Relevance Model (EBRM). We identify the entities contained in an item and decompose the QI (query-item) relevance problem into multiple QE (query-entity) relevance problems; we then aggregate their results to form the QI prediction using a soft logic formulation. The decomposition allows us to use a Cross-encoder QE relevance module for high accuracy as well as cache QE predictions for fast online inference. Utilizing soft logic makes the prediction procedure interpretable and intervenable. We also show that pretraining the QE module with auto-generated QE data from user logs can further improve the overall performance. The proposed method is evaluated on labeled data from e-commerce websites. Empirical results show that it achieves promising improvements with computation efficiency.
    摘要 发现用户查询的目标项从大量的项目库中是电商搜索系统的主要目标之一。准确预测 relevance 是搜索系统的关键因素,它可以提高系统的性能。现在,广泛使用的模型,如Bi-encoder 和 Cross-encoder,它们在准确性和推理速度之间存在限制。在这种工作中,我们提出了一种新的模型,即基于实体的准确性模型(EBRM)。我们Identify 在项目中含有的实体,并将 QI(查询项)准确性问题分解为多个 QE(查询实体)准确性问题;然后,我们将其结果聚合以形成 QI 预测,使用软逻辑表述。这种分解允许我们使用 Cross-encoder QE 准确性模块以获得高准确性,同时缓存 QE 预测以便在线推理。使用软逻辑使预测过程可见和可操作。此外,我们还表明了在用户日志中自动生成的 QE 数据进行预处理可以进一步提高总性能。我们的方法在电商网站上的标注数据上进行了实验,实验结果表明,它可以获得了可观的改进,同时具有计算效率。

BatGPT: A Bidirectional Autoregessive Talker from Generative Pre-trained Transformer

  • paper_url: http://arxiv.org/abs/2307.00360
  • repo_url: None
  • paper_authors: Zuchao Li, Shitou Zhang, Hai Zhao, Yifei Yang, Dongjie Yang
  • for: batGPT是一种大规模语言模型,用于生成自然流畅的文本回应不同类型的输入,包括文本提示、图像和音频。
  • methods: 模型采用双向autoregressive架构,可以高效地捕捉自然语言中的复杂依赖关系,使其在语言生成、对话系统和问答等任务中表现出色。此外,双向autoregressive模型不仅从左到右运行,还从右到左运行,有效地减少固定内存效应和模型幻想。
  • results: 通过提出novel parameter expansion方法,可以利用 smaller模型的预训练和人工智能和人类反馈的强化学习,提高模型的对齐性性能。总的来说,这些方法有效地提高了batGPT的效果,并可以在各种自然语言应用中使用。
    Abstract BatGPT is a large-scale language model designed and trained jointly by Wuhan University and Shanghai Jiao Tong University. It is capable of generating highly natural and fluent text in response to various types of input, including text prompts, images, and audio. In the modeling level, we employ a bidirectional autoregressive architecture that allows the model to efficiently capture the complex dependencies of natural language, making it highly effective in tasks such as language generation, dialog systems, and question answering. Moreover, the bidirectional autoregressive modeling not only operates from left to right but also from right to left, effectively reducing fixed memory effects and alleviating model hallucinations. In the training aspect, we propose a novel parameter expansion method for leveraging the pre-training of smaller models and employ reinforcement learning from both AI and human feedback, aimed at improving the model's alignment performance. Overall, these approaches significantly improve the effectiveness of BatGPT, and the model can be utilized for a wide range of natural language applications.
    摘要 batgpt是一种大规模语言模型,由武汉大学和上海交通大学共同设计和训练。它可以生成高度自然和流畅的文本响应不同类型的输入,包括文本提示、图像和音频。在模型层次上,我们采用了双向 autoregressive 架构,使模型可以有效地捕捉自然语言中的复杂依赖关系,从而在语言生成、对话系统和问答等任务中表现出色。此外,双向 autoregressive 模型不仅从左到右还从右到左运行,有效地减少固定内存效应和解决模型幻想现象。在训练方面,我们提出了一种新的参数扩展方法,利用小型模型的预训练和人工智能和人类反馈的强化学习,以提高模型的对齐性。总的来说,这些方法有效地提高了 batgpt 的效果,该模型可以应用于各种自然语言应用。

Improving Multitask Retrieval by Promoting Task Specialization

  • paper_url: http://arxiv.org/abs/2307.00342
  • repo_url: https://github.com/wenzhengzhang/taco
  • paper_authors: Wenzheng Zhang, Chenyan Xiong, Karl Stratos, Arnold Overwijk
  • for: 这个论文目的是提高多任务检索的性能,并且比用单个任务特定的检索方法更高效。
  • methods: 这个论文使用了一种新的多任务学习方法,该方法可以使 Parameters 在不同的任务上特化。此外,文章还使用了一种适应学习方法,以便每个参数都可以在特定任务上特化。
  • results: 根据 KILT 测试 benchmark,这个多任务检索方法可以高效地 Retrieval 多个任务上的相关文献。并且,文章的分析表明,这个方法实际上学习了更加任务特化的参数,比单个任务检索方法更高效。
    Abstract In multitask retrieval, a single retriever is trained to retrieve relevant contexts for multiple tasks. Despite its practical appeal, naive multitask retrieval lags behind task-specific retrieval in which a separate retriever is trained for each task. We show that it is possible to train a multitask retriever that outperforms task-specific retrievers by promoting task specialization. The main ingredients are: (1) a better choice of pretrained model (one that is explicitly optimized for multitasking) along with compatible prompting, and (2) a novel adaptive learning method that encourages each parameter to specialize in a particular task. The resulting multitask retriever is highly performant on the KILT benchmark. Upon analysis, we find that the model indeed learns parameters that are more task-specialized compared to naive multitasking without prompting or adaptive learning.
    摘要 在多任务检索中,单个检索器被训练来检索多个任务的相关上下文。尽管这有实践的吸引力,但是直接的多任务检索 lag behind 每个任务特定的检索器。我们示示可以通过促进任务专业化来训练一个高性能的多任务检索器。主要的成分是:(1)更好的预训练模型(一个explicitly optimized for multitasking),以及与其兼容的提示,和(2)一种新型的 adaptive learning 方法,该方法使每个参数特化在特定的任务中。结果的多任务检索器在 KILT benchmark 上表现出色。经分析发现,模型实际上学习了更特化于任务的参数,相比无提示或adaptive learning的多任务检索无法达到这种水平。

Single Sequence Prediction over Reasoning Graphs for Multi-hop QA

  • paper_url: http://arxiv.org/abs/2307.00335
  • repo_url: None
  • paper_authors: Gowtham Ramesh, Makesh Sreedhar, Junjie Hu
  • for: 这个论文的目的是提高多步问答(QA)模型的准确率和可解性。
  • methods: 这个论文使用了一种基于本地逻辑图的单序预测方法(\model),通过在每个问题上的关键实体之间建立图 estructure来提高模型的准确率和可解性。
  • results: 实验结果显示,这个方法可以在HotpotQA数据集上提高答案匹配/F1分数和理由路径的准确性,并在Musique数据集上达到了当前最佳数据。
    Abstract Recent generative approaches for multi-hop question answering (QA) utilize the fusion-in-decoder method~\cite{izacard-grave-2021-leveraging} to generate a single sequence output which includes both a final answer and a reasoning path taken to arrive at that answer, such as passage titles and key facts from those passages. While such models can lead to better interpretability and high quantitative scores, they often have difficulty accurately identifying the passages corresponding to key entities in the context, resulting in incorrect passage hops and a lack of faithfulness in the reasoning path. To address this, we propose a single-sequence prediction method over a local reasoning graph (\model)\footnote{Code/Models will be released at \url{https://github.com/gowtham1997/SeqGraph} that integrates a graph structure connecting key entities in each context passage to relevant subsequent passages for each question. We use a graph neural network to encode this graph structure and fuse the resulting representations into the entity representations of the model. Our experiments show significant improvements in answer exact-match/F1 scores and faithfulness of grounding in the reasoning path on the HotpotQA dataset and achieve state-of-the-art numbers on the Musique dataset with only up to a 4\% increase in model parameters.
    摘要 最近的生成方法 для多步问答(QA)使用融合在decoder中的方法~\cite{izacard-grave-2021-leveraging}来生成一个单个序列输出,该输出包括最终答案以及用于到达该答案的思维路径,例如段落标题和关键事实。这些模型可能会导致更好的解释性和高量级分数,但它们经常在正确地标识问题中的关键段落上遇到困难,从而导致错误的段落跳跃和思维路径的不准确。为解决这个问题,我们提出了基于本地逻辑图(\model)的单序列预测方法,该方法使用逻辑图结构连接每个问题上的关键实体与其相关的后续段落。我们使用图神经网络来编码这个逻辑图结构,并将其与模型中的实体表示进行融合。我们的实验表明,使用我们的方法可以在HotpotQA数据集上提高答案匹配分/F1分数和思维路径的固有性。此外,我们在Musique数据集上达到了状态之最的数据,只需要增加模型参数4\%。

Let Me Teach You: Pedagogical Foundations of Feedback for Language Models

  • paper_url: http://arxiv.org/abs/2307.00279
  • repo_url: None
  • paper_authors: Beatriz Borges, Niket Tandon, Tanja Käser, Antoine Bosselut
  • for: 本研究旨在提供一种Feedback Framework(FELT),用于对Large Language Models(LLMs)进行人工定制。
  • methods: 本研究使用了教学学科中已有的Feedback模型,将其应用于NLF领域,并提出了一个Feedback内容分类法。
  • results: 研究发现,不同类型的反馈对LLMs的修订生成有不同的影响,并提出了一些新的可能性 дляNLF研究。
    Abstract Natural Language Feedback (NLF) is an increasingly popular avenue to align Large Language Models (LLMs) to human preferences. Despite the richness and diversity of the information it can convey, NLF is often hand-designed and arbitrary. In a different world, research in pedagogy has long established several effective feedback models. In this opinion piece, we compile ideas from pedagogy to introduce FELT, a feedback framework for LLMs that outlines the various characteristics of the feedback space, and a feedback content taxonomy based on these variables. Our taxonomy offers both a general mapping of the feedback space, as well as pedagogy-established discrete categories, allowing us to empirically demonstrate the impact of different feedback types on revised generations. In addition to streamlining existing NLF designs, FELT also brings out new, unexplored directions for research in NLF. We make our taxonomy available to the community, providing guides and examples for mapping our categorizations to future resources.
    摘要 自然语言反馈(NLF)是现在吸引着越来越多的研究者的一个热门领域,用于将大语言模型(LLM)与人类的偏好相对应。尽管NLF可以传递各种多样化的信息,但是它们frequently hand-designed和arbitrary。在另一个世界,教学研究已经长期确立了多种有效的反馈模型。在这篇观点文章中,我们从教学研究中综合提出了FELT,一个反馈框架 для LLM,其中包括反馈空间的多种特征和基于这些变量的反馈内容分类法。我们的分类法不仅提供了反馈空间的总体地图,还提供了教学研究确立的明确分类,使我们能够实证不同类型的反馈对修改后的生成的影响。除了使 existed NLF设计更加流畅,FELT还探索了未曾被研究的NLF方向。我们将我们的分类法公布给社区,并提供了将我们的分类映射到未来资源的指南和示例。

Discovering Patterns of Definitions and Methods from Scientific Documents

  • paper_url: http://arxiv.org/abs/2307.01216
  • repo_url: None
  • paper_authors: Yutian Sun, Hai Zhuge
  • for: 这个论文主要是为了提出一种方法来自动抽取科学文献中的定义和方法。
  • methods: 该方法基于自然语言处理技术,包括定义和方法的分别抽取、完整性验证等步骤。
  • results: 实验表明,该方法可以准确地抽取科学文献中的定义和方法,并且可以在不同的应用场景中进行修改或扩展。
    Abstract The difficulties of automatic extraction of definitions and methods from scientific documents lie in two aspects: (1) the complexity and diversity of natural language texts, which requests an analysis method to support the discovery of pattern; and, (2) a complete definition or method represented by a scientific paper is usually distributed within text, therefore an effective approach should not only extract single sentence definitions and methods but also integrate the sentences to obtain a complete definition or method. This paper proposes an analysis method for discovering patterns of definition and method and uses the method to discover patterns of definition and method. Completeness of the patterns at the semantic level is guaranteed by a complete set of semantic relations that identify definitions and methods respectively. The completeness of the patterns at the syntactic and lexical levels is guaranteed by syntactic and lexical constraints. Experiments on the self-built dataset and two public definition datasets show that the discovered patterns are effective. The patterns can be used to extract definitions and methods from scientific documents and can be tailored or extended to suit other applications.
    摘要 科学文献中定义和方法的自动提取具有两个方面的挑战:一是自然语言文本的复杂性和多样性,需要一种分析方法来支持发现模式;二是科学论文中的定义和方法通常分散在文本中,因此一种有效的方法不仅需要提取单句定义和方法,还需要将它们集成起来,以获得完整的定义和方法。本文提出了一种分析方法,用于发现定义和方法的模式,并使用该方法在自己建立的数据集和两个公共定义数据集上进行实验,实验结果表明,发现的模式具有完整性。这些模式可以用来从科学文献中提取定义和方法,并可以根据需要进行修改或扩展。

How far is Language Model from 100% Few-shot Named Entity Recognition in Medical Domain

  • paper_url: http://arxiv.org/abs/2307.00186
  • repo_url: https://github.com/toneli/rt-retrieving-and-thinking
  • paper_authors: Mingchen Li, Rui Zhang
  • for: 这paper的目的是对医疗领域中LMs的性能进行全面的研究,以及探讨如何使用LMs来提高NER表现。
  • methods: 这paper使用了16种NER模型,从2018年到2023年进行了广泛的实验,并结合了一种简单有效的方法 called \textsc{RT} (Retrieving and Thinking),以提高NER表现。
  • results: 实验结果表明,LMs在医疗领域中的少数例NER任务中表现更好,但仍然存在一些挑战,如误认、模板预测等。\textsc{RT}方法在两个开源医疗benchmark数据集上显著超过了强开放基线。
    Abstract Recent advancements in language models (LMs) have led to the emergence of powerful models such as Small LMs (e.g., T5) and Large LMs (e.g., GPT-4). These models have demonstrated exceptional capabilities across a wide range of tasks, such as name entity recognition (NER) in the general domain. (We define SLMs as pre-trained models with fewer parameters compared to models like GPT-3/3.5/4, such as T5, BERT, and others.) Nevertheless, their efficacy in the medical section remains uncertain and the performance of medical NER always needs high accuracy because of the particularity of the field. This paper aims to provide a thorough investigation to compare the performance of LMs in medical few-shot NER and answer How far is LMs from 100\% Few-shot NER in Medical Domain, and moreover to explore an effective entity recognizer to help improve the NER performance. Based on our extensive experiments conducted on 16 NER models spanning from 2018 to 2023, our findings clearly indicate that LLMs outperform SLMs in few-shot medical NER tasks, given the presence of suitable examples and appropriate logical frameworks. Despite the overall superiority of LLMs in few-shot medical NER tasks, it is important to note that they still encounter some challenges, such as misidentification, wrong template prediction, etc. Building on previous findings, we introduce a simple and effective method called \textsc{RT} (Retrieving and Thinking), which serves as retrievers, finding relevant examples, and as thinkers, employing a step-by-step reasoning process. Experimental results show that our proposed \textsc{RT} framework significantly outperforms the strong open baselines on the two open medical benchmark datasets
    摘要 最近的语言模型(LM)的进步已导致小型LM(例如T5)和大型LM(例如GPT-4)的出现。这些模型在各种任务上表现出色,如通用领域中的名实体识别(NER)。(我们定义SLMs为预训练模型,比如GPT-3/3.5/4,T5、BERT等。)然而,它们在医疗领域的表现仍然存在uncertainty,因为医疗领域的特殊性。本文旨在对LMs在医疗领域的少量NER任务进行全面的调查,以确定LMs在这些任务中的表现有多好,并且探讨一种有效的实体识别器,以提高NER表现。根据我们在2018年至2023年间进行的广泛实验,我们的发现显示,LMs在医疗领域的少量NER任务中表现出色,尤其当给出合适的示例和适当的逻辑框架时。尽管LMs在这些任务中的总表现优于SLMs,但它们仍然面临一些挑战,如误认、错误的模板预测等。基于之前的发现,我们提出了一种简单有效的方法,称为\textsc{RT}(检索和思考),它可以作为检索器,找到相关的示例,并作为思考者,采用步骤式的思考过程。实验结果显示,我们的提议的\textsc{RT}框架在两个开源医疗benchmark数据集上得到了显著的改进。

Still No Lie Detector for Language Models: Probing Empirical and Conceptual Roadblocks

  • paper_url: http://arxiv.org/abs/2307.00175
  • repo_url: https://github.com/balevinstein/probes
  • paper_authors: B. A. Levinstein, Daniel A. Herrmann
  • for: 这篇论文探讨了大语言模型(LLMs)是否具有信念,以及如果它们具有信念,如何测量它们。
  • methods: 论文评估了两种现有的方法,一种是由Azaria和Mitchell(2023)提出的,另一种是由Burns等人(2022)提出的。
  • results: 论文提供了实验结果,表明这两种方法在基本上无法泛化。然后,论文 argue了这些方法不太可能成功,因为LLMs具有信念是一个概念上的问题。因此,目前还没有一种可靠的侦测LLMs的信念的方法。
    Abstract We consider the questions of whether or not large language models (LLMs) have beliefs, and, if they do, how we might measure them. First, we evaluate two existing approaches, one due to Azaria and Mitchell (2023) and the other to Burns et al. (2022). We provide empirical results that show that these methods fail to generalize in very basic ways. We then argue that, even if LLMs have beliefs, these methods are unlikely to be successful for conceptual reasons. Thus, there is still no lie-detector for LLMs. After describing our empirical results we take a step back and consider whether or not we should expect LLMs to have something like beliefs in the first place. We consider some recent arguments aiming to show that LLMs cannot have beliefs. We show that these arguments are misguided. We provide a more productive framing of questions surrounding the status of beliefs in LLMs, and highlight the empirical nature of the problem. We conclude by suggesting some concrete paths for future work.
    摘要 我们考虑了大语言模型(LLM)是否有信仰的问题,以及如果它们有的话,我们如何测量它们。首先,我们评估了两种现有的方法,一种由Azaria和Mitchell(2023)提出,另一种由Burns等人(2022)提出。我们提供了实验结果,表明这些方法在非常基础的方面无法泛化。我们然后 argue that,即使 LLMs 有信仰,这些方法是不可能成功的,这些方法因概念上的原因。因此,目前还没有一种lie-detector для LLMs。我们然后承认我们的实验结果,然后考虑 LLMs 是否应该有类似于信仰的问题。我们考虑了一些最近的Arguments,它们声称 LLMs 不能有信仰。我们表示这些Arguments是误导的。我们提供了一种更产生的框架,用于问题周围的信仰的状态。我们结束时,建议了一些未来工作的具体路径。

What do self-supervised speech models know about words?

  • paper_url: http://arxiv.org/abs/2307.00162
  • repo_url: None
  • paper_authors: Ankita Pasad, Chung-Ming Chien, Shane Settle, Karen Livescu
  • for: 这个研究是为了探究不同的自助学习speech模型(S3M)是如何编码语言信息的,以及这些模型是否可以学习单词单元。
  • methods: 研究使用了三种S3M模型:wav2vec2、HuBERT和WavLM,并使用了可读性 corr 分析(CCA)来测试这些模型的层次结构是否具有语言特征。
  • results: 研究发现,最佳的单词语言特征通常位于模型中间层次结构中,而一些更低级的信息,如发音,也保留在HuBERT和WavLM的高层次结构中。同时,研究发现在不同层次结构中,模型的性能也有明显的层次特征。
    Abstract Many self-supervised speech models (S3Ms) have been introduced over the last few years, producing performance and data efficiency improvements for a variety of speech tasks. Evidence is emerging that different S3Ms encode linguistic information in different layers, and also that some S3Ms appear to learn phone-like sub-word units. However, the extent to which these models capture larger linguistic units, such as words, and where word-related information is encoded, remains unclear. In this study, we conduct several analyses of word segment representations extracted from different layers of three S3Ms: wav2vec2, HuBERT, and WavLM. We employ canonical correlation analysis (CCA), a lightweight analysis tool, to measure the similarity between these representations and word-level linguistic properties. We find that the maximal word-level linguistic content tends to be found in intermediate model layers, while some lower-level information like pronunciation is also retained in higher layers of HuBERT and WavLM. Syntactic and semantic word attributes have similar layer-wise behavior. We also find that, for all of the models tested, word identity information is concentrated near the center of each word segment. We then test the layer-wise performance of the same models, when used directly with no additional learned parameters, on several tasks: acoustic word discrimination, word segmentation, and semantic sentence similarity. We find similar layer-wise trends in performance, and furthermore, find that when using the best-performing layer of HuBERT or WavLM, it is possible to achieve performance on word segmentation and sentence similarity that rivals more complex existing approaches.
    摘要

SMILE: Evaluation and Domain Adaptation for Social Media Language Understanding

  • paper_url: http://arxiv.org/abs/2307.00135
  • repo_url: None
  • paper_authors: Vasilisa Bashlovkina, Riley Matthews, Zhaobin Kuang, Simon Baumgartner, Michael Bendersky
  • for: 这 paper 的目的是研究基于 transformer 语言模型 (LMs) 理解社交媒体语言。
  • methods: 这 paper 使用了一种新的 Social MedIa Language Evaluation (SMILE) 标准,以评估 LM 在社交媒体上的表现。
  • results: 研究发现,基于 social media 和标准语言的混合预训练可以提高 LM 的表现,比最佳相似的替代方案提高了4.2个 SMILE 分数。
    Abstract We study the ability of transformer-based language models (LMs) to understand social media language. Social media (SM) language is distinct from standard written language, yet existing benchmarks fall short of capturing LM performance in this socially, economically, and politically important domain. We quantify the degree to which social media language differs from conventional language and conclude that the difference is significant both in terms of token distribution and rate of linguistic shift. Next, we introduce a new benchmark for Social MedIa Language Evaluation (SMILE) that covers four SM platforms and eleven tasks. Finally, we show that learning a tokenizer and pretraining on a mix of social media and conventional language yields an LM that outperforms the best similar-sized alternative by 4.2 points on the overall SMILE score.
    摘要 我们研究基于转换器的语言模型(LM)在社交媒体语言理解方面的能力。社交媒体语言与普通的书面语言有所不同,但现有的标准 benchmark 无法准确地测试LM在这一重要领域的性能。我们评估社交媒体语言与普通语言之间的差异,并发现这些差异在字符分布和语言变革速度方面都是显著的。然后,我们介绍了一个新的社交媒体语言评估标准(SMILE),该标准覆盖了四个社交媒体平台和十一个任务。最后,我们展示了一种使用社交媒体和普通语言的tokenizer和预训练的LM,其在总体SMILE分数上超过了相同大小的相似性LM的最佳选择 by 4.2分。

iMETRE: Incorporating Markers of Entity Types for Relation Extraction

  • paper_url: http://arxiv.org/abs/2307.00132
  • repo_url: None
  • paper_authors: N Harsha Vardhan, Manav Chaudhary
  • for: 本文是关于 sentence-level 关系抽象(RE)在金融数据集 REFinD 中进行研究的论文。
  • methods: 本文使用了类型 entity marker 表示法和特制化的模型,在验证集上达到了69.65%的 F1 分数。
  • results: 本文在验证集上实现了69.65%的 F1 分数,并讨论了多种方法和可能的限制。
    Abstract Sentence-level relation extraction (RE) aims to identify the relationship between 2 entities given a contextual sentence. While there have been many attempts to solve this problem, the current solutions have a lot of room to improve. In this paper, we approach the task of relationship extraction in the financial dataset REFinD. Our approach incorporates typed entity markers representations and various models finetuned on the dataset, which has allowed us to achieve an F1 score of 69.65% on the validation set. Through this paper, we discuss various approaches and possible limitations.
    摘要 句子关系EXTRACTION (RE) 目标是在给定一个上下文句子中identify两个实体之间的关系。虽然有很多人尝试解决这个问题,但目前的解决方案还有很大的改进空间。在这篇论文中,我们对金融数据集ReFiND的关系EXTRACTION问题进行了approach。我们的approach使用了类型标记表示和适应于数据集的多种模型,这使得我们在验证集上达到了69.65%的F1分数。在这篇论文中,我们讨论了多种方法和可能的限制。

Information Extraction in Domain and Generic Documents: Findings from Heuristic-based and Data-driven Approaches

  • paper_url: http://arxiv.org/abs/2307.00130
  • repo_url: None
  • paper_authors: Shiyu Yuan, Carlo Lipizzi
  • for: 本研究旨在investigate文本处理领域中information extraction(IE)任务中document genre和length的影响。
  • methods: 本研究采用了两种主流实现方法:heuristic-based searching和data-driven learning。
  • results: 研究发现,不同的文档特点和 genre具有不同的extraction outcome。 Specifically, short documents may yield better accuracy results, while generic documents may exhibit superior extraction outcomes due to training document genre limitations. Additionally, different semantic roles exhibited varying accuracy levels with the same method.
    Abstract Information extraction (IE) plays very important role in natural language processing (NLP) and is fundamental to many NLP applications that used to extract structured information from unstructured text data. Heuristic-based searching and data-driven learning are two main stream implementation approaches. However, no much attention has been paid to document genre and length influence on IE tasks. To fill the gap, in this study, we investigated the accuracy and generalization abilities of heuristic-based searching and data-driven to perform two IE tasks: named entity recognition (NER) and semantic role labeling (SRL) on domain-specific and generic documents with different length. We posited two hypotheses: first, short documents may yield better accuracy results compared to long documents; second, generic documents may exhibit superior extraction outcomes relative to domain-dependent documents due to training document genre limitations. Our findings reveals that no single method demonstrated overwhelming performance in both tasks. For named entity extraction, data-driven approaches outperformed symbolic methods in terms of accuracy, particularly in short texts. In the case of semantic roles extraction, we observed that heuristic-based searching method and data-driven based model with syntax representation surpassed the performance of pure data-driven approach which only consider semantic information. Additionally, we discovered that different semantic roles exhibited varying accuracy levels with the same method. This study offers valuable insights for downstream text mining tasks, such as NER and SRL, when addressing various document features and genres.
    摘要 信息抽取(IE)在自然语言处理(NLP)中扮演着非常重要的角色,对许多NLP应用程序的结构化信息抽取具有基本性。使用规则基本搜索和数据驱动学习是两种主流实现方法。然而,对文档类型和长度的影响很少得到了关注。为了填补这一空白,在本研究中,我们investigated the accuracy和generalization能力 of heuristic-based searching和数据驱动学习在两个IE任务中:名称实体识别(NER)和Semantic Role Labeling(SRL)中进行了两个IE任务。我们提出了两个假设:一、短文本可能比长文本更高的准确率;二、通用文档可能相对于领域特定文档更好地进行抽取,因为训练文档类型的限制。我们的发现表明,没有任何方法在两个任务中表现出优异的表现。对名称实体抽取而言,数据驱动方法在短文本中的准确率较高,特别是使用语法表示。在Semantic Role Labeling任务中,我们发现,使用符号基本搜索方法和数据驱动基于语法表示的模型可以超越数据驱动方法的性能,特别是在短文本中。此外,我们发现不同的semantic role具有不同的准确率水平。本研究对下游文本挖掘任务,如NER和SRL,提供了有价值的发现,当Addressing various document features and genres时。

Meta-training with Demonstration Retrieval for Efficient Few-shot Learning

  • paper_url: http://arxiv.org/abs/2307.00119
  • repo_url: None
  • paper_authors: Aaron Mueller, Kanika Narang, Lambert Mathias, Qifan Wang, Hamed Firooz
  • for: 用于增强几何学模型在几何学任务上的泛化能力和效率。
  • methods: 使用示例检索来提供更多的相似示例,以增强模型的学习和泛化能力。
  • results: 比较多种目标任务上的表现,包括SQuAD、QNLI和TREC等。
    Abstract Large language models show impressive results on few-shot NLP tasks. However, these models are memory and computation-intensive. Meta-training allows one to leverage smaller models for few-shot generalization in a domain-general and task-agnostic manner; however, these methods alone results in models that may not have sufficient parameterization or knowledge to adapt quickly to a large variety of tasks. To overcome this issue, we propose meta-training with demonstration retrieval, where we use a dense passage retriever to retrieve semantically similar labeled demonstrations to each example for more varied supervision. By separating external knowledge from model parameters, we can use meta-training to train parameter-efficient models that generalize well on a larger variety of tasks. We construct a meta-training set from UnifiedQA and CrossFit, and propose a demonstration bank based on UnifiedQA tasks. To our knowledge, our work is the first to combine retrieval with meta-training, to use DPR models to retrieve demonstrations, and to leverage demonstrations from many tasks simultaneously, rather than randomly sampling demonstrations from the training set of the target task. Our approach outperforms a variety of targeted parameter-efficient and retrieval-augmented few-shot methods on QA, NLI, and text classification tasks (including SQuAD, QNLI, and TREC). Our approach can be meta-trained and fine-tuned quickly on a single GPU.
    摘要 大型语言模型在少量示例NLPT任务中显示出很好的结果,但这些模型占用内存和计算资源很多。元训练可以使用较小的模型进行少量示例总结,但这些方法独立使用不具备足够的参数或知识来快速适应多种任务。为解决这个问题,我们提议使用示例检索和元训练,使用笛卡尔 passage retriever来检索每个示例的相似标注示例,以提供更多的多样化监督。通过将外部知识与模型参数分离,我们可以使用元训练来训练效率高的模型,并且这些模型可以在多种任务上总结良好。我们使用UnifiedQA和CrossFit构建元训练集,并提出一个基于UnifiedQA任务的示例银行。根据我们所知,我们的方法是首次将检索与元训练结合使用,使用DPR模型来检索示例,并同时使用多个任务的示例来帮助模型快速适应。我们的方法在SQuAD、QNLI和文本分类任务上(包括SQuAD、QNLI和TREC)表现出色,并且可以快速在单个GPU上meta-训练和精度调整。

SPAE: Semantic Pyramid AutoEncoder for Multimodal Generation with Frozen LLMs

  • paper_url: http://arxiv.org/abs/2306.17842
  • repo_url: https://github.com/google-research/magvit
  • paper_authors: Lijun Yu, Yong Cheng, Zhiruo Wang, Vivek Kumar, Wolfgang Macherey, Yanping Huang, David A. Ross, Irfan Essa, Yonatan Bisk, Ming-Hsuan Yang, Kevin Murphy, Alexander G. Hauptmann, Lu Jiang
  • for: 该论文旨在启用冻结的自然语言处理模型(LLM)进行多modal任务,包括图像理解和生成任务。
  • methods: 该论文提出了Semantic Pyramid AutoEncoder(SPAE),用于将图像转化为可读性强的lexical token(或词),这些token捕捉了图像的semantic meaning和细节信息,可以让LLM进行多modal任务。
  • results: 该论文通过在冻结PaLM 2和GPT 3.5上进行Context Learning实验, Validated SPAE的方法可以使LLM在图像理解和生成任务中表现出state-of-the-art的水平,在同一个设定下,比前一个方法提高了25%以上。
    Abstract In this work, we introduce Semantic Pyramid AutoEncoder (SPAE) for enabling frozen LLMs to perform both understanding and generation tasks involving non-linguistic modalities such as images or videos. SPAE converts between raw pixels and interpretable lexical tokens (or words) extracted from the LLM's vocabulary. The resulting tokens capture both the semantic meaning and the fine-grained details needed for visual reconstruction, effectively translating the visual content into a language comprehensible to the LLM, and empowering it to perform a wide array of multimodal tasks. Our approach is validated through in-context learning experiments with frozen PaLM 2 and GPT 3.5 on a diverse set of image understanding and generation tasks. Our method marks the first successful attempt to enable a frozen LLM to generate image content while surpassing state-of-the-art performance in image understanding tasks, under the same setting, by over 25%.
    摘要 在这个工作中,我们介绍了含义峰自动编码器(SPAE),用于让冻结的 LLM 能够执行包括非语言类modalities(如图像或视频)的理解和生成任务。 SPAE 将raw像素转化为 LLM 词汇中提取的可读取的lexical token(或词),这些token capture了semantic meaning以及需要于视觉重建的细节信息,从而将视觉内容翻译成可以被 LLM 理解的语言,并赋予它执行多modal任务的能力。我们的方法通过封闭PaLM 2和 GPT 3.5在多种图像理解和生成任务上进行 Context Learning 实验验证。我们的方法成功地使得冻结的 LLM 能够生成图像内容,同时在同一个设定下,胜过state-of-the-art的图像理解任务性能,提高了25%以上。

Statler: State-Maintaining Language Models for Embodied Reasoning

  • paper_url: http://arxiv.org/abs/2306.17840
  • repo_url: https://github.com/statler-lm/statler-lm.github.io
  • paper_authors: Takuma Yoneda, Jiading Fang, Peng Li, Huanyu Zhang, Tianchong Jiang, Shengjie Lin, Ben Picker, David Yunis, Hongyuan Mei, Matthew R. Walter
  • for: 提高 LLM 在长时间规划中的能力,使 robot 能够更好地完成复杂的任务。
  • methods: 使用两个特定的 LLM 实例,一个读取世界模型,一个写入世界模型,以维护世界状态的表示。
  • results: 在三个虚拟表格 manipulation 领域和一个真实 robot 领域中,实现了比现有 LLM 更高的表示能力。
    Abstract Large language models (LLMs) provide a promising tool that enable robots to perform complex robot reasoning tasks. However, the limited context window of contemporary LLMs makes reasoning over long time horizons difficult. Embodied tasks such as those that one might expect a household robot to perform typically require that the planner consider information acquired a long time ago (e.g., properties of the many objects that the robot previously encountered in the environment). Attempts to capture the world state using an LLM's implicit internal representation is complicated by the paucity of task- and environment-relevant information available in a robot's action history, while methods that rely on the ability to convey information via the prompt to the LLM are subject to its limited context window. In this paper, we propose Statler, a framework that endows LLMs with an explicit representation of the world state as a form of ``memory'' that is maintained over time. Integral to Statler is its use of two instances of general LLMs -- a world-model reader and a world-model writer -- that interface with and maintain the world state. By providing access to this world state ``memory'', Statler improves the ability of existing LLMs to reason over longer time horizons without the constraint of context length. We evaluate the effectiveness of our approach on three simulated table-top manipulation domains and a real robot domain, and show that it improves the state-of-the-art in LLM-based robot reasoning. Project website: https://statler-lm.github.io/
    摘要 大型语言模型(LLM)提供了一种承诺的工具,帮助机器人执行复杂的机器人逻辑任务。然而,当前LLM的限定上下文窗口使得在长时间规划中难以进行逻辑。附属任务,如家用机器人所需的任务,通常需要计划器考虑在环境中遇到的多种物体的性质。使用LLM的隐式内部表示 capture 世界状态困难,因为机器人的行动历史中缺乏任务和环境相关的信息。此外,通过提示来传递信息给LLM的方法受到了上下文窗口的限制。在本文中,我们提出了Statler框架,它使得LLM具有一种“记忆”的显式表示,并且在时间上保持这个表示。Statler使用两个通用LLM实例——世界模型读取器和世界模型写入器——与世界状态进行交互和维护。通过提供这个世界状态“记忆”,Statler使得现有的LLM可以在不受上下文长度限制的情况下进行较长时间规划。我们在三个模拟的桌面拼接任务和一个真实机器人任务上进行了评估,并证明了我们的方法可以超越现有的LLM-基于机器人逻辑的状态。项目网站:https://statler-lm.github.io/

Meta-Reasoning: Semantics-Symbol Deconstruction For Large Language Models

  • paper_url: http://arxiv.org/abs/2306.17820
  • repo_url: None
  • paper_authors: Yiming Wang, Zhuosheng Zhang, Rui Wang
  • for: 提高大语言模型(LLM)的理解能力
  • methods: 使用自然语言中的符号来简化LLM的学习过程
  • results: 在符号理解任务中,GPT-3(文本-达文西-002)可以达到99%的准确率,比现有的LLM更高,只需要一次Meta-Reasoning示例。
    Abstract Symbolization methods in large language models (LLMs) have been shown effective to improve LLMs' reasoning ability. However, most of these approaches hinge on mapping natural languages to formal languages (e.g., Python, SQL) that are more syntactically complete and free of ambiguity. Although effective, they depart from the natural language itself and deviate from the habits of human thinking, and instead cater more to the execution mindset of computers. In contrast, we hope to simplify natural language by starting from the concept of symbols in linguistics itself, so that LLMs can learn the common formulation and general solution of reasoning problems wrapped in different natural semantics. From this consideration, we propose \textbf{Meta-Reasoning}, which allows LLMs to automatically accomplish semantic-symbol deconstruction, i.e., semantic resolution, to maximally reduce different questions of certain reasoning tasks to similar natural language representation, thus gaining the ability to learn by analogy and facilitating data-efficient in-context learning. Our experiments show that the Meta-Reasoning paradigm saliently enhances LLMs' reasoning performance with fewer demonstrations. They can learn not only reasoning chains but also general solutions to certain types of tasks. In particular, for symbolic reasoning tasks, such as 7-step Tracking Shuffled Objects, GPT-3 (text-davinci-002) achieves over 99% accuracy with only one Meta-Reasoning demonstration, outperforming all current LLMs with the standard chain-of-thought prompting.
    摘要 大型自然语言模型(LLM)中的符号化方法已经被证明可以提高LLM的理解能力。然而,大多数这些方法都基于将自然语言映射到更加完整和不含歧义的 formaL语言(如Python、SQL)中。虽然有效,但这些方法偏离自然语言本身,而且更倾向于计算机的执行意识,而不是人类的思维习惯。相反,我们希望通过从语言学中的符号概念开始,使LLM可以学习自然语言中的通用形式和通用解决方案,从而在不同的自然语言表示中解决理解问题。基于这种考虑,我们提出了“Meta-Reasoning”,允许LLM通过自动完成语义符号解构,即语义解析,将不同的理解任务问题最大化地减少到相似的自然语言表示,从而获得学习analogy的能力和数据效率的在 Context 中学习。我们的实验显示,Meta-Reasoning方法能够明显提高LLM的理解性能,只需要 fewer demonstrations。它不仅可以学习理解链,还可以学习ertain types of tasks的通用解决方案。例如,对于符号逻辑任务,如7步追踪排序物品,GPT-3(text-davinci-002)可以在一个Meta-Reasoning示例下达到99%的准确率,超过当前所有LLMs的标准链条提问。

A Massive Scale Semantic Similarity Dataset of Historical English

  • paper_url: http://arxiv.org/abs/2306.17810
  • repo_url: None
  • paper_authors: Emily Silcock, Melissa Dell
  • for: 本研究使用新闻稿来构建一个大规模的 semantic similarity 数据集,覆盖70年时间段从1920年到1989年,包含约400万个正面 semantic similarity 对。
  • methods: 本研究使用了深度神经网络方法来检测articles的来源,并利用文档格式和语言理解来关联articles和其关联的headlines。
  • results: 研究得到了一个公共可用的 HEADLINES 数据集,覆盖了一段很长的时间段和大量的正面 semantic similarity 对,可以用于许多任务,如研究 semantic change 的发展和变化。
    Abstract A diversity of tasks use language models trained on semantic similarity data. While there are a variety of datasets that capture semantic similarity, they are either constructed from modern web data or are relatively small datasets created in the past decade by human annotators. This study utilizes a novel source, newly digitized articles from off-copyright, local U.S. newspapers, to assemble a massive-scale semantic similarity dataset spanning 70 years from 1920 to 1989 and containing nearly 400M positive semantic similarity pairs. Historically, around half of articles in U.S. local newspapers came from newswires like the Associated Press. While local papers reproduced articles from the newswire, they wrote their own headlines, which form abstractive summaries of the associated articles. We associate articles and their headlines by exploiting document layouts and language understanding. We then use deep neural methods to detect which articles are from the same underlying source, in the presence of substantial noise and abridgement. The headlines of reproduced articles form positive semantic similarity pairs. The resulting publicly available HEADLINES dataset is significantly larger than most existing semantic similarity datasets and covers a much longer span of time. It will facilitate the application of contrastively trained semantic similarity models to a variety of tasks, including the study of semantic change across space and time.
    摘要 各种任务都利用基于Semantic similarity的语言模型训练。尽管有很多 datasets 捕捉 Semantic similarity,但它们是从现代网络数据构建或者是过去十年人工标注者创建的较小的 datasets。这个研究使用一种新的来源, newly digitized 的Off-copyright 的美国地方报纸文章,拼接了70年从1920年到1989年的大规模Semantic similarity dataset,包含约400万个正面Semantic similarity pair。在过去,约半个美国地方报纸文章来自新闻 wire like the Associated Press。而地方报纸 reproduce 文章,但是它们写了自己的标题,这些标题形成了报纸文章的抽象摘要。我们利用文档布局和语言理解来相关articles和其标题,然后使用深度神经网络来检测articles 是否来自同一个源,在存在较大的噪音和缩短后。 reproduce 文章的标题形成了正面Semantic similarity pair。 resulting publicly available HEADLINES dataset 比大多数现有的Semantic similarity datasets更大,覆盖了一个更长的时间间隔。它将为各种任务,包括时间和空间上的Semantic change 研究提供一个更大的 dataset。

Stay on topic with Classifier-Free Guidance

  • paper_url: http://arxiv.org/abs/2306.17806
  • repo_url: https://github.com/Vermeille/cfg-llm
  • paper_authors: Guillaume Sanchez, Honglu Fan, Alexander Spangher, Elad Levi, Pawan Sasanka Ammanamanchi, Stella Biderman
  • for: 这个论文的目的是探讨Classifier-Free Guidance(CFG)在文本到图像生成中作为轻量级技术,以提高提示遵循性。
  • methods: 这篇论文使用CFG作为推理时间技术,并在不同任务中(如问答、逻辑、代码生成和machine translation)进行了广泛的应用,并取得了LAMBADA中LLaMA-7B模型的SOTA成绩,超过PaLM-540B。
  • results: 这篇论文的结果表明,CFG可以提高Pythia、GPT-2和LLaMA-family模型的性能,相当于增加了模型参数的两倍,并可以与其他推理时间方法(如链条思维和自我一致)相结合,以提高在困难任务中的性能。此外,CFG还可以增加助手的准确性和 coherence,在 complex 提示中表现出优异。
    Abstract Classifier-Free Guidance (CFG) has recently emerged in text-to-image generation as a lightweight technique to encourage prompt-adherence in generations. In this work, we demonstrate that CFG can be used broadly as an inference-time technique in pure language modeling. We show that CFG (1) improves the performance of Pythia, GPT-2 and LLaMA-family models across an array of tasks: Q\&A, reasoning, code generation, and machine translation, achieving SOTA on LAMBADA with LLaMA-7B over PaLM-540B; (2) brings improvements equivalent to a model with twice the parameter-count; (3) can stack alongside other inference-time methods like Chain-of-Thought and Self-Consistency, yielding further improvements in difficult tasks; (4) can be used to increase the faithfulness and coherence of assistants in challenging form-driven and content-driven prompts: in a human evaluation we show a 75\% preference for GPT4All using CFG over baseline.
    摘要 classifier-free guidance (CFG) 是一种近期在文本到图像生成中出现的轻量级技术,用于鼓励描述遵循。在这项工作中,我们示示了 CFG 可以广泛应用于纯语言模型的推理时间阶段。我们发现 CFG 可以:(1)提高 Pyithia、GPT-2 和 LLaMA 家族模型在各种任务上的表现:问答、理解、代码生成和机器翻译,并在 LAMBADA 上与 LLaMA 7B 超过 PaLM 540B 的 SOTA;(2)提供一个比 Paramater-count 为 twice 的模型的改进;(3)可以与其他推理时间方法 like Chain-of-Thought 和 Self-Consistency 混合使用,在困难任务中提供进一步的改进;(4)可以用来增加助手的准确性和 coherence,在复杂的形式驱动和内容驱动的提问中表现出人类评价中的75% 的偏好。

Voting-based Multimodal Automatic Deception Detection

  • paper_url: http://arxiv.org/abs/2307.07516
  • repo_url: None
  • paper_authors: Lana Touma, Mohammad Al Horani, Manar Tailouni, Anas Dahabiah, Khloud Al Jallad
  • for: 本研究旨在提出一种投票方法来自动检测谎言,并使用机器学习和深度学习技术对视频、音频和文本特征进行检测。
  • methods: 本研究使用了三种模型来实现投票方法,包括一个用于从图像中检测谎言的CNN模型,一个用于从音频中检测谎言的SVM模型,以及一个用于从文本中检测谎言的Word2Vec-SVM模型。
  • results: 本研究的实验结果显示,提出的投票方法在两个 datasets 上均达到了州际先进水平,图像、音频和文本特征的检测精度分别为97%、96%和92%。
    Abstract Automatic Deception Detection has been a hot research topic for a long time, using machine learning and deep learning to automatically detect deception, brings new light to this old field. In this paper, we proposed a voting-based method for automatic deception detection from videos using audio, visual and lexical features. Experiments were done on two datasets, the Real-life trial dataset by Michigan University and the Miami University deception detection dataset. Video samples were split into frames of images, audio, and manuscripts. Our Voting-based Multimodal proposed solution consists of three models. The first model is CNN for detecting deception from images, the second model is Support Vector Machine (SVM) on Mel spectrograms for detecting deception from audio and the third model is Word2Vec on Support Vector Machine (SVM) for detecting deception from manuscripts. Our proposed solution outperforms state of the art. Best results achieved on images, audio and text were 97%, 96%, 92% respectively on Real-Life Trial Dataset, and 97%, 82%, 73% on video, audio and text respectively on Miami University Deception Detection.
    摘要 自动欺骗检测已经是长期的研究热点,使用机器学习和深度学习自动检测欺骗,带来新的灯光。在这篇论文中,我们提出了基于投票的多Modal自动欺骗检测方法,使用图像、音频和文本特征。我们的提议方案包括三个模型:首先是用Convolutional Neural Network (CNN)检测图像中的欺骗,第二个是使用Support Vector Machine (SVM)对音频的Mel spectrogram进行检测,第三个是使用Word2Vec和SVM对文本进行检测。我们的提议方案在比较的状况下表现出色,在Real-Life Trial Dataset上得到了97%、96%和92%的最佳结果,在Miami University Deception Detection Dataset上得到了97%、82%和73%的最佳结果。

Towards Improving the Performance of Pre-Trained Speech Models for Low-Resource Languages Through Lateral Inhibition

  • paper_url: http://arxiv.org/abs/2306.17792
  • repo_url: None
  • paper_authors: Andrei-Marius Avram, Răzvan-Alexandru Smădu, Vasile Păiş, Dumitru-Clementin Cercel, Radu Ion, Dan Tufiş
  • for: 提高 speech 模型的表现,特别是在 low-resource 语言上。
  • methods: 取代 fine-tuning 稠密层,使用 lateral inhibition 层,这种层 Draw inspiration from biological process。
  • results: 在 Romanian 语言上测试, average improvement 12.5% word error rate (WER)。此外,在 Romanian Speech Corpus 和 Robin Technical Acquisition Corpus 上都达到了 state-of-the-art result,WER 分别为 1.78% 和 29.64%。
    Abstract With the rise of bidirectional encoder representations from Transformer models in natural language processing, the speech community has adopted some of their development methodologies. Therefore, the Wav2Vec models were introduced to reduce the data required to obtain state-of-the-art results. This work leverages this knowledge and improves the performance of the pre-trained speech models by simply replacing the fine-tuning dense layer with a lateral inhibition layer inspired by the biological process. Our experiments on Romanian, a low-resource language, show an average improvement of 12.5% word error rate (WER) using the lateral inhibition layer. In addition, we obtain state-of-the-art results on both the Romanian Speech Corpus and the Robin Technical Acquisition Corpus with 1.78% WER and 29.64% WER, respectively.
    摘要 随着Transformer模型的bidirectionalEncoder Representation在自然语言处理领域的普及,speech社区开始采用其开发方法。因此,Wav2Vec模型被引入,以降低获得state-of-the-art结果所需的数据量。本工作利用这些知识,改进了预训练的语音模型性能,通过将细致调教层换为后alerter Inhibition层,这种层取自生物过程。我们的实验表明,在罗马尼亚语言中,使用后alerter Inhibition层可以提高语音识别率(WER)的平均值12.5%。此外,我们在罗马尼亚语言 corpus和Robin Technical Acquisition corpus上都获得了state-of-the-art结果,具体的WER分别为1.78%和29.64%。

Should you marginalize over possible tokenizations?

  • paper_url: http://arxiv.org/abs/2306.17757
  • repo_url: https://github.com/naver/marginalization
  • paper_authors: Nadezhda Chirkova, Germán Kruszewski, Jos Rozen, Marc Dymetman
  • for: 本研究探讨了autoregressive语言模型(LMs)是否正确地计算字符串的概率。
  • methods: 作者提出了一种基于重要性采样的算法,可以计算概率的 marginalization,并与常见的方法进行比较。
  • results: 研究发现,大多数情况下,忽略 marginalization 的做法并不会导致显著的损失(log-likelihood gap 在0.5%之内),但在具有长复杂单词的数据上,差异变得更明显。
    Abstract Autoregressive language models (LMs) map token sequences to probabilities. The usual practice for computing the probability of any character string (e.g. English sentences) is to first transform it into a sequence of tokens that is scored by the model. However, there are exponentially many token sequences that represent any given string. To truly compute the probability of a string one should marginalize over all tokenizations, which is typically intractable. Here, we analyze whether the practice of ignoring the marginalization is justified. To this end, we devise an importance-sampling-based algorithm that allows us to compute estimates of the marginal probabilities and compare them to the default procedure in a range of state-of-the-art models and datasets. Our results show that the gap in log-likelihood is no larger than 0.5% in most cases, but that it becomes more pronounced for data with long complex words.
    摘要 自然语言模型(LM)可以将字符序列映射到概率上。通常情况下,计算英文句子等字符串的概率时,首先将其转换为模型可以评估的Token序列,然后计算模型的概率。但是,任何给定的字符串都有无数个Token序列来表示它,因此,计算字符串的真实概率实际上是通过所有Token序列的重要性抽样来实现的。在这里,我们分析了是否可以忽略这种重要性抽样。为此,我们提出了一种基于重要性抽样的算法,允许我们计算重要性抽样后的概率并与标准过程进行比较。我们的结果表明,在大多数情况下,忽略重要性抽样后的概率差异不大于0.5%,但是在具有长复杂词的数据时,差异变得更加明显。

cs.LG - 2023-07-01

Minimizing Energy Consumption of Deep Learning Models by Energy-Aware Training

  • paper_url: http://arxiv.org/abs/2307.00368
  • repo_url: None
  • paper_authors: Dario Lazzaro, Antonio Emanuele Cinà, Maura Pintor, Ambra Demontis, Battista Biggio, Fabio Roli, Marcello Pelillo
  • for: 降低深度学习模型的能耗
  • methods: 使用梯度算法对模型训练进行精度补偿,以提高模型的能效性
  • results: 通过三个数据集和两种深度神经网络的实验分析,我们证明了我们的能源意识训练算法EAT可以培育出具有更好的平衡 между分类性能和能效性的网络。
    Abstract Deep learning models undergo a significant increase in the number of parameters they possess, leading to the execution of a larger number of operations during inference. This expansion significantly contributes to higher energy consumption and prediction latency. In this work, we propose EAT, a gradient-based algorithm that aims to reduce energy consumption during model training. To this end, we leverage a differentiable approximation of the $\ell_0$ norm, and use it as a sparse penalty over the training loss. Through our experimental analysis conducted on three datasets and two deep neural networks, we demonstrate that our energy-aware training algorithm EAT is able to train networks with a better trade-off between classification performance and energy efficiency.
    摘要

Understanding recent deep-learning techniques for identifying collective variables of molecular dynamics

  • paper_url: http://arxiv.org/abs/2307.00365
  • repo_url: None
  • paper_authors: Wei Zhang, Christof Schütte
  • for: 本研究探讨了使用深度学习技术来找出高维度不稳定分子系统中的一些特征变量(CV)的两种方法。
  • methods: 这两种方法分别是计算激发器或传输Operator的主要吸引函数的最大值,以及通过最小化重建误差来学习自适应oder。
  • results: 我们在 illustrate examples 上进行了比较 numerical 研究,并对两种方法的数学基础和实际应用进行了简洁的概述。
    Abstract High-dimensional metastable molecular system can often be characterised by a few features of the system, i.e. collective variables (CVs). Thanks to the rapid advance in the area of machine learning and deep learning, various deep learning-based CV identification techniques have been developed in recent years, allowing accurate modelling and efficient simulation of complex molecular systems. In this paper, we look at two different categories of deep learning-based approaches for finding CVs, either by computing leading eigenfunctions of infinitesimal generator or transfer operator associated to the underlying dynamics, or by learning an autoencoder via minimisation of reconstruction error. We present a concise overview of the mathematics behind these two approaches and conduct a comparative numerical study of these two approaches on illustrative examples.
    摘要 高维度征文分子系统可以通过一些系统的特征来描述,即集合变量(CV)。随着机器学习和深度学习领域的快速发展,各种深度学习基于CV标识技术在最近几年内得到了推广应用。在这篇文章中,我们分析了两种不同类型的深度学习基于CV标识方法,即计算极值生成器或传输Operator相关的征文动态的主要特征值,或通过重建错误来学习自适应器。我们介绍了这两种方法的数学基础,并对这两种方法在示例问题上进行了比较性数值研究。

The future of human-centric eXplainable Artificial Intelligence (XAI) is not post-hoc explanations

  • paper_url: http://arxiv.org/abs/2307.00364
  • repo_url: None
  • paper_authors: Vinitra Swamy, Jibril Frej, Tanja Käser
  • for: 本文提出了一个呼吁,即从现有的黑obox模型解释方法中扩展到设计可解释的神经网络架构,以解决现有的解释器有限制的问题。
  • methods: 本文提出了两种解释器设计方法,包括适应路由的可解释 conditional computation,以及诊断标准的Iterative Model Learning。
  • results: 本文认为,未来的人类中心的XAI不应该仅仅是解释黑obox,而是通过设计可解释的神经网络来实现。
    Abstract Explainable Artificial Intelligence (XAI) plays a crucial role in enabling human understanding and trust in deep learning systems, often defined as determining which features are most important to a model's prediction. As models get larger, more ubiquitous, and pervasive in aspects of daily life, explainability is necessary to avoid or minimize adverse effects of model mistakes. Unfortunately, current approaches in human-centric XAI (e.g. predictive tasks in healthcare, education, or personalized ads) tend to rely on a single explainer. This is a particularly concerning trend when considering that recent work has identified systematic disagreement in explainability methods when applied to the same points and underlying black-box models. In this paper, we therefore present a call for action to address the limitations of current state-of-the-art explainers. We propose to shift from post-hoc explainability to designing interpretable neural network architectures; moving away from approximation techniques in human-centric and high impact applications. We identify five needs of human-centric XAI (real-time, accurate, actionable, human-interpretable, and consistent) and propose two schemes for interpretable-by-design neural network workflows (adaptive routing for interpretable conditional computation and diagnostic benchmarks for iterative model learning). We postulate that the future of human-centric XAI is neither in explaining black-boxes nor in reverting to traditional, interpretable models, but in neural networks that are intrinsically interpretable.
    摘要 人工智能(XAI)的解释能力在深度学习系统中扮演了关键的角色,通常是确定模型预测中最重要的特征。随着模型变得更大、更普遍和生活中更加普遍,解释性变得必不可少,以避免或减少模型错误的影响。然而,当前的人类中心XAI方法(如医疗预测任务、教育预测任务或个性化广告)通常依赖于单一的解释器。这是一个特别关心的趋势,因为最近的研究发现了不同的解释方法在对同一点和基于黑盒模型时存在系统性的不一致。在这篇论文中,我们因此呼吁行动,以解决当前状态的限制。我们提议将从后续解释性过渡到设计可解释的神经网络架构,卸下 aproximation 技术在人类中心和高影响应用中。我们标识出了人类中心 XAI 中的五个需求(实时、准确、可行、人类可解释、一致),并提出了两种可解释性设计神经网络工作流程的方案(可解释的 conditional 计算路由和诊断标准 для迭代式模型学习)。我们认为未来的人类中心 XAI 不应该是解释黑盒子,nor should it revert to traditional, interpretable models, but in neural networks that are intrinsically interpretable。

A Comparative Study of Machine Learning Algorithms for Anomaly Detection in Industrial Environments: Performance and Environmental Impact

  • paper_url: http://arxiv.org/abs/2307.00361
  • repo_url: None
  • paper_authors: Álvaro Huertas-García, Carlos Martí-González, Rubén García Maezo, Alejandro Echeverría Rey
  • for: 本研究旨在开发一种可持续的人工智能(AI)和机器学习(ML)模型,用于避免异常检测中的高计算需求和相关的环境影响。
  • methods: 本研究采用了多种机器学习算法和不同的多层感知器(MLP)配置,并且对这些模型进行了仔细的评估。
  • results: 研究发现, tradicional的机器学习算法如决策树和随机森林可以达到强健的性能和效率,但是优化后的MLP配置可以提供更高的性能,尽管与此同时也增加了资源的消耗。
    Abstract In the context of Industry 4.0, the use of artificial intelligence (AI) and machine learning for anomaly detection is being hampered by high computational requirements and associated environmental effects. This study seeks to address the demands of high-performance machine learning models with environmental sustainability, contributing to the emerging discourse on 'Green AI.' An extensive variety of machine learning algorithms, coupled with various Multilayer Perceptron (MLP) configurations, were meticulously evaluated. Our investigation encapsulated a comprehensive suite of evaluation metrics, comprising Accuracy, Area Under the Curve (AUC), Recall, Precision, F1 Score, Kappa Statistic, Matthews Correlation Coefficient (MCC), and F1 Macro. Simultaneously, the environmental footprint of these models was gauged through considerations of time duration, CO2 equivalent, and energy consumption during the training, cross-validation, and inference phases. Traditional machine learning algorithms, such as Decision Trees and Random Forests, demonstrate robust efficiency and performance. However, superior outcomes were obtained with optimised MLP configurations, albeit with a commensurate increase in resource consumption. The study incorporated a multi-objective optimisation approach, invoking Pareto optimality principles, to highlight the trade-offs between a model's performance and its environmental impact. The insights derived underscore the imperative of striking a balance between model performance, complexity, and environmental implications, thus offering valuable directions for future work in the development of environmentally conscious machine learning models for industrial applications.
    摘要 在工业4.0上,人工智能(AI)和机器学习(ML)用于异常检测受到高度计算需求和相关环境影响的限制。本研究旨在满足高性能机器学习模型的环境可持续性需求,贡献到emerging Discourse on 'Green AI'。我们对各种机器学习算法和多层感知器(MLP)配置进行了精心评估。我们的调查包括了评估指标的总集,包括准确率、折线评估值(AUC)、回归率、精度、F1分数、kappa统计量、Matthews相关系数(MCC)和F1宏观量。同时,我们对这些模型的环境影响进行了评估,包括训练、验证和推理阶段的时间长度、二氧化碳等同和能耗。传统的机器学习算法,如决策树和随机森林,显示了出色的效率和性能。然而,优化的MLP配置可以提供更高的性能,尽管与此同时,资源消耗也增加了。我们的研究采用多目标优化方法, invoke Pareto可行性原理,以描述模型性能和环境影响之间的负担。研究结论表明,在开发工业应用中的机器学习模型时,需要 struck a balance между模型性能、复杂度和环境影响,从而提供有价值的指导 для未来的工作。

When Synthetic Data Met Regulation

  • paper_url: http://arxiv.org/abs/2307.00359
  • repo_url: None
  • paper_authors: Georgi Ganev
  • for: 本研究 argue that differentially private generative models 生成的伪数据可以具有足够的隐私保护,因此可以被认为是匿名数据和符合法规的。
  • methods: 本研究使用 differentially private generative models,such as 隐私均衡生成器和匿名生成器,来生成伪数据。
  • results: 本研究结果表明,使用 differentially private generative models 生成的伪数据可以具有足够的隐私保护,并且可以满足不同的隐私保护标准。
    Abstract In this paper, we argue that synthetic data produced by Differentially Private generative models can be sufficiently anonymized and, therefore, anonymous data and regulatory compliant.
    摘要 在这篇论文中,我们认为使用差分隐私生成模型生成的合成数据可以具备隐私和法规符合性。

Fedward: Flexible Federated Backdoor Defense Framework with Non-IID Data

  • paper_url: http://arxiv.org/abs/2307.00356
  • repo_url: None
  • paper_authors: Zekai Chen, Fuyi Wang, Zhiwei Zheng, Ximeng Liu, Yujie Lin
  • for: 防止 Federated Backdoor Attack (FBA) 在 Federated Learning (FL) 中,保护敏感本地数据的隐私。
  • methods: 引入 Flexible Federated Backdoor Defense Framework (Fedward),分解 FBA 为多种攻击,并使用 AmGrad 和 AutoOPTICS 等方法来解决每种攻击。同时,Fedward 使用 adaptive clipping method,通过限制边界上的样本数来维护 Non-IID 场景下的性能。
  • results: 对三个 benchmark 数据集进行了实验评估,与 state-of-the-art 研究进行了比较。结果表明,Fedward 能够有效防止 FBA,提高 clustering 防御方法的性能,并在 Non-IID 场景下保持最佳性能。特别是,Fedward 在 MNIST、FMNIST 和 CIFAR10 上的平均 FBA 成功率为 96.98%、90.74% 和 89.8%。
    Abstract Federated learning (FL) enables multiple clients to collaboratively train deep learning models while considering sensitive local datasets' privacy. However, adversaries can manipulate datasets and upload models by injecting triggers for federated backdoor attacks (FBA). Existing defense strategies against FBA consider specific and limited attacker models, and a sufficient amount of noise to be injected only mitigates rather than eliminates FBA. To address these deficiencies, we introduce a Flexible Federated Backdoor Defense Framework (Fedward) to ensure the elimination of adversarial backdoors. We decompose FBA into various attacks, and design amplified magnitude sparsification (AmGrad) and adaptive OPTICS clustering (AutoOPTICS) to address each attack. Meanwhile, Fedward uses the adaptive clipping method by regarding the number of samples in the benign group as constraints on the boundary. This ensures that Fedward can maintain the performance for the Non-IID scenario. We conduct experimental evaluations over three benchmark datasets and thoroughly compare them to state-of-the-art studies. The results demonstrate the promising defense performance from Fedward, moderately improved by 33% $\sim$ 75 in clustering defense methods, and 96.98%, 90.74%, and 89.8% for Non-IID to the utmost extent for the average FBA success rate over MNIST, FMNIST, and CIFAR10, respectively.
    摘要 federated learning (FL) 允许多个客户端共同训练深度学习模型,同时考虑本地敏感数据的隐私。然而,敌方可以 manipulate 数据和上传模型,通过注入触发器进行联邦后门攻击 (FBA)。现有的防御策略对 FBA 只考虑特定和有限的攻击者模型,并且只有一定的噪声可以 Mitigate 而不是消除 FBA。为解决这些不足,我们介绍了一个可靠的联邦后门防御框架 (Fedward),以确保消除敌意后门。我们将 FBA 分解为多种攻击,并设计了增强矩阵缩减 (AmGrad) 和自适应 OPTICS 划分 (AutoOPTICS) 来解决每种攻击。同时,Fedward 使用 adaptive clipping 方法,将数据集中减少缺失数据的影响。这确保了 Fedward 可以在非相同分布 (Non-IID) 场景中保持表现。我们对三个标准 benchmark 数据集进行了实验评估,并对比与现有研究。结果表明,Fedward 具有优秀的防御性能,与比较的 improved 33% 至 75% 在划分防御方法中,以及 96.98%、90.74% 和 89.8% 对 Non-IID 场景中的平均 FBA 成功率。

Sparse-Input Neural Network using Group Concave Regularization

  • paper_url: http://arxiv.org/abs/2307.00344
  • repo_url: https://github.com/r08in/gcrnn
  • paper_authors: Bin Luo, Susan Halabi
  • for: 本文研究了神经网络中的特征选择问题,尤其是在高维设置下,where the number of variables exceeds the available sample size in modeling.
  • methods: 我们提出了一种基于集合凹函数 regularization的含缺输入神经网络框架,通过对每个输入节点的所有出向连接权的$l_2$范数应用合适的凹函数罚则,以获得一个只使用小subset of the original variables的神经网络。
  • results: 我们的广泛的随机实验和实际数据示例表明,提议的估计器在特征选择和预测 kontinuous, binary, and time-to-event outcomes 方面具有满意的finite sample performance.
    Abstract Simultaneous feature selection and non-linear function estimation are challenging, especially in high-dimensional settings where the number of variables exceeds the available sample size in modeling. In this article, we investigate the problem of feature selection in neural networks. Although the group LASSO has been utilized to select variables for learning with neural networks, it tends to select unimportant variables into the model to compensate for its over-shrinkage. To overcome this limitation, we propose a framework of sparse-input neural networks using group concave regularization for feature selection in both low-dimensional and high-dimensional settings. The main idea is to apply a proper concave penalty to the $l_2$ norm of weights from all outgoing connections of each input node, and thus obtain a neural net that only uses a small subset of the original variables. In addition, we develop an effective algorithm based on backward path-wise optimization to yield stable solution paths, in order to tackle the challenge of complex optimization landscapes. Our extensive simulation studies and real data examples demonstrate satisfactory finite sample performances of the proposed estimator, in feature selection and prediction for modeling continuous, binary, and time-to-event outcomes.
    摘要 文本翻译为简化中文。<>模型中的同时特征选择和非线性函数估计是具有挑战性的,特别是在高维设置下,变量的数量超过了可用样本数量。在这篇文章中,我们研究了神经网络中的特征选择问题。虽然LASSO集团已经在学习神经网络中使用特征选择,但它往往选择无关的变量进行模型,以做到其过度压缩。为了解决这个限制,我们提议一种基于 sparse-input 神经网络的组合凹陷 regularization 的特征选择方法,在低维和高维设置下都可以实现。我们的主要想法是对每个输入节点的所有出going连接的 $l_2$ 范数加入一个合适的凹陷罚则,从而获得一个只使用原始变量的小子集的神经网络。此外,我们开发了一种基于后向路径优化的稳定解方法,以解决复杂的优化地形的挑战。我们的广泛的验证研究和实际数据示例表明,提议的估计器在特征选择和预测连续、二进制和时间事件的结果中具有满意的finite sample性能。

Recursive Algorithmic Reasoning

  • paper_url: http://arxiv.org/abs/2307.00337
  • repo_url: https://github.com/DJayalath/gnn-call-stack
  • paper_authors: Dulhan Jayalath, Jonas Jürß, Petar Veličković
  • for: 本文旨在探讨深度学习中对不同数据进行泛化的问题,并提出了一种基于图神经网络(GNN)的方法来解决这个问题。
  • methods: 本文提出了一种将GNN与堆结构相结合的方法,以及一种捕捉中间算法轨迹的方法,以提高深度学习网络对递归算法的适应性。
  • results: 本文的实验表明,与先前方法相比,该方法可以在处理更大的输入图时提高泛化性能。
    Abstract Learning models that execute algorithms can enable us to address a key problem in deep learning: generalizing to out-of-distribution data. However, neural networks are currently unable to execute recursive algorithms because they do not have arbitrarily large memory to store and recall state. To address this, we (1) propose a way to augment graph neural networks (GNNs) with a stack, and (2) develop an approach for capturing intermediate algorithm trajectories that improves algorithmic alignment with recursive algorithms over previous methods. The stack allows the network to learn to store and recall a portion of the state of the network at a particular time, analogous to the action of a call stack in a recursive algorithm. This augmentation permits the network to reason recursively. We empirically demonstrate that our proposals significantly improve generalization to larger input graphs over prior work on depth-first search (DFS).
    摘要 学习模型可以让我们解决深度学习中的一个关键问题:对于不同分布数据的泛化。然而,神经网络目前无法执行循环算法因为它们没有无限大的内存来存储和回忆状态。为了解决这个问题,我们(1)提出了将图神经网络(GNNs)与栈相结合的方法,并(2)开发了捕捉中间算法轨迹的方法,以提高对循环算法的算法对齐性。栈允许网络学习保存和回忆网络特定时刻的一部分状态,类似于循环算法中的调用栈。这种扩展允许网络进行循环计算。我们经验性地证明,我们的提议可以对输入图更好地泛化,比之前的深度先搜索(DFS)方法更好。

Single Sequence Prediction over Reasoning Graphs for Multi-hop QA

  • paper_url: http://arxiv.org/abs/2307.00335
  • repo_url: None
  • paper_authors: Gowtham Ramesh, Makesh Sreedhar, Junjie Hu
  • for: 这篇论文目的是提出一种基于本地逻辑图的单调预测方法,以提高多步问答(QA)模型的准确率和解释性。
  • methods: 该方法使用了一个基于本地逻辑图的图神经网络来编码关键实体在每个上下文段中的关系,并将其混合到模型中的实体表示中。
  • results: 对HotpotQA和Musique两个 datasets进行实验,该方法显示有显著提高的答案匹配/F1 分数和解释路径的固有性,而且在Musique dataset上达到了状态艺术数据的最佳数据。
    Abstract Recent generative approaches for multi-hop question answering (QA) utilize the fusion-in-decoder method~\cite{izacard-grave-2021-leveraging} to generate a single sequence output which includes both a final answer and a reasoning path taken to arrive at that answer, such as passage titles and key facts from those passages. While such models can lead to better interpretability and high quantitative scores, they often have difficulty accurately identifying the passages corresponding to key entities in the context, resulting in incorrect passage hops and a lack of faithfulness in the reasoning path. To address this, we propose a single-sequence prediction method over a local reasoning graph (\model)\footnote{Code/Models will be released at \url{https://github.com/gowtham1997/SeqGraph} that integrates a graph structure connecting key entities in each context passage to relevant subsequent passages for each question. We use a graph neural network to encode this graph structure and fuse the resulting representations into the entity representations of the model. Our experiments show significant improvements in answer exact-match/F1 scores and faithfulness of grounding in the reasoning path on the HotpotQA dataset and achieve state-of-the-art numbers on the Musique dataset with only up to a 4\% increase in model parameters.
    摘要 现代生成方法 для多步问答(QA)使用融合在解码器中的方法来生成一个单个序列输出,包括最终答案和到达该答案的思路,例如段落标题和关键信息从那些段落。 Although such models can lead to better interpretability and high quantitative scores, they often have difficulty accurately identifying the passages corresponding to key entities in the context, resulting in incorrect passage hops and a lack of faithfulness in the reasoning path. To address this, we propose a single-sequence prediction method over a local reasoning graph (\model) that integrates a graph structure connecting key entities in each context passage to relevant subsequent passages for each question. We use a graph neural network to encode this graph structure and fuse the resulting representations into the entity representations of the model. Our experiments show significant improvements in answer exact-match/F1 scores and faithfulness of grounding in the reasoning path on the HotpotQA dataset and achieve state-of-the-art numbers on the Musique dataset with only up to a 4% increase in model parameters.

Variation-aware Vision Transformer Quantization

  • paper_url: http://arxiv.org/abs/2307.00331
  • repo_url: https://github.com/huangowen/vvtq
  • paper_authors: Xijie Huang, Zhiqiang Shen, Kwang-Ting Cheng
  • for: 这篇论文的目的是提出一种基于知识传播的干扰量调整方法,以提高干扰量调整的稳定性和效率,并且减少干扰量调整的训练振荡。
  • methods: 本篇论文使用了量化检查和多腋检查的方法来探讨干扰量调整的影响,并且提出了一种基于知识传播的干扰量调整方法,包括多腋知识传播和模组对应的调整方法。
  • results: 本篇论文在ImageNet-1K上取得了2位数字的Swin-T模型,实现了77.66%的Top-1准确率,比前一代量化模型高3.35%。
    Abstract Despite the remarkable performance of Vision Transformers (ViTs) in various visual tasks, the expanding computation and model size of ViTs have increased the demand for improved efficiency during training and inference. To address the heavy computation and parameter drawbacks, quantization is frequently studied in the community as a representative model compression technique and has seen extensive use on CNNs. However, due to the unique properties of CNNs and ViTs, the quantization applications on ViTs are still limited and underexplored. In this paper, we identify the difficulty of ViT quantization on its unique variation behaviors, which differ from traditional CNN architectures. The variations indicate the magnitude of the parameter fluctuations and can also measure outlier conditions. Moreover, the variation behaviors reflect the various sensitivities to the quantization of each module. The quantization sensitivity analysis and comparison of ViTs with CNNs help us locate the underlying differences in variations. We also find that the variations in ViTs cause training oscillations, bringing instability during quantization-aware training (QAT). Correspondingly, we solve the variation problem with an efficient knowledge-distillation-based variation-aware quantization method. The multi-crop knowledge distillation scheme can accelerate and stabilize the training and alleviate the variation's influence during QAT. We also proposed a module-dependent quantization scheme and a variation-aware regularization term to suppress the oscillation of weights. On ImageNet-1K, we obtain a 77.66% Top-1 accuracy on the extremely low-bit scenario of 2-bit Swin-T, outperforming the previous state-of-the-art quantized model by 3.35%.
    摘要 尽管视觉转换器(ViT)在视觉任务中表现出色,但是它们的计算量和模型大小的增加使得训练和推理中的效率提升变得更加重要。为了解决计算量和参数占用的问题,量化被广泛研究并在 convolutional neural networks(CNN) 中广泛应用。然而,由于ViT的特殊性,量化在ViT上的应用仍然受限和未得到充分利用。在这篇论文中,我们发现ViT量化的困难在它的特殊变化行为上,这与传统的CNN架构不同。这些变化表示参数的振荡范围和可以测量异常情况。此外,这些变化行为反映每个模块的参数敏感度。我们通过对ViT和CNN进行量化敏感度分析和比较,了解它们之间的差异。我们还发现,在量化训练中,ViT中的变化会导致训练振荡,从而导致训练不稳定。为了解决这个问题,我们提出了一种高效的知识传递基于变化感知量化方法。我们采用多幅知识传递方案,以加速和稳定训练,并减轻变化的影响。此外,我们还提出了模块dependent的量化方案和变化感知 regularization 项,以抑制量化训练中的振荡。在 ImageNet-1K 上,我们在2位法(2-bit)的 Swin-T 上获得了77.66% 的 Top-1 准确率,比前一个状态的量化模型提高了3.35%。

FedCP: Separating Feature Information for Personalized Federated Learning via Conditional Policy

  • paper_url: http://arxiv.org/abs/2307.01217
  • repo_url: https://github.com/tsingz0/fedcp
  • paper_authors: Jianqing Zhang, Yang Hua, Hao Wang, Tao Song, Zhengui Xue, Ruhui Ma, Haibing Guan
  • for: 针对关于隐私保护、合作学习和处理客户端数据统计不均衡的问题,提出了个性化联合学习(pFL)方法。
  • methods: 我们提出了一种名为 Federated Conditional Policy(FedCP)的方法,它根据每个样本的特征进行 conditinal policy 生成,以分类 global information 和个性化信息。然后,它将 global head 和个性化 head 分别处理这些特征。
  • results: 实验结果显示,FedCP 在计算机视觉和自然语言处理领域的十一种现有方法中,具有最高的性能,高于最佳化的方法平均提高6.69%。此外,FedCP 在部分客户端意外drop out的情况下仍保持优良性能。
    Abstract Recently, personalized federated learning (pFL) has attracted increasing attention in privacy protection, collaborative learning, and tackling statistical heterogeneity among clients, e.g., hospitals, mobile smartphones, etc. Most existing pFL methods focus on exploiting the global information and personalized information in the client-level model parameters while neglecting that data is the source of these two kinds of information. To address this, we propose the Federated Conditional Policy (FedCP) method, which generates a conditional policy for each sample to separate the global information and personalized information in its features and then processes them by a global head and a personalized head, respectively. FedCP is more fine-grained to consider personalization in a sample-specific manner than existing pFL methods. Extensive experiments in computer vision and natural language processing domains show that FedCP outperforms eleven state-of-the-art methods by up to 6.69%. Furthermore, FedCP maintains its superiority when some clients accidentally drop out, which frequently happens in mobile settings. Our code is public at https://github.com/TsingZ0/FedCP.
    摘要 最近,个性化联合学习(pFL)已引起越来越多的关注,以保护隐私、合作学习和客户端数据的统计差异等方面。现有的大多数pFL方法都是利用客户端模型参数中的全球信息和个性化信息来增强模型性能的,而忽略了数据的来源。为了解决这个问题,我们提议了基于联合策略的个性化联合学习方法(FedCP),该方法可以为每个样本生成一个条件策略,以分离样本中的全球信息和个性化信息,然后由全球头和个性化头进行处理。FedCP比现有的pFL方法更加细化,可以在样本具体的方式上考虑个性化。在计算机视觉和自然语言处理领域进行了广泛的实验,我们发现FedCP可以与11种现有方法进行比较,其性能提高至多6.69%。此外,FedCP在手机设备上意外退出时仍然保持优势,这经常发生在手机设备上。我们的代码可以在https://github.com/TsingZ0/FedCP上获取。

DeepMediX: A Deep Learning-Driven Resource-Efficient Medical Diagnosis Across the Spectrum

  • paper_url: http://arxiv.org/abs/2307.00324
  • repo_url: None
  • paper_authors: Kishore Babu Nampalle, Pradeep Singh, Uppala Vivek Narayan, Balasubramanian Raman
  • for: 这篇论文目的是为了提出一个高精度、资源有效的医疗影像诊断模型,以应对医疗影像诊断中的 Computational Efficiency 挑战。
  • methods: 这篇论文使用了 MobileNetV2 架构,并将其扩展为 DeepMediX 模型,用于分类大脑 MRI 扫描和皮肤癌影像。它在 binary 和多类皮肤癌数据集上达到了Superior 性能。
  • results: DeepMediX 在 ISIC2018 等标准数据集上进行了严格的测试,结果显示它具有了 Exceptional 的诊断能力,与现有模型在大多数任务上几乎相同,甚至在一些情况下超越它们。
    Abstract In the rapidly evolving landscape of medical imaging diagnostics, achieving high accuracy while preserving computational efficiency remains a formidable challenge. This work presents \texttt{DeepMediX}, a groundbreaking, resource-efficient model that significantly addresses this challenge. Built on top of the MobileNetV2 architecture, DeepMediX excels in classifying brain MRI scans and skin cancer images, with superior performance demonstrated on both binary and multiclass skin cancer datasets. It provides a solution to labor-intensive manual processes, the need for large datasets, and complexities related to image properties. DeepMediX's design also includes the concept of Federated Learning, enabling a collaborative learning approach without compromising data privacy. This approach allows diverse healthcare institutions to benefit from shared learning experiences without the necessity of direct data access, enhancing the model's predictive power while preserving the privacy and integrity of sensitive patient data. Its low computational footprint makes DeepMediX suitable for deployment on handheld devices, offering potential for real-time diagnostic support. Through rigorous testing on standard datasets, including the ISIC2018 for dermatological research, DeepMediX demonstrates exceptional diagnostic capabilities, matching the performance of existing models on almost all tasks and even outperforming them in some cases. The findings of this study underline significant implications for the development and deployment of AI-based tools in medical imaging and their integration into point-of-care settings. The source code and models generated would be released at https://github.com/kishorebabun/DeepMediX.
    摘要 在医疗影像诊断领域中,具有高准确率并且节省计算资源的挑战仍然存在。这项工作提出了《DeepMediX》模型,它是一种创新的、资源高效的模型,可以很好地解决这一挑战。基于MobileNetV2架构,DeepMediX在脑磁共振成像和皮肤癌图像分类方面表现出色,在 binary 和多类皮肤癌数据集上显示出了superior的性能。它提供了一种解决劳动密集的手动过程、需要大量数据和图像特性复杂度的解决方案。此外,DeepMediX 还实现了联合学习的概念,即在不同医疗机构之间进行分布式学习,而不需要直接访问敏感 пациент数据,从而保持数据隐私和敏感度。这种设计使得DeepMediX 适用于手持设备上部署,并且可以在实时诊断支持中提供可 counted 的优势。经过对标准数据集的严格测试,包括ISIC2018 皮肤癌研究数据集,DeepMediX 表现出了极其出色的诊断能力,与现有模型在大多数任务上准确率相当,甚至在一些情况下超越它们。这些发现对医疗影像领域中AI模型的开发和部署产生了深刻的影响,同时也提供了实时诊断支持的可能性。源代码和生成的模型将在 上发布。

SHARCS: Shared Concept Space for Explainable Multimodal Learning

  • paper_url: http://arxiv.org/abs/2307.00316
  • repo_url: https://github.com/gabriele-dominici/SHARCS
  • paper_authors: Gabriele Dominici, Pietro Barbiero, Lucie Charlotte Magister, Pietro Liò, Nikola Simidjievski
  • for: 本研究旨在提供一种可解释的多模态学习方法,以解决复杂的实际世界问题,其中各个数据模式通常不够精准地解决给定的模型任务。
  • methods: 本研究使用了SHARCS(分享概念空间)方法,这是一种基于概念的新的多模态学习方法,可以学习和映射不同的各种多样化数据模式到一个共同的概念空间中,从而实现协同的概念映射。
  • results: 研究表明,SHARCS方法可以带来内在可解释的任务预测结果,同时也提高了下游预测性能。此外,SHARCS方法还可以在实际重要的场景中运行,如缺失模式的检索和跨模式解释。
    Abstract Multimodal learning is an essential paradigm for addressing complex real-world problems, where individual data modalities are typically insufficient to accurately solve a given modelling task. While various deep learning approaches have successfully addressed these challenges, their reasoning process is often opaque; limiting the capabilities for a principled explainable cross-modal analysis and any domain-expert intervention. In this paper, we introduce SHARCS (SHARed Concept Space) -- a novel concept-based approach for explainable multimodal learning. SHARCS learns and maps interpretable concepts from different heterogeneous modalities into a single unified concept-manifold, which leads to an intuitive projection of semantically similar cross-modal concepts. We demonstrate that such an approach can lead to inherently explainable task predictions while also improving downstream predictive performance. Moreover, we show that SHARCS can operate and significantly outperform other approaches in practically significant scenarios, such as retrieval of missing modalities and cross-modal explanations. Our approach is model-agnostic and easily applicable to different types (and number) of modalities, thus advancing the development of effective, interpretable, and trustworthy multimodal approaches.
    摘要 多模态学习是现实世界中解决复杂问题的重要思想,因为单个数据模式通常不够准确地解决给定的模型化任务。虽然各种深度学习方法已成功解决这些挑战,但它们的理解过程通常是不透明的,限制了跨模态分析的可解释性和领域专家的干预。在这篇论文中,我们介绍了SHARCS(共享概念空间)——一种新的概念基于方法,用于解释多模态学习。SHARCS学习和映射不同的多样化模式中的可解释概念,将它们映射到单一的共享概念空间中,从而实现跨模态semantic相似的概念投影。我们示示了这种方法可以导致自然的解释任务预测,同时提高下游预测性能。此外,我们还证明了SHARCS可以在实际重要的场景下运行,如缺失模式的检索和跨模态解释。我们的方法是模型免疫的,可以适用于不同类型和数量的模式,因此推进了有效、可解释、可信worthy的多模态方法的发展。

Gradients Look Alike: Sensitivity is Often Overestimated in DP-SGD

  • paper_url: http://arxiv.org/abs/2307.00310
  • repo_url: None
  • paper_authors: Anvith Thudi, Hengrui Jia, Casey Meehan, Ilia Shumailov, Nicolas Papernot
  • for: 本文研究了 differentially private stochastic gradient descent(DP-SGD)的隐私分析,以及在训练常用 benchmark 数据集上的模型获得的隐私泄露。
  • methods: 本文提出了一种新的 DP-SGD 隐私分析方法,基于模型更新的分布来考虑点的相似性。此外,本文还提出了一种新的组合定理,用于有效地使用这种新的每步分析来评估整个训练过程的隐私保护。
  • results: 本文的评估结果显示,这种新的 DP-SGD 隐私分析方法可以正确地显示 DP-SGD 在许多数据点上具有更好的隐私保护。具体来说,正确分类的点会获得更好的隐私保证。
    Abstract Differentially private stochastic gradient descent (DP-SGD) is the canonical algorithm for private deep learning. While it is known that its privacy analysis is tight in the worst-case, several empirical results suggest that when training on common benchmark datasets, the models obtained leak significantly less privacy for many datapoints. In this paper, we develop a new analysis for DP-SGD that captures the intuition that points with similar neighbors in the dataset enjoy better privacy than outliers. Formally, this is done by modifying the per-step privacy analysis of DP-SGD to introduce a dependence on the distribution of model updates computed from a training dataset. We further develop a new composition theorem to effectively use this new per-step analysis to reason about an entire training run. Put all together, our evaluation shows that this novel DP-SGD analysis allows us to now formally show that DP-SGD leaks significantly less privacy for many datapoints. In particular, we observe that correctly classified points obtain better privacy guarantees than misclassified points.
    摘要 diferencialmente privado elíptico gradient descent (DP-SGD) es el algoritmo canonical para aprendizaje profundo privado. Aunque se sabe que su análisis de privacidad es tight en el caso worst-case, varios resultados empíricos sugieren que al entrenar en conjuntos de datos comunes, las modelos obtenidos filtran significativamente menos privacidad para muchos puntos de datos. En este artículo, desarrollamos un nuevo análisis para DP-SGD que captura la intuición de que los puntos con vecinos similares en el conjunto de datos disfrutan de mejores garantías de privacidad que los puntos de datos aislados. Formalmente, esto se hace modificando el análisis de privacidad por step de DP-SGD para introducir una dependencia en la distribución de actualizaciones de modelo computadas desde un conjunto de entrenamiento. Además, desarrollamos un nuevo teorema de composición para utilizar este nuevo análisis de per-step de manera efectiva para razonar sobre una carrera completa de entrenamiento. Juntos, nuestra evaluación muestra que este análisis novel de DP-SGD nos permite formalmente demostrar que DP-SGD filtra significativamente menos privacidad para muchos puntos de datos. En particular, observamos que los puntos correctamente clasificados obtienen mejores garantías de privacidad que los puntos incorrectamente clasificados.

Adversarial Attacks and Defenses on 3D Point Cloud Classification: A Survey

  • paper_url: http://arxiv.org/abs/2307.00309
  • repo_url: None
  • paper_authors: Hanieh Naderi, Ivan V. Bajić
  • for: 本文概述了针对点云分类任务的 adversarial attack 和防御技术的当前进展。
  • methods: 本文首先介绍了针对点云分类任务的 adversarial attack 的原理和特点,然后总结了过去几年的 adversarial example 生成方法。此外,它还分类了防御策略为输入转换、数据优化和深度模型修改。
  • results: 本文结束时提出了一些未来研究方向和挑战。Translation:
  • for: This paper summarizes the current progress on adversarial attack and defense techniques for point cloud classification.
  • methods: The paper first introduces the principles and characteristics of adversarial attacks and summarizes and analyzes the adversarial example generation methods in recent years. It also classifies defense strategies as input transformation, data optimization, and deep model modification.
  • results: The paper concludes with several challenging issues and future research directions in this domain.
    Abstract Deep learning has successfully solved a wide range of tasks in 2D vision as a dominant AI technique. Recently, deep learning on 3D point clouds is becoming increasingly popular for addressing various tasks in this field. Despite remarkable achievements, deep learning algorithms are vulnerable to adversarial attacks. These attacks are imperceptible to the human eye but can easily fool deep neural networks in the testing and deployment stage. To encourage future research, this survey summarizes the current progress on adversarial attack and defense techniques on point cloud classification. This paper first introduces the principles and characteristics of adversarial attacks and summarizes and analyzes the adversarial example generation methods in recent years. Besides, it classifies defense strategies as input transformation, data optimization, and deep model modification. Finally, it presents several challenging issues and future research directions in this domain.
    摘要 深度学习在2D视觉领域已经成为主导AI技术,Recently, deep learning on 3D点云是 becoming increasingly popular for addressing various tasks in this field. Despite remarkable achievements, deep learning algorithms are vulnerable to adversarial attacks. These attacks are imperceptible to the human eye but can easily fool deep neural networks in the testing and deployment stage. To encourage future research, this survey summarizes the current progress on adversarial attack and defense techniques on point cloud classification. This paper first introduces the principles and characteristics of adversarial attacks and summarizes and analyzes the adversarial example generation methods in recent years. Besides, it classifies defense strategies as input transformation, data optimization, and deep model modification. Finally, it presents several challenging issues and future research directions in this domain.Here's the translation breakdown:* 深度学习 (shēn dào xué xí) - deep learning* 2D视觉 (èr dào zhì jué) - 2D vision* 3D点云 (sān dào diǎn yún) - 3D point clouds* 主导 (zhǔ dǎo) - dominant* 攻击 (dào jī) - attack* 隐蔽 (yǐn bì) - imperceptible* 深度神经网络 (shēn dào shēn jī wǎng luò) - deep neural networks* 测试 (cè shí) - testing* 部署 (bù zhī) - deployment* 攻击表现 (dào jī bǎo xiǎng) - adversarial attacks* 攻击示例 (dào jī shì yǐ) - adversarial examples* 生成 (shēng jì) - generate* 攻击方法 (dào jī fāng fa) - adversarial attack methods* 防御 (fáng yì) - defense* 输入变换 (liù chū biàn zhǎn) - input transformation* 数据优化 (shù jí yǎo jī) - data optimization* 深度模型修改 (shēn dào mó deli xiū gǎi) - deep model modification* 挑战 (tiǎo zhàn) - challenging* 未来研究 (wèi lái yán jí) - future researchNote that Simplified Chinese is used in this translation, which is the standard writing system used in mainland China.

SyMFM6D: Symmetry-aware Multi-directional Fusion for Multi-View 6D Object Pose Estimation

  • paper_url: http://arxiv.org/abs/2307.00306
  • repo_url: https://github.com/boschresearch/symfm6d
  • paper_authors: Fabian Duffhauss, Sebastian Koch, Hanna Ziesche, Ngo Anh Vien, Gerhard Neumann
  • for: 提高自动化系统与环境的交互安全性,需要检测对象和计算其6D姿态。
  • methods: 我们提出了一种新的对称意识多视角6D姿态估计器(SyMFM6D),使用深度多向合并网络将RGB-D帧集合多个视角进行有效融合,并同时预测场景中所有对象的预定关键点。基于关键点和实例semantic分割,我们高效计算6D姿态。
  • results: 我们的SyMFM6D网络在单视图和多视图6D姿态估计中都达到了现状顶峰性,并且我们提出了一种新的对称意识训练方法,以解决对称对象的歧义问题。此外,我们还证明了我们的方法对于不正确的摄像头准确和动态摄像头设置具有Robust性。
    Abstract Detecting objects and estimating their 6D poses is essential for automated systems to interact safely with the environment. Most 6D pose estimators, however, rely on a single camera frame and suffer from occlusions and ambiguities due to object symmetries. We overcome this issue by presenting a novel symmetry-aware multi-view 6D pose estimator called SyMFM6D. Our approach efficiently fuses the RGB-D frames from multiple perspectives in a deep multi-directional fusion network and predicts predefined keypoints for all objects in the scene simultaneously. Based on the keypoints and an instance semantic segmentation, we efficiently compute the 6D poses by least-squares fitting. To address the ambiguity issues for symmetric objects, we propose a novel training procedure for symmetry-aware keypoint detection including a new objective function. Our SyMFM6D network significantly outperforms the state-of-the-art in both single-view and multi-view 6D pose estimation. We furthermore show the effectiveness of our symmetry-aware training procedure and demonstrate that our approach is robust towards inaccurate camera calibration and dynamic camera setups.
    摘要 检测对象和估计其6D姿态是自动化系统与环境交互的关键。大多数6D姿态估计器,然而,依赖单个相机框架,受到 occlusion 和对象相似性的影响。我们解决这个问题,通过提出一种新的对称意识多视图6D姿态估计器,称为 SyMFM6D。我们的方法能够有效地将多个视角的 RGB-D 帧在深度多向量融合网络中进行集成,并同时预测场景中所有对象的预定的关键点。基于关键点和实例Semantic分割,我们高效地计算出6D姿态。为了解决对称对象的含糊问题,我们提出了一种新的训练程序,包括一个新的目标函数。我们的 SyMFM6D 网络在单视图和多视图6D姿态估计中具有显著的优势,并且我们进一步证明了我们的对称意识训练程序的有效性。此外,我们还证明了我们的方法对不准确的相机准备和动态相机设置是Robust。

Applied Bayesian Structural Health Monitoring: inclinometer data anomaly detection and forecasting

  • paper_url: http://arxiv.org/abs/2307.00305
  • repo_url: None
  • paper_authors: David K. E. Green, Adam Jaspan
  • for: 这个论文旨在应用某种抽象技术来处理实际情况下的倾斜仪数据,以实现异常检测和预测。
  • methods: 这篇论文使用了 Bayesian 技术来处理倾斜仪数据,包括异常检测和预测。
  • results: 研究人员成功应用了这些技术来处理大量实际数据,并获得了有用的结果。这些技术可以广泛应用于工程 UQ 和结构健康监测(SHM)领域。
    Abstract Inclinometer probes are devices that can be used to measure deformations within earthwork slopes. This paper demonstrates a novel application of Bayesian techniques to real-world inclinometer data, providing both anomaly detection and forecasting. Specifically, this paper details an analysis of data collected from inclinometer data across the entire UK rail network. Practitioners have effectively two goals when processing monitoring data. The first is to identify any anomalous or dangerous movements, and the second is to predict potential future adverse scenarios by forecasting. In this paper we apply Uncertainty Quantification (UQ) techniques by implementing a Bayesian approach to anomaly detection and forecasting for inclinometer data. Subsequently, both costs and risks may be minimised by quantifying and evaluating the appropriate uncertainties. This framework may then act as an enabler for enhanced decision making and risk analysis. We show that inclinometer data can be described by a latent autocorrelated Markov process derived from measurements. This can be used as the transition model of a non-linear Bayesian filter. This allows for the prediction of system states. This learnt latent model also allows for the detection of anomalies: observations that are far from their expected value may be considered to have `high surprisal', that is they have a high information content relative to the model encoding represented by the learnt latent model. We successfully apply the forecasting and anomaly detection techniques to a large real-world data set in a computationally efficient manner. Although this paper studies inclinometers in particular, the techniques are broadly applicable to all areas of engineering UQ and Structural Health Monitoring (SHM).
    摘要 倾斜仪是一种可以用来测量地形坡度中的弯曲变化的设备。本文介绍了一种使用 bayesian 技术对实际倾斜仪数据进行分析,以实现异常检测和预测。 Specifically, this paper details an analysis of data collected from inclinometer data across the entire UK rail network. 实践人员在监测数据处理中有两个目标:首先是 Identify any anomalous or dangerous movements, and the second is to predict potential future adverse scenarios by forecasting. In this paper, we apply Uncertainty Quantification (UQ) techniques by implementing a Bayesian approach to anomaly detection and forecasting for inclinometer data. By quantifying and evaluating the appropriate uncertainties, both costs and risks may be minimized. This framework may then act as an enabler for enhanced decision making and risk analysis.我们发现倾斜仪数据可以被描述为一个 latent autocorrelated Markov process,这可以作为系统状态预测的过程模型。这种learnt latent model也allow for the detection of anomalies:observations that are far from their expected value may be considered to have `high surprisal', that is they have a high information content relative to the model encoding represented by the learnt latent model.我们成功地应用预测和异常检测技术于大规模的实际数据集,并在计算效率上达到了可行的水平。 although this paper studies inclinometers in particular, the techniques are broadly applicable to all areas of engineering UQ and Structural Health Monitoring (SHM).

Accelerated primal-dual methods with enlarged step sizes and operator learning for nonsmooth optimal control problems

  • paper_url: http://arxiv.org/abs/2307.00296
  • repo_url: None
  • paper_authors: Yongcun Song, Xiaoming Yuan, Hangrui Yue
  • for: 该论文关注的问题是非满射最优控制问题,具有部分微分方程约束,这类问题具有非满射目标函数和高维度的计算问题。
  • methods: 该论文提出了一种基于 primal-dual 方法的解决方案,可以在每个迭代中只需解决两个微分方程。另外, authors 还提出了两种加速方法:一种是通过增大步长来加速 primal-dual 方法,另一种是通过学习操作符来加速 solve PDE 的计算。
  • results: 该论文通过验证了加速 primal-dual 方法的有效性,并且通过构建深度神经网络模型来缩减 solve PDE 的计算成本。该方法可以快速、精度高地解决各种 PDE 问题,并且可扩展到不同类型的 PDE 问题。
    Abstract We consider a general class of nonsmooth optimal control problems with partial differential equation (PDE) constraints, which are very challenging due to its nonsmooth objective functionals and the resulting high-dimensional and ill-conditioned systems after discretization. We focus on the application of a primal-dual method, with which different types of variables can be treated individually and thus its main computation at each iteration only requires solving two PDEs. Our target is to accelerate the primal-dual method with either larger step sizes or operator learning techniques. For the accelerated primal-dual method with larger step sizes, its convergence can be still proved rigorously while it numerically accelerates the original primal-dual method in a simple and universal way. For the operator learning acceleration, we construct deep neural network surrogate models for the involved PDEs. Once a neural operator is learned, solving a PDE requires only a forward pass of the neural network, and the computational cost is thus substantially reduced. The accelerated primal-dual method with operator learning is mesh-free, numerically efficient, and scalable to different types of PDEs. The acceleration effectiveness of these two techniques is promisingly validated by some preliminary numerical results.
    摘要 我们考虑一个总体上的非短小最优控制问题,其具有部分泛函数方程(PDE)约束,这些问题非常具有挑战性,因为目标函数不均匀和约束后的系统尺度高、不充分条件。我们专注于使用主要-副 PRIMARY-DUAL 方法,它可以在每次迭代中只需解两个PDE来进行主要计算。我们的目标是加速主要-副 PRIMARY-DUAL 方法,可以通过更大的步长或者运算学习技术来实现。对于加速主要-副 PRIMARY-DUAL 方法的更大步长,我们可以正式证明其 converge,而且在数值上加速原始主要-副 PRIMARY-DUAL 方法。对于运算学习加速,我们构建了深度神经网络替代模型,可以在解PDE时只需进行神经网络的前进 pass,因此计算成本得到了显著减少。加速主要-副 PRIMARY-DUAL 方法的操作学习方法是无网络、数值高效和可扩展的。我们的实验结果表明,这两种加速技术的效果非常演示。

AutoST: Training-free Neural Architecture Search for Spiking Transformers

  • paper_url: http://arxiv.org/abs/2307.00293
  • repo_url: None
  • paper_authors: Ziqing Wang, Qidong Zhao, Jinku Cui, Xu Liu, Dongkuan Xu
  • for: 这个论文的目的是为了快速找到高性能且能耗低的神经网络架构(Spiking Transformer)。
  • methods: 这篇论文使用了一种名为AutoST的训练 свобо Nas方法,以快速找到高性能的Spiking Transformer架构。这种方法利用了Float Point Operations(FLOPs)作为性能指标,而不是传统的训练方法,这使得它能够更好地捕捉Spiking Neural Networks(SNNs)的性能特点。此外,这种方法还利用了初始化时的活动模式来估算Spiking Transformer的能 consumption。
  • results: 这篇论文的实验结果表明,AutoST模型比现有的手动或自动设计的SNN架构更高性能,而且能够减少能 consumption。
    Abstract Spiking Transformers have gained considerable attention because they achieve both the energy efficiency of Spiking Neural Networks (SNNs) and the high capacity of Transformers. However, the existing Spiking Transformer architectures, derived from ANNs, exhibit a notable architectural gap, resulting in suboptimal performance compared to their ANN counterparts. Traditional approaches to discovering optimal architectures primarily rely on either manual procedures, which are time-consuming, or Neural Architecture Search (NAS) methods, which are usually expensive in terms of memory footprints and computation time. To address these limitations, we introduce AutoST, a training-free NAS method for Spiking Transformers, to rapidly identify high-performance and energy-efficient Spiking Transformer architectures. Unlike existing training-free NAS methods, which struggle with the non-differentiability and high sparsity inherent in SNNs, we propose to utilize Floating-Point Operations (FLOPs) as a performance metric, which is independent of model computations and training dynamics, leading to a stronger correlation with performance. Moreover, to enable the search for energy-efficient architectures, we leverage activation patterns during initialization to estimate the energy consumption of Spiking Transformers. Our extensive experiments show that AutoST models outperform state-of-the-art manually or automatically designed SNN architectures on static and neuromorphic datasets, while significantly reducing energy consumption.
    摘要 斯坦卷积 трансформа器(Spiking Transformer)在过去几年内已经受到了广泛关注,因为它们可以同时实现神经网络(SNN)的能量效率和转换器的高容量。然而,现有的斯坦卷积架构,基于人工神经网络(ANN),表现出一定的建筑空隙,导致其性能落后于其ANN对应器。传统的最佳架构发现方法主要靠 manual procedure,这是时间consuming的,或者使用神经网络搜索(NAS)方法,这些方法通常具有大量的内存占用和计算时间。为了解决这些限制,我们介绍AutoST,一种不需要训练的NAS方法,可以快速地找到高性能和能效的斯坦卷积架构。与现有的训练free NAS方法不同,我们使用 floating-point operations(FLOPs)作为性能指标,这是与模型计算和训练动态无关的,从而具有更强的相关性。此外,为了搜索能效的架构,我们在初始化时利用活动模式来估计斯坦卷积的能 consumption。我们的广泛实验表明,AutoST模型在静止和 neuromorphic 数据集上都高于状态当前的手动或自动设计的SNN架构,而同时减少了能 consumption。

All-in-SAM: from Weak Annotation to Pixel-wise Nuclei Segmentation with Prompt-based Finetuning

  • paper_url: http://arxiv.org/abs/2307.00290
  • repo_url: None
  • paper_authors: Can Cui, Ruining Deng, Quan Liu, Tianyuan Yao, Shunxing Bao, Lucas W. Remedios, Yucheng Tang, Yuankai Huo
  • for: 这 paper 的目的是提出一种不需要在推理阶段提供手动提示的全自动分割模型(all-in-SAM),用于从标注生成到模型精度调整的整个人工智能开发工程中。
  • methods: 这 paper 使用了 SAM 模型,首先使用 SAM 生成点或 bounding box 等弱提示来生成像素级别的标注,然后使用这些标注来精度调整 SAM 分割模型。
  • results: 实验结果显示,提posed 管道在 Monuseg 数据集上的核体分割任务中超过了现有的 SOTA 方法,而使用弱和少的标注进行 SAM 精度调整也可以达到与使用强像素级别标注数据的性能。
    Abstract The Segment Anything Model (SAM) is a recently proposed prompt-based segmentation model in a generic zero-shot segmentation approach. With the zero-shot segmentation capacity, SAM achieved impressive flexibility and precision on various segmentation tasks. However, the current pipeline requires manual prompts during the inference stage, which is still resource intensive for biomedical image segmentation. In this paper, instead of using prompts during the inference stage, we introduce a pipeline that utilizes the SAM, called all-in-SAM, through the entire AI development workflow (from annotation generation to model finetuning) without requiring manual prompts during the inference stage. Specifically, SAM is first employed to generate pixel-level annotations from weak prompts (e.g., points, bounding box). Then, the pixel-level annotations are used to finetune the SAM segmentation model rather than training from scratch. Our experimental results reveal two key findings: 1) the proposed pipeline surpasses the state-of-the-art (SOTA) methods in a nuclei segmentation task on the public Monuseg dataset, and 2) the utilization of weak and few annotations for SAM finetuning achieves competitive performance compared to using strong pixel-wise annotated data.
    摘要 Segment Anything Model (SAM) 是一种最近提出的批量基于描述符的分割模型,可以在无预期分割任务中实现出色的灵活性和精度。然而,现有的管道仍然需要在推理阶段手动提供描述符,这对生物医学图像分割来说仍然是资源浪费。在本文中,我们提出一种不使用推理阶段手动提供描述符的管道,通过整个人工智能开发工程(从描述符生成到模型微调)使用 SAM。具体来说,SAM 首先使用弱描述符(例如点、 bounding box)生成像素级别的描述符,然后使用这些描述符微调 SAM 分割模型而不需要从scratch 重新训练。我们的实验结果显示了两个关键发现:1)我们的管道超过了目前最佳方法(SOTA)在公共 Monuseg 数据集上的核体 segmentation 任务中的性能,2)使用弱和少量的描述符进行 SAM 微调可以与使用强像素级别的 annotated 数据实现竞争性的性能。

CMA-ES for Post Hoc Ensembling in AutoML: A Great Success and Salvageable Failure

  • paper_url: http://arxiv.org/abs/2307.00286
  • repo_url: https://github.com/LennartPurucker/CMA-ES-PostHocEnsemblingAutoML
  • paper_authors: Lennart Purucker, Joeran Beel
  • For: The paper aims to analyze the performance of covariance matrix adaptation evolution strategy (CMA-ES) and greedy ensemble selection (GES) in automated machine learning (AutoML) systems, and to explore methods to stop CMA-ES from overfitting for ROC AUC.* Methods: The paper compares the performance of CMA-ES and GES on 71 classification datasets from the AutoML benchmark for AutoGluon, and proposes a method to normalize the weights produced by CMA-ES to avoid overfitting and improve performance for ROC AUC.* Results: The paper finds that CMA-ES overfits drastically and is outperformed by GES for the metric ROC AUC, but does not overfit and outperforms GES for the metric balanced accuracy. The proposed method to normalize the weights produced by CMA-ES improves its performance for ROC AUC and makes it perform better than or similar to GES.
    Abstract Many state-of-the-art automated machine learning (AutoML) systems use greedy ensemble selection (GES) by Caruana et al. (2004) to ensemble models found during model selection post hoc. Thereby, boosting predictive performance and likely following Auto-Sklearn 1's insight that alternatives, like stacking or gradient-free numerical optimization, overfit. Overfitting in Auto-Sklearn 1 is much more likely than in other AutoML systems because it uses only low-quality validation data for post hoc ensembling. Therefore, we were motivated to analyze whether Auto-Sklearn 1's insight holds true for systems with higher-quality validation data. Consequently, we compared the performance of covariance matrix adaptation evolution strategy (CMA-ES), state-of-the-art gradient-free numerical optimization, to GES on the 71 classification datasets from the AutoML benchmark for AutoGluon. We found that Auto-Sklearn's insight depends on the chosen metric. For the metric ROC AUC, CMA-ES overfits drastically and is outperformed by GES -- statistically significantly for multi-class classification. For the metric balanced accuracy, CMA-ES does not overfit and outperforms GES significantly. Motivated by the successful application of CMA-ES for balanced accuracy, we explored methods to stop CMA-ES from overfitting for ROC AUC. We propose a method to normalize the weights produced by CMA-ES, inspired by GES, that avoids overfitting for CMA-ES and makes CMA-ES perform better than or similar to GES for ROC AUC.
    摘要 许多现代自动机器学习(AutoML)系统使用Caruana et al. (2004)提出的滥贪集成选择(GES)来ensemble模型在模型选择后期。这有助于提高预测性能,并可能是Auto-Sklearn 1中提出的一种准则,即使用低质量验证数据进行后期集成会导致过拟合。因此,我们被激励分析Auto-Sklearn 1中的准则是否适用于高质量验证数据。我们在AutoML benchmark中使用71个分类数据集进行比较,发现Auto-Sklearn的准则与选择的度量相关。对于ROC AUC度量,CMA-ES会过度拟合,而GES在多类分类问题上 statistically significant 地超过CMA-ES。对于balanced accuracy度量,CMA-ES不会过度拟合,并在多类分类问题上 statistically significant 地超过GES。受到成功应用CMA-ES的启示,我们探讨了如何使CMA-ES在ROC AUC度量上避免过度拟合,并提出了一种normalize CMA-ES生成的权重的方法,以避免CMA-ES过度拟合并使其在ROC AUC度量上表现更好或类似于GES。

Assembled-OpenML: Creating Efficient Benchmarks for Ensembles in AutoML with OpenML

  • paper_url: http://arxiv.org/abs/2307.00285
  • repo_url: https://github.com/isg-siegen/assembled
  • paper_authors: Lennart Purucker, Joeran Beel
  • for: 这篇论文的目的是提出一种名为Assembled-OpenML的Python工具,用于 comparing 不同的集成技术,以选择适合自动机器学习(AutoML)框架的技术。
  • methods: 这篇论文使用了OpenML的数据集和预测数据来构建集成的meta-dataset,并使用这些数据来比较不同的集成技术。
  • results: 在这篇论文中,作者使用了Assembled-OpenML工具来对一组集成技术进行比较,并在31个数据集上收集了1523个基本模型的预测数据。 获取所有基本模型的预测数据使用Assembled-OpenML工具只需要大约1小时,而在最复杂的数据集上只需要训练和评估一个基本模型需要大约37分钟。
    Abstract Automated Machine Learning (AutoML) frameworks regularly use ensembles. Developers need to compare different ensemble techniques to select appropriate techniques for an AutoML framework from the many potential techniques. So far, the comparison of ensemble techniques is often computationally expensive, because many base models must be trained and evaluated one or multiple times. Therefore, we present Assembled-OpenML. Assembled-OpenML is a Python tool, which builds meta-datasets for ensembles using OpenML. A meta-dataset, called Metatask, consists of the data of an OpenML task, the task's dataset, and prediction data from model evaluations for the task. We can make the comparison of ensemble techniques computationally cheaper by using the predictions stored in a metatask instead of training and evaluating base models. To introduce Assembled-OpenML, we describe the first version of our tool. Moreover, we present an example of using Assembled-OpenML to compare a set of ensemble techniques. For this example comparison, we built a benchmark using Assembled-OpenML and implemented ensemble techniques expecting predictions instead of base models as input. In our example comparison, we gathered the prediction data of $1523$ base models for $31$ datasets. Obtaining the prediction data for all base models using Assembled-OpenML took ${\sim} 1$ hour in total. In comparison, obtaining the prediction data by training and evaluating just one base model on the most computationally expensive dataset took ${\sim} 37$ minutes.
    摘要 自动化机器学习(AutoML)框架常用集成技术。开发者需要比较不同的集成技术,从多种可能的技术中选择适合 AutoML 框架。目前,对集成技术的比较经常是计算昂贵的,因为需要训练和评估多个基本模型。因此,我们介绍了 Assembled-OpenML。Assembled-OpenML 是一个 Python 工具,可以使用 OpenML 构建集成数据集(meta-dataset)。一个 meta-dataset 包括 OpenML 任务的数据、任务的数据集和模型评估中的预测数据。我们可以使用预测数据来减轻对集成技术的计算成本,而不需要训练和评估基本模型。为了介绍 Assembled-OpenML,我们描述了我们的首个工具版本。此外,我们还提供了使用 Assembled-OpenML 比较集成技术的示例。在这个示例中,我们使用 Assembled-OpenML 构建了一个 Refer 和一个 benchmark。我们在 benchmark 中使用 Assembled-OpenML 获取了 $1523$ 个基本模型的预测数据,并对 $31$ 个数据集进行了比较。在 total 的一小时内,我们获取了所有基本模型的预测数据。相比之下,只使用一个基本模型在最 computationally Expensive 的数据集上训练和评估,需要 ${\sim} 37$ 分钟。

SysNoise: Exploring and Benchmarking Training-Deployment System Inconsistency

  • paper_url: http://arxiv.org/abs/2307.00280
  • repo_url: None
  • paper_authors: Yan Wang, Yuhang Li, Ruihao Gong, Aishan Liu, Yanfei Wang, Jian Hu, Yongqiang Yao, Yunchen Zhang, Tianzi Xiao, Fengwei Yu, Xianglong Liu
  • for: 这个论文旨在探讨深度学习模型在不同系统实现中的Robustness问题。
  • methods: 作者们首次引入了一种通常被忽略的噪声——SysNoise,它会在训练和部署过程中出现,并对深度学习模型的Robustness产生影响。作者们分类了SysNoise为三类,并建立了一个总面测试来评估20多种模型在不同任务中对SysNoise的抗性。
  • results: 实验结果表明,SysNoise会对不同任务中的模型抗性产生影响,而常见的数据增强和对抗训练等方法对其影响有限。这些结果开启了一个新的研究领域,并希望这项工作能够吸引更多关注深度学习部署系统中模型性能的问题。研究者们已经在https://modeltc.github.io/systemnoise_web上公开了benchmark和框架。
    Abstract Extensive studies have shown that deep learning models are vulnerable to adversarial and natural noises, yet little is known about model robustness on noises caused by different system implementations. In this paper, we for the first time introduce SysNoise, a frequently occurred but often overlooked noise in the deep learning training-deployment cycle. In particular, SysNoise happens when the source training system switches to a disparate target system in deployments, where various tiny system mismatch adds up to a non-negligible difference. We first identify and classify SysNoise into three categories based on the inference stage; we then build a holistic benchmark to quantitatively measure the impact of SysNoise on 20+ models, comprehending image classification, object detection, instance segmentation and natural language processing tasks. Our extensive experiments revealed that SysNoise could bring certain impacts on model robustness across different tasks and common mitigations like data augmentation and adversarial training show limited effects on it. Together, our findings open a new research topic and we hope this work will raise research attention to deep learning deployment systems accounting for model performance. We have open-sourced the benchmark and framework at https://modeltc.github.io/systemnoise_web.
    摘要 广泛的研究表明深度学习模型对阈值和自然噪声极敏感,然而对不同系统实施中的噪声知之少。在这篇论文中,我们首次引入了系统噪声(SysNoise),这是在深度学习训练部署过程中频繁出现, yet often overlooked的噪声。具体来说,SysNoise发生在训练系统转换到不同目标系统时,这些系统之间存在微小的差异,这些差异总体而言是可观的。我们首先将SysNoise分类为三类基于推理阶段,然后建立了一个涵盖多个任务的总体测试环境,以量化SysNoise对20多种模型的影响。我们的广泛实验发现,SysNoise会对不同任务的模型性能产生一定的影响,而常见的防御措施如数据扩展和对抗训练显示有限的效果。共同来说,我们的发现开启了一个新的研究领域,我们希望这项工作会吸引研究人员对深度学习部署系统的性能考虑。我们已经在https://modeltc.github.io/systemnoise_web上公开了benchmark和框架。

Common Knowledge Learning for Generating Transferable Adversarial Examples

  • paper_url: http://arxiv.org/abs/2307.00274
  • repo_url: None
  • paper_authors: Ruijie Yang, Yuanfang Guo, Junfu Wang, Jiantao Zhou, Yunhong Wang
  • for: 本文研究了一种黑盒攻击方法,即将恶意例子生成于一个卷积网络模型(源模型),然后使用这些恶意例子攻击一个未知的目标模型。现有方法往往在不同类型的DNN结构(例如ResNet-18和Swin Transformer)之间存在不满意的攻击跨性。
  • methods: 我们提出了一种共同知识学习(CKL)框架,用于学习更好的网络参数,以生成更好的攻击 transferred。具体来说,我们构建了多教师框架,其中知识从不同的教师模型中提取到一个学生网络中。我们还增加了约束条件,以降低模型特定的特征和提高攻击 transferred。
  • results: 我们的提议可以明显提高攻击 transferred。广泛的实验表明,我们的CKL框架可以更好地利用现有的DNN模型,并提高攻击 transferred的效果。
    Abstract This paper focuses on an important type of black-box attacks, i.e., transfer-based adversarial attacks, where the adversary generates adversarial examples by a substitute (source) model and utilize them to attack an unseen target model, without knowing its information. Existing methods tend to give unsatisfactory adversarial transferability when the source and target models are from different types of DNN architectures (e.g. ResNet-18 and Swin Transformer). In this paper, we observe that the above phenomenon is induced by the output inconsistency problem. To alleviate this problem while effectively utilizing the existing DNN models, we propose a common knowledge learning (CKL) framework to learn better network weights to generate adversarial examples with better transferability, under fixed network architectures. Specifically, to reduce the model-specific features and obtain better output distributions, we construct a multi-teacher framework, where the knowledge is distilled from different teacher architectures into one student network. By considering that the gradient of input is usually utilized to generated adversarial examples, we impose constraints on the gradients between the student and teacher models, to further alleviate the output inconsistency problem and enhance the adversarial transferability. Extensive experiments demonstrate that our proposed work can significantly improve the adversarial transferability.
    摘要 Translated into Simplified Chinese:这篇论文关注一种重要的黑盒攻击方法,即传输基于敌意攻击,其中敌对者通过一个代理(源)模型生成攻击示例,然后使用这些示例攻击未知的目标模型。现有方法通常对不同类型的深度神经网络(DNN)模型的攻击性不满意。在这篇论文中,我们发现这种情况是由输出不一致问题引起的。为了解决这个问题并有效利用现有的DNN模型,我们提议一种通用知识学习(CKL)框架,用于学习更好的网络权重,以生成更好的攻击示例,并且具有更好的传输性。具体来说,我们构建了多教导框架,其中知识来自不同的教导模型被塑造成一个学生网络。由于通常利用输入的梯度来生成攻击示例,我们对学生和教导模型之间的梯度做出限制,以进一步缓解输出不一致问题,并提高攻击传输性。广泛的实验表明,我们的提议可以显著提高攻击传输性。

Hiding in Plain Sight: Differential Privacy Noise Exploitation for Evasion-resilient Localized Poisoning Attacks in Multiagent Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2307.00268
  • repo_url: None
  • paper_authors: Md Tamjid Hossain, Hung La
  • for: 这个论文是为了探讨 differential privacy (DP) 在 cooperative multiagent reinforcement learning (CMARL) 中保护代理人的隐私问题而写的。
  • methods: 这篇论文使用了 differential privacy 机制,并提出了一种基于这些机制的本地损害攻击(PeLPA),以便在知识共享时防止对代理人的攻击。
  • results: 研究发现,在不同的环境中,PeLPA 攻击可以增加 CMARL 模型的平均步长,并且可以减少 CMARL 模型的优化奖励和速度。在一个中等规模的环境中,PeLPA 攻击可以导致 CMARL 模型的平均步长增加50.69%和64.41%。
    Abstract Lately, differential privacy (DP) has been introduced in cooperative multiagent reinforcement learning (CMARL) to safeguard the agents' privacy against adversarial inference during knowledge sharing. Nevertheless, we argue that the noise introduced by DP mechanisms may inadvertently give rise to a novel poisoning threat, specifically in the context of private knowledge sharing during CMARL, which remains unexplored in the literature. To address this shortcoming, we present an adaptive, privacy-exploiting, and evasion-resilient localized poisoning attack (PeLPA) that capitalizes on the inherent DP-noise to circumvent anomaly detection systems and hinder the optimal convergence of the CMARL model. We rigorously evaluate our proposed PeLPA attack in diverse environments, encompassing both non-adversarial and multiple-adversarial contexts. Our findings reveal that, in a medium-scale environment, the PeLPA attack with attacker ratios of 20% and 40% can lead to an increase in average steps to goal by 50.69% and 64.41%, respectively. Furthermore, under similar conditions, PeLPA can result in a 1.4x and 1.6x computational time increase in optimal reward attainment and a 1.18x and 1.38x slower convergence for attacker ratios of 20% and 40%, respectively.
    摘要 近些时间,演变式隐私(DP)在合作多代理游戏学习(CMARL)中被引入,以保护代理的隐私免受敌意推理中的攻击。然而,我们认为DP机制引入的噪声可能会不知不觉地导致一种新的毒素威胁,具体来说是在CMARL中私人知识共享时期,这一点在文献中尚未得到探讨。为解决这一缺点,我们提出了一种适应、隐私滥用和逃避抗击的本地化毒素攻击(PeLPA),利用DP噪声来绕过异常检测系统,阻碍CMARL模型的优化征化。我们仔细测试了我们提出的PeLPA攻击在多种环境中,包括不良环境和多个敌对环境。我们的结果表明,在中型环境下,PeLPA攻击的20%和40%攻击者比率可以导致平均步骤数增加50.69%和64.41%,分别。此外,在同样的条件下,PeLPA攻击可以导致优化奖励获得的计算时间增加1.4倍和1.6倍,以及优化征化 slower convergence的1.18倍和1.38倍。

An ML approach to resolution of singularities

  • paper_url: http://arxiv.org/abs/2307.00252
  • repo_url: None
  • paper_authors: Gergely Bérczi, Honglu Fan, Mingcong Zeng
    for: 这个论文主要关注的是如何使用机器学习算法来解决 polynomial equations 中的精度问题。methods: 该论文使用了 reinforcement learning agents来找到最佳的解决方案。results: 在某些领域,论文中的模型在total number of polynomial additions performed方面表现出了提高,这提供了一个Proof-of-concept,表明现代机器学习技术在符号计算中的性能可以进行改进。
    Abstract The solution set of a system of polynomial equations typically contains ill-behaved, singular points. Resolution is a fundamental process in geometry in which we replace singular points with smooth points, while keeping the rest of the solution set unchanged. Resolutions are not unique: the usual way to describe them involves repeatedly performing a fundamental operation known as "blowing-up", and the complexity of the resolution highly depends on certain choices. The process can be translated into various versions of a 2-player game, the so-called Hironaka game, and a winning strategy for the first player provides a solution to the resolution problem. In this paper we introduce a new approach to the Hironaka game that uses reinforcement learning agents to find optimal resolutions of singularities. In certain domains, the trained model outperforms state-of-the-art selection heuristics in total number of polynomial additions performed, which provides a proof-of-concept that recent developments in machine learning have the potential to improve performance of algorithms in symbolic computation.
    摘要 系统的多项式方程集解通常包含恶 behave的点。解析是几何基本过程,我们将这些点 replaced with smooth points,保持解集不变。解析不唯一,通常通过 repeatedly performing a fundamental operation known as " blowing-up" 来描述它们。这个过程可以被翻译为各种版本的2个玩家游戏,即希罗纳卡游戏,并且一个赢家的策略可以提供解决方案。在这篇论文中,我们介绍了一种使用强化学习代理来找到最优的解决方案。在某些领域,训练模型在总数量方程添加的环节上表现出色,超过了现有的选择规则,这提供了一个证明,表明最近的机器学习发展有助于改善符号计算中的算法性能。

Safe Screening for Unbalanced Optimal Transport

  • paper_url: http://arxiv.org/abs/2307.00247
  • repo_url: None
  • paper_authors: Xun Su, Zhongxi Fang, Hiroyuki Kasai
  • for: 本研究旨在加速优化非平衡优Transport(UOT)问题的优化过程,通过积极地标识并消除稀疏解的零元素。
  • methods: 本研究使用Safe Screening技术,并提出了一种新的近似投影、椭球安全区建构和两平面放松方法,以提高屏选效率而无需增加算法复杂度。
  • results: 研究表明,通过应用Safe Screening技术,可以有效地加速UOT问题的优化过程,而无需改变算法的复杂度。
    Abstract This paper introduces a framework that utilizes the Safe Screening technique to accelerate the optimization process of the Unbalanced Optimal Transport (UOT) problem by proactively identifying and eliminating zero elements in the sparse solutions. We demonstrate the feasibility of applying Safe Screening to the UOT problem with $\ell_2$-penalty and KL-penalty by conducting an analysis of the solution's bounds and considering the local strong convexity of the dual problem. Considering the specific structural characteristics of the UOT in comparison to general Lasso problems on the index matrix, we specifically propose a novel approximate projection, an elliptical safe region construction, and a two-hyperplane relaxation method. These enhancements significantly improve the screening efficiency for the UOT's without altering the algorithm's complexity.
    摘要

On a Relation Between the Rate-Distortion Function and Optimal Transport

  • paper_url: http://arxiv.org/abs/2307.00246
  • repo_url: None
  • paper_authors: Eric Lei, Hamed Hassani, Shirin Saeedi Bidokhti
  • for: 这篇论文探讨了Rate-Distortion和最佳运输(OT)理论之间的关系,尤其是将一种基于极限Entropic OT距离的函数与Rate-Distortion函数相等。
  • methods: 论文使用了数值验证这个结果,以及之前已知的Monge和Kantorovich问题与最佳整数化器相关的结果。
  • results: 论文将Rate-Distortion和整数化器的解决方案统一到了一起,使用它们的最佳运输解决方案。
    Abstract We discuss a relationship between rate-distortion and optimal transport (OT) theory, even though they seem to be unrelated at first glance. In particular, we show that a function defined via an extremal entropic OT distance is equivalent to the rate-distortion function. We numerically verify this result as well as previous results that connect the Monge and Kantorovich problems to optimal scalar quantization. Thus, we unify solving scalar quantization and rate-distortion functions in an alternative fashion by using their respective optimal transport solvers.
    摘要 我们讨论了率复运输(rate-distortion)和最佳运输(optimal transport,OT)理论之间的关系,虽然他们在初见之下似乎无关。具体来说,我们证明了一个基于极大熵运输距离的函数与率复运输函数相等。我们也 numerically 验证了这个结果,以及之前已知的联结卢曼和乔托罗维茨问题与优秀单元调变器。因此,我们将解决单元调变和率复运输函数的问题集成到一个alternative的方式,使用它们的各自的最佳运输解决方案。

Unified Transfer Learning Models for High-Dimensional Linear Regression

  • paper_url: http://arxiv.org/abs/2307.00238
  • repo_url: None
  • paper_authors: Shuo Shuo Liu
  • for: 这 paper 是为了解决现代数据分析中的转移学习问题,特别是当target数据稀缺但source数据充沛时。
  • methods: 这 paper 提出了一种可解释的转移学习模型,称为UTrans,可以探测目标数据中可转移变量和源数据。 authors 还提出了一种基于假设检测的源检测算法,以排除不可转移数据。
  • results: 在多个实验中,UTrans 的估计和预测误差比既有的方法低很多,同时保持可解释性。 authors 最后应用了这种算法到美国 между代移动性数据,并与经典机器学习算法进行比较。
    Abstract Transfer learning plays a key role in modern data analysis when: (1) the target data are scarce but the source data are sufficient; (2) the distributions of the source and target data are heterogeneous. This paper develops an interpretable unified transfer learning model, termed as UTrans, which can detect both transferable variables and source data. More specifically, we establish the estimation error bounds and prove that our bounds are lower than those with target data only. Besides, we propose a source detection algorithm based on hypothesis testing to exclude the nontransferable data. We evaluate and compare UTrans to the existing algorithms in multiple experiments. It is shown that UTrans attains much lower estimation and prediction errors than the existing methods, while preserving interpretability. We finally apply it to the US intergenerational mobility data and compare our proposed algorithms to the classical machine learning algorithms.
    摘要 现代数据分析中,转移学习具有重要作用,特别是当目标数据scarce但来源数据充足时。本文提出了可解释的一种统一转移学习模型,称为UTrans,可以探测目标数据中可以转移的变量以及来源数据。更 specifically,我们建立了估计误差 bound,并证明我们的 bound 比目标数据只有的低。此外,我们提出了一种来源检测算法基于假设测试,以排除不可转移的数据。我们在多个实验中评估和比较UTrans与现有方法,显示UTrans可以获得远低于现有方法的估计和预测误差,同时保持可解释性。最后,我们应用其到美国 между代移动性数据中,并与经典机器学习算法进行比较。

Hierarchical Federated Learning Incentivization for Gas Usage Estimation

  • paper_url: http://arxiv.org/abs/2307.00233
  • repo_url: None
  • paper_authors: Has Sun, Xiaoli Tang, Chengyi Yang, Zhenpeng Yu, Xiuli Wang, Qijie Ding, Zengxiang Li, Han Yu
  • for: The paper is written for the efficient functioning of gas distribution networks and saving operational costs by accurately estimating gas usage.
  • methods: The paper proposes a Hierarchical FL Incentive Mechanism for Gas Usage Estimation (HI-GAS) that uses federated learning (FL) to enable local data processing on each participant, such as gas companies and heating stations, while maintaining privacy and incentivizing active participation.
  • results: The proposed mechanism is testbedded in the ENN Group, one of the leading players in the natural gas and green energy industry, and extensive experiments validate the effectiveness of the proposed mechanism in improving gas usage estimation performance.
    Abstract Accurately estimating gas usage is essential for the efficient functioning of gas distribution networks and saving operational costs. Traditional methods rely on centralized data processing, which poses privacy risks. Federated learning (FL) offers a solution to this problem by enabling local data processing on each participant, such as gas companies and heating stations. However, local training and communication overhead may discourage gas companies and heating stations from actively participating in the FL training process. To address this challenge, we propose a Hierarchical FL Incentive Mechanism for Gas Usage Estimation (HI-GAS), which has been testbedded in the ENN Group, one of the leading players in the natural gas and green energy industry. It is designed to support horizontal FL among gas companies, and vertical FL among each gas company and heating station within a hierarchical FL ecosystem, rewarding participants based on their contributions to FL. In addition, a hierarchical FL model aggregation approach is also proposed to improve the gas usage estimation performance by aggregating models at different levels of the hierarchy. The incentive scheme employs a multi-dimensional contribution-aware reward distribution function that combines the evaluation of data quality and model contribution to incentivize both gas companies and heating stations within their jurisdiction while maintaining fairness. Results of extensive experiments validate the effectiveness of the proposed mechanism.
    摘要 必须准确估算燃气使用量以确保燃气供应网络的有效运行和降低运营成本。传统方法依赖中央数据处理,却可能涉及到隐私风险。联邦学习(FL)提供了一种解决方案,允许每个参与者(如燃气公司和温水站)进行本地数据处理。然而,本地训练和通信开销可能会抑制燃气公司和温水站参与FL训练过程。为解决这个挑战,我们提出了一种层次FL奖励机制 для燃气使用量估算(HI-GAS),在ENN集团(一家领先的自然气和绿色能源企业)的测试环境中进行了测试。它支持水平FL among燃气公司,并在每个燃气公司和温水站之间层次FL,为参与者基于他们对FL的贡献而颁发奖励。此外,一种层次FL模型聚合方法也被提出,以提高燃气使用量估算性能。奖励机制采用多维度贡献意识分布函数,旨在奖励燃气公司和温水站在其辖区内的贡献,同时保持公平。实验结果证明了提案的效果。

Forward-Forward Algorithm for Hyperspectral Image Classification: A Preliminary Study

  • paper_url: http://arxiv.org/abs/2307.00231
  • repo_url: None
  • paper_authors: Sidike Paheding, Abel A. Reyes-Angulo
  • for: 本研究探讨了使用forward-forward算法(FFA)进行干涉谱图像分类。
  • methods: 本研究使用了传统的反向卷积算法(back-propagation)和FFA两种方法进行比较分析。
  • results: 初步结果表明FFA可能具有更好的性能和更好的可扩展性,而且可以避免反向卷积算法的一些局限性。
    Abstract The back-propagation algorithm has long been the de-facto standard in optimizing weights and biases in neural networks, particularly in cutting-edge deep learning models. Its widespread adoption in fields like natural language processing, computer vision, and remote sensing has revolutionized automation in various tasks. The popularity of back-propagation stems from its ability to achieve outstanding performance in tasks such as classification, detection, and segmentation. Nevertheless, back-propagation is not without its limitations, encompassing sensitivity to initial conditions, vanishing gradients, overfitting, and computational complexity. The recent introduction of a forward-forward algorithm (FFA), which computes local goodness functions to optimize network parameters, alleviates the dependence on substantial computational resources and the constant need for architectural scaling. This study investigates the application of FFA for hyperspectral image classification. Experimental results and comparative analysis are provided with the use of the traditional back-propagation algorithm. Preliminary results show the potential behind FFA and its promises.
    摘要 很长时间内,反射协同精算算法一直是深度学习模型优化参数的标准方法,尤其是在自然语言处理、计算机视觉和远程感知等领域中。这种广泛的应用使得自动化各种任务得到了革命性的改善。反射协同精算算法的吸引力来自它能够在分类、探测和分割等任务中达到杰出的性能。然而,反射协同精算算法也有一些局限性,例如依赖于初始条件、渐近衰减、过拟合和计算复杂性等。最近,一种前进前进算法(FFA)在计算局部优良函数来优化网络参数,从而减少了对计算资源的依赖和建筑层次的需求。本研究探讨了使用FFA进行光谱成像分类的应用。实验结果和比较分析都提供了使用传统反射协同精算算法的前期结果,显示了FFA的潜力和承诺。

InferTurbo: A Scalable System for Boosting Full-graph Inference of Graph Neural Network over Huge Graphs

  • paper_url: http://arxiv.org/abs/2307.00228
  • repo_url: None
  • paper_authors: Dalong Zhang, Xianzheng Song, Zhiyang Hu, Yang Li, Miao Tao, Binbin Hu, Lin Wang, Zhiqiang Zhang, Jun Zhou
  • for: 提高工业场景中GNNS的推理效率和可扩展性。
  • methods: 提出一个可扩展的系统名为InferTurbo,通过采用GAS(集合应用散列)Schema来解决扩展性、可靠性和可扩展性等问题。
  • results: 实验结果表明,InferTurbo可以快速和高效地完成GNNS的推理任务,并且在图中含有一些核心节点时能够更好地均衡负载。系统可以在2小时内完成一个图中百亿个边的GNNS推理任务,并且与传统推理管道相比有显著的性能提升。
    Abstract GNN inference is a non-trivial task, especially in industrial scenarios with giant graphs, given three main challenges, i.e., scalability tailored for full-graph inference on huge graphs, inconsistency caused by stochastic acceleration strategies (e.g., sampling), and the serious redundant computation issue. To address the above challenges, we propose a scalable system named InferTurbo to boost the GNN inference tasks in industrial scenarios. Inspired by the philosophy of ``think-like-a-vertex", a GAS-like (Gather-Apply-Scatter) schema is proposed to describe the computation paradigm and data flow of GNN inference. The computation of GNNs is expressed in an iteration manner, in which a vertex would gather messages via in-edges and update its state information by forwarding an associated layer of GNNs with those messages and then send the updated information to other vertexes via out-edges. Following the schema, the proposed InferTurbo can be built with alternative backends (e.g., batch processing system or graph computing system). Moreover, InferTurbo introduces several strategies like shadow-nodes and partial-gather to handle nodes with large degrees for better load balancing. With InferTurbo, GNN inference can be hierarchically conducted over the full graph without sampling and redundant computation. Experimental results demonstrate that our system is robust and efficient for inference tasks over graphs containing some hub nodes with many adjacent edges. Meanwhile, the system gains a remarkable performance compared with the traditional inference pipeline, and it can finish a GNN inference task over a graph with tens of billions of nodes and hundreds of billions of edges within 2 hours.
    摘要 原文:GNN推理是一个非常复杂的任务,特别在工业场景中面临巨大图的情况下,因为存在三个主要挑战:一是扩展性,二是随机加速策略(如采样)引起的不一致性,三是严重的重复计算问题。为了解决这些挑战,我们提出了一个可扩展的系统名为InferTurbo,用于加速工业场景中的GNN推理任务。根据“思考如Vertex”的哲学,我们提出了一种GAS(聚合应用散发)模式来描述GNN推理的计算范式和数据流程。GNN的计算是在迭代方式下进行,每个顶点都会通过入边收集消息,并将这些消息与GNN层进行相应的更新,然后将更新后的信息发送给其他顶点via出边。按照这种模式,我们提出的InferTurbo可以采用替换性的后端(如批处理系统或图计算系统)。此外,InferTurbo还引入了一些策略,如影子节点和部分聚合,以更好地协调节点的负载均衡。通过InferTurbo,GNN推理可以在全图上进行不需要采样和重复计算。实验结果表明,我们的系统具有良好的稳定性和效率,可以快速完成含有一些核心节点的GNN推理任务。同时,我们的系统与传统推理管道相比,具有显著的性能优势,可以在2小时内完成一个图中百亿个节点、千亿个边的GNN推理任务。

Causal Structure Learning by Using Intersection of Markov Blankets

  • paper_url: http://arxiv.org/abs/2307.00227
  • repo_url: https://github.com/ronedong/eembi
  • paper_authors: Yiran Dong, Chuanhou Gao
  • for: 本研究提出了一种新的 causal 结构学习算法,即 Endogenous and Exogenous Markov Blankets Intersection (EEMBI),该算法结合了 Bayesian 网络和Structural Causal Models (SCM) 的特点。
  • methods: 本研究使用了 EEMBI 算法,并在其基础上提出了一种扩展版本,即 EEMBI-PC,它将 PC 算法的最后一步纳入 EEMBI 中。
  • results: 研究人员通过使用 EEMBI 和 EEMBI-PC 算法,在不同的数据集上进行了实验,并证明了这些算法的有效性和可靠性。
    Abstract In this paper, we introduce a novel causal structure learning algorithm called Endogenous and Exogenous Markov Blankets Intersection (EEMBI), which combines the properties of Bayesian networks and Structural Causal Models (SCM). Furthermore, we propose an extended version of EEMBI, namely EEMBI-PC, which integrates the last step of the PC algorithm into EEMBI.
    摘要 在本文中,我们介绍了一种新的 causal structure 学习算法,即 Endogenous and Exogenous Markov Blankets Intersection (EEMBI),该算法结合了 bayesian networks 和 Structural Causal Models (SCM) 的特性。此外,我们还提出了 EEMBI 的扩展版本,即 EEMBI-PC,该版本将 PC 算法的最后一步纳入 EEMBI 中。

S-Omninet: Structured Data Enhanced Universal Multimodal Learning Architecture

  • paper_url: http://arxiv.org/abs/2307.00226
  • repo_url: None
  • paper_authors: Ye Xue, Diego Klabjan, Jean Utke
  • for: 本文旨在扩展和改进 Omninet 模型,以便处理多modalitaties 和多任务 simultaneously.
  • methods: 本文提出了三种改进:1) 通过cross-cache attention实现交互 among spatial, temporal, and structured features; 2) 使用 patch embeddings 增强视觉输入的空间表示; 3) 支持структуры数据。
  • results: 对多modalitaties 数据进行评估,提出了一种新的 Structured-data-enhanced Omninet(S-Omninet)模型,并在多个多模态数据集上实现了显著提高。
    Abstract Multimodal multitask learning has attracted an increasing interest in recent years. Singlemodal models have been advancing rapidly and have achieved astonishing results on various tasks across multiple domains. Multimodal learning offers opportunities for further improvements by integrating data from multiple modalities. Many methods are proposed to learn on a specific type of multimodal data, such as vision and language data. A few of them are designed to handle several modalities and tasks at a time. In this work, we extend and improve Omninet, an architecture that is capable of handling multiple modalities and tasks at a time, by introducing cross-cache attention, integrating patch embeddings for vision inputs, and supporting structured data. The proposed Structured-data-enhanced Omninet (S-Omninet) is a universal model that is capable of learning from structured data of various dimensions effectively with unstructured data through cross-cache attention, which enables interactions among spatial, temporal, and structured features. We also enhance spatial representations in a spatial cache with patch embeddings. We evaluate the proposed model on several multimodal datasets and demonstrate a significant improvement over the baseline, Omninet.
    摘要 多模态多任务学习在最近几年内得到了越来越多的关注。单模态模型在不同领域中取得了非常出众的成绩。多模态学习可以更好地提高模型的性能,通过结合多个模式的数据。许多方法是用来学习特定类型的多模态数据,如视觉语言数据。然而,只有一些方法可以同时处理多个模式和任务。在这项工作中,我们对Omninet架构进行扩展和改进,通过引入跨缓存注意力、 integrate patch embeddings для视觉输入和支持结构数据。我们提出的结构化数据增强Omninet(S-Omninet)是一种通用的模型,可以有效地从不同维度的结构数据中学习,并且可以通过跨缓存注意力和缓存中的补充表示来实现空间、时间和结构特征之间的交互。我们还增强缓存中的空间表示,通过将patch embeddings integrate into spatial cache。我们在多个多模态数据集上评估了提议的模型,并证明了与基eline Omninet相比,S-Omninet具有显著的提升。

Re-Think and Re-Design Graph Neural Networks in Spaces of Continuous Graph Diffusion Functionals

  • paper_url: http://arxiv.org/abs/2307.00222
  • repo_url: None
  • paper_authors: Tingting Dan, Jiaqi Ding, Ziquan Wei, Shahar Z Kovalsky, Minjeong Kim, Won Hwa Kim, Guorong Wu
  • for: This paper proposes a new framework for graph neural networks (GNNs) that addresses the limitation of locality in existing GNN models and improves their ability to capture long-range dependencies and global patterns in graphs.
  • methods: The proposed framework uses a new inductive bias based on variational analysis and maps discrete GNN models to continuous diffusion functionals. It also introduces a selective mechanism to address the trade-off between model depth and over-smoothing, and a novel generative adversarial network (GAN) that predicts spreading flows in graphs.
  • results: The proposed GNN models achieve state-of-the-art (SOTA) performance on popular graph learning benchmarks such as Cora, Citeseer, and Pubmed.
    Abstract Graph neural networks (GNNs) are widely used in domains like social networks and biological systems. However, the locality assumption of GNNs, which limits information exchange to neighboring nodes, hampers their ability to capture long-range dependencies and global patterns in graphs. To address this, we propose a new inductive bias based on variational analysis, drawing inspiration from the Brachistochrone problem. Our framework establishes a mapping between discrete GNN models and continuous diffusion functionals. This enables the design of application-specific objective functions in the continuous domain and the construction of discrete deep models with mathematical guarantees. To tackle over-smoothing in GNNs, we analyze the existing layer-by-layer graph embedding models and identify that they are equivalent to l2-norm integral functionals of graph gradients, which cause over-smoothing. Similar to edge-preserving filters in image denoising, we introduce total variation (TV) to align the graph diffusion pattern with global community topologies. Additionally, we devise a selective mechanism to address the trade-off between model depth and over-smoothing, which can be easily integrated into existing GNNs. Furthermore, we propose a novel generative adversarial network (GAN) that predicts spreading flows in graphs through a neural transport equation. To mitigate vanishing flows, we customize the objective function to minimize transportation within each community while maximizing inter-community flows. Our GNN models achieve state-of-the-art (SOTA) performance on popular graph learning benchmarks such as Cora, Citeseer, and Pubmed.
    摘要 图 neural network (GNN) 在社交网络和生物系统等领域广泛应用。然而,GNN 的本地性假设,即只与邻居交换信息,限制其捕捉长距离依赖和全局模式的能力。为解决这个问题,我们提出了一种新的假设基于变分分析, Drawing inspiration from the Brachistochrone problem。我们的框架将离散 GNN 模型与连续扩散函数联系起来,这使得可以在连续空间中设计应用特定的目标函数和建立具有数学保证的深度模型。为了解决 GNN 中的过滤问题,我们分析了现有的层次 Graph Embedding 模型,发现它们与 L2 范数积函数相等,导致过滤。类似于图边缘滤波器,我们引入全体变量(TV),以确保图像扩散模式与全局社区结构相匹配。此外,我们开发了一种选择性机制,以解决模型深度和过滤之间的负反向关系,这可以轻松地集成到现有的 GNN 中。此外,我们提出了一种基于神经运输方程的图文生成模型,通过最小化运输在每个社区内的交通量,以最大化社区间的运输量来避免消失流量。我们的 GNN 模型在 популяр的图学学习 Benchmark 上达到了状态的最佳性能(SOTA)。

A Constructive Approach to Function Realization by Neural Stochastic Differential Equations

  • paper_url: http://arxiv.org/abs/2307.00215
  • repo_url: None
  • paper_authors: Tanya Veeravalli, Maxim Raginsky
  • for: 本研究目的是研究神经动力系统的函数逼近问题,并提出了一种opposite, constructiveapproach。
  • methods: 本文使用了概率方法和 геометрические方法(Lie-theoretic methods)来 caracterize the classes of functions realized by such systems。
  • results: 研究表明,通过对系统动力学特性进行限制,可以实现不同类型的函数逼近。
    Abstract The problem of function approximation by neural dynamical systems has typically been approached in a top-down manner: Any continuous function can be approximated to an arbitrary accuracy by a sufficiently complex model with a given architecture. This can lead to high-complexity controls which are impractical in applications. In this paper, we take the opposite, constructive approach: We impose various structural restrictions on system dynamics and consequently characterize the class of functions that can be realized by such a system. The systems are implemented as a cascade interconnection of a neural stochastic differential equation (Neural SDE), a deterministic dynamical system, and a readout map. Both probabilistic and geometric (Lie-theoretic) methods are used to characterize the classes of functions realized by such systems.
    摘要 通常,函数近似问题由神经动态系统来解决,通常采取顶部下降方法:任何连续函数都可以在给定的架构下被至高精度地近似。这可能导致具有高复杂度的控制系统,在应用中不实用。在这篇论文中,我们采取了相反的构建方法:我们对系统动力学进行了各种结构限制,并且根据这些限制,描述了神经动态系统可以实现的函数类型。这些系统通过神经随机分布方程(Neural SDE)、恒定动力学系统和读取映射来实现。我们使用概率方法和几何方法(Lie-theoretic)来描述这些系统实现的函数类型。

More for Less: Compact Convolutional Transformers Enable Robust Medical Image Classification with Limited Data

  • paper_url: http://arxiv.org/abs/2307.00213
  • repo_url: None
  • paper_authors: Andrew Kean Gao
  • for: 这个研究旨在测试Compact Convolutional Transformers(CCT)是否能够在具有限制数据的医疗图像分类中提供高精度的结果。
  • methods: 本研究使用了CCT,它是一种结合了transformers和卷积层的混合模型,以测试其在有限数据的医疗图像分类中的效果。
  • results: 研究发现,CCT在一个小型的资料集上可以 дости得92.49%的分类精度和0.9935的微 averaged ROC AUC,并在5个迭代后超过80%的验证精度。
    Abstract Transformers are very powerful tools for a variety of tasks across domains, from text generation to image captioning. However, transformers require substantial amounts of training data, which is often a challenge in biomedical settings, where high quality labeled data can be challenging or expensive to obtain. This study investigates the efficacy of Compact Convolutional Transformers (CCT) for robust medical image classification with limited data, addressing a key issue faced by conventional Vision Transformers - their requirement for large datasets. A hybrid of transformers and convolutional layers, CCTs demonstrate high accuracy on modestly sized datasets. We employed a benchmark dataset of peripheral blood cell images of eight distinct cell types, each represented by approximately 2,000 low-resolution (28x28x3 pixel) samples. Despite the dataset size being smaller than those typically used with Vision Transformers, we achieved a commendable classification accuracy of 92.49% and a micro-average ROC AUC of 0.9935. The CCT also learned quickly, exceeding 80% validation accuracy after five epochs. Analysis of per-class precision, recall, F1, and ROC showed that performance was strong across cell types. Our findings underscore the robustness of CCTs, indicating their potential as a solution to data scarcity issues prevalent in biomedical imaging. We substantiate the applicability of CCTs in data-constrained areas and encourage further work on CCTs.
    摘要 transformers是非常强大的工具,可以用于多个领域的任务,从文本生成到图像描述。然而,transformers需要大量的训练数据,而在医学设置中,高质量的标注数据可能困难或者昂贵。这项研究检验了Compact Convolutional Transformers(CCT)在具有限制数据的情况下的稳定性,并解决了传统的视图转换器面临的大数据问题。CCT是一种组合了transformers和卷积层的混合型模型,在中等规模的数据集上达到了高精度。我们使用了8种不同类型的骨髓细胞图像 benchmark数据集,每种类型有约2000个低分辨率(28x28x3像素)的样本。尽管数据集的大小小于传统使用的视图转换器数据集,但我们达到了92.49%的分类精度和0.9935的微平均ROC AUC。CCT也快速学习,在5个epoch后超过80%的验证精度。分析每个类型的精度、回归、F1和ROC指标表明,CCT的性能强大,并且在各个细胞类型中表现良好。我们的发现证明了CCT的可靠性,表明它们在数据缺乏的情况下可以作为解决方案。我们鼓励进一步研究CCT,以推广其应用范围。

An Interpretable Constructive Algorithm for Incremental Random Weight Neural Networks and Its Application

  • paper_url: http://arxiv.org/abs/2307.00185
  • repo_url: None
  • paper_authors: Jing Nan, Wei Dai, Guan Yuan, Ping Zhou
  • for: 本文提出了一种可解释性构建算法(ICA),用于解决快速学习的难点,即隐藏参数与剩余误差之间的关系难以解释。
  • methods: 本文提出了一种基于几何关系的可解释性构建算法(ICA),并采用了节点池策略来获得更易于收敛的隐藏参数。此外,本文证明了ICA的通用近似性质。
  • results: 实验结果表明,ICA在六个基准数据集和一个数学模拟数据集上表现出色,其中模型学习速度、模型准确率和模型网络结构都得到了改进。此外,本文还采用了两个实际应用案例来验证ICA在实践中的效果。
    Abstract Incremental random weight neural networks (IRWNNs) have gained attention in view of its easy implementation and fast learning. However, a significant drawback of IRWNNs is that the elationship between the hidden parameters (node)and the residual error (model performance) is difficult to be interpreted. To address the above issue, this article proposes an interpretable constructive algorithm (ICA) with geometric information constraint. First, based on the geometric relationship between the hidden parameters and the residual error, an interpretable geometric information constraint is proposed to randomly assign the hidden parameters. Meanwhile, a node pool strategy is employed to obtain hidden parameters that is more conducive to convergence from hidden parameters satisfying the proposed constraint. Furthermore, the universal approximation property of the ICA is proved. Finally, a lightweight version of ICA is presented for large-scale data modeling tasks. Experimental results on six benchmark datasets and a numerical simulation dataset demonstrate that the ICA outperforms other constructive algorithms in terms of modeling speed, model accuracy, and model network structure. Besides, two practical industrial application case are used to validate the effectiveness of ICA in practical applications.
    摘要 incremenetal random weight neural networks (IRWNNs) 已经吸引了关注,因为它的实现容易和学习速度快。然而,IRWNNs 的一个重大缺点是hidden parameters 和 residual error 之间的关系难以被解释。为了解决这个问题,本文提出了一种可解释性建构算法(ICA),具有几何信息约束。首先,根据hidden parameters 和 residual error 的几何关系,提出了一种可解释性的几何信息约束,随机分配hidden parameters。此外,employs 一种node pool策略来从满足提出的约束的hidden parameters中获取更有利于收敛的hidden parameters。其次,证明了ICA的通用适应性。最后,为大规模数据模型任务提出了一种轻量级版本的ICA。实验结果表明,ICA在六个标准 benchmark 数据集和一个数学模拟数据集上的模型速度、模型准确性和模型网络结构方面都超过了其他构造算法。此外,通过两个实际应用案例, validate 了ICA在实际应用中的有效性。

Still No Lie Detector for Language Models: Probing Empirical and Conceptual Roadblocks

  • paper_url: http://arxiv.org/abs/2307.00175
  • repo_url: https://github.com/balevinstein/probes
  • paper_authors: B. A. Levinstein, Daniel A. Herrmann
  • for: 本文研究了大语言模型(LLMs)是否具有信念,以及如果它们具有信念,我们如何测量它们。
  • methods: 本文评估了两种现有方法,一种由Azaria和Mitchell(2023)提出,另一种由Burns等人(2022)提出。我们提供了实验结果,表明这两种方法在基本上不能普遍化。
  • results: 本文提供了实验结果,表明现有方法无法检测LLMs的信念。我们还讨论了一些近期的Arguments,认为LLMs无法具有信念。我们表明这些Arguments是误导的,并提供了一种更产生的问题定义,以及未来工作的具体路径。
    Abstract We consider the questions of whether or not large language models (LLMs) have beliefs, and, if they do, how we might measure them. First, we evaluate two existing approaches, one due to Azaria and Mitchell (2023) and the other to Burns et al. (2022). We provide empirical results that show that these methods fail to generalize in very basic ways. We then argue that, even if LLMs have beliefs, these methods are unlikely to be successful for conceptual reasons. Thus, there is still no lie-detector for LLMs. After describing our empirical results we take a step back and consider whether or not we should expect LLMs to have something like beliefs in the first place. We consider some recent arguments aiming to show that LLMs cannot have beliefs. We show that these arguments are misguided. We provide a more productive framing of questions surrounding the status of beliefs in LLMs, and highlight the empirical nature of the problem. We conclude by suggesting some concrete paths for future work.
    摘要 我团队考虑了大语言模型(LLM)是否具有信念的问题,以及如果它们具有信念,我们如何测量它们。我们首先评估了Azaria和Mitchell(2023)的方法和Burns等人(2022)的方法。我们提供了实验结果,表明这些方法在非常基础的情况下无法泛化。我们then argue that,即使LLMs具有信念,这些方法是不可靠的,因为概念上的理由。因此,目前还没有LLM的“谎言检测器”。在描述我们的实验结果后,我们抽出一步,考虑LLM是否应该具有类似信念的问题。我们考虑了一些最近的Arguments,试图表明LLM无法具有信念。我们表示这些Arguments是误导的,并提供了一种更产生的问题定征,高亮问题的实际性。我们 conclude by suggesting some concrete paths for future work.Note that the translation is done using a machine translation tool, and may not be perfect or idiomatic.

The Integer Linear Programming Inference Cookbook

  • paper_url: http://arxiv.org/abs/2307.00171
  • repo_url: None
  • paper_authors: Vivek Srikumar, Dan Roth
  • for: 这篇论文是为了导导读者在自然语言处理中使用整数线性计划来解决新的推理问题。
  • methods: 这篇论文使用了许多recipes来帮助读者将新的推理问题转化为整数线性计划的实例。
  • results: 文章结束有两个实践例子,用于说明使用这些recipes的过程。
    Abstract Over the years, integer linear programs have been employed to model inference in many natural language processing problems. This survey is meant to guide the reader through the process of framing a new inference problem as an instance of an integer linear program and is structured as a collection of recipes. At the end, we will see two worked examples to illustrate the use of these recipes.
    摘要 年来,整数线性Programs 被应用于许多自然语言处理问题的推理。这篇评论旨在引导读者将新的推理问题转换成整数线性Programs的实例,并结构化为一系列的烹饪。文章结尾有两个示例,以解释使用这些烹饪的方法。Here's a breakdown of the translation:年来 (年来) - "Over the years"整数线性Programs (整数线性Programs) - "integer linear programs"被应用于 (被应用于) - "have been employed in"许多自然语言处理问题 (许多自然语言处理问题) - "many natural language processing problems"推理 (推理) - "inference"这篇评论 (这篇评论) - "this survey"是 meant to guide (是 meant to guide) - "is meant to guide"读者 (读者) - "reader"将新的推理问题 (将新的推理问题) - "new inference problems"转换成 (转换成) - "converted into"整数线性Programs 的实例 (整数线性Programs 的实例) - "an instance of integer linear programs"并结构化为 (并结构化为) - "and structured as"一系列的烹饪 (一系列的烹饪) - "a collection of recipes" At the end (At the end) - "at the end"有两个示例 (有两个示例) - "there are two examples"以解释使用 (以解释使用) - "to explain the use of"这些烹饪 (这些烹饪) - "these recipes"

VoxWatch: An open-set speaker recognition benchmark on VoxCeleb

  • paper_url: http://arxiv.org/abs/2307.00169
  • repo_url: None
  • paper_authors: Raghuveer Peri, Seyed Omid Sadjadi, Daniel Garcia-Romero
    for:* 这个论文主要关注于开放集成 speaker identification(OSI)问题,具体来说是 determinant whether a test speech sample belongs to a speaker from a set of pre-enrolled individuals(in-set)or if it is from an out-of-set speaker。methods:* 该论文使用了三种强大的神经网络系统来进行测试,并使用了VoxCeleb dataset来建立了首个公共benchmark для OSI。results:* 论文表明,通常采用的adaptive score normalization不一定会提高OSI性能,但是score calibration和score fusion两种通常用于SV的技术在OSI中具有显著的改善作用。
    Abstract Despite its broad practical applications such as in fraud prevention, open-set speaker identification (OSI) has received less attention in the speaker recognition community compared to speaker verification (SV). OSI deals with determining if a test speech sample belongs to a speaker from a set of pre-enrolled individuals (in-set) or if it is from an out-of-set speaker. In addition to the typical challenges associated with speech variability, OSI is prone to the "false-alarm problem"; as the size of the in-set speaker population (a.k.a watchlist) grows, the out-of-set scores become larger, leading to increased false alarm rates. This is in particular challenging for applications in financial institutions and border security where the watchlist size is typically of the order of several thousand speakers. Therefore, it is important to systematically quantify the false-alarm problem, and develop techniques that alleviate the impact of watchlist size on detection performance. Prior studies on this problem are sparse, and lack a common benchmark for systematic evaluations. In this paper, we present the first public benchmark for OSI, developed using the VoxCeleb dataset. We quantify the effect of the watchlist size and speech duration on the watchlist-based speaker detection task using three strong neural network based systems. In contrast to the findings from prior research, we show that the commonly adopted adaptive score normalization is not guaranteed to improve the performance for this task. On the other hand, we show that score calibration and score fusion, two other commonly used techniques in SV, result in significant improvements in OSI performance.
    摘要 尽管开放集群人识别(OSI)在实际应用中具有广泛的实用价值,如防止诈骗等,但在语音识别社区中,它受到了较少的关注。OSI的任务是判断测试语音样本是否属于一个先前 регистрирован的人员(在集群)中的人或者是来自外部人员。除了普通的语音变化挑战外,OSI还面临着“假警示问题”,即预先注册人员(watchlist)的大小增加后,外部人员的分数变大,导致假警示率增加。这对于金融机构和边境安全应用来说特别重要,因为预先注册人员的大小通常在千人左右。因此,系统地量化假警示问题,并开发技术来减少预先注册人员大小对检测性能的影响。现有研究对这个问题的研究不多,而且没有一个通用的评估标准。在这篇文章中,我们提供了第一个公共Benchmark для OSI,使用VoxCeleb数据集。我们使用三种强大的神经网络基于系统来评估预先注册人员大小和语音 duration对开放集群人识别任务的影响。与之前的研究不同,我们发现通常采用的adaptive score normalization不一定能提高表现。然而,我们发现对于这个任务,使用Score Calibration和Score Fusion两种通常用于语音识别中的技术可以实现显著的提高。

U-Calibration: Forecasting for an Unknown Agent

  • paper_url: http://arxiv.org/abs/2307.00168
  • repo_url: None
  • paper_authors: Robert Kleinberg, Renato Paes Leme, Jon Schneider, Yifeng Teng
  • for: 评估预测Binary事件的forecast, whose predictions are consumed by rational agents who take an action in response to a prediction, but whose utility is unknown to the forecaster.
  • methods: 使用新的metric called U-calibration to evaluate forecasts, which guarantees sublinear regret for all possible agents.
  • results: 提供了一种算法来实现$O(\sqrt{T})$ U-calibration error, 并且可以在多个预测类型中扩展。I hope that helps! Let me know if you have any other questions.
    Abstract We consider the problem of evaluating forecasts of binary events whose predictions are consumed by rational agents who take an action in response to a prediction, but whose utility is unknown to the forecaster. We show that optimizing forecasts for a single scoring rule (e.g., the Brier score) cannot guarantee low regret for all possible agents. In contrast, forecasts that are well-calibrated guarantee that all agents incur sublinear regret. However, calibration is not a necessary criterion here (it is possible for miscalibrated forecasts to provide good regret guarantees for all possible agents), and calibrated forecasting procedures have provably worse convergence rates than forecasting procedures targeting a single scoring rule. Motivated by this, we present a new metric for evaluating forecasts that we call U-calibration, equal to the maximal regret of the sequence of forecasts when evaluated under any bounded scoring rule. We show that sublinear U-calibration error is a necessary and sufficient condition for all agents to achieve sublinear regret guarantees. We additionally demonstrate how to compute the U-calibration error efficiently and provide an online algorithm that achieves $O(\sqrt{T})$ U-calibration error (on par with optimal rates for optimizing for a single scoring rule, and bypassing lower bounds for the traditionally calibrated learning procedures). Finally, we discuss generalizations to the multiclass prediction setting.
    摘要 我们考虑一个评价预测 binary 事件的问题,其中预测是由理智的代理人运行,但预测的价值是不知道的。我们表明了优化预测的问题,不能 garantte 所有可能的代理人对预测的 regret 是低的。相反,几乎准确的预测可以保证所有代理人的 regret 是线性的。但是,准确ness 不是必要的条件(可能有错误的预测仍然可以为所有可能的代理人提供好的 regret guarantees),并且准确的预测程序有可靠的更差的融合率。 motivated by this,我们提出了一个新的预测评估标准,即 U-calibration,定义为任何紧bounded scoring rule 下的预测序列最大的 regret。我们表明了 sublinear U-calibration error 是所有代理人都可以获得 sublinear regret guarantees 的必需和充分条件。此外,我们还说明了如何实时计算 U-calibration error 的方法,并提出了一个在线上运行的算法,可以在 $O(\sqrt{T})$ 的 U-calibration error 下 достичь最佳的 regret guarantees,并且超越了传统的准确预测程序的下界。最后,我们讨论了多项预测设定下的扩展。

Counterfactual Collaborative Reasoning

  • paper_url: http://arxiv.org/abs/2307.00165
  • repo_url: None
  • paper_authors: Jianchao Ji, Zelong Li, Shuyuan Xu, Max Xiong, Juntao Tan, Yingqiang Ge, Hao Wang, Yongfeng Zhang
  • for: 提高机器学习模型的准确率和可读性
  • methods: 结合Counterfactual Collaborative Reasoning(CCR)和神经网络逻辑理解,使用推荐系统为例,解决数据稀缺问题,提高模型性能和透明度
  • results: 在三个实际数据集上实现更好的表现,比非加工模型和隐式加工模型更高,同时也提高模型的可读性。
    Abstract Causal reasoning and logical reasoning are two important types of reasoning abilities for human intelligence. However, their relationship has not been extensively explored under machine intelligence context. In this paper, we explore how the two reasoning abilities can be jointly modeled to enhance both accuracy and explainability of machine learning models. More specifically, by integrating two important types of reasoning ability -- counterfactual reasoning and (neural) logical reasoning -- we propose Counterfactual Collaborative Reasoning (CCR), which conducts counterfactual logic reasoning to improve the performance. In particular, we use recommender system as an example to show how CCR alleviate data scarcity, improve accuracy and enhance transparency. Technically, we leverage counterfactual reasoning to generate "difficult" counterfactual training examples for data augmentation, which -- together with the original training examples -- can enhance the model performance. Since the augmented data is model irrelevant, they can be used to enhance any model, enabling the wide applicability of the technique. Besides, most of the existing data augmentation methods focus on "implicit data augmentation" over users' implicit feedback, while our framework conducts "explicit data augmentation" over users explicit feedback based on counterfactual logic reasoning. Experiments on three real-world datasets show that CCR achieves better performance than non-augmented models and implicitly augmented models, and also improves model transparency by generating counterfactual explanations.
    摘要 人工智能中的 causal reasoning 和逻辑理解是两种重要的理解能力。然而,这两种理解能力在机器人智能上的关系尚未得到广泛探讨。在这篇论文中,我们explore了两种理解能力如何结合以提高机器学习模型的准确性和可读性。更 Specifically,我们提出了Counterfactual Collaborative Reasoning(CCR),它通过将counterfactual逻辑理解和神经网络逻辑理解结合起来,以提高表现。具体来说,我们使用推荐系统作为示例,展示了如何CCR可以适应数据稀缺、提高准确性和提高透明度。技术上,我们利用counterfactual逻辑理解生成"difficult" counterfactual训练示例,用于数据加工。这些示例,与原始训练示例一起,可以提高模型性能。由于扩展数据是模型无关的,它们可以用于提高任何模型,因此这种技术具有广泛的可用性。此外,大多数现有的数据加工方法都是基于用户的隐式反馈进行"隐式数据加工",而我们的框架则是基于counterfactual逻辑理解进行"显式数据加工"。实验结果表明,CCR在三个实际数据集上表现更好于未加工模型和隐式加工模型,并且也提高了模型的透明度。

What do self-supervised speech models know about words?

  • paper_url: http://arxiv.org/abs/2307.00162
  • repo_url: None
  • paper_authors: Ankita Pasad, Chung-Ming Chien, Shane Settle, Karen Livescu
  • for: 本研究旨在 investigate 自我supervised speech模型(S3M)中 Encoding 语言信息的层次结构,以及不同模型中 Encoding 词级信息的方式。
  • methods: 本研究使用 canonical correlation analysis (CCA) 来度量不同层次中 Encoding 的词级语言特征,并对不同模型的层次表现进行了比较分析。
  • results: 研究发现,最佳词级语言内容通常存在模型中间层次,而一些较低级别信息,如发音,也保留在 huBERT 和 WavLM 中高层次。同时,研究发现,不同模型中 Encoding 词级信息的层次分布与语言属性具有相似的特征。此外,研究还发现,使用 HuBERT 和 WavLM 的最佳层次,可以直接实现一些任务的优秀表现,例如词汇识别、词语分 segmentation 和 semanticsentence similarity。
    Abstract Many self-supervised speech models (S3Ms) have been introduced over the last few years, producing performance and data efficiency improvements for a variety of speech tasks. Evidence is emerging that different S3Ms encode linguistic information in different layers, and also that some S3Ms appear to learn phone-like sub-word units. However, the extent to which these models capture larger linguistic units, such as words, and where word-related information is encoded, remains unclear. In this study, we conduct several analyses of word segment representations extracted from different layers of three S3Ms: wav2vec2, HuBERT, and WavLM. We employ canonical correlation analysis (CCA), a lightweight analysis tool, to measure the similarity between these representations and word-level linguistic properties. We find that the maximal word-level linguistic content tends to be found in intermediate model layers, while some lower-level information like pronunciation is also retained in higher layers of HuBERT and WavLM. Syntactic and semantic word attributes have similar layer-wise behavior. We also find that, for all of the models tested, word identity information is concentrated near the center of each word segment. We then test the layer-wise performance of the same models, when used directly with no additional learned parameters, on several tasks: acoustic word discrimination, word segmentation, and semantic sentence similarity. We find similar layer-wise trends in performance, and furthermore, find that when using the best-performing layer of HuBERT or WavLM, it is possible to achieve performance on word segmentation and sentence similarity that rivals more complex existing approaches.
    摘要 In this study, we analyze word segment representations extracted from different layers of three S3Ms: wav2vec2, HuBERT, and WavLM. We use canonical correlation analysis (CCA) to measure the similarity between these representations and word-level linguistic properties. Our results show that the most comprehensive word-level linguistic content is found in the intermediate layers of the models, while some lower-level information like pronunciation is also retained in higher layers of HuBERT and WavLM. Additionally, we find that the layer-wise behavior of syntactic and semantic word attributes is similar.We also investigate the layer-wise performance of the models on several tasks: acoustic word discrimination, word segmentation, and semantic sentence similarity. Our findings show that the best-performing layers of HuBERT and WavLM achieve performance on word segmentation and sentence similarity that is comparable to more complex existing approaches. Furthermore, we find that the layer-wise trends in performance are similar across tasks.Overall, our study provides insights into the representation of linguistic information in self-supervised speech models and demonstrates the potential of using these models for speech processing tasks.

FFPDG: Fast, Fair and Private Data Generation

  • paper_url: http://arxiv.org/abs/2307.00161
  • repo_url: None
  • paper_authors: Weijie Xu, Jinjin Zhao, Francis Iannacci, Bo Wang
  • for: 本研究旨在提出一种快速、公平、灵活和隐私的数据生成方法,以解决现有的生成模型具有偏见和高计算资源需求的问题。
  • methods: 本研究使用了Recent GAN[\cite{goodfellow2014generative}]基于方法,并提出了一种基于约束的数据生成方法,以保证生成的数据具有公平性和隐私性。
  • results: 本研究通过理论和实验验证了提出的方法的有效性,并证明了模型在实际应用场景中的良好性能。
    Abstract Generative modeling has been used frequently in synthetic data generation. Fairness and privacy are two big concerns for synthetic data. Although Recent GAN [\cite{goodfellow2014generative}] based methods show good results in preserving privacy, the generated data may be more biased. At the same time, these methods require high computation resources. In this work, we design a fast, fair, flexible and private data generation method. We show the effectiveness of our method theoretically and empirically. We show that models trained on data generated by the proposed method can perform well (in inference stage) on real application scenarios.
    摘要 生成模型已经广泛应用于合成数据生成。公平和隐私是合成数据的两大关注点。虽然最近的GAN[\cite{goodfellow2014generative}]基于方法显示了保护隐私的好效果,但生成的数据可能更加偏袋。同时,这些方法需要高度的计算资源。在这项工作中,我们设计了快速、公平、灵活和隐私的数据生成方法。我们证明了我们的方法的效果 both theoretically and empirically,并示出了基于我们的方法训练的模型在真实应用场景中的良好表现。

The Effect of Balancing Methods on Model Behavior in Imbalanced Classification Problems

  • paper_url: http://arxiv.org/abs/2307.00157
  • repo_url: None
  • paper_authors: Adrian Stando, Mustafa Cavus, Przemysław Biecek
    for: Addressing the challenge of imbalanced data in classification by examining the impact of balancing methods on model behavior.methods: Utilizes Explainable Artificial Intelligence tools such as variable importance, partial dependence profile, and accumulated local effects to compare models trained on datasets before and after balancing.results: Shows significant changes in model behavior due to balancing methods, which can lead to biased models toward a balanced distribution. These findings emphasize the importance of considering the impact of balancing methods on model behavior beyond just performance comparisons.
    Abstract Imbalanced data poses a significant challenge in classification as model performance is affected by insufficient learning from minority classes. Balancing methods are often used to address this problem. However, such techniques can lead to problems such as overfitting or loss of information. This study addresses a more challenging aspect of balancing methods - their impact on model behavior. To capture these changes, Explainable Artificial Intelligence tools are used to compare models trained on datasets before and after balancing. In addition to the variable importance method, this study uses the partial dependence profile and accumulated local effects techniques. Real and simulated datasets are tested, and an open-source Python package edgaro is developed to facilitate this analysis. The results obtained show significant changes in model behavior due to balancing methods, which can lead to biased models toward a balanced distribution. These findings confirm that balancing analysis should go beyond model performance comparisons to achieve higher reliability of machine learning models. Therefore, we propose a new method performance gain plot for informed data balancing strategy to make an optimal selection of balancing method by analyzing the measure of change in model behavior versus performance gain.
    摘要 <>使用中文简化字体表示文本。>不均衡数据对于分类具有重大挑战,因为模型的学习效果受到少数类的影响。为解决这个问题,通常使用平衡方法。然而,这些技术可能会导致过拟合或信息损失。本研究探讨平衡方法对模型行为的影响。使用可解释人工智能工具来比较在数据集之前和之后训练的模型。除了变量重要性方法外,这种研究还使用分解方程和积累地方效果技术。使用真实和模拟的数据集测试,并开发了一个开源的Python包edgaro来实现这种分析。结果显示,平衡方法会导致模型的行为发生显著变化,这可能会导致模型偏向平衡分布。这些发现证实了平衡分析应该超越模型性能比较,以实现更高的机器学习模型可靠性。因此,我们提出了一种新的模型性能增加图表,用于了解平衡方法的改变对模型行为的度量和性能增加的关系。

Stitched ViTs are Flexible Vision Backbones

  • paper_url: http://arxiv.org/abs/2307.00154
  • repo_url: https://github.com/ziplab/sn-netv2
  • paper_authors: Zizheng Pan, Jing Liu, Haoyu He, Jianfei Cai, Bohan Zhuang
  • for: This paper aims to improve the efficiency of training and deployment of large pre-trained vision Transformers (ViTs) for downstream tasks.
  • methods: The authors propose a new framework called SN-Netv2, which stitches pre-trained model families to create a single model that supports diverse performance-efficiency trade-offs at runtime.
  • results: The authors achieve strong adaptation and efficiency in downstream tasks, including ImageNet-1K, ADE20K, COCO-Stuff-10K, NYUv2, and COCO-2017, with extensive experiments demonstrating the effectiveness of SN-Netv2.
    Abstract Large pretrained plain vision Transformers (ViTs) have been the workhorse for many downstream tasks. However, existing works utilizing off-the-shelf ViTs are inefficient in terms of training and deployment, because adopting ViTs with individual sizes requires separate training and is restricted by fixed performance-efficiency trade-offs. In this paper, we are inspired by stitchable neural networks, which is a new framework that cheaply produces a single model that covers rich subnetworks by stitching pretrained model families, supporting diverse performance-efficiency trade-offs at runtime. Building upon this foundation, we introduce SN-Netv2, a systematically improved model stitching framework to facilitate downstream task adaptation. Specifically, we first propose a Two-way stitching scheme to enlarge the stitching space. We then design a resource-constrained sampling strategy that takes into account the underlying FLOPs distributions in the space for improved sampling. Finally, we observe that learning stitching layers is a low-rank update, which plays an essential role on downstream tasks to stabilize training and ensure a good Pareto frontier. With extensive experiments on ImageNet-1K, ADE20K, COCO-Stuff-10K, NYUv2 and COCO-2017, SN-Netv2 demonstrates strong ability to serve as a flexible vision backbone, achieving great advantages in both training efficiency and adaptation. Code will be released at https://github.com/ziplab/SN-Netv2.
    摘要 大型预训练的平面视 transformer(ViT)已成为许多下游任务的工具马。然而,现有的使用批处理的ViT的工作是不高效的,因为采用不同的ViT大小需要单独的训练和固定的性能精度负担。在这篇论文中,我们受到折叠神经网络的启发,这是一种新的框架,可以便宜地生成一个单个模型,覆盖多样性的性能精度负担。基于这个基础,我们引入SN-Netv2,一个系统地改进的模型缝合框架,以便下游任务的适应。我们首先提出了两种缝合方案,以扩大缝合空间。然后,我们设计了考虑下面的FLOPs分布的资源限制 sampling策略。最后,我们发现学习缝合层是一个低级别的更新,在下游任务中很重要,以稳定训练和保证良好的Pareto前沿。通过对ImageNet-1K、ADE20K、COCO-Stuff-10K、NYUv2和COCO-2017进行了广泛的实验,SN-Netv2表现出了强大的灵活视觉基础,在训练效率和适应方面都取得了优异的成绩。代码将在https://github.com/ziplab/SN-Netv2上发布。

Hierarchical Neural Coding for Controllable CAD Model Generation

  • paper_url: http://arxiv.org/abs/2307.00149
  • repo_url: https://github.com/samxuxiang/hnc-cad
  • paper_authors: Xiang Xu, Pradeep Kumar Jayaraman, Joseph G. Lambourne, Karl D. D. Willis, Yasutaka Furukawa
  • for: 本研究开发了一种新型的计算机支持设计(CAD)生成模型,以实现高级设计概念的生成和调整。
  • methods: 本研究使用了一种基于三层嵌入的神经代码树,将设计概念分解为全局部件安排、中度曲线几何和小度特征三层次。另外,使用了一种实验新的vector quantized VAE,将设计变化捕捉为神经代码库。
  • results: 本研究的实验结果显示,这种生成模型在单独生成和增强设计交互等任务上具有优秀的性能,并且可以实现设计调整和完成。代码可以在https://github.com/samxuxiang/hnc-cad中找到。
    Abstract This paper presents a novel generative model for Computer Aided Design (CAD) that 1) represents high-level design concepts of a CAD model as a three-level hierarchical tree of neural codes, from global part arrangement down to local curve geometry; and 2) controls the generation or completion of CAD models by specifying the target design using a code tree. Concretely, a novel variant of a vector quantized VAE with "masked skip connection" extracts design variations as neural codebooks at three levels. Two-stage cascaded auto-regressive transformers learn to generate code trees from incomplete CAD models and then complete CAD models following the intended design. Extensive experiments demonstrate superior performance on conventional tasks such as random generation while enabling novel interaction capabilities on conditional generation tasks. The code is available at https://github.com/samxuxiang/hnc-cad.
    摘要 Specifically, the proposed model uses a novel variant of a vector quantized VAE with "masked skip connection" to extract design variations as neural codebooks at three levels. Two-stage cascaded auto-regressive transformers are then used to generate code trees from incomplete CAD models and complete the models following the intended design.The proposed model was evaluated through extensive experiments, which demonstrated superior performance on conventional tasks such as random generation and enabled novel interaction capabilities on conditional generation tasks. The code for the proposed model is available on GitHub at https://github.com/samxuxiang/hnc-cad.In simplified Chinese:这篇论文提出了一种新的计算机支持设计(CAD)的生成模型,该模型使用三级层次的神经代码表示高级设计概念,从全局部件布局下到本地曲线几何。模型允许设计或完成CAD模型,通过指定目标设计的代码树。具体来说,提出的模型使用了一种新的vectorQuantized VAE with "masked skip connection"抽取设计变化为神经代码库的三级层次。两个阶段嵌入式自然语言处理器学习代码树从不完整的CAD模型中生成代码树,然后完成CAD模型根据设计的意图。这篇论文通过了广泛的实验,证明了该模型在Random Generation和Conditional Generation任务上的超越性表现,同时允许新的交互能力。代码可以在GitHub上下载:https://github.com/samxuxiang/hnc-cad。

Abide by the Law and Follow the Flow: Conservation Laws for Gradient Flows

  • paper_url: http://arxiv.org/abs/2307.00144
  • repo_url: https://github.com/sibyllema/conservation_laws
  • paper_authors: Sibylle Marcotte, Rémi Gribonval, Gabriel Peyré
  • for: 这篇论文的目的是为了解释大型机器学习模型的最近成功,具体来说是研究梯度下降动力学的几何性质。
  • methods: 论文使用了梯度流动的方法,包括计算梯度的Jacobian和利达数学algebraic manipulate。
  • results: 论文发现了一些保守的量(conservation laws),这些量是在梯度下降动力学中保持不变的独立量,并且可以用来解释训练过程中模型的良好泛化性。论文还提供了计算这些量的算法,并在一些ReLU网络架构上进行了实验验证。
    Abstract Understanding the geometric properties of gradient descent dynamics is a key ingredient in deciphering the recent success of very large machine learning models. A striking observation is that trained over-parameterized models retain some properties of the optimization initialization. This "implicit bias" is believed to be responsible for some favorable properties of the trained models and could explain their good generalization properties. The purpose of this article is threefold. First, we rigorously expose the definition and basic properties of "conservation laws", which are maximal sets of independent quantities conserved during gradient flows of a given model (e.g. of a ReLU network with a given architecture) with any training data and any loss. Then we explain how to find the exact number of these quantities by performing finite-dimensional algebraic manipulations on the Lie algebra generated by the Jacobian of the model. Finally, we provide algorithms (implemented in SageMath) to: a) compute a family of polynomial laws; b) compute the number of (not necessarily polynomial) conservation laws. We provide showcase examples that we fully work out theoretically. Besides, applying the two algorithms confirms for a number of ReLU network architectures that all known laws are recovered by the algorithm, and that there are no other laws. Such computational tools pave the way to understanding desirable properties of optimization initialization in large machine learning models.
    摘要 The purpose of this article is threefold:1. We rigorously expose the definition and basic properties of "conservation laws", which are maximal sets of independent quantities conserved during gradient flows of a given model (e.g. of a ReLU network with a given architecture) with any training data and any loss.2. We explain how to find the exact number of these quantities by performing finite-dimensional algebraic manipulations on the Lie algebra generated by the Jacobian of the model.3. We provide algorithms (implemented in SageMath) to:a. Compute a family of polynomial laws.b. Compute the number of (not necessarily polynomial) conservation laws.We provide showcase examples that we fully work out theoretically. Besides, applying the two algorithms confirms for a number of ReLU network architectures that all known laws are recovered by the algorithm, and that there are no other laws. Such computational tools pave the way to understanding desirable properties of optimization initialization in large machine learning models.Translated into Simplified Chinese:理解梯度下降动力学的几何性质是大机器学习模型最近成功的关键因素之一。一个 striking observation 是训练过的过参数模型保留了优化初始化的一些性质。这种 "隐式偏见" 被认为是训练模型的一些有利属性的原因,并可能解释它们的泛化性能。这篇文章的目的是三重的:1. 我们严格把定了 "保守定律" 的定义和基本性质,即梯度流动中的一个给定模型(例如 ReLU 网络)与任何训练数据和任何损失的情况下,保留的独立量的最大集。2. 我们解释了如何使用 finite-dimensional 代数推导来计算这些量的确切数量。3. 我们提供了两个算法( implemented in SageMath):a. 计算一家 polynomial 定律。b. 计算不必然 polynomial 定律的数量。我们提供了一些示例,并完全理论上处理这些示例。此外,通过应用这两个算法,我们发现了一些 ReLU 网络架构中的所有知道的法律都是由算法生成的,而没有其他法律。这些计算工具可以帮助我们理解大机器学习模型优化初始化的愉悦性质。Translated into Traditional Chinese:理解梯度下降动力学的几何性质是大机器学习模型最近成功的关键因素之一。一个 striking observation 是训练过的过参数模型保留了优化初始化的一些性质。这种 "隐式偏见" 被认为是训练模型的一些有利属性的原因,并可能解释它们的泛化性能。这篇文章的目的是三重的:1. 我们严格把定了 "保守定律" 的定义和基本性质,即梯度流动中的一个给定模型(例如 ReLU 网络)与任何训练数据和任何损失的情况下,保留的独立量的最大集。2. 我们解释了如何使用 finite-dimensional 代数推导来计算这些量的确切数量。3. 我们提供了两个算法( implemented in SageMath):a. 计算一家 polynomial 定律。b. 计算不必然 polynomial 定律的数量。我们提供了一些示例,并完全理论上处理这些示例。此外,通过应用这两个算法,我们发现了一些 ReLU 网络架构中的所有知道的法律都是由算法生成的,而没有其他法律。这些计算工具可以帮助我们理解大机器学习模型优化初始化的愉悂性质。

BuildingsBench: A Large-Scale Dataset of 900K Buildings and Benchmark for Short-Term Load Forecasting

  • paper_url: http://arxiv.org/abs/2307.00142
  • repo_url: https://github.com/nrel/buildingsbench
  • paper_authors: Patrick Emami, Abhijeet Sahu, Peter Graf
  • for: 短期预测住宅和商业建筑能源消耗,以便在电力系统中进行规划和管理。
  • methods: 使用数据驱动的短期负荷预测(STLF)技术,并对buildingsbench dataset进行评估和比较。
  • results: 研究发现,使用生成的synthetically pretrained模型可以在真实的商业建筑中进行良好的预测,而不需要进行细致的调整。同时,对于大多数目标建筑,进行 fine-tuning可以提高性能。
    Abstract Short-term forecasting of residential and commercial building energy consumption is widely used in power systems and continues to grow in importance. Data-driven short-term load forecasting (STLF), although promising, has suffered from a lack of open, large-scale datasets with high building diversity. This has hindered exploring the pretrain-then-finetune paradigm for STLF. To help address this, we present BuildingsBench, which consists of 1) Buildings-900K, a large-scale dataset of 900K simulated buildings representing the U.S. building stock, and 2) an evaluation platform with over 1,900 real residential and commercial buildings from 7 open datasets. BuildingsBench benchmarks two under-explored tasks: zero-shot STLF, where a pretrained model is evaluated on unseen buildings without fine-tuning, and transfer learning, where a pretrained model is fine-tuned on a target building. The main finding of our benchmark analysis is that synthetically pretrained models generalize surprisingly well to real commercial buildings. An exploration of the effect of increasing dataset size and diversity on zero-shot commercial building performance reveals a power-law with diminishing returns. We also show that fine-tuning pretrained models on real commercial and residential buildings improves performance for a majority of target buildings. We hope that BuildingsBench encourages and facilitates future research on generalizable STLF. All datasets and code can be accessed from \url{https://github.com/NREL/BuildingsBench}.
    摘要 短期预测住宅和商业建筑能源消耗广泛应用于电力系统,并且不断增长在重要性。数据驱动短期负荷预测(STLF),尽管搭配承诺,却受到开放大规模数据集的缺乏,高建筑多样性限制了explore预训练然后finetune的 paradigma。为了解决这个问题,我们提出了BuildingsBench,它包括以下两个组成部分:1. Buildings-900K,一个大规模的数据集,包含900K个模拟的建筑物,表示美国建筑资产。2.一个评估平台,包括7个开放数据集中的1,900个实际的住宅和商业建筑。BuildingsBench对两个未经审查的任务进行了评估:零shot STLF和传输学习。我们的主要发现是,通过synthetically预训练模型,对实际的商业建筑进行预测 surprisingly well。我们还发现,增加数据集的大小和多样性对零shot商业建筑性能的影响按照一个减少 returns的power-law。此外,我们还发现,对实际的商业和住宅建筑进行finetune预训练后,大多数目标建筑的性能得到了改进。我们希望,BuildingsBench可以鼓励和促进未来对通用 STLF 的研究。所有数据集和代码可以通过 \url{https://github.com/NREL/BuildingsBench} 访问。

Risk-sensitive Actor-free Policy via Convex Optimization

  • paper_url: http://arxiv.org/abs/2307.00141
  • repo_url: None
  • paper_authors: Ruoqi Zhang, Jens Sjölund
  • for: 这个论文旨在提出一种不考虑安全性的传统强化学习方法,并提出一种基于Conditional Value at Risk(CVaR)的风险敏感目标函数,以优化智能代理人。
  • methods: 该论文提出的方法是基于输入几何函数模型的风险敏感目标函数,该函数 Ensure convexity with respect to actions,使得通过简单的梯度追踪方法可以快速地确定全局优化的行动。
  • results: 实验结果表明,该方法可以有效地维护风险控制。
    Abstract Traditional reinforcement learning methods optimize agents without considering safety, potentially resulting in unintended consequences. In this paper, we propose an optimal actor-free policy that optimizes a risk-sensitive criterion based on the conditional value at risk. The risk-sensitive objective function is modeled using an input-convex neural network ensuring convexity with respect to the actions and enabling the identification of globally optimal actions through simple gradient-following methods. Experimental results demonstrate the efficacy of our approach in maintaining effective risk control.
    摘要 传统的回归学习方法不考虑安全性,可能导致不良后果。在这篇论文中,我们提出了一种无actor的优化策略,该策略基于条件风险值来优化一个风险敏感的目标函数。风险敏感目标函数使用输入凸神经网络模型,使得行动的 convexity guarantees globally optimal actions can be identified through simple gradient-following methods。实验结果表明我们的方法可以够有效地控制风险。Note that the translation is in Simplified Chinese, which is the standard writing system used in mainland China. If you need the translation in Traditional Chinese, please let me know.

Generalization Limits of Graph Neural Networks in Identity Effects Learning

  • paper_url: http://arxiv.org/abs/2307.00134
  • repo_url: https://github.com/aledinve/gnn_identity_effects
  • paper_authors: Giuseppe Alessio D’Inverno, Simone Brugiapaglia, Mirco Ravanelli
  • for: 本研究探讨了图生成模型(GNNs)在复杂图域上进行数据驱动学习的能力。
  • methods: 本研究使用了消息传递机制,并对GNNs的普通化性和基本限制进行了分析。
  • results: 研究发现,使用抽象编码(如一颗一热图)可以使GNNs在识别字符串是否相同的任务中表现不佳,而使用WL测试可以提供正面的存在结果。
    Abstract Graph Neural Networks (GNNs) have emerged as a powerful tool for data-driven learning on various graph domains. They are usually based on a message-passing mechanism and have gained increasing popularity for their intuitive formulation, which is closely linked to the Weisfeiler-Lehman (WL) test for graph isomorphism to which they have been proven equivalent in terms of expressive power. In this work, we establish new generalization properties and fundamental limits of GNNs in the context of learning so-called identity effects, i.e., the task of determining whether an object is composed of two identical components or not. Our study is motivated by the need to understand the capabilities of GNNs when performing simple cognitive tasks, with potential applications in computational linguistics and chemistry. We analyze two case studies: (i) two-letters words, for which we show that GNNs trained via stochastic gradient descent are unable to generalize to unseen letters when utilizing orthogonal encodings like one-hot representations; (ii) dicyclic graphs, i.e., graphs composed of two cycles, for which we present positive existence results leveraging the connection between GNNs and the WL test. Our theoretical analysis is supported by an extensive numerical study.
    摘要 Graph Neural Networks (GNNs) 已经成为数据驱动学习在各种图形域的 poderoso工具。它们通常基于一个消息传递机制,并在表达力方面取得了广泛的应用。在这篇文章中,我们建立了新的普适性质和基本限制,以及在学习“标识效应”(即判断对象是否由两个相同的组件组成)方面的GNNs的潜在应用。我们的研究受到了计算语言学和化学等领域的应用需求的启发。我们分析了两个案例:(i)两个字母词,我们表明在使用 ortogonal 编码时,GNNs 通过权重梯度下降训练无法泛化到未见字母;(ii)异环图,即两个环组成的图,我们提供了正的存在结果,利用 GNNs 和 WL 测试之间的连接。我们的理论分析得到了广泛的数字实验支持。

Machine learning for advancing low-temperature plasma modeling and simulation

  • paper_url: http://arxiv.org/abs/2307.00131
  • repo_url: None
  • paper_authors: Jan Trieschmann, Luca Vialetto, Tobias Gergs
  • for: 本文主要用于介绍低温气体模拟和计算的最新进展,特别是通过机器学习和数据驱动模型的应用。
  • methods: 本文使用了许多现有的机器学习算法和方法,包括逻辑回归、支持向量机和深度学习等,以及数据驱动模型的应用。
  • results: 本文通过对 Literature 中的许多例子进行分析和评论,展示了机器学习和数据驱动模型在低温气体模拟和计算中的广泛应用和潜在advances。
    Abstract Machine learning has had an enormous impact in many scientific disciplines. Also in the field of low-temperature plasma modeling and simulation it has attracted significant interest within the past years. Whereas its application should be carefully assessed in general, many aspects of plasma modeling and simulation have benefited substantially from recent developments within the field of machine learning and data-driven modeling. In this survey, we approach two main objectives: (a) We review the state-of-the-art focusing on approaches to low-temperature plasma modeling and simulation. By dividing our survey into plasma physics, plasma chemistry, plasma-surface interactions, and plasma process control, we aim to extensively discuss relevant examples from literature. (b) We provide a perspective of potential advances to plasma science and technology. We specifically elaborate on advances possibly enabled by adaptation from other scientific disciplines. We argue that not only the known unknowns, but also unknown unknowns may be discovered due to an inherent propensity to spotlight hidden patterns in data.
    摘要
  1. Review the state-of-the-art in low-temperature plasma modeling and simulation, focusing on various approaches and their applications in plasma physics, plasma chemistry, plasma-surface interactions, and plasma process control.2. Provide a perspective on potential advances to plasma science and technology, including those that may be enabled by adapting techniques from other scientific disciplines. We argue that not only the known unknowns, but also unknown unknowns may be discovered due to the inherent ability of machine learning to spotlight hidden patterns in data.

Accelerating Inexact HyperGradient Descent for Bilevel Optimization

  • paper_url: http://arxiv.org/abs/2307.00126
  • repo_url: None
  • paper_authors: Haikuo Yang, Luo Luo, Chris Junchi Li, Michael I. Jordan
  • for: 这个论文是解决一般非凸-强 convex 双层优化问题的方法。
  • methods: 这个方法是 \emph{Restarted Accelerated HyperGradient Descent} (\texttt{RAHGD}) 方法,它可以在 $\mathcal{O}(\kappa^{3.25}\epsilon^{-1.75})$ oracle复杂度内找到一个 $\epsilon$-first-order站点点。
  • results: 这个方法可以在非凸-强 convex 双层优化问题中找到一个 $\big(\epsilon,\mathcal{O}(\kappa^{2.5}\sqrt{\epsilon},)\big)$-second-order站点点,并且超越了现有的Upper bound for finding second-order stationary points in nonconvex-strongly-concave minimax optimization problems,设置了新的状态纪录。empirical studies are conducted to validate the theoretical results in this paper.
    Abstract We present a method for solving general nonconvex-strongly-convex bilevel optimization problems. Our method -- the \emph{Restarted Accelerated HyperGradient Descent} (\texttt{RAHGD}) method -- finds an $\epsilon$-first-order stationary point of the objective with $\tilde{\mathcal{O}(\kappa^{3.25}\epsilon^{-1.75})$ oracle complexity, where $\kappa$ is the condition number of the lower-level objective and $\epsilon$ is the desired accuracy. We also propose a perturbed variant of \texttt{RAHGD} for finding an $\big(\epsilon,\mathcal{O}(\kappa^{2.5}\sqrt{\epsilon}\,)\big)$-second-order stationary point within the same order of oracle complexity. Our results achieve the best-known theoretical guarantees for finding stationary points in bilevel optimization and also improve upon the existing upper complexity bound for finding second-order stationary points in nonconvex-strongly-concave minimax optimization problems, setting a new state-of-the-art benchmark. Empirical studies are conducted to validate the theoretical results in this paper.
    摘要 我们提出了一种解决一般非 convex-强 convex 二级优化问题的方法。我们的方法 -- Restarted Accelerated HyperGradient Descent(\texttt{RAHGD})方法 -- 可以在 $\epsilon$ 精度下找到一个 $\epsilon$-第一阶点的目标函数,其复杂度为 $\mathcal{O}(\kappa^{3.25}\epsilon^{-1.75})$,其中 $\kappa$ 是下级目标函数的condition number,$\epsilon$ 是所需的精度。我们还提出了一种受损变体的 \texttt{RAHGD},可以在同样的 oracle complexity 下找到一个 $\big(\epsilon,\mathcal{O}(\kappa^{2.5}\sqrt{\epsilon}\,)\big)$-第二阶点。我们的结果达到了带优化问题中的最佳理论保证,并且超越了现有的非 convex-强 convex 最小最大化问题中的上界复杂度,设置了新的状态纪录。我们进行了实验研究,以验证这些理论结果。

RObotic MAnipulation Network (ROMAN) $\unicode{x2013}$ Hybrid Hierarchical Learning for Solving Complex Sequential Tasks

  • paper_url: http://arxiv.org/abs/2307.00125
  • repo_url: None
  • paper_authors: Eleftherios Triantafyllidis, Fernando Acero, Zhaocheng Liu, Zhibin Li
  • for: 解决机器人 manipulate 多个复杂任务的长时间 horizon 问题
  • methods: 结合行为做样、学习做样、奖励学习
  • results: 实验结果表明,ROMAN 可以 correctly 生成长序列 manipulate 任务的正确动作,并且 exhibiting 鲁棒性于各种感知噪音。这些结果表明 ROMAN 具有自适应能力和自主维护功能,并且可以应用于各种自主 manipulate 任务。
    Abstract Solving long sequential tasks poses a significant challenge in embodied artificial intelligence. Enabling a robotic system to perform diverse sequential tasks with a broad range of manipulation skills is an active area of research. In this work, we present a Hybrid Hierarchical Learning framework, the Robotic Manipulation Network (ROMAN), to address the challenge of solving multiple complex tasks over long time horizons in robotic manipulation. ROMAN achieves task versatility and robust failure recovery by integrating behavioural cloning, imitation learning, and reinforcement learning. It consists of a central manipulation network that coordinates an ensemble of various neural networks, each specialising in distinct re-combinable sub-tasks to generate their correct in-sequence actions for solving complex long-horizon manipulation tasks. Experimental results show that by orchestrating and activating these specialised manipulation experts, ROMAN generates correct sequential activations for accomplishing long sequences of sophisticated manipulation tasks and achieving adaptive behaviours beyond demonstrations, while exhibiting robustness to various sensory noises. These results demonstrate the significance and versatility of ROMAN's dynamic adaptability featuring autonomous failure recovery capabilities, and highlight its potential for various autonomous manipulation tasks that demand adaptive motor skills.
    摘要 Translated into Simplified Chinese:解决长序列任务是人工智能体内的一大挑战。允许机器人系统执行多种多样化的序列任务,并且具有广泛的抓取技巧,是一个活跃的研究领域。在这项工作中,我们提出了一种混合层次学习框架,即机器人抓取网络(ROMAN),以解决机器人抓取中的多个复杂任务长时间内的挑战。ROMAN通过结合行为模仿学习、模仿学习和奖励学习来实现任务多样性和强健的失败恢复。它包括一个中央抓取网络,协调多个各特性的神经网络,每个神经网络专门为不同的可重复子任务提供正确的顺序动作,以解决复杂的长时间任务。实验结果表明,通过协调和激活这些特殊抓取专家,ROMAN可以成功完成长序列的复杂抓取任务,并达到超出示例的适应行为,同时展示对各种感知噪声的Robustness。这些结果表明ROMAN的动态适应性和自动失败恢复能力的重要性,并高亮它的适用于多种自主抓取任务,需要适应动作技巧。

How Do Human Users Teach a Continual Learning Robot in Repeated Interactions?

  • paper_url: http://arxiv.org/abs/2307.00123
  • repo_url: https://github.com/aliayub7/cl_hri
  • paper_authors: Ali Ayub, Jainish Mehta, Zachary De Francesco, Patrick Holthaus, Kerstin Dautenhahn, Chrystopher L. Nehaniv
  • for: 本研究旨在探讨人类在长期互动中教育可持续学习机器人的方式,以及这些教学方法是否具有个体化的特点。
  • methods: 本研究使用了两种不同的可持续学习模型,在Fetch移动抓取机器人上进行测试。研究采用了详细的质量分析和量化分析方法,探讨参与者的教学风格是否存在差异,以及这些差异对机器人的性能是否有影响。
  • results: 研究发现,参与者的教学风格之间存在显著的差异, indicating the need for personalized adaptation to their distinct teaching styles。此外,研究还发现,虽然专家和非专家用户之间有教学风格的差异,但这种差异不会影响机器人的性能。最后,研究发现,常用的 continual learning 技术测试设置不够,因为实际的用户在教育和教学机器人时会采用多种方法。
    Abstract Continual learning (CL) has emerged as an important avenue of research in recent years, at the intersection of Machine Learning (ML) and Human-Robot Interaction (HRI), to allow robots to continually learn in their environments over long-term interactions with humans. Most research in continual learning, however, has been robot-centered to develop continual learning algorithms that can quickly learn new information on static datasets. In this paper, we take a human-centered approach to continual learning, to understand how humans teach continual learning robots over the long term and if there are variations in their teaching styles. We conducted an in-person study with 40 participants that interacted with a continual learning robot in 200 sessions. In this between-participant study, we used two different CL models deployed on a Fetch mobile manipulator robot. An extensive qualitative and quantitative analysis of the data collected in the study shows that there is significant variation among the teaching styles of individual users indicating the need for personalized adaptation to their distinct teaching styles. The results also show that although there is a difference in the teaching styles between expert and non-expert users, the style does not have an effect on the performance of the continual learning robot. Finally, our analysis shows that the constrained experimental setups that have been widely used to test most continual learning techniques are not adequate, as real users interact with and teach continual learning robots in a variety of ways. Our code is available at https://github.com/aliayub7/cl_hri.
    摘要

Goal Representations for Instruction Following: A Semi-Supervised Language Interface to Control

  • paper_url: http://arxiv.org/abs/2307.00117
  • repo_url: https://github.com/rail-berkeley/grif_release
  • paper_authors: Vivek Myers, Andre He, Kuan Fang, Homer Walke, Philippe Hansen-Estruch, Ching-An Cheng, Mihai Jalobeanu, Andrey Kolobov, Anca Dragan, Sergey Levine
  • for: 这个论文的目的是为了让机器人按照自然语言指令进行操作,但获取大量标注数据(即包含任务的语言指令)是不可能的。因此,本论文提出了一种方法,利用只需要一小amount of语言数据来获得 JOINT image-和目标条件的策略。
  • methods: 本论文使用了图像目标和语言指令之间的匹配来实现这种方法。具体来说,它首先学习一个映射从标注数据中,将语言指令与图像目标之间的对应关系学习到一个 embedding 空间中。然后,它将这个 embedding 空间用于训练一个策略,这个策略可以利用所有的无标注数据进行训练,同时受益于 embedding 的匹配关系,以便通过语言指令来控制策略。
  • results: 本论文的实验结果表明,这种方法可以在实际世界中实现 robust 的任务完成。具体来说,它可以在不同的搬运任务中,在不同的场景中,以及使用不同的语言指令中,完成任务。此外,这种方法还可以在语言指令外的数据上进行扩展。视频和代码可以在https://rail-berkeley.github.io/grif/ 上找到。
    Abstract Our goal is for robots to follow natural language instructions like "put the towel next to the microwave." But getting large amounts of labeled data, i.e. data that contains demonstrations of tasks labeled with the language instruction, is prohibitive. In contrast, obtaining policies that respond to image goals is much easier, because any autonomous trial or demonstration can be labeled in hindsight with its final state as the goal. In this work, we contribute a method that taps into joint image- and goal- conditioned policies with language using only a small amount of language data. Prior work has made progress on this using vision-language models or by jointly training language-goal-conditioned policies, but so far neither method has scaled effectively to real-world robot tasks without significant human annotation. Our method achieves robust performance in the real world by learning an embedding from the labeled data that aligns language not to the goal image, but rather to the desired change between the start and goal images that the instruction corresponds to. We then train a policy on this embedding: the policy benefits from all the unlabeled data, but the aligned embedding provides an interface for language to steer the policy. We show instruction following across a variety of manipulation tasks in different scenes, with generalization to language instructions outside of the labeled data. Videos and code for our approach can be found on our website: https://rail-berkeley.github.io/grif/ .
    摘要 我们的目标是让机器人按照自然语言指令进行操作,如“把毯子放在微风炉旁”。但获取大量标注数据,即包含任务的示例并与语言指令相关的标签,是不可能的。相比之下,获取基于图像目标的策略是更加容易的,因为任何自主试验或示例都可以在后看的情况下被标注为其最终状态作为目标。在这项工作中,我们提供一种方法,可以通过只使用少量语言数据来使用图像和目标联合条件下的策略。先前的工作已经在这个领域做出了进步,使用视觉语言模型或同时培养语言目标条件下的策略,但是到目前为止,没有任何方法能够在实际世界中有效地执行机器人任务,不需要人类的批注。我们的方法可以在实际世界中达到稳定性,通过从标注数据中学习一个对应语言和目标之间的嵌入,使用这个嵌入来训练策略。我们的策略可以利用所有没有标注的数据,但是嵌入的对接使得语言可以控制策略。我们在不同的搬运任务中展示了 instrucion 遵循,并且在语言指令外部的数据上进行泛化。视频和代码可以在我们的网站上找到:https://rail-berkeley.github.io/grif/。

Ticket-BERT: Labeling Incident Management Tickets with Language Models

  • paper_url: http://arxiv.org/abs/2307.00108
  • repo_url: None
  • paper_authors: Zhexiong Liu, Cris Benge, Siduo Jiang
  • for: 高效地归类incident ticket для解决
  • methods: 使用我们提出的票据集合并Ticket-BERT模型进行标签
  • results: 对Azure认知服务进行实验,表明Ticket-BERT超过基线和当前文本分类器的性能,并且通过活动学习循环和Microsoft IcM系统的部署,快速适应新收集的票据。
    Abstract An essential aspect of prioritizing incident tickets for resolution is efficiently labeling tickets with fine-grained categories. However, ticket data is often complex and poses several unique challenges for modern machine learning methods: (1) tickets are created and updated either by machines with pre-defined algorithms or by engineers with domain expertise that share different protocols, (2) tickets receive frequent revisions that update ticket status by modifying all or parts of ticket descriptions, and (3) ticket labeling is time-sensitive and requires knowledge updates and new labels per the rapid software and hardware improvement lifecycle. To handle these issues, we introduce Ticket- BERT which trains a simple yet robust language model for labeling tickets using our proposed ticket datasets. Experiments demonstrate the superiority of Ticket-BERT over baselines and state-of-the-art text classifiers on Azure Cognitive Services. We further encapsulate Ticket-BERT with an active learning cycle and deploy it on the Microsoft IcM system, which enables the model to quickly finetune on newly-collected tickets with a few annotations.
    摘要 “优先Handle incident tickets的一个重要方面是精确地分类tickets。然而,tickets的数据往往复杂,带来多个现代机器学习方法的挑战:(1)tickets是由机器或专家工程师透过预先定义的 алгоритми或专家知识所创建的,(2)tickets会频繁地更新,将tickets的状态更新为全部或部分的描述,(3)tickets的分类是时间敏感的,需要不断更新知识和新的标签,以应对软件和硬件的持续改进周期。”“为解决这些问题,我们介绍了Ticket-BERT,一个训练了简单 yet robust的语言模型,用于分类tickets。实验结果显示Ticket-BERT在Azure cognitive Services上比基eline和现有文本分类器更有优势。我们还将Ticket-BERT与活动学习周期结合,并将其部署到Microsoft IcM系统上,这使得模型可以快速地调整新收集的tickets,只需要少量的标注。”

Distance Functions and Normalization Under Stream Scenarios

  • paper_url: http://arxiv.org/abs/2307.00106
  • repo_url: None
  • paper_authors: Eduardo V. L. Barboza, Paulo R. Lisboa de Almeida, Alceu de Souza Britto Jr, Rafael M. O. Cruz
  • for: 本研究旨在 investigate 数据流模型中的数据Normalization问题,以及不同距离函数在数据流中的表现。
  • methods: 本研究使用了 eight 种常见距离函数,包括不归一化、基于首批数据的归一化和基于前一批数据的归一化。
  • results: 结果显示,不归一化和Canberra距离函数组合可以在无知数据流情况下达到良好的表现。
    Abstract Data normalization is an essential task when modeling a classification system. When dealing with data streams, data normalization becomes especially challenging since we may not know in advance the properties of the features, such as their minimum/maximum values, and these properties may change over time. We compare the accuracies generated by eight well-known distance functions in data streams without normalization, normalized considering the statistics of the first batch of data received, and considering the previous batch received. We argue that experimental protocols for streams that consider the full stream as normalized are unrealistic and can lead to biased and poor results. Our results indicate that using the original data stream without applying normalization, and the Canberra distance, can be a good combination when no information about the data stream is known beforehand.
    摘要 <>传感器数据归一化是模型分类系统的必要任务。在处理数据流时,归一化变得特别困难,因为我们可能不知道特征的属性,如最小值和最大值,而这些属性可能会随着时间变化。我们比较了8种公知的距离函数在数据流中的准确率,不归一化、根据首批数据的统计归一化和考虑前一批数据的归一化。我们认为启用全流程归一化的实验协议是不实际的,可能会导致偏导和Results的欠佳。我们的结果表明,不应用归一化,和Canberra距离可以是一个好的组合,当没有任何信息关于数据流是知道的时候。Note: The word "归一化" (guīyìhuà) in Simplified Chinese is used to refer to the process of normalization, and it is a literal translation of the English word "normalization".

Obscured Wildfire Flame Detection By Temporal Analysis of Smoke Patterns Captured by Unmanned Aerial Systems

  • paper_url: http://arxiv.org/abs/2307.00104
  • repo_url: None
  • paper_authors: Uma Meleti, Abolfazl Razi
  • for: 本研究论文目标是实时检测受遮盲森林火灾(火焰被树木、烟雾、云层等自然障碍物遮盖),使用RGB摄像头 equipped 无人机。
  • methods: 我们提出了一种新的方法,利用semantic segmentation,基于视频序列中烟团特征的时间分析。我们的方法使用卷积神经网络架构,包括预训练的CNNEncoder和3D卷积来解码,同时使用序列堆叠特征来利用时间变化。
  • results: 我们对一个精心制作的数据集进行了测试,该数据集包括RGB视频和IR视频,以确定真实的地面 truth。我们的方法在测试数据上达到了85.88%的Dice分数,同时达到了92.47%的准确率和90.67%的分类率。与其他方法相比,我们的方法在视频级别的火灾分类中表现出色,使用MobileNet+CBAM作为encoder backbone时达到了约100%的准确率。
    Abstract This research paper addresses the challenge of detecting obscured wildfires (when the fire flames are covered by trees, smoke, clouds, and other natural barriers) in real-time using drones equipped only with RGB cameras. We propose a novel methodology that employs semantic segmentation based on the temporal analysis of smoke patterns in video sequences. Our approach utilizes an encoder-decoder architecture based on deep convolutional neural network architecture with a pre-trained CNN encoder and 3D convolutions for decoding while using sequential stacking of features to exploit temporal variations. The predicted fire locations can assist drones in effectively combating forest fires and pinpoint fire retardant chemical drop on exact flame locations. We applied our method to a curated dataset derived from the FLAME2 dataset that includes RGB video along with IR video to determine the ground truth. Our proposed method has a unique property of detecting obscured fire and achieves a Dice score of 85.88%, while achieving a high precision of 92.47% and classification accuracy of 90.67% on test data showing promising results when inspected visually. Indeed, our method outperforms other methods by a significant margin in terms of video-level fire classification as we obtained about 100% accuracy using MobileNet+CBAM as the encoder backbone.
    摘要

Redeeming Data Science by Decision Modelling

  • paper_url: http://arxiv.org/abs/2307.00088
  • repo_url: None
  • paper_authors: John Mark Agosta, Robert Horton
  • for: 本文提出了一个新的应用研究计划,以防止数据科学领域的范围过于扩大。
  • methods: 本文使用了 bayesian 方法和 AI 技术来构建决策图模型,并提出了 six 个决策质量原则。
  • results: 本文指出,任何成功的应用机器学习模型均需满足这六个决策质量原则。 Plus, the article also shows an example of how to integrate a model’s ROC curve with a utility model using Decision Modelling.
    Abstract With the explosion of applications of Data Science, the field is has come loose from its foundations. This article argues for a new program of applied research in areas familiar to researchers in Bayesian methods in AI that are needed to ground the practice of Data Science by borrowing from AI techniques for model formulation that we term ``Decision Modelling.'' This article briefly reviews the formulation process as building a causal graphical model, then discusses the process in terms of six principles that comprise \emph{Decision Quality}, a framework from the popular business literature. We claim that any successful applied ML modelling effort must include these six principles. We explain how Decision Modelling combines a conventional machine learning model with an explicit value model. To give a specific example we show how this is done by integrating a model's ROC curve with a utility model.
    摘要 The article briefly reviews the formulation process as building a causal graphical model, and then discusses the process in terms of six principles that comprise "Decision Quality," a framework from the popular business literature. These six principles are:1. Causality: Understanding the causal relationships between variables.2. Contextuality: Taking into account the specific context in which decisions are made.3. Feedback: Using feedback to refine and improve decision-making processes.4. Flexibility: Being able to adapt to changing circumstances and learn from experience.5. Interpretability: Understanding the reasoning behind the decisions made by the model.6. Robustness: Ensuring that the model is robust and can handle unexpected events and outliers.The authors claim that any successful applied machine learning modelling effort must include these six principles. They explain how Decision Modelling combines a conventional machine learning model with an explicit value model, and provide a specific example of how this is done by integrating a model's ROC curve with a utility model.

Inter-case Predictive Process Monitoring: A candidate for Quantum Machine Learning?

  • paper_url: http://arxiv.org/abs/2307.00080
  • repo_url: None
  • paper_authors: Stefan Hill, David Fitzek, Patrick Delfmann, Carl Corea
  • for: 本研究旨在提高预测过程实例未来行为的精度,特别是当多个实例交互时。
  • methods: 本研究基于最新的机器学习研究进展,提出了自动预测过程实例的下一个活动、结果或剩余时间的方法。研究涉及提取事件日志数据中有用的特征以及捕捉数据中复杂的模式的问题。
  • results: 研究发现,在真实世界的训练数据上,包含间 случа的特征可以提高预测精度 более四 percent,而量子机器学习模型在一些特征配置下实际上是竞争对手。然而,由于量子硬件处于初期阶段,本研究对runtime、噪音和过拟合问题进行了批判性评估。
    Abstract Regardless of the domain, forecasting the future behaviour of a running process instance is a question of interest for decision makers, especially when multiple instances interact. Fostered by the recent advances in machine learning research, several methods have been proposed to predict the next activity, outcome or remaining time of a process automatically. Still, building a model with high predictive power requires both - intrinsic knowledge of how to extract meaningful features from the event log data and a model that captures complex patterns in data. This work builds upon the recent progress in inter-case Predictive Process Monitoring (PPM) and comprehensively benchmarks the impact of inter-case features on prediction accuracy. Moreover, it includes quantum machine learning models, which are expected to provide an advantage over classical models with a scaling amount of feature dimensions. The evaluation on real-world training data from the BPI challenge shows that the inter-case features provide a significant boost by more than four percent in accuracy and quantum algorithms are indeed competitive in a handful of feature configurations. Yet, as quantum hardware is still in its early stages of development, this paper critically discusses these findings in the light of runtime, noise and the risk to overfit on the training data. Finally, the implementation of an open-source plugin demonstrates the technical feasibility to connect a state-of-the-art workflow engine such as Camunda to an IBM quantum computing cloud service.
    摘要 This work builds upon recent progress in inter-case Predictive Process Monitoring (PPM) and comprehensively benchmarks the impact of inter-case features on prediction accuracy. Moreover, it includes quantum machine learning models, which are expected to provide an advantage over classical models with a scaling amount of feature dimensions.The evaluation on real-world training data from the BPI challenge shows that inter-case features provide a significant boost of more than four percent in accuracy, and quantum algorithms are indeed competitive in a handful of feature configurations. However, as quantum hardware is still in its early stages of development, this paper critically discusses these findings in the light of runtime, noise, and the risk of overfitting on the training data.Finally, the implementation of an open-source plugin demonstrates the technical feasibility to connect a state-of-the-art workflow engine such as Camunda to an IBM quantum computing cloud service.

Dataset balancing can hurt model performance

  • paper_url: http://arxiv.org/abs/2307.00079
  • repo_url: None
  • paper_authors: R. Channing Moore, Daniel P. W. Ellis, Eduardo Fonseca, Shawn Hershey, Aren Jansen, Manoj Plakal
  • for: 提高AudioSet dataset上的批处理性能,尤其是在罕见类样本上。
  • methods: 使用dataset balancing技术来提高性能。
  • results: balancing技术可以提高公共评估数据上的性能,但同时会降低在未公布的评估数据上的性能。balancing并不能确保罕见类性能的提高,nor does it improve rare class performance relative to common classes。
    Abstract Machine learning from training data with a skewed distribution of examples per class can lead to models that favor performance on common classes at the expense of performance on rare ones. AudioSet has a very wide range of priors over its 527 sound event classes. Classification performance on AudioSet is usually evaluated by a simple average over per-class metrics, meaning that performance on rare classes is equal in importance to the performance on common ones. Several recent papers have used dataset balancing techniques to improve performance on AudioSet. We find, however, that while balancing improves performance on the public AudioSet evaluation data it simultaneously hurts performance on an unpublished evaluation set collected under the same conditions. By varying the degree of balancing, we show that its benefits are fragile and depend on the evaluation set. We also do not find evidence indicating that balancing improves rare class performance relative to common classes. We therefore caution against blind application of balancing, as well as against paying too much attention to small improvements on a public evaluation set.
    摘要

Transformers in Healthcare: A Survey

  • paper_url: http://arxiv.org/abs/2307.00067
  • repo_url: None
  • paper_authors: Subhash Nerella, Sabyasachi Bandyopadhyay, Jiaqing Zhang, Miguel Contreras, Scott Siegel, Aysegul Bumin, Brandon Silva, Jessica Sena, Benjamin Shickel, Azra Bihorac, Kia Khezeli, Parisa Rashidi
  • for: 该论文主要为健康领域的人工智能应用提供了一种概述。
  • methods: 该论文使用了Transformers神经网络架构,对医疗数据、结构化和无结构化电子医疗记录、社交媒体、生物物理信号和蛋白Sequences进行分析。
  • results: 该论文总结了使用Transformers神经网络在医疗领域的应用,包括临床诊断、报告生成、数据重构和药物/蛋白Synthesis。同时也讨论了使用Transformers的好处和缺点,以及计算成本、模型解释性、公平性、对人类价值观Alignment和伦理问题的影响。
    Abstract With Artificial Intelligence (AI) increasingly permeating various aspects of society, including healthcare, the adoption of the Transformers neural network architecture is rapidly changing many applications. Transformer is a type of deep learning architecture initially developed to solve general-purpose Natural Language Processing (NLP) tasks and has subsequently been adapted in many fields, including healthcare. In this survey paper, we provide an overview of how this architecture has been adopted to analyze various forms of data, including medical imaging, structured and unstructured Electronic Health Records (EHR), social media, physiological signals, and biomolecular sequences. Those models could help in clinical diagnosis, report generation, data reconstruction, and drug/protein synthesis. We identified relevant studies using the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. We also discuss the benefits and limitations of using transformers in healthcare and examine issues such as computational cost, model interpretability, fairness, alignment with human values, ethical implications, and environmental impact.
    摘要 With 人工智能(AI)逐渐渗透到社会各个方面,包括医疗健康领域,Transformers神经网络架构的采用速度快速地改变了许多应用程序。Transformer是一种深度学习架构,最初是用于解决通用自然语言处理(NLP)任务,后来在许多领域中得到了应用,包括医疗健康领域。在这篇评论文中,我们提供了Transformers神经网络架构在分析不同类型数据的概述,包括医疗影像、结构化和无结构化电子医疗纪录(EHR)、社交媒体、生理参数和蛋白质序列。这些模型可以帮助临床诊断、报告生成、数据重建和药物/蛋白合成。我们根据Preferred Reporting Items for Systematic Reviews and Meta-Analyses(PRISMA)指南进行了相关的研究选择。我们还讨论了使用Transformers在医疗健康领域的优点和缺点,包括计算成本、模型解释性、公平、对人类价值观Alignment、伦理问题和环境影响。

Improving the Transferability of Time Series Forecasting with Decomposition Adaptation

  • paper_url: http://arxiv.org/abs/2307.00066
  • repo_url: None
  • paper_authors: Yan Gao, Yan Wang, Qiang Wang
  • for: 本研究旨在提高多变量时间序预测模型的性能,解决数据稀缺问题。
  • methods: 我们提出了一种新的转移架构——序列分解适应网络(SeDAN),通过对各个领域数据进行适应,提高预测性能。同时,我们还提出了一种新的特征分解方法——隐式对比分解,以分解时间序列数据中的特征。
  • results: 我们在五个benchmark dataset上进行了广泛的实验,结果表明,我们的SeDAN可以更好地传递知识,提高预测性能的稳定性。
    Abstract Due to effective pattern mining and feature representation, neural forecasting models based on deep learning have achieved great progress. The premise of effective learning is to collect sufficient data. However, in time series forecasting, it is difficult to obtain enough data, which limits the performance of neural forecasting models. To alleviate the data scarcity limitation, we design Sequence Decomposition Adaptation Network (SeDAN) which is a novel transfer architecture to improve forecasting performance on the target domain by aligning transferable knowledge from cross-domain datasets. Rethinking the transferability of features in time series data, we propose Implicit Contrastive Decomposition to decompose the original features into components including seasonal and trend features, which are easier to transfer. Then we design the corresponding adaptation methods for decomposed features in different domains. Specifically, for seasonal features, we perform joint distribution adaptation and for trend features, we design an Optimal Local Adaptation. We conduct extensive experiments on five benchmark datasets for multivariate time series forecasting. The results demonstrate the effectiveness of our SeDAN. It can provide more efficient and stable knowledge transfer.
    摘要 We rethought the transferability of features in time series data and proposed Implicit Contrastive Decomposition to decompose the original features into components, including seasonal and trend features, which are easier to transfer. We then designed corresponding adaptation methods for the decomposed features in different domains. For seasonal features, we performed joint distribution adaptation, and for trend features, we designed an Optimal Local Adaptation.We conducted extensive experiments on five benchmark datasets for multivariate time series forecasting and the results demonstrate the effectiveness of our SeDAN. It can provide more efficient and stable knowledge transfer.Here's the Simplified Chinese translation:因为深度学习模型的效果性和特征表示,因特性 forecasting 模型在深度学习方面已经取得了很大的进步。但是,在时间序列预测中,收集到的数据不够,限制了神经预测模型的性能。为了解决数据缺乏的问题,我们设计了序列分解适应网络(SeDAN),这是一种新的传输架构,可以在目标预测领域提高预测性能。我们重新思考了时间序列数据中特征的传输性,并提出了偏好对比分解,将原始特征分解成季节特征和趋势特征两个组分,这两个组分更容易传输。然后,我们设计了对这两个组分在不同领域进行适应的方法。对季节特征,我们进行了共同分布适应,对趋势特征,我们设计了最佳本地适应。我们在五个多ivariate 时间序列预测 benchmark 数据集上进行了广泛的实验,结果表明,我们的 SeDAN 非常有效。它可以提供更高效和稳定的知识传输。

The Clock and the Pizza: Two Stories in Mechanistic Explanation of Neural Networks

  • paper_url: http://arxiv.org/abs/2306.17844
  • repo_url: None
  • paper_authors: Ziqian Zhong, Ziming Liu, Max Tegmark, Jacob Andreas
  • for: 这个论文是用来研究 neural networks 是否可靠地 rediscover 已知的算法来解决相应的任务的。
  • methods: 这个论文使用的方法是使用 modular addition 作为一个示例问题,通过调整模型的 hyperparameter 和初始化来让 neural networks 发现不同的算法解决方案。
  • results: 研究发现,即使是使用简单的学习问题, neural networks 仍然可以发现多种不同的算法解决方案,包括一种已知的 Clock 算法和一种新发现的、 menos intuitive 但可读的 Pizza 算法,以及多种更复杂的解决方案。
    Abstract Do neural networks, trained on well-understood algorithmic tasks, reliably rediscover known algorithms for solving those tasks? Several recent studies, on tasks ranging from group arithmetic to in-context linear regression, have suggested that the answer is yes. Using modular addition as a prototypical problem, we show that algorithm discovery in neural networks is sometimes more complex. Small changes to model hyperparameters and initializations can induce the discovery of qualitatively different algorithms from a fixed training set, and even parallel implementations of multiple such algorithms. Some networks trained to perform modular addition implement a familiar Clock algorithm; others implement a previously undescribed, less intuitive, but comprehensible procedure which we term the Pizza algorithm, or a variety of even more complex procedures. Our results show that even simple learning problems can admit a surprising diversity of solutions, motivating the development of new tools for characterizing the behavior of neural networks across their algorithmic phase space.
    摘要 neuros networks, trained on well-understood algorithmic tasks, can reliably rediscover known algorithms for solving those tasks? Several recent studies, on tasks ranging from group arithmetic to in-context linear regression, have suggested that the answer is yes. Using modular addition as a prototypical problem, we show that algorithm discovery in neural networks is sometimes more complex. Small changes to model hyperparameters and initializations can induce the discovery of qualitatively different algorithms from a fixed training set, and even parallel implementations of multiple such algorithms. Some networks trained to perform modular addition implement a familiar Clock algorithm; others implement a previously undescribed, less intuitive, but comprehensible procedure which we term the Pizza algorithm, or a variety of even more complex procedures. Our results show that even simple learning problems can admit a surprising diversity of solutions, motivating the development of new tools for characterizing the behavior of neural networks across their algorithmic phase space.Note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. Traditional Chinese is used in Hong Kong, Macau, and Taiwan.

Resetting the Optimizer in Deep RL: An Empirical Study

  • paper_url: http://arxiv.org/abs/2306.17833
  • repo_url: None
  • paper_authors: Kavosh Asadi, Rasool Fakoor, Shoham Sabach
  • for: deep reinforcement learning中的优化值函数近似问题
  • methods: 使用现代variants of stochastic gradient descent algorithm such as Adam,并在每次迭代中重置内部参数
  • results: 这种简单的修改方法可以减轻现代优化器内部参数的污染效应,提高深度RL在Atari benchmark上的表现
    Abstract We focus on the task of approximating the optimal value function in deep reinforcement learning. This iterative process is comprised of approximately solving a sequence of optimization problems where the objective function can change per iteration. The common approach to solving the problem is to employ modern variants of the stochastic gradient descent algorithm such as Adam. These optimizers maintain their own internal parameters such as estimates of the first and the second moment of the gradient, and update these parameters over time. Therefore, information obtained in previous iterations is being used to solve the optimization problem in the current iteration. We hypothesize that this can contaminate the internal parameters of the employed optimizer in situations where the optimization landscape of the previous iterations is quite different from the current iteration. To hedge against this effect, a simple idea is to reset the internal parameters of the optimizer when starting a new iteration. We empirically investigate this resetting strategy by employing various optimizers in conjunction with the Rainbow algorithm. We demonstrate that this simple modification unleashes the true potential of modern optimizers, and significantly improves the performance of deep RL on the Atari benchmark.
    摘要 我们关注深度游戏学习中的优化值函数 aproximation 问题。这是一个迭代过程,其中每次迭代都可能有不同的优化目标函数。通常使用现代 variants of stochastic gradient descent algorithm,如 Adam。这些优化器会维护自己的内部参数,如梯度的初始值和二阶导数的估计值,并在时间上更新这些参数。因此,在前一个迭代的信息会影响当前迭代的优化问题解决。我们提出了一个简单的想法,即在开始新的迭代时,重置优化器的内部参数。我们通过employmodern optimizers in conjunction with the Rainbow algorithm来实验这种重置策略,并证明这是一种简单的 modificaiton 可以大幅提高深度RL在Atari benchmark上的性能。

Federated Ensemble YOLOv5 - A Better Generalized Object Detection Algorithm

  • paper_url: http://arxiv.org/abs/2306.17829
  • repo_url: None
  • paper_authors: Vinit Hegiste, Tatjana Legler, Martin Ruskowski
  • for: 这篇论文旨在探讨基于联合学习算法的联邦学习(FL)在对象检测中的应用,以提高泛化性能。
  • methods: 该论文使用了基于FED Avg和FED SGD的联邦学习算法,并采用了随机抽样策略不替换。
  • results: 实验结果显示,基于FL的YOLOv5模型在测试集上生成的精度 bounding box 比中央训练方法更高,特别是在测试集中包含两个客户端没有训练集的情况下。这些结果表明,FL可以视为一种ensemble algorithm的融合,类似于Bagging和Boosting技术的融合。因此,FL不仅可以视为一种隐私保护方法,还可以视为一种提高机器学习模型性能的方法。
    Abstract Federated learning (FL) has gained significant traction as a privacy-preserving algorithm, but the underlying resembles of federated learning algorithm like Federated averaging (FED Avg) or Federated SGD (FED SGD) to ensemble learning algorithms has not been fully explored. The purpose of this paper is to examine the application of FL to object detection as a method to enhance generalizability, and to compare its performance against a centralized training approach for an object detection algorithm. Specifically, we investigate the performance of a YOLOv5 model trained using FL across multiple clients and employ a random sampling strategy without replacement, so each client holds a portion of the same dataset used for centralized training. Our experimental results showcase the superior efficiency of the FL object detector's global model in generating accurate bounding boxes for unseen objects, with the test set being a mixture of objects from two distinct clients not represented in the training dataset. These findings suggest that FL can be viewed from an ensemble algorithm perspective, akin to a synergistic blend of Bagging and Boosting techniques. As a result, FL can be seen not only as a method to enhance privacy, but also as a method to enhance the performance of a machine learning model.
    摘要 federated learning (FL) 已经得到了关于隐私保护的算法的广泛应用,但是背后的 federated learning 算法,如 federated averaging (FED Avg) 或 federated SGD (FED SGD) 与 ensemble learning 算法的相似性仍未得到了充分的探讨。本文的目的是探讨 federated learning 在物品探测中的应用,以增强模型的一致性,并与中央训练方法进行比较。具体来说,我们 investigate YOLOv5 模型使用 federated learning 方法在多个客户端上训练,使用随机抽样无替换的方法,以确保每个客户端都持有相同的测试数据。我们的实验结果显示,使用 federated learning 的 object detection 模型在处理未见过的物品时,具有更高的精度,尤其是在测试集中包含两个不同客户端的物品。这些结果表明, federated learning 可以被视为一种ensemble algorithm的融合,类似于 Bagging 和 Boosting 技术的融合。因此, federated learning 不仅可以作为一种隐私保护方法,而且可以作为一种提高机器学习模型表现的方法。

Understanding Unfairness via Training Concept Influence

  • paper_url: http://arxiv.org/abs/2306.17828
  • repo_url: None
  • paper_authors: Yuanshun Yao, Yang Liu
  • for: 本研究旨在帮助实践者更好地理解他们的数据和算法是如何不公。
  • methods: 本研究使用对训练样本进行counterfactual intervening,以计算这些样本对模型不公性的影响。
  • results: 本研究可以帮助实践者理解训练数据中的不公性,并且可以探索不公性的来源。此外,这种方法还可以检测恶意攻击、探测数据质量问题和修复不公性。
    Abstract Knowing the causes of a model's unfairness helps practitioners better understand their data and algorithms. This is an important yet relatively unexplored task. We look into this problem through the lens of the training data - one of the major sources of unfairness. We ask the following questions: how would a model's fairness performance change if, in its training data, some samples (1) were collected from a different (e.g. demographic) group, (2) were labeled differently, or (3) some features were changed? In other words, we quantify the fairness influence of training samples by counterfactually intervening and changing samples based on predefined concepts, i.e. data attributes such as features (X), labels (Y), or sensitive attributes (A). To calculate a training sample's influence on the model's unfairness w.r.t a concept, we first generate counterfactual samples based on the concept, i.e. the counterfactual versions of the sample if the concept were changed. We then calculate the resulting impact on the unfairness, via influence function, if the counterfactual samples were used in training. Our framework not only helps practitioners understand the observed unfairness and repair their training data, but also leads to many other applications, e.g. detecting mislabeling, fixing imbalanced representations, and detecting fairness-targeted poisoning attacks.
    摘要 知道模型的不公平性的原因可以帮助实践者更好地理解其数据和算法。这是一项重要但相对未经探索的任务。我们通过训练数据的镜像来研究这个问题。我们问的问题是:如果在训练数据中某些样本(1)来自不同(例如,人口)群体,(2)被标注 differently,或(3)某些特征被改变, THEN 如何改变模型的不公平性?我们使用 counterfactual 技术来计算训练样本对模型不公平性的影响。首先,我们生成 counterfactual 样本基于概念,即样本的 counterfactual 版本,如果概念被改变。然后,我们计算这些 counterfactual 样本在训练中的影响,通过 influence function 来衡量模型的不公平性。我们的框架不仅帮助实践者理解观察到的不公平性,还可以用于其他应用,如检测杂标注、修复不均衡表示、和检测公平性目标攻击。

Scalable tensor methods for nonuniform hypergraphs

  • paper_url: http://arxiv.org/abs/2306.17825
  • repo_url: None
  • paper_authors: Sinan G. Aksoy, Ilya Amburg, Stephen J. Young
  • for: 研究多元图模型中的多方交互方式
  • methods: 使用tensor方法,并开发了tensor times same vector(TTSV)算法以提高复杂度
  • results: 提供了一种新的多元图中心性和凝聚算法,并证明了这些tensor测量可以提供更多的信息,并能够探测高阶结构,而许多现有的矩阵基于方法无法探测出来
    Abstract While multilinear algebra appears natural for studying the multiway interactions modeled by hypergraphs, tensor methods for general hypergraphs have been stymied by theoretical and practical barriers. A recently proposed adjacency tensor is applicable to nonuniform hypergraphs, but is prohibitively costly to form and analyze in practice. We develop tensor times same vector (TTSV) algorithms for this tensor which improve complexity from $O(n^r)$ to a low-degree polynomial in $r$, where $n$ is the number of vertices and $r$ is the maximum hyperedge size. Our algorithms are implicit, avoiding formation of the order $r$ adjacency tensor. We demonstrate the flexibility and utility of our approach in practice by developing tensor-based hypergraph centrality and clustering algorithms. We also show these tensor measures offer complementary information to analogous graph-reduction approaches on data, and are also able to detect higher-order structure that many existing matrix-based approaches provably cannot.
    摘要 While multilinear algebra appears natural for studying the multiway interactions modeled by hypergraphs, tensor methods for general hypergraphs have been stymied by theoretical and practical barriers. A recently proposed adjacency tensor is applicable to nonuniform hypergraphs, but is prohibitively costly to form and analyze in practice. We develop tensor times same vector (TTSV) algorithms for this tensor which improve complexity from $O(n^r)$ to a low-degree polynomial in $r$, where $n$ is the number of vertices and $r$ is the maximum hyperedge size. Our algorithms are implicit, avoiding formation of the order $r$ adjacency tensor. We demonstrate the flexibility and utility of our approach in practice by developing tensor-based hypergraph centrality and clustering algorithms. We also show these tensor measures offer complementary information to analogous graph-reduction approaches on data, and are also able to detect higher-order structure that many existing matrix-based approaches provably cannot.Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and widely used in other countries as well. If you prefer Traditional Chinese, I can provide that as well.

Act3D: Infinite Resolution Action Detection Transformer for Robotic Manipulation

  • paper_url: http://arxiv.org/abs/2306.17817
  • repo_url: None
  • paper_authors: Theophile Gervet, Zhou Xian, Nikolaos Gkanatsios, Katerina Fragkiadaki
  • for: 提高机器人操作精度,使用3D感知表示来实现高精度End-effector pose预测。
  • methods: 使用Transformer模型,将6DoF键动预测转换为3D检测,并使用自适应空间计算来选择最佳特征点。
  • results: 在RLbench manipulate benchmark上实现新的state-of-the-art,比前一代SOTA 2D多视图策略提高10%,并且在3D策略上提高22%,且需要3x menos计算资源。
    Abstract 3D perceptual representations are well suited for robot manipulation as they easily encode occlusions and simplify spatial reasoning. Many manipulation tasks require high spatial precision in end-effector pose prediction, typically demanding high-resolution 3D perceptual grids that are computationally expensive to process. As a result, most manipulation policies operate directly in 2D, foregoing 3D inductive biases. In this paper, we propose Act3D, a manipulation policy Transformer that casts 6-DoF keypose prediction as 3D detection with adaptive spatial computation. It takes as input 3D feature clouds unprojected from one or more camera views, iteratively samples 3D point grids in free space in a coarse-to-fine manner, featurizes them using relative spatial attention to the physical feature cloud, and selects the best feature point for end-effector pose prediction. Act3D sets a new state-of-the-art in RLbench, an established manipulation benchmark. Our model achieves 10% absolute improvement over the previous SOTA 2D multi-view policy on 74 RLbench tasks and 22% absolute improvement with 3x less compute over the previous SOTA 3D policy. In thorough ablations, we show the importance of relative spatial attention, large-scale vision-language pre-trained 2D backbones, and weight tying across coarse-to-fine attentions. Code and videos are available at our project site: https://act3d.github.io/.
    摘要 三维感知表示法适用于机器人操作,因为它容易表示遮挡和空间理解简化。许多操作任务需要高精度的endpoint姿态预测,通常需要高分辨率的3D感知格,这些格是计算成本较高的。因此,大多数操作策略直接在2D中运行,抛弃3D的逻辑推导。在这篇论文中,我们提出了Act3D,一种操作策略变换器,将6度 freedom键点预测视为3D检测,并在自适应空间计算中使用相对的空间注意力。它从一个或多个相机视图中提取了3D特征云,逐步在自由空间中分布3D点网格,使用相对的空间注意力来特征化物理特征云,并选择最佳的特征点 дляendpoint姿态预测。Act3D在RLbench上设置了新的状态天地,其在74个RLbench任务中实现了10%的绝对提升,并在22%的绝对提升和3x的计算量下超越了之前的SOTA 3D策略。在严格的拓展中,我们表明了相对空间注意力、大规模视语言预训练2D背景和跨度融合的重要性。代码和视频可以在我们项目网站上获取:

Bayesian Optimization with Formal Safety Guarantees via Online Conformal Prediction

  • paper_url: http://arxiv.org/abs/2306.17815
  • repo_url: None
  • paper_authors: Yunchuan Zhang, Sangwoo Park, Osvaldo Simeone
  • For: The paper is written for optimizing black-box functions with safety constraints, specifically in scenarios where feedback on the safety of attempted solutions is provided.* Methods: The paper uses Bayesian optimization (BO) and online conformal prediction (CP) to develop a novel approach called SAFE-BOCP, which satisfies safety requirements while allowing for an arbitrary rate of violation of the safety constraint.* Results: The proposed SAFE-BOCP method is validated through experimental results on synthetic and real-world data, demonstrating its advantages and flexibility in optimizing black-box functions with safety constraints.Here’s the same information in Simplified Chinese:* For: 这篇论文是为了优化黑盒函数(black-box function)with safety constraints,具体来说是在反馈安全性的情况下进行优化。* Methods: 这篇论文使用 bayesian 优化(BO)和在线准确预测(CP)来开发一种名为 SAFE-BOCP 的新方法,该方法可以满足安全要求,同时允许随机但非零的安全性约束违反率。* Results: SAFE-BOCP 方法通过对 sintetic 和实际数据进行实验,证明了它在优化黑盒函数 with safety constraints 中的优势和灵活性。
    Abstract Black-box zero-th order optimization is a central primitive for applications in fields as diverse as finance, physics, and engineering. In a common formulation of this problem, a designer sequentially attempts candidate solutions, receiving noisy feedback on the value of each attempt from the system. In this paper, we study scenarios in which feedback is also provided on the safety of the attempted solution, and the optimizer is constrained to limit the number of unsafe solutions that are tried throughout the optimization process. Focusing on methods based on Bayesian optimization (BO), prior art has introduced an optimization scheme -- referred to as SAFEOPT -- that is guaranteed not to select any unsafe solution with a controllable probability over feedback noise as long as strict assumptions on the safety constraint function are met. In this paper, a novel BO-based approach is introduced that satisfies safety requirements irrespective of properties of the constraint function. This strong theoretical guarantee is obtained at the cost of allowing for an arbitrary, controllable but non-zero, rate of violation of the safety constraint. The proposed method, referred to as SAFE-BOCP, builds on online conformal prediction (CP) and is specialized to the cases in which feedback on the safety constraint is either noiseless or noisy. Experimental results on synthetic and real-world data validate the advantages and flexibility of the proposed SAFE-BOCP.
    摘要 黑盒zero顺位优化是应用于金融、物理和工程等领域的中心基本原理。在一般的形式ulation中,设计师 sequentially尝试候选解, receiving noisy feedback on the value of each attempt from the system。在这篇论文中,我们研究了具有反馈安全性的问题,并将优化器限制为在优化过程中不能对安全性进行多个不安全的尝试。我们专注在基于Bayesian优化(BO)的方法上,并引入了一个称为SAFEOPT的优化方案,可以在feedbacknoise下保证不会选择任何不安全的解决方案,只要strictly assumptions on the safety constraint function是满足的。在这篇论文中,我们介绍了一个新的BO-based方法,可以满足安全要求,不论安全限制函数的性质。这个强制性保证是基于允许一定、可控的,但不是零的安全限制违背率。我们称这个方法为SAFE-BOCP,它基于线上对称预测(CP),并特化为当反馈安全限制是无噪或噪音的情况下。实验结果显示了SAFE-BOCP的优势和灵活性,在 synthetic和实际数据上。

Stay on topic with Classifier-Free Guidance

  • paper_url: http://arxiv.org/abs/2306.17806
  • repo_url: https://github.com/Vermeille/cfg-llm
  • paper_authors: Guillaume Sanchez, Honglu Fan, Alexander Spangher, Elad Levi, Pawan Sasanka Ammanamanchi, Stella Biderman
  • for: 本研究的目的是应用类别器-free guidance(CFG)技术在文本生成中,以便更好地遵循提示。
  • methods: 本研究使用的方法是CFG技术,并在不同的任务上进行了评估,包括问答、理解、代码生成和机器翻译等。
  • results: 研究结果显示,CFG技术可以广泛应用于纯语言模型中,可以提高模型的性能,并且可以与其他推理时间方法相结合使用,以提高模型在困难任务中的表现。此外,CFG技术还可以增加助手的准确性和一致性。
    Abstract Classifier-Free Guidance (CFG) has recently emerged in text-to-image generation as a lightweight technique to encourage prompt-adherence in generations. In this work, we demonstrate that CFG can be used broadly as an inference-time technique in pure language modeling. We show that CFG (1) improves the performance of Pythia, GPT-2 and LLaMA-family models across an array of tasks: Q\&A, reasoning, code generation, and machine translation, achieving SOTA on LAMBADA with LLaMA-7B over PaLM-540B; (2) brings improvements equivalent to a model with twice the parameter-count; (3) can stack alongside other inference-time methods like Chain-of-Thought and Self-Consistency, yielding further improvements in difficult tasks; (4) can be used to increase the faithfulness and coherence of assistants in challenging form-driven and content-driven prompts: in a human evaluation we show a 75\% preference for GPT4All using CFG over baseline.
    摘要 <> translate "Classifier-Free Guidance (CFG) has recently emerged in text-to-image generation as a lightweight technique to encourage prompt-adherence in generations. In this work, we demonstrate that CFG can be used broadly as an inference-time technique in pure language modeling. We show that CFG (1) improves the performance of Pythia, GPT-2 and LLaMA-family models across an array of tasks: Q\&A, reasoning, code generation, and machine translation, achieving SOTA on LAMBADA with LLaMA-7B over PaLM-540B; (2) brings improvements equivalent to a model with twice the parameter-count; (3) can stack alongside other inference-time methods like Chain-of-Thought and Self-Consistency, yielding further improvements in difficult tasks; (4) can be used to increase the faithfulness and coherence of assistants in challenging form-driven and content-driven prompts: in a human evaluation we show a 75\% preference for GPT4All using CFG over baseline." into Simplified Chinese.简化中文:现代出版的Classifier-Free Guidance(CFG)在文本生成中作为轻量级技术,激发生成的优化。在这项工作中,我们证明CFG可以广泛应用于语言模型的推理时间技术。我们显示CFG可以(1)提高PYTHIA、GPT-2和LLaMA家族模型在各种任务中的性能:问答、推理、代码生成和机器翻译等,在LAMBADA上以LLaMA-7B超过PaLM-540B的最佳性能;(2)带来相当于两倍参数的改进;(3)可以与其他推理时间方法相结合,如Chain-of-Thought和Self-Consistency,带来进一步的改进在困难任务中;(4)可以增强助手的准确性和流畅性在复杂的形式驱动和内容驱动的提问中:人类评估中显示GPT4All使用CFG的75%的偏好比基eline。

Voting-based Multimodal Automatic Deception Detection

  • paper_url: http://arxiv.org/abs/2307.07516
  • repo_url: None
  • paper_authors: Lana Touma, Mohammad Al Horani, Manar Tailouni, Anas Dahabiah, Khloud Al Jallad
  • for: automatic deception detection in videos using audio, visual, and lexical features
  • methods: voting-based method combining CNN, SVM on Mel spectrograms, and Word2Vec on SVM
  • results: outperformed state of the art with best results of 97%, 96%, 92% on Real-Life Trial Dataset, and 97%, 82%, 73% on Miami University Deception Detection Dataset.
    Abstract Automatic Deception Detection has been a hot research topic for a long time, using machine learning and deep learning to automatically detect deception, brings new light to this old field. In this paper, we proposed a voting-based method for automatic deception detection from videos using audio, visual and lexical features. Experiments were done on two datasets, the Real-life trial dataset by Michigan University and the Miami University deception detection dataset. Video samples were split into frames of images, audio, and manuscripts. Our Voting-based Multimodal proposed solution consists of three models. The first model is CNN for detecting deception from images, the second model is Support Vector Machine (SVM) on Mel spectrograms for detecting deception from audio and the third model is Word2Vec on Support Vector Machine (SVM) for detecting deception from manuscripts. Our proposed solution outperforms state of the art. Best results achieved on images, audio and text were 97%, 96%, 92% respectively on Real-Life Trial Dataset, and 97%, 82%, 73% on video, audio and text respectively on Miami University Deception Detection.
    摘要 自动欺骗检测已经是长期的研究热点,通过机器学习和深度学习自动检测欺骗,带来了新的灯光。在这篇论文中,我们提出了一种投票方法 для自动欺骗检测从视频中提取的音频、视觉和文本特征。我们的投票方法包括三个模型:一个用于从图像中检测欺骗的Convolutional Neural Network (CNN),一个用于从音频中检测欺骗的Support Vector Machine (SVM),以及一个用于从文本中检测欺骗的Word2Vec和SVM。我们的提议方案在比较之上超过了现状。在美加大学实验 dataset 上,最好的结果是图像、音频和文本的检测率分别为97%、96%和92%。在美州大学欺骗检测 dataset 上,最好的结果是视频、音频和文本的检测率分别为97%、82%和73%。

Hierarchical Bayesian Regression for Multi-Location Sales Transaction Forecasting

  • paper_url: http://arxiv.org/abs/2306.17795
  • repo_url: None
  • paper_authors: John Mark Agosta, Mario Inchiosa
  • for: 预测购物行为,具体来说是在不同的店铺和日期上的购物额。
  • methods: 使用层次架构的 bayesian 模型,通过对各个组合的预测结果进行共享来提高预测的准确性。
  • results: 通过使用 \textsf{stan} 包和具体的销售交易数据,实现了在多个位置和日期上的购物额预测,并且解决了由于数据有限而导致的准确性问题。
    Abstract The features in many prediction models naturally take the form of a hierarchy. The lower levels represent individuals or events. These units group naturally into locations and intervals or other aggregates, often at multiple levels. Levels of groupings may intersect and join, much as relational database tables do. Besides representing the structure of the data, predictive features in hierarchical models can be assigned to their proper levels. Such models lend themselves to hierarchical Bayes solution methods that ``share'' results of inference between groups by generalizing over the case of individual models for each group versus one model that aggregates all groups into one. In this paper we show our work-in-progress applying a hierarchical Bayesian model to forecast purchases throughout the day at store franchises, with groupings over locations and days of the week. We demonstrate using the \textsf{stan} package on individual sales transaction data collected over the course of a year. We show how this solves the dilemma of having limited data and hence modest accuracy for each day and location, while being able to scale to a large number of locations with improved accuracy.
    摘要 模型中的特征自然形成一个层次结构。下层表示个人或事件。这些单元自然归于地点和时间间隔或其他聚合,经常有多个层次。层次分组可能相交和结合,类似于关系数据库表。besides representing the structure of the data, predictive features in hierarchical models can be assigned to their proper levels.这些模型适用于层次权重解决方法,可以在不同组中共享推断结果。在这篇论文中,我们展示了我们对estore销售预测的工作进程,使用层次权重模型。我们使用了\textsf{stan}包,对各个销售交易数据进行分析,并示出了在一年内收集的数据中的准确性。我们还示出了如何通过聚合多个地点来提高准确性。

Vision Through the Veil: Differential Privacy in Federated Learning for Medical Image Classification

  • paper_url: http://arxiv.org/abs/2306.17794
  • repo_url: None
  • paper_authors: Kishore Babu Nampalle, Pradeep Singh, Uppala Vivek Narayan, Balasubramanian Raman
  • for: 这种研究是为了解决医疗领域深度学习应用中数据集成问题,而这种集成常常会带来重要的隐私问题。
  • methods: 这种研究使用了联邦学习框架,并将分Diff privacy技术integrated into it,以提供更高的隐私保护。
  • results: 研究发现,在保持隐私的同时,可以使用权限证明技术来维护强大的图像分类性能。
    Abstract The proliferation of deep learning applications in healthcare calls for data aggregation across various institutions, a practice often associated with significant privacy concerns. This concern intensifies in medical image analysis, where privacy-preserving mechanisms are paramount due to the data being sensitive in nature. Federated learning, which enables cooperative model training without direct data exchange, presents a promising solution. Nevertheless, the inherent vulnerabilities of federated learning necessitate further privacy safeguards. This study addresses this need by integrating differential privacy, a leading privacy-preserving technique, into a federated learning framework for medical image classification. We introduce a novel differentially private federated learning model and meticulously examine its impacts on privacy preservation and model performance. Our research confirms the existence of a trade-off between model accuracy and privacy settings. However, we demonstrate that strategic calibration of the privacy budget in differential privacy can uphold robust image classification performance while providing substantial privacy protection.
    摘要 深度学习在医疗领域的普及需要不同机构之间的数据集成,这常常与隐私问题相关。尤其在医疗图像分析中,由于数据的敏感性,隐私保护是非常重要的。联邦学习,允许协作模型训练而不需直接数据交换,提供了一个有希望的解决方案。然而,联邦学习的内在漏洞需要进一步的隐私保护措施。本研究在联邦学习框架中集成了差分隐私,一种领先的隐私保护技术,以提高隐私保护和模型性能之间的平衡。我们提出了一种新的差分隐私联邦学习模型,并仔细分析了其对隐私保护和模型性能的影响。我们的研究证明了隐私保护和模型性能之间存在负相关性,但我们还是能够通过差分隐私的调整来保持 Robust 的图像分类性能,同时提供了严格的隐私保护。

Look, Remember and Reason: Visual Reasoning with Grounded Rationales

  • paper_url: http://arxiv.org/abs/2306.17778
  • repo_url: None
  • paper_authors: Apratim Bhattacharyya, Sunny Panchal, Mingu Lee, Reza Pourreza, Pulkit Madan, Roland Memisevic
  • for: 这篇论文旨在探讨大语言模型在视觉理解任务中的表现,尤其是如何使用人类视觉解决方法来提高模型的表现。
  • methods: 这篇论文提出了一种基于人类视觉解决方法的方法,即“看、记忆、理解”的三步过程,通过在每一步中逐步提取视觉信息,使用低级视觉能力来解决视觉理解问题。
  • results: 研究发现,通过在大语言模型中引入视觉理解的 rationales,可以使模型在多种视觉理解任务中表现竞争力强,包括CLEVR、CATER和ACRE等数据集中的任务。
    Abstract Large language models have recently shown human level performance on a variety of reasoning tasks. However, the ability of these models to perform complex visual reasoning has not been studied in detail yet. A key challenge in many visual reasoning tasks is that the visual information needs to be tightly integrated in the reasoning process. We propose to address this challenge by drawing inspiration from human visual problem solving which depends on a variety of low-level visual capabilities. It can often be cast as the three step-process of ``Look, Remember, Reason'': visual information is incrementally extracted using low-level visual routines in a step-by-step fashion until a final answer is reached. We follow the same paradigm to enable existing large language models, with minimal changes to the architecture, to solve visual reasoning problems. To this end, we introduce rationales over the visual input that allow us to integrate low-level visual capabilities, such as object recognition and tracking, as surrogate tasks. We show competitive performance on diverse visual reasoning tasks from the CLEVR, CATER, and ACRE datasets over state-of-the-art models designed specifically for these tasks.
    摘要 We apply the same paradigm to existing large language models, with minimal architectural changes, to enable them to solve visual reasoning problems. To do this, we introduce rationales over the visual input that allow us to integrate low-level visual capabilities, such as object recognition and tracking, as surrogate tasks. Our approach achieves competitive performance on diverse visual reasoning tasks from the CLEVR, CATER, and ACRE datasets, outperforming state-of-the-art models designed specifically for these tasks.

Practical and Asymptotically Exact Conditional Sampling in Diffusion Models

  • paper_url: http://arxiv.org/abs/2306.17775
  • repo_url: https://github.com/blt2114/twisted_diffusion_sampler
  • paper_authors: Luhuan Wu, Brian L. Trippe, Christian A. Naesseth, David M. Blei, John P. Cunningham
  • for: This paper is written for the task of conditional generation, specifically for molecular design and protein design.
  • methods: The paper proposes a new method called Twisted Diffusion Sampler (TDS), which is a sequential Monte Carlo (SMC) algorithm that targets the conditional distributions of diffusion models. TDS uses twisting, an SMC technique, to incorporate heuristic approximations without compromising asymptotic exactness.
  • results: The paper shows that TDS provides a computational statistical trade-off, yielding more accurate approximations with many particles but with empirical improvements over heuristics with as few as two particles. Additionally, TDS is applied to motif-scaffolding, a core task in protein design, and outperforms the state of the art on benchmark test cases.
    Abstract Diffusion models have been successful on a range of conditional generation tasks including molecular design and text-to-image generation. However, these achievements have primarily depended on task-specific conditional training or error-prone heuristic approximations. Ideally, a conditional generation method should provide exact samples for a broad range of conditional distributions without requiring task-specific training. To this end, we introduce the Twisted Diffusion Sampler, or TDS. TDS is a sequential Monte Carlo (SMC) algorithm that targets the conditional distributions of diffusion models. The main idea is to use twisting, an SMC technique that enjoys good computational efficiency, to incorporate heuristic approximations without compromising asymptotic exactness. We first find in simulation and on MNIST image inpainting and class-conditional generation tasks that TDS provides a computational statistical trade-off, yielding more accurate approximations with many particles but with empirical improvements over heuristics with as few as two particles. We then turn to motif-scaffolding, a core task in protein design, using a TDS extension to Riemannian diffusion models. On benchmark test cases, TDS allows flexible conditioning criteria and often outperforms the state of the art.
    摘要 Diffusion models 已经在一系列的条件生成任务上取得了成功,包括分子设计和文本到图像生成。然而,这些成就主要取决于任务特定的条件训练或错误的尝试性近似。理想的条件生成方法应该能够提供准确的样本 для广泛的条件分布没有需要任务特定的训练。为此目的,我们介绍了扭曲扩散采样(TDS)。TDS 是一种顺序 Monte Carlo(SMC)算法, targets diffusion models 的条件分布。主要思想是使用扭曲,一种 SMC 技术,来推入尝试性近似而不失 asymptotic exactness。我们首先在 simulated 和 MNIST 图像缺失和分类生成任务上发现,TDS 提供了计算统计上的交易,提供更准确的近似,但是与 empirical 上的改进。然后,我们转向 protein 设计中的核心任务 - motif-scaffolding,使用 TDS 扩展到 Riemannian 扩散模型。在 benchmark 测试 caso,TDS 允许 flexible conditioning criteria,并经常超越当前的状态。

Precision Anti-Cancer Drug Selection via Neural Ranking

  • paper_url: http://arxiv.org/abs/2306.17771
  • repo_url: https://github.com/ninglab/drugranker
  • paper_authors: Vishal Dey, Xia Ning
    for:这paper的目的是提出一种基于大规模高通量屏选数据的神经网络排名方法,用于快速和准确地选择适合某个癌细胞线的药物。methods:这paper使用了高通量屏选数据来驱动神经网络模型,并提出了两种神经网络排名方法:List-One和List-All。这两种方法都使用了列式排名的思想,但是List-All方法会考虑所有敏感药物,而不是只考虑最敏感的一个。results:实验结果表明,List-All方法在50%的测试细胞线上可以达到8.6%的提升,与最佳基准方法相比。此外,分析表明,提出的方法学习的隐藏空间具有有用的凝固结构和捕捉了相关的生物学特征。此外,这paper的实验评估还提供了不同方法之间的对比,以便用于选择最佳方法。
    Abstract Personalized cancer treatment requires a thorough understanding of complex interactions between drugs and cancer cell lines in varying genetic and molecular contexts. To address this, high-throughput screening has been used to generate large-scale drug response data, facilitating data-driven computational models. Such models can capture complex drug-cell line interactions across various contexts in a fully data-driven manner. However, accurately prioritizing the most sensitive drugs for each cell line still remains a significant challenge. To address this, we developed neural ranking approaches that leverage large-scale drug response data across multiple cell lines from diverse cancer types. Unlike existing approaches that primarily utilize regression and classification techniques for drug response prediction, we formulated the objective of drug selection and prioritization as a drug ranking problem. In this work, we proposed two neural listwise ranking methods that learn latent representations of drugs and cell lines, and then use those representations to score drugs in each cell line via a learnable scoring function. Specifically, we developed a neural listwise ranking method, List-One, on top of the existing method ListNet. Additionally, we proposed a novel listwise ranking method, List-All, that focuses on all the sensitive drugs instead of the top sensitive drug, unlike List-One. Our results demonstrate that List-All outperforms the best baseline with significant improvements of as much as 8.6% in hit@20 across 50% test cell lines. Furthermore, our analyses suggest that the learned latent spaces from our proposed methods demonstrate informative clustering structures and capture relevant underlying biological features. Moreover, our comprehensive empirical evaluation provides a thorough and objective comparison of the performance of different methods (including our proposed ones).
    摘要 个性化癌症治疗需要深入了解药物和癌细胞之间复杂的互动关系,以及这些关系在不同的遗传和分子上下文中的变化。为此,高通量测试技术已被使用来生成大规模的药物响应数据,以便建立数据驱动的计算模型。然而,准确地对每个癌细胞类型中最敏感的药物进行优先级划分仍然是一个大的挑战。为解决这个问题,我们开发了基于大规模药物响应数据的神经网络排名方法。与现有方法不同的是,我们将药物响应预测 зада题定义为排名问题。在这个工作中,我们提出了两种神经网络排名方法:List-One和List-All。List-One方法基于现有的ListNet方法,而List-All方法则专注于每个癌细胞类型中的所有敏感药物。我们的结果显示,List-All方法在50%的测试癌细胞类型上具有显著改善(最大为8.6%),而且我们的分析表明,我们所提出的方法学习的秘密表空间具有有用的归一化结构,并捕捉了相关的生物学特征。此外,我们的全面的实验评估还提供了对不同方法(包括我们所提出的方法)的 объектив和 thorugh的比较。

The Shaped Transformer: Attention Models in the Infinite Depth-and-Width Limit

  • paper_url: http://arxiv.org/abs/2306.17759
  • repo_url: None
  • paper_authors: Lorenzo Noci, Chuning Li, Mufan Bill Li, Bobby He, Thomas Hofmann, Chris Maddison, Daniel M. Roy
  • for: 研究深度学习理论中covariance矩阵的代理,探讨Transformers的trainability。
  • methods: 使用修改后的Softmax-based attention模型,并添加skip connections,在无穷深度和宽度的比例限制下进行研究。
  • results: 显示在初始化时,限定分布可以通过一个Stochastic Differential Equation(SDE)来描述,该SDE是depth-to-width比例的。通过修改Transformers的注意机制,控制了scale的演算和噪声,使得网络存在稳定的SDE,从而避免了深度注意模型中的rank degeneracy问题。
    Abstract In deep learning theory, the covariance matrix of the representations serves as a proxy to examine the network's trainability. Motivated by the success of Transformers, we study the covariance matrix of a modified Softmax-based attention model with skip connections in the proportional limit of infinite-depth-and-width. We show that at initialization the limiting distribution can be described by a stochastic differential equation (SDE) indexed by the depth-to-width ratio. To achieve a well-defined stochastic limit, the Transformer's attention mechanism is modified by centering the Softmax output at identity, and scaling the Softmax logits by a width-dependent temperature parameter. We examine the stability of the network through the corresponding SDE, showing how the scale of both the drift and diffusion can be elegantly controlled with the aid of residual connections. The existence of a stable SDE implies that the covariance structure is well-behaved, even for very large depth and width, thus preventing the notorious issues of rank degeneracy in deep attention models. Finally, we show, through simulations, that the SDE provides a surprisingly good description of the corresponding finite-size model. We coin the name shaped Transformer for these architectural modifications.
    摘要 在深度学习理论中,covariance矩阵表示 Representatives 的训练可行性。靛 motivated by Transformers 的成功,我们研究 modified Softmax-based attention 模型中 skip connections 的 proportional limit 的 infinite-depth-and-width。我们发现,在初始化时,限定分布可以通过一个 Stochastic Differential Equation (SDE) indexed by depth-to-width ratio 来描述。为了实现一个具有well-defined 杂化限制,Transformers 的 attention机制被修改为将 Softmax 输出中心于标识,并将 Softmax logits Scaling 通过宽度 dependent 温度参数。我们通过对应的 SDE 检查网络的稳定性,并证明了 residual connections 可以 elegantly 控制杂化的规模。存在一个稳定的 SDE 表示 covariance 结构是良好的,即使在非常深和宽的情况下,从而避免了深度注意力模型中的著名问题 rank degeneracy。最后,我们通过 simulations 表明 SDE 提供了非常好的finite-size model 的描述。我们称这种建模修改为 shaped Transformer。

TD Convergence: An Optimization Perspective

  • paper_url: http://arxiv.org/abs/2306.17750
  • repo_url: None
  • paper_authors: Kavosh Asadi, Shoham Sabach, Yao Liu, Omer Gottesman, Rasool Fakoor
  • for: 本研究探讨了TD学习算法的收敛行为。
  • methods: 我们通过优化视角来研究TD算法,并证明了TD可以视为每次迭代都改变函数要Minimize的迭代优化算法。
  • results: 我们在经典 counter example 上 investigate TD 的偏置,并发现了两种力量 Determine TD 算法的收敛或偏置行为。我们在线性 TD 设置下证明了这两种力量之间的关系,并扩展到更广泛的设置中证明了 TD 算法的收敛。
    Abstract We study the convergence behavior of the celebrated temporal-difference (TD) learning algorithm. By looking at the algorithm through the lens of optimization, we first argue that TD can be viewed as an iterative optimization algorithm where the function to be minimized changes per iteration. By carefully investigating the divergence displayed by TD on a classical counter example, we identify two forces that determine the convergent or divergent behavior of the algorithm. We next formalize our discovery in the linear TD setting with quadratic loss and prove that convergence of TD hinges on the interplay between these two forces. We extend this optimization perspective to prove convergence of TD in a much broader setting than just linear approximation and squared loss. Our results provide a theoretical explanation for the successful application of TD in reinforcement learning.
    摘要 我们研究Temporal-Difference(TD)学习算法的收敛行为。通过优化的视角,我们首先 argueTD可以视为每次迭代都有变化的函数最小化算法。通过仔细分析TD在经典Counter例中的异常情况,我们确定了这两种力量DetermineTD算法的收敛或异常行为。我们 subsequentially formalize我们的发现,在线TD设置下,使用quadratic loss函数,并证明TD的收敛取决于这两种力量之间的交互。我们扩展这个优化视角,证明TD在远多于线性approximation和平方损失的情况下也是收敛的。我们的结果为TD在回归学习中的成功应用提供了理论解释。

eess.IV - 2023-07-01

CephGPT-4: An Interactive Multimodal Cephalometric Measurement and Diagnostic System with Visual Large Language Model

  • paper_url: http://arxiv.org/abs/2307.07518
  • repo_url: None
  • paper_authors: Lei Ma, Jincong Han, Zhaoxin Wang, Dian Zhang
  • for: 这个研究旨在探索基于多modal cephalometric医疗数据的诊断语言模型。
  • methods: 研究使用多modal cephalometric影像和医生病人对话数据,首先自动分析cephalometric特征点使用U-net,然后生成诊断报告。最后,cephalometric数据和生成的诊断报告分别精确化在Minigpt-4和VisualGLM上。
  • results: 研究结果显示CephGPT-4模型在表现出色,具有潜在创新的应用潜力在颚科领域。
    Abstract Large-scale multimodal language models (LMMs) have achieved remarkable success in general domains. However, the exploration of diagnostic language models based on multimodal cephalometric medical data remains limited. In this paper, we propose a novel multimodal cephalometric analysis and diagnostic dialogue model. Firstly, a multimodal orthodontic medical dataset is constructed, comprising cephalometric images and doctor-patient dialogue data, with automatic analysis of cephalometric landmarks using U-net and generation of diagnostic reports. Then, the cephalometric dataset and generated diagnostic reports are separately fine-tuned on Minigpt-4 and VisualGLM. Results demonstrate that the CephGPT-4 model exhibits excellent performance and has the potential to revolutionize orthodontic measurement and diagnostic applications. These innovations hold revolutionary application potential in the field of orthodontics.
    摘要 大规模多modal语言模型(LMMs)在通用领域已经取得了很大成功。然而,对于基于多modal额外医学数据的诊断语言模型的探索仍然受限。本文提出了一种新的多modal额外医学分析和诊断对话模型。首先,构建了一个多modal额外医学数据集,包括额外成像和医生病人对话数据,并自动分析额外特征点使用U-网和生成诊断报告。然后,额外数据集和生成的诊断报告分别在Minigpt-4和VisualGLM上进行了精细调整。结果表明,CephGPT-4模型在表现出色,有潜力革命化额外测量和诊断应用。这些创新在颌面矫正领域具有革命性应用 potential。

SDRCNN: A single-scale dense residual connected convolutional neural network for pansharpening

  • paper_url: http://arxiv.org/abs/2307.00327
  • repo_url: None
  • paper_authors: Yuan Fang, Yuanzhi Cai, Lei Fan
  • for: 本研究旨在开发一种高效精准的气象图像笼合成方法,以提高气象图像的空间分辨率和多spectral信息的精度。
  • methods: 本研究使用了一种单支持单规模的轻量级卷积神经网络,即SDRCNN,来实现笼合成。SDRCNN使用了一种新的稠密异或连接结构和卷积块,从而实现了更好的准确性和效率的平衡。
  • results: 据视觉检查和相关绝对差值图像,SDRCNN比其他8种传统方法和5种轻量级深度学习方法更为精准,具有最小的空间细节模糊和spectral扭曲。同时,SDRCNN的处理时间也最短。最后,研究人员通过缺省实验证明了SDRCNN的组件的有效性。
    Abstract Pansharpening is a process of fusing a high spatial resolution panchromatic image and a low spatial resolution multispectral image to create a high-resolution multispectral image. A novel single-branch, single-scale lightweight convolutional neural network, named SDRCNN, is developed in this study. By using a novel dense residual connected structure and convolution block, SDRCNN achieved a better trade-off between accuracy and efficiency. The performance of SDRCNN was tested using four datasets from the WorldView-3, WorldView-2 and QuickBird satellites. The compared methods include eight traditional methods (i.e., GS, GSA, PRACS, BDSD, SFIM, GLP-CBD, CDIF and LRTCFPan) and five lightweight deep learning methods (i.e., PNN, PanNet, BayesianNet, DMDNet and FusionNet). Based on a visual inspection of the pansharpened images created and the associated absolute residual maps, SDRCNN exhibited least spatial detail blurring and spectral distortion, amongst all the methods considered. The values of the quantitative evaluation metrics were closest to their ideal values when SDRCNN was used. The processing time of SDRCNN was also the shortest among all methods tested. Finally, the effectiveness of each component in the SDRCNN was demonstrated in ablation experiments. All of these confirmed the superiority of SDRCNN.
    摘要 盘晶化是一种将高分辨率粒子图和低分辨率多spectral图像 fusion 成高分辨率多spectral图像的过程。本研究中提出了一种单支持单尺度轻量级卷积神经网络,称为SDRCNN。通过使用一种新型的稠密幂connected结构和卷积块,SDRCNN实现了更好的精度和效率之间的变换。SDRCNN的性能在四个来自WorldView-3、WorldView-2和QuickBird卫星的数据集上进行测试,与八种传统方法(即GS、GSA、PRACS、BDSD、SFIM、GLP-CBD、CDIF和LRTCFPan)和五种轻量级深度学习方法(即PNN、PanNet、BayesianNet、DMDNet和FusionNet)进行比较。视觉检查盘晶化图像和相关的绝对差异图,SDRCNN表现最少的空间细节模糊和spectral扭曲,其中所有方法中最佳。SDRCNN的处理时间也是所有方法中最短。最后,SDRCNN的每个组件的效果在减少实验中得到了证明。这些证明了SDRCNN的优越性。

Spatio-Temporal Classification of Lung Ventilation Patterns using 3D EIT Images: A General Approach for Individualized Lung Function Evaluation

  • paper_url: http://arxiv.org/abs/2307.00307
  • repo_url: None
  • paper_authors: Shuzhe Chen, Li Li, Zhichao Lin, Ke Zhang, Ying Gong, Lu Wang, Xu Wu, Maokun Li, Yuanlin Song, Fan Yang, Shenheng Xu
  • for: 这个研究旨在用电気阻抗图像序列来分类肺功能模式。
  • methods: 这个研究使用了变换自适应网络和多重块来压缩3D电阻图像序列,并使用了一个简单的卷积神经网络进行分类。
  • results: 研究结果显示,使用图像序列可以正确地分类肺功能模式,并且准确率和敏感性都很高。
    Abstract The Pulmonary Function Test (PFT) is an widely utilized and rigorous classification test for lung function evaluation, serving as a comprehensive tool for lung diagnosis. Meanwhile, Electrical Impedance Tomography (EIT) is a rapidly advancing clinical technique that visualizes conductivity distribution induced by ventilation. EIT provides additional spatial and temporal information on lung ventilation beyond traditional PFT. However, relying solely on conventional isolated interpretations of PFT results and EIT images overlooks the continuous dynamic aspects of lung ventilation. This study aims to classify lung ventilation patterns by extracting spatial and temporal features from the 3D EIT image series. The study uses a Variational Autoencoder network with a MultiRes block to compress the spatial distribution in a 3D image into a one-dimensional vector. These vectors are then concatenated to create a feature map for the exhibition of temporal features. A simple convolutional neural network is used for classification. Data collected from 137 subjects were finally used for training. The model is validated by ten-fold and leave-one-out cross-validation first. The accuracy and sensitivity of normal ventilation mode are 0.95 and 1.00, and the f1-score is 0.94. Furthermore, we check the reliability and feasibility of the proposed pipeline by testing it on newly recruited nine subjects. Our results show that the pipeline correctly predicts the ventilation mode of 8 out of 9 subjects. The study demonstrates the potential of using image series for lung ventilation mode classification, providing a feasible method for patient prescreening and presenting an alternative form of PFT.
    摘要 《肺功能测试(PFT)》是一种广泛使用和严格的分类测试,用于评估肺功能, serving as a comprehensive tool for lung diagnosis. Meanwhile, Electrical Impedance Tomography (EIT) is a rapidly advancing clinical technique that visualizes conductivity distribution induced by ventilation. EIT provides additional spatial and temporal information on lung ventilation beyond traditional PFT. However, relying solely on conventional isolated interpretations of PFT results and EIT images overlooks the continuous dynamic aspects of lung ventilation. This study aims to classify lung ventilation patterns by extracting spatial and temporal features from the 3D EIT image series. The study uses a Variational Autoencoder network with a MultiRes block to compress the spatial distribution in a 3D image into a one-dimensional vector. These vectors are then concatenated to create a feature map for the exhibition of temporal features. A simple convolutional neural network is used for classification. Data collected from 137 subjects were finally used for training. The model is validated by ten-fold and leave-one-out cross-validation first. The accuracy and sensitivity of normal ventilation mode are 0.95 and 1.00, and the f1-score is 0.94. Furthermore, we check the reliability and feasibility of the proposed pipeline by testing it on newly recruited nine subjects. Our results show that the pipeline correctly predicts the ventilation mode of 8 out of 9 subjects. The study demonstrates the potential of using image series for lung ventilation mode classification, providing a feasible method for patient prescreening and presenting an alternative form of PFT.

AE-RED: A Hyperspectral Unmixing Framework Powered by Deep Autoencoder and Regularization by Denoising

  • paper_url: http://arxiv.org/abs/2307.00269
  • repo_url: None
  • paper_authors: Min Zhao, Jie Chen, Nicolas Dobigeon
  • for: This paper proposes a novel framework for spectral unmixing that integrates autoencoder networks with regularization by denoising (RED) to enhance the unmixing performance.
  • methods: The proposed framework uses a deep autoencoder network to implicitly regularize the estimates and model the mixture mechanism, and leverages a denoiser to bring in explicit information.
  • results: Experimental results on both synthetic and real data sets show the superiority of the proposed framework compared with state-of-the-art unmixing approaches.Here is the text in Simplified Chinese:
  • for: 这篇论文提出了一种 integrate autoencoder 网络和正则化 by denoising (RED) 的 spectral unmixing 框架,以提高混合性能。
  • methods: 提案的框架使用深度 autoencoder 网络来隐式地正则化估计和模型混合机制,并利用 denoiser 带入显式信息。
  • results: 对both synthetic 和实际数据集进行了实验,结果显示提案的框架与当前的混合方法相比,性能更高。
    Abstract Spectral unmixing has been extensively studied with a variety of methods and used in many applications. Recently, data-driven techniques with deep learning methods have obtained great attention to spectral unmixing for its superior learning ability to automatically learn the structure information. In particular, autoencoder based architectures are elaborately designed to solve blind unmixing and model complex nonlinear mixtures. Nevertheless, these methods perform unmixing task as blackboxes and lack of interpretability. On the other hand, conventional unmixing methods carefully design the regularizer to add explicit information, in which algorithms such as plug-and-play (PnP) strategies utilize off-the-shelf denoisers to plug powerful priors. In this paper, we propose a generic unmixing framework to integrate the autoencoder network with regularization by denoising (RED), named AE-RED. More specially, we decompose the unmixing optimized problem into two subproblems. The first one is solved using deep autoencoders to implicitly regularize the estimates and model the mixture mechanism. The second one leverages the denoiser to bring in the explicit information. In this way, both the characteristics of the deep autoencoder based unmixing methods and priors provided by denoisers are merged into our well-designed framework to enhance the unmixing performance. Experiment results on both synthetic and real data sets show the superiority of our proposed framework compared with state-of-the-art unmixing approaches.
    摘要

Deep Angiogram: Trivializing Retinal Vessel Segmentation

  • paper_url: http://arxiv.org/abs/2307.00245
  • repo_url: None
  • paper_authors: Dewei Hu, Xing Yao, Jiacheng Wang, Yuankai K. Tao, Ipek Oguz
  • for: 本研究旨在提出一种可以Robustly recognize retinal vessels on unseen domains的深度学习模型。
  • methods: 我们提出了一种具有对比损失的变分自动编码器,可以筛除不相关的特征,并生成一个含有Only retinal vessels的深度影像,称为深度气图。然后,通过对深度气图进行阈值处理,可以实现高效的血管分 segmentation。
  • results: 我们的模型可以在不同目标域上稳定地生成稳定的气图,提供了优秀的血管可视化,并且成为了非侵入的、安全的替代品于荧光气graphy。
    Abstract Among the research efforts to segment the retinal vasculature from fundus images, deep learning models consistently achieve superior performance. However, this data-driven approach is very sensitive to domain shifts. For fundus images, such data distribution changes can easily be caused by variations in illumination conditions as well as the presence of disease-related features such as hemorrhages and drusen. Since the source domain may not include all possible types of pathological cases, a model that can robustly recognize vessels on unseen domains is desirable but remains elusive, despite many proposed segmentation networks of ever-increasing complexity. In this work, we propose a contrastive variational auto-encoder that can filter out irrelevant features and synthesize a latent image, named deep angiogram, representing only the retinal vessels. Then segmentation can be readily accomplished by thresholding the deep angiogram. The generalizability of the synthetic network is improved by the contrastive loss that makes the model less sensitive to variations of image contrast and noisy features. Compared to baseline deep segmentation networks, our model achieves higher segmentation performance via simple thresholding. Our experiments show that the model can generate stable angiograms on different target domains, providing excellent visualization of vessels and a non-invasive, safe alternative to fluorescein angiography.
    摘要 中文简体版 amidst 胶囊� branch retinal vasculature 图像的研究努力,深度学习模型一直表现出优秀的性能。然而,这种数据驱动的方法很敏感于频率� changes。为 fundus 图像,这种数据分布变化可以轻松地由照明条件的变化以及疾病相关特征,如血肿和� druse 所引起。由于源频道可能不包括所有可能的疾病情况,一种可以鲁� robustly 识别不同频道上的血管的模型是极其感兴趣的,但尚未实现,尽管已经提出了许多增加复杂性的 segmentation 网络。在这项工作中,我们提出了一种对比� variational autoencoder,可以过滤出不相关的特征并生成一个名为深度� angiogram 的潜在图像,表示只有血管。然后,通过对深度� angiogram 进行简单的阈值处理,可以很容易地完成分 segmentation。我们的模型通过对� target 频道进行匹配,提高了对不同频道的普适性。我们的实验显示,我们的模型可以在不同的 target 频道上生成稳定的 angiograms,提供了优秀的血管可视化和非侵入性、安全的替代品。

AIGCIQA2023: A Large-scale Image Quality Assessment Database for AI Generated Images: from the Perspectives of Quality, Authenticity and Correspondence

  • paper_url: http://arxiv.org/abs/2307.00211
  • repo_url: https://github.com/wangjiarui153/aigciqa2023
  • paper_authors: Jiarui Wang, Huiyu Duan, Jing Liu, Shi Chen, Xiongkuo Min, Guangtao Zhai
  • for: 为了更好地理解人类对AI生成图像的视觉偏好
  • methods: 使用6种当前最先进的文本到图像生成模型生成2000多个图像,并通过组织有序的主观实验评估人类对每个图像的质量、准确性和匹配程度的视觉偏好
  • results: 基于大规模的IQA数据库AIGCIQA2023,对多种当前最先进的IQA指标的性能进行评估
    Abstract In this paper, in order to get a better understanding of the human visual preferences for AIGIs, a large-scale IQA database for AIGC is established, which is named as AIGCIQA2023. We first generate over 2000 images based on 6 state-of-the-art text-to-image generation models using 100 prompts. Based on these images, a well-organized subjective experiment is conducted to assess the human visual preferences for each image from three perspectives including quality, authenticity and correspondence. Finally, based on this large-scale database, we conduct a benchmark experiment to evaluate the performance of several state-of-the-art IQA metrics on our constructed database.
    摘要 在本文中,为了更好地理解人类对AI生成图像的Visual preferences,我们建立了一个大规模的AI生成图像评价数据库,名为AIGCIQA2023。我们首先生成了2000多个图像,使用100个提示语来基于6种state-of-the-art文本到图像生成模型。然后,我们对这些图像进行了严格的主观测试,以评估人类对每个图像的质量、原始性和准确性的Visual preferences。最后,基于我们构建的数据库,我们进行了一项 benchmark测试,以评估一些state-of-the-art IQA metrics的性能。

Unsupervised Coordinate-Based Video Denoising

  • paper_url: http://arxiv.org/abs/2307.00179
  • repo_url: None
  • paper_authors: Mary Damilola Aiyetigbo, Dineshchandar Ravichandran, Reda Chalhoub, Peter Kalivas, Nianyi Li
  • for: 这个论文是为了提出一种新的无监督视频干净深度学习方法,以解决数据稀缺问题并对不同噪声模式具有抗性,从而扩大其应用范围。
  • methods: 该方法包括三个模块:特征生成器生成特征地图,噪声约束网络生成一些噪声约束的参照帧,以及重新引入高频环境细节。通过坐标基于网络,我们可以大幅简化网络结构,同时保留高频环境细节在干净视频帧中。
  • results: 我们通过对实验室和实际捕捉的 calcium 影像视频序列进行了广泛的实验,并证明了我们的方法可以有效地去噪真实世界 calcium 影像视频序列,不需要在训练过程中知道噪声模式和数据增强。
    Abstract In this paper, we introduce a novel unsupervised video denoising deep learning approach that can help to mitigate data scarcity issues and shows robustness against different noise patterns, enhancing its broad applicability. Our method comprises three modules: a Feature generator creating features maps, a Denoise-Net generating denoised but slightly blurry reference frames, and a Refine-Net re-introducing high-frequency details. By leveraging the coordinate-based network, we can greatly simplify the network structure while preserving high-frequency details in the denoised video frames. Extensive experiments on both simulated and real-captured demonstrate that our method can effectively denoise real-world calcium imaging video sequences without prior knowledge of noise models and data augmentation during training.
    摘要 在这篇论文中,我们提出了一种新的无监督视频干扰深度学习方法,可以帮助解决数据稀缺问题,并对不同干扰模式表现稳定,提高其广泛应用性。我们的方法包括三个模块:一个特征生成器生成特征地图,一个干扰网络生成干扰除的相对轻微模糊参照帧,以及一个修复网络重新引入高频环境细节。通过借鉴坐标基于网络,我们可以大幅简化网络结构,保留高频环境细节在干扰视频帧中。我们在实验中使用 simulate 和实际捕获的 calcium 影像视频序列进行了广泛的测试,并证明了我们的方法可以有效地去干扰实际世界 calcium 影像视频序列,不需要先知道干扰模型和数据增强 durante 训练。

Multiscale Progressive Text Prompt Network for Medical Image Segmentation

  • paper_url: http://arxiv.org/abs/2307.00174
  • repo_url: https://github.com/codehxj/MPTPN-for--Medical-Image-Segmentation
  • paper_authors: Xianjun Han, Qianqian Chen, Zhaoyang Xie, Xuejun Li, Hongyu Yang
  • for: 用于靠据医学图像分割任务的模型训练,以获取可靠的形态统计数据。
  • methods: 使用进步文本提示来导航分割过程,包括两个阶段:首先使用自然图像进行对比学习,以预训练一个强大的先提示编码器(PPE),然后将医学图像和文本提示送入PPE进行下游分割任务。
  • results: 通过将多尺度特征融合并使用文本提示来改进预测准确性,并使用UpAttention块进行预测结果的精细调整。模型可以减少数据标注成本,同时在医学图像和自然图像上都有出色表现。
    Abstract The accurate segmentation of medical images is a crucial step in obtaining reliable morphological statistics. However, training a deep neural network for this task requires a large amount of labeled data to ensure high-accuracy results. To address this issue, we propose using progressive text prompts as prior knowledge to guide the segmentation process. Our model consists of two stages. In the first stage, we perform contrastive learning on natural images to pretrain a powerful prior prompt encoder (PPE). This PPE leverages text prior prompts to generate multimodality features. In the second stage, medical image and text prior prompts are sent into the PPE inherited from the first stage to achieve the downstream medical image segmentation task. A multiscale feature fusion block (MSFF) combines the features from the PPE to produce multiscale multimodality features. These two progressive features not only bridge the semantic gap but also improve prediction accuracy. Finally, an UpAttention block refines the predicted results by merging the image and text features. This design provides a simple and accurate way to leverage multiscale progressive text prior prompts for medical image segmentation. Compared with using only images, our model achieves high-quality results with low data annotation costs. Moreover, our model not only has excellent reliability and validity on medical images but also performs well on natural images. The experimental results on different image datasets demonstrate that our model is effective and robust for image segmentation.
    摘要 医学影像分割是获取可靠形态统计的关键步骤。然而,用深度神经网络进行这个任务需要大量标注数据来确保高准确率结果。为解决这个问题,我们提议使用进步文本提示作为先验知识来导航分割过程。我们的模型包括两个阶段。在第一阶段,我们通过对自然图像进行对比学习来预训练一个强大的先验提示编码器(PPE)。这个PPE利用文本先验提示来生成多样性特征。在第二阶段,医学影像和文本先验提示被送入PPE继承自第一阶段以实现下游医学影像分割任务。一个多尺度特征融合块(MSFF)将PPE生成的特征融合成多尺度多样性特征。这两个进步特征不仅bridge semantic gap,还提高预测精度。最后,一个UpAttention块进一步细化预测结果,将图像和文本特征相结合。这种设计提供了一种简单有效的方式,使用进步多尺度文本先验提示来实现医学影像分割。相比使用仅图像,我们的模型可以 дости得高质量结果,同时减少数据标注成本。此外,我们的模型不仅在医学影像上有出色的可靠性和有效性,还在自然图像上表现良好。实验结果表明,我们的模型是有效和稳定的图像分割模型。