cs.LG - 2023-07-02

Protecting the Future: Neonatal Seizure Detection with Spatial-Temporal Modeling

  • paper_url: http://arxiv.org/abs/2307.05382
  • repo_url: None
  • paper_authors: Ziyue Li, Yuchen Fang, You Li, Kan Ren, Yansen Wang, Xufang Luo, Juanyong Duan, Congrui Huang, Dongsheng Li, Lili Qiu
  • for: 旨在提供一种自动化新生儿癫痫识别方法,以替代人工监测。
  • methods: 使用深度学习框架STATENet,特点包括temporal、 spatial和model层次的精细设计,以便更好地适应新生儿癫痫特点。
  • results: 实验结果表明,我们的框架在大规模实际新生儿EEG数据集上表现出了显著更高的癫痫识别精度。
    Abstract A timely detection of seizures for newborn infants with electroencephalogram (EEG) has been a common yet life-saving practice in the Neonatal Intensive Care Unit (NICU). However, it requires great human efforts for real-time monitoring, which calls for automated solutions to neonatal seizure detection. Moreover, the current automated methods focusing on adult epilepsy monitoring often fail due to (i) dynamic seizure onset location in human brains; (ii) different montages on neonates and (iii) huge distribution shift among different subjects. In this paper, we propose a deep learning framework, namely STATENet, to address the exclusive challenges with exquisite designs at the temporal, spatial and model levels. The experiments over the real-world large-scale neonatal EEG dataset illustrate that our framework achieves significantly better seizure detection performance.
    摘要 新生儿电enzephalogram(EEG)检测是医疗单元(NICU)中一种常见 yet life-saving的做法。然而,这需要大量的人工劳动进行实时监测,因此需要自动化新生儿癫痫检测的解决方案。然而,现有的自动化方法通常因(i)脑动脉癫痫开始的地方不断变化;(ii)新生儿和成人的montage不同;以及(iii)不同个体之间的分布差异而失败。本文提出一种深度学习框架,即状态网络(STATENet),以解决这些独特的挑战。实验表明,我们的框架在大规模实际新生儿EEG数据集上显著提高了癫痫检测性能。

IoT-Based Air Quality Monitoring System with Machine Learning for Accurate and Real-time Data Analysis

  • paper_url: http://arxiv.org/abs/2307.00580
  • repo_url: https://github.com/Hemanth-Karnati-HK/AQMS-ML
  • paper_authors: Hemanth Karnati
  • for: addresses the issue of air pollution awareness in urban areas
  • methods: uses two sensors (MQ135 and MQ3) to detect harmful gases and measure air quality in PPM, and employs machine learning analysis on the collected data
  • results: provides real-time data specific to the user’s location, and visualizes the data using a cloud-based web app called ThinkSpeak
    Abstract Air pollution in urban areas has severe consequences for both human health and the environment, predominantly caused by exhaust emissions from vehicles. To address the issue of air pollution awareness, Air Pollution Monitoring systems are used to measure the concentration of gases like CO2, smoke, alcohol, benzene, and NH3 present in the air. However, current mobile applications are unable to provide users with real-time data specific to their location. In this paper, we propose the development of a portable air quality detection device that can be used anywhere. The data collected will be stored and visualized using the cloud-based web app ThinkSpeak. The device utilizes two sensors, MQ135 and MQ3, to detect harmful gases and measure air quality in parts per million (PPM). Additionally, machine learning analysis will be employed on the collected data.
    摘要 城市空气污染造成人体健康和环境问题严重,主要由交通工具排放的废气引起。为解决空气污染意识问题,空气污染监测系统用于测量空气中的气体 like CO2、烟雾、酒精、苯并丙烯的浓度。但现有 mobil 应用程序无法为用户提供实时特定位置的数据。在这篇论文中,我们提议开发一种可携带的空气质量检测设备,可以在任何地方使用。收集的数据将被存储并视觉化使用云端的网站 ThinkSpeak。 该设备使用 MQ135 和 MQ3 两种传感器检测危险气体,并测量空气质量为每万分之一(PPM)。此外,我们还将机器学习分析集成到收集的数据中。

Mode-wise Principal Subspace Pursuit and Matrix Spiked Covariance Model

  • paper_url: http://arxiv.org/abs/2307.00575
  • repo_url: None
  • paper_authors: Runshi Tang, Ming Yuan, Anru R. Zhang
  • for: 这个论文提出了一种新的框架,叫做模式精炼主要特征追踪(MOP-UP),用于抽取矩阵数据中隐藏的变化。
  • methods: 该algorithm consists of two steps: Average Subspace Capture (ASC) and Alternating Projection (AP). These steps are specifically designed to capture the row-wise and column-wise dimension-reduced subspaces which contain the most informative features of the data.
  • results: The proposed framework is demonstrated to be effective through experiments on both simulated and real datasets, and the authors also discuss generalizations of their approach to higher-order data.
    Abstract This paper introduces a novel framework called Mode-wise Principal Subspace Pursuit (MOP-UP) to extract hidden variations in both the row and column dimensions for matrix data. To enhance the understanding of the framework, we introduce a class of matrix-variate spiked covariance models that serve as inspiration for the development of the MOP-UP algorithm. The MOP-UP algorithm consists of two steps: Average Subspace Capture (ASC) and Alternating Projection (AP). These steps are specifically designed to capture the row-wise and column-wise dimension-reduced subspaces which contain the most informative features of the data. ASC utilizes a novel average projection operator as initialization and achieves exact recovery in the noiseless setting. We analyze the convergence and non-asymptotic error bounds of MOP-UP, introducing a blockwise matrix eigenvalue perturbation bound that proves the desired bound, where classic perturbation bounds fail. The effectiveness and practical merits of the proposed framework are demonstrated through experiments on both simulated and real datasets. Lastly, we discuss generalizations of our approach to higher-order data.
    摘要 MOP-UP consists of two steps: Average Subspace Capture (ASC) and Alternating Projection (AP). These steps are designed to capture the row-wise and column-wise dimension-reduced subspaces that contain the most informative features of the data. ASC uses a novel average projection operator as initialization and can achieve exact recovery in the noiseless setting.We analyze the convergence and non-asymptotic error bounds of MOP-UP, using a blockwise matrix eigenvalue perturbation bound that proves the desired bound, where classic perturbation bounds fail. The effectiveness and practical merits of the proposed framework are demonstrated through experiments on both simulated and real datasets. Finally, we discuss generalizations of our approach to higher-order data.

Conditionally Invariant Representation Learning for Disentangling Cellular Heterogeneity

  • paper_url: http://arxiv.org/abs/2307.00558
  • repo_url: None
  • paper_authors: Hananeh Aliee, Ferdinand Kapl, Soroor Hediyeh-Zadeh, Fabian J. Theis
  • for: 这 paper 的目的是学习受到不想要的变化的表示,通过强制独立性来消除噪声,并建立可解释的模型。
  • methods: 该方法利用域变化来学习受到不想要的变化的表示,并在满足条件下强制独立性,以便构建可解释的模型。
  • results: 该方法可以在大规模的单元细胞 genomics 数据中实现数据集 Integration,并提供更深刻的理解单元细胞多样性和疾病细胞状态的能力。
    Abstract This paper presents a novel approach that leverages domain variability to learn representations that are conditionally invariant to unwanted variability or distractors. Our approach identifies both spurious and invariant latent features necessary for achieving accurate reconstruction by placing distinct conditional priors on latent features. The invariant signals are disentangled from noise by enforcing independence which facilitates the construction of an interpretable model with a causal semantic. By exploiting the interplay between data domains and labels, our method simultaneously identifies invariant features and builds invariant predictors. We apply our method to grand biological challenges, such as data integration in single-cell genomics with the aim of capturing biological variations across datasets with many samples, obtained from different conditions or multiple laboratories. Our approach allows for the incorporation of specific biological mechanisms, including gene programs, disease states, or treatment conditions into the data integration process, bridging the gap between the theoretical assumptions and real biological applications. Specifically, the proposed approach helps to disentangle biological signals from data biases that are unrelated to the target task or the causal explanation of interest. Through extensive benchmarking using large-scale human hematopoiesis and human lung cancer data, we validate the superiority of our approach over existing methods and demonstrate that it can empower deeper insights into cellular heterogeneity and the identification of disease cell states.
    摘要

Partial-label Learning with Mixed Closed-set and Open-set Out-of-candidate Examples

  • paper_url: http://arxiv.org/abs/2307.00553
  • repo_url: None
  • paper_authors: Shuo He, Lei Feng, Guowu Yang
  • for: 本研究旨在解决部分标签学习中的一种限制性假设问题,即训练示例的真实标签必须在候选标签集中。
  • methods: 本研究使用了两种类型的假设例子,即闭集/开集假设例子,其中true标签在知道标签空间内/外。为解决这个新的部分标签学习问题,我们首先计算了木头克朗径分布损失函数,并根据特制的标签分类器来动态地区分两种假设例子。对于闭集假设例子,我们进行了反向标签混淆处理;对于开集假设例子,我们利用了一种有效的规范策略,在候选标签集中随机分配了假设标签。这样,两种假设例子都可以得到分化并利用于模型训练。
  • results: 我们的提议方法与现有的部分标签学习方法进行比较,实验结果显示,我们的方法在训练集中表现更好。
    Abstract Partial-label learning (PLL) relies on a key assumption that the true label of each training example must be in the candidate label set. This restrictive assumption may be violated in complex real-world scenarios, and thus the true label of some collected examples could be unexpectedly outside the assigned candidate label set. In this paper, we term the examples whose true label is outside the candidate label set OOC (out-of-candidate) examples, and pioneer a new PLL study to learn with OOC examples. We consider two types of OOC examples in reality, i.e., the closed-set/open-set OOC examples whose true label is inside/outside the known label space. To solve this new PLL problem, we first calculate the wooden cross-entropy loss from candidate and non-candidate labels respectively, and dynamically differentiate the two types of OOC examples based on specially designed criteria. Then, for closed-set OOC examples, we conduct reversed label disambiguation in the non-candidate label set; for open-set OOC examples, we leverage them for training by utilizing an effective regularization strategy that dynamically assigns random candidate labels from the candidate label set. In this way, the two types of OOC examples can be differentiated and further leveraged for model training. Extensive experiments demonstrate that our proposed method outperforms state-of-the-art PLL methods.
    摘要 假设学习(Partial-label learning,PLL)基于一个关键假设,即每个训练示例的真实标签必须在候选标签集中。然而,在复杂的实际场景中,这个假设可能会被违反,有些收集的示例的真实标签可能会不期望地外部候选标签集。在这篇论文中,我们称这些示例为“外部候选标签示例”(OOC,out-of-candidate),并开拓了一种新的PLL研究,以学习与OOC示例。我们在实际中考虑了两种类型的OOC示例,即关闭集/开放集OOC示例,其中true标签在知道的标签空间内/外。为解决这个新的PLL问题,我们首先计算了木材cross-entropy损失从候选标签和非候选标签分别来,然后通过特殊设计的标准来动态分类OOC示例。对关闭集OOC示例,我们进行了反向标签混淆在非候选标签集中;对开放集OOC示例,我们利用它们进行训练,通过使用一种有效的补偿策略,动态将候选标签集中的随机标签分配给OOC示例。这种方法可以区分和利用两种类型的OOC示例进行训练。我们的提议方法在现有的PLL方法之上具有优异性。

Adaptive reinforcement learning of multi-agent ethically-aligned behaviours: the QSOM and QDSOM algorithms

  • paper_url: http://arxiv.org/abs/2307.00552
  • repo_url: None
  • paper_authors: Rémy Chaput, Olivier Boissier, Mathieu Guillermin
  • for: 这篇论文主要是为了解决人工智能系统的伦理考虑问题,特别是随着时间的推移,我们社会的价值观念不断发展,这使得现有的AI系统困难以适应。
  • methods: 该论文提出了两种算法,即QSOM和QDSOM,它们可以适应环境和奖励函数的变化,以实现AI系统的伦理考虑。它们使用了知名的Q表格和动态自组织地图来处理连续和多维状态和动作空间。
  • results: 作者在一个小型智能网格中进行了多代理能量分配的用例,并证明了QSOM和QDSOM算法的适应性和比基eline Reinforcement Learning算法的高性能。
    Abstract The numerous deployed Artificial Intelligence systems need to be aligned with our ethical considerations. However, such ethical considerations might change as time passes: our society is not fixed, and our social mores evolve. This makes it difficult for these AI systems; in the Machine Ethics field especially, it has remained an under-studied challenge. In this paper, we present two algorithms, named QSOM and QDSOM, which are able to adapt to changes in the environment, and especially in the reward function, which represents the ethical considerations that we want these systems to be aligned with. They associate the well-known Q-Table to (Dynamic) Self-Organizing Maps to handle the continuous and multi-dimensional state and action spaces. We evaluate them on a use-case of multi-agent energy repartition within a small Smart Grid neighborhood, and prove their ability to adapt, and their higher performance compared to baseline Reinforcement Learning algorithms.
    摘要 各种已经部署的人工智能系统需要与我们的道德考虑相协调。然而,这些道德考虑可能随着时间的推移而发生变化:我们的社会不固定,我们的社会习俗也在不断发展。这会让这些 AI 系统受到挑战:在机器伦理学领域,这是一个未得到充分研究的挑战。在这篇论文中,我们提出了两种算法,即 QSOM 和 QDSOM,它们可以适应环境的变化,特别是奖励函数的变化,这些奖励函数表达我们想要这些系统遵循的道德考虑。它们将知名的 Q-表与(动态)自组织地图相结合,以处理连续和多维状态和动作空间。我们在一个小型智能网格社区中进行了多机器人能源分配的使用案例研究,并证明它们的适应性和与基准 Reinforcement Learning 算法相比的高性能。

Is Risk-Sensitive Reinforcement Learning Properly Resolved?

  • paper_url: http://arxiv.org/abs/2307.00547
  • repo_url: None
  • paper_authors: Ruiwen Zhou, Minghuan Liu, Kan Ren, Xufang Luo, Weinan Zhang, Dongsheng Li
  • for: 本研究旨在解决分布式强化学习中的风险敏感问题,提出了一种新的算法,即轨迹Q学习(TQL),以便在风险敏感目标下优化做法。
  • methods: 本研究使用了分布式强化学习框架,学习风险敏感目标函数,并提出了一种新的学习算法,即TQL,可以在不同的风险度量下学习不同的风险敏感策略。
  • results: 实验结果表明,TQL算法可以有效地实现风险敏感目标,并且可以在不同的风险度量下实现更好的性能。
    Abstract Due to the nature of risk management in learning applicable policies, risk-sensitive reinforcement learning (RSRL) has been realized as an important direction. RSRL is usually achieved by learning risk-sensitive objectives characterized by various risk measures, under the framework of distributional reinforcement learning. However, it remains unclear if the distributional Bellman operator properly optimizes the RSRL objective in the sense of risk measures. In this paper, we prove that the existing RSRL methods do not achieve unbiased optimization and can not guarantee optimality or even improvements regarding risk measures over accumulated return distributions. To remedy this issue, we further propose a novel algorithm, namely Trajectory Q-Learning (TQL), for RSRL problems with provable convergence to the optimal policy. Based on our new learning architecture, we are free to introduce a general and practical implementation for different risk measures to learn disparate risk-sensitive policies. In the experiments, we verify the learnability of our algorithm and show how our method effectively achieves better performances toward risk-sensitive objectives.
    摘要 (Simplified Chinese translation)由于学习适用政策的风险管理性质,风险敏感回归学习(RSRL)已成为重要的方向。RSRL通常通过学习不同的风险度量来定义风险敏感目标,在分布式回归学习框架下进行学习。然而,是否存在分布式贝尔曼算子可以正确优化RSRL目标还存在unclear。在这篇论文中,我们证明现有的RSRL方法不能够实现不偏优化和 garantía优化或even improvements regarding风险度量 надaccumulated return分布。为了解决这个问题,我们进一步提出了一种新的算法,即轨迹Q学习(TQL),用于RSRL问题,并证明其可提供可靠的优化方案。基于我们的新学习架构,我们可以自由地引入不同的风险度量来学习不同的风险敏感政策。在实验中,我们验证了我们的算法的学习可能性并显示了我们的方法可以更好地实现风险敏感目标。

Defending Against Malicious Behaviors in Federated Learning with Blockchain

  • paper_url: http://arxiv.org/abs/2307.00543
  • repo_url: None
  • paper_authors: Nanqing Dong, Zhipeng Wang, Jiahao Sun, Michael Kampffmeyer, Yizhe Wen, Shuoying Zhang, William Knottenbelt, Eric Xing
  • for: 提高 federated learning 系统的安全性和可靠性,使其更易承受多个机构数据拥有者(客户)之间的合作训练。
  • methods: 基于区块链和分布式笔记技术,提出一种安全可靠的 federated learning 系统,包括对接触 peer-to-peer 投票机制和奖励折损机制,以检测和抵制恶意客户端行为。
  • results: 通过理论和实验分析,证明提出的方法可以具有高效性和可靠性,并能够抵制恶意客户端的行为。
    Abstract In the era of deep learning, federated learning (FL) presents a promising approach that allows multi-institutional data owners, or clients, to collaboratively train machine learning models without compromising data privacy. However, most existing FL approaches rely on a centralized server for global model aggregation, leading to a single point of failure. This makes the system vulnerable to malicious attacks when dealing with dishonest clients. In this work, we address this problem by proposing a secure and reliable FL system based on blockchain and distributed ledger technology. Our system incorporates a peer-to-peer voting mechanism and a reward-and-slash mechanism, which are powered by on-chain smart contracts, to detect and deter malicious behaviors. Both theoretical and empirical analyses are presented to demonstrate the effectiveness of the proposed approach, showing that our framework is robust against malicious client-side behaviors.
    摘要 在深度学习时代,联邦学习(FL)表现出了一种有前途的方法,允许多个机构数据所有者或客户端共同训练机器学习模型,无需违反数据隐私。然而,大多数现有FL方法仍然依赖中央服务器进行全球模型集成,导致单点失败的风险。这使得系统易受到不良客户端行为的攻击。在这种情况下,我们解决这个问题,提出一个安全可靠的FL系统,基于区块链和分布式记录技术。我们的系统包括一个点对点投票机制和一个奖励惩罚机制,这些机制由onto-chain智能合约动力,用于检测和抵制不良客户端行为。我们提供了理论和实验分析,证明我们的框架对客户端不良行为具有耐诡性。

Collaborative Policy Learning for Dynamic Scheduling Tasks in Cloud-Edge-Terminal IoT Networks Using Federated Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2307.00541
  • repo_url: None
  • paper_authors: Do-Yup Kim, Da-Eun Lee, Ji-Wan Kim, Hyun-Suk Lee
  • for: 这篇论文研究了云端-边缘-终端互联网络,其中边缘设备进行了一系列通常的动态调度任务。
  • methods: 该论文提出了一种基于联邦 reinforcement learning 的共同策略学习框架,用于动态调度任务。该框架可以在云服务器上协同学习每个任务的中央策略,并且可以通过Edge在每个任务上学习本地策略,以避免Edge需要从头开始学习策略。
  • results: 通过实验,论文表明,相比没有共同策略学习的方法,该框架能够加速策略学习速度,并且帮助新到达的Edge更容易适应其任务。
    Abstract In this paper, we examine cloud-edge-terminal IoT networks, where edges undertake a range of typical dynamic scheduling tasks. In these IoT networks, a central policy for each task can be constructed at a cloud server. The central policy can be then used by the edges conducting the task, thereby mitigating the need for them to learn their own policy from scratch. Furthermore, this central policy can be collaboratively learned at the cloud server by aggregating local experiences from the edges, thanks to the hierarchical architecture of the IoT networks. To this end, we propose a novel collaborative policy learning framework for dynamic scheduling tasks using federated reinforcement learning. For effective learning, our framework adaptively selects the tasks for collaborative learning in each round, taking into account the need for fairness among tasks. In addition, as a key enabler of the framework, we propose an edge-agnostic policy structure that enables the aggregation of local policies from different edges. We then provide the convergence analysis of the framework. Through simulations, we demonstrate that our proposed framework significantly outperforms the approaches without collaborative policy learning. Notably, it accelerates the learning speed of the policies and allows newly arrived edges to adapt to their tasks more easily.
    摘要 在这篇论文中,我们研究了云端-边缘-终端互联网络,其中边缘进行了一系列典型的动态调度任务。在这些互联网络中,可以在云服务器上构建一个中央政策 для每个任务。这个中央政策可以由边缘进行调度任务使用,从而减少边缘需要从零开始学习自己的策略。此外,这个中央政策可以在云服务器上协同学习,通过累累的结构来汇集不同边缘的地方经验。为了实现这一目标,我们提出了一种基于联邦束力学学习的共同策略学习框架。为了有效学习,我们的框架在每次轮次中适时选择需要协同学习的任务,考虑到任务之间的公平性。此外,我们还提出了一种不受边缘影响的策略结构,以便将不同边缘的本地策略集成。然后,我们提供了框架的收敛分析。通过实验,我们证明了我们提出的框架可以快速学习策略,并且新到达的边缘可以更容易适应自己的任务。

Shared Growth of Graph Neural Networks via Free-direction Knowledge Distillation

  • paper_url: http://arxiv.org/abs/2307.00534
  • repo_url: None
  • paper_authors: Kaituo Feng, Yikun Miao, Changsheng Li, Ye Yuan, Guoren Wang
  • for: 提高Graph Neural Networks(GNNs)的性能,避免过参数和过拟合问题。
  • methods: 提出了首个基于强化学习的Free-direction Knowledge Distillation框架(FreeKD),不再需要提供一个很深和优化的教师GNN。我们的核心想法是通过层次的强化学习来让两个相对较浅的GNN Collaboratively学习,以便在层次上交换知识。我们观察到,一个典型的GNN模型在训练中常常在不同的节点上展现出不同的性能,所以我们设计了一种动态和自由方向的知识传递策略,其中包括两个层次的操作:1)节点级别的操作确定了两个网络中相应节点之间的知识传递方向; 2)结构级别的操作确定了哪些本地结构由节点级别的操作生成的知识传递。
  • results: 对五个标准 benchmark dataset进行了广泛的实验,并显示了我们的方法可以大幅提高基GNN的性能,并且可以适应不同的GNN模型。最后,我们还提出了FreeKD++,可以在多视图输入下实现自由方向知识传递。
    Abstract Knowledge distillation (KD) has shown to be effective to boost the performance of graph neural networks (GNNs), where the typical objective is to distill knowledge from a deeper teacher GNN into a shallower student GNN. However, it is often quite challenging to train a satisfactory deeper GNN due to the well-known over-parametrized and over-smoothing issues, leading to invalid knowledge transfer in practical applications. In this paper, we propose the first Free-direction Knowledge Distillation framework via reinforcement learning for GNNs, called FreeKD, which is no longer required to provide a deeper well-optimized teacher GNN. Our core idea is to collaboratively learn two shallower GNNs in an effort to exchange knowledge between them via reinforcement learning in a hierarchical way. As we observe that one typical GNN model often exhibits better and worse performances at different nodes during training, we devise a dynamic and free-direction knowledge transfer strategy that involves two levels of actions: 1) node-level action determines the directions of knowledge transfer between the corresponding nodes of two networks; and then 2) structure-level action determines which of the local structures generated by the node-level actions to be propagated. Furthermore, considering the diverse knowledge present in different GNNs when dealing with multi-view inputs, we introduce FreeKD++ as a solution to enable free-direction knowledge transfer among multiple shallow GNNs operating on multi-view inputs. Extensive experiments on five benchmark datasets demonstrate our approaches outperform the base GNNs in a large margin, and shows their efficacy to various GNNs. More surprisingly, our FreeKD has comparable or even better performance than traditional KD algorithms that distill knowledge from a deeper and stronger teacher GNN.
    摘要 知识塑化(KD)已经证明可以提高图内 нейрон网络(GNN)的性能,通常的目标是将深度更大的教师GNN中的知识透传到更浅的学生GNN中。然而,在实际应用中,往往很难训练一个满意的深度更大的GNN,因为图内神经网络容易受到过参数和过缓态问题的影响,从而导致无效的知识传递。在这篇论文中,我们提出了首个基于奖励学习的自由方向知识塑化框架 для GNN,称为FreeKD。我们的核心想法是通过奖励学习的方式,将两个更浅的GNN合作学习,以便在其中交换知识。我们发现,一个典型的GNN模型在训练中经常在不同的节点表现出不同的性能,因此我们设计了一种动态和自由方向的知识传递策略,其中包括两个层次的动作:1)节点级别的动作确定了两个网络中对应节点之间的知识传递方向; 然后2)结构级别的动作确定了哪些本地结构,由节点级别的动作生成的。此外,当处理多视图输入时,我们引入FreeKD++,以允许多个浅GNN之间自由地进行知识传递。我们的方法在五个基准数据集上进行了广泛的实验,并证明了我们的方法在基GNN上大幅提高性能,并且可以适应不同的GNN模型。更重要的是,我们的FreeKD与传统的KD算法相比,在提取深度更大的教师GNN中的知识时,表现相对或甚至更好。

New intelligent defense systems to reduce the risks of Selfish Mining and Double-Spending attacks using Learning Automata

  • paper_url: http://arxiv.org/abs/2307.00529
  • repo_url: None
  • paper_authors: Seyed Ardalan Ghoreishi, Mohammad Reza Meybodi
  • for: 这 paper addresses the double-spending and selfish mining attacks challenges in blockchain-based digital currencies.
  • methods: 该 paper 提出了一种新的攻击方法, combinining double-spending 和 selfish mining attacks,并提出了一种基于机器学习的解决方案。 Specifically, 使用 learning automaton 来开发两种模型, namely SDTLA 和 WVBM,可以有效地防止自ish mining attacks。
  • results: 实验结果表明, SDTLA 方法可以提高自ish mining 的利润阈值达到 47%,而 WVBM 方法在许多情况下几乎达到理想情况,即每个矿工的收益与其分配的 hash 处理能力成正比。 此外,两种方法都可以有效地减少 double-spending 的风险。
    Abstract In this paper, we address the critical challenges of double-spending and selfish mining attacks in blockchain-based digital currencies. Double-spending is a problem where the same tender is spent multiple times during a digital currency transaction, while selfish mining is an intentional alteration of a blockchain to increase rewards to one miner or a group of miners. We introduce a new attack that combines both these attacks and propose a machine learning-based solution to mitigate the risks associated with them. Specifically, we use the learning automaton, a powerful online learning method, to develop two models, namely the SDTLA and WVBM, which can effectively defend against selfish mining attacks. Our experimental results show that the SDTLA method increases the profitability threshold of selfish mining up to 47$\%$, while the WVBM method performs even better and is very close to the ideal situation where each miner's revenue is proportional to their shared hash processing power. Additionally, we demonstrate that both methods can effectively reduce the risks of double-spending by tuning the $Z$ Parameter. Our findings highlight the potential of SDTLA and WVBM as promising solutions for enhancing the security and efficiency of blockchain networks.
    摘要 在本文中,我们讨论了区块链基于货币的双重支付和自私采矿攻击的关键挑战。双重支付是一个情况下,同一笔货币在数字货币交易中被使用多次,而自私采矿是故意修改区块链,以增加采矿奖励的情况。我们介绍了一种新的攻击,该攻击组合了这两种攻击,并提出了基于机器学习的解决方案。 Specifically,我们使用学习自动机,一种强大的在线学习方法,开发了两种模型, namely SDTLA和WVBM,以有效防止自私采矿攻击。我们的实验结果表明,SDTLA方法可以提高自私采矿的利润阈值,最高达47%;WVBM方法表现更好,与每个矿工的分配 hash 处理能力相似。此外,我们证明了这两种方法可以有效降低双重支付的风险,通过调整 $Z$ 参数。我们的发现表明 SDTLA 和 WVBM 是加强区块链网络安全性和效率的有力解决方案。

Graph Neural Network based Log Anomaly Detection and Explanation

  • paper_url: http://arxiv.org/abs/2307.00527
  • repo_url: None
  • paper_authors: Zhong Li, Jiayang Shi, Matthijs van Leeuwen
  • for: 该研究旨在提高高科技系统监控中的日志异常检测精度,使用图形模型来检测日志异常。
  • methods: 该方法首先将日志事件转换为有Attributes、指向和权重的图形模型,然后使用图形神经网络进行图形级别异常检测。
  • results: 实验结果显示,Logs2Graphs在五个基准数据集上表现至少与现状异常检测方法相当,而在复杂数据集上大量超过现状异常检测方法的性能。
    Abstract Event logs are widely used to record the status of high-tech systems, making log anomaly detection important for monitoring those systems. Most existing log anomaly detection methods take a log event count matrix or log event sequences as input, exploiting quantitative and/or sequential relationships between log events to detect anomalies. Unfortunately, only considering quantitative or sequential relationships may result in many false positives and/or false negatives. To alleviate this problem, we propose a graph-based method for unsupervised log anomaly detection, dubbed Logs2Graphs, which first converts event logs into attributed, directed, and weighted graphs, and then leverages graph neural networks to perform graph-level anomaly detection. Specifically, we introduce One-Class Digraph Inception Convolutional Networks, abbreviated as OCDiGCN, a novel graph neural network model for detecting graph-level anomalies in a collection of attributed, directed, and weighted graphs. By coupling the graph representation and anomaly detection steps, OCDiGCN can learn a representation that is especially suited for anomaly detection, resulting in a high detection accuracy. Importantly, for each identified anomaly, we additionally provide a small subset of nodes that play a crucial role in OCDiGCN's prediction as explanations, which can offer valuable cues for subsequent root cause diagnosis. Experiments on five benchmark datasets show that Logs2Graphs performs at least on par state-of-the-art log anomaly detection methods on simple datasets while largely outperforming state-of-the-art log anomaly detection methods on complicated datasets.
    摘要 <>translate the following text into Simplified Chinese<>高科技系统的事件日志广泛应用,因此对事件日志异常检测非常重要。现有的大多数异常检测方法都是根据事件日志的数量或Sequential关系进行检测,但是只考虑数量或Sequential关系可能会导致许多假阳性和/或假阴性。为了解决这个问题,我们提出了一种基于图的方法,名为Logs2Graphs,它首先将事件日志转换为有 Attribute、指向和权重的图,然后利用图神经网络进行图级异常检测。我们引入了一种novel的图神经网络模型,称为One-Class Digraph Inception Convolutional Networks(简称OCDiGCN),可以在一个集合 Attribute、指向和权重的图中检测图级异常。通过将图表示和异常检测步骤结合起来,OCDiGCN可以学习一种特别适合异常检测的表示,从而实现高的检测精度。此外,对每个异常检测结果,我们还提供了一小 subsets of nodes,这些 nodes 在 OCDiGCN 的预测中发挥了关键作用,这些 nodes 可以提供有价值的诊断灵感。在五个 benchmark 数据集上进行了实验,Logs2Graphs 与 state-of-the-art 异常检测方法在简单的数据集上表现至少与state-of-the-art,而在复杂的数据集上则广泛超越 state-of-the-art。

TensorGPT: Efficient Compression of the Embedding Layer in LLMs based on the Tensor-Train Decomposition

  • paper_url: http://arxiv.org/abs/2307.00526
  • repo_url: None
  • paper_authors: Mingxue Xu, Yao Lei Xu, Danilo P. Mandic
  • For: 提高 Large Language Models (LLMs) 的性能和压缩硬件需求* Methods: 使用 Tensor-Train Decomposition (TTD) 将每个token embedding treated as a Matrix Product State (MPS),可以有效地在分布式环境中计算* Results: 对 GPT-2 进行实验,可以压缩 embedding layer 38.40 倍,并且当压缩因子为 3.31 倍时,甚至超过原始 GPT-2 模型的性能。
    Abstract High-dimensional token embeddings underpin Large Language Models (LLMs), as they can capture subtle semantic information and significantly enhance the modelling of complex language patterns. However, the associated high dimensionality also introduces considerable model parameters, and a prohibitively high model storage. To address this issue, this work proposes an approach based on the Tensor-Train Decomposition (TTD), where each token embedding is treated as a Matrix Product State (MPS) that can be efficiently computed in a distributed manner. The experimental results on GPT-2 demonstrate that, through our approach, the embedding layer can be compressed by a factor of up to 38.40 times, and when the compression factor is 3.31 times, even produced a better performance than the original GPT-2 model.
    摘要 高维度Token embedding在大型语言模型(LLM)中扮演着关键角色,它们可以捕捉语言表达中的细微含义并明显提高复杂语言模式的模型化。然而,相关的高维度也导致了较大的模型参数和庞大的模型存储。为解决这问题,本研究提出了基于张量积 Train Decomposition(TTD)的方法,其中每个Token embedding被视为一个Matrix Product State(MPS),可以有效地在分布式环境中计算。实验结果表明,通过我们的方法,Embedding层可以被压缩38.40倍,并且当压缩因子为3.31倍时,甚至超越了原始GPT-2模型的性能。

LEDITS: Real Image Editing with DDPM Inversion and Semantic Guidance

  • paper_url: http://arxiv.org/abs/2307.00522
  • repo_url: https://github.com/adham-elarabawy/ledits
  • paper_authors: Linoy Tsaban, Apolinário Passos
  • for: 本文旨在提出一种轻量级的真实图像编辑方法,以便使用文本来修改图像,而不需要更改模型结构。
  • methods: 本方法利用 Edit Friendly DDPM 倒推技术和 semantic guidance,将图像倒推到 DDPM 模型的预训练空间中,然后通过文本提示进行修改。
  • results: 本方法可以实现高质量的图像修改,包括细微和广泛的修改、组合和风格的变化,而无需进行优化或扩展模型结构。
    Abstract Recent large-scale text-guided diffusion models provide powerful image-generation capabilities. Currently, a significant effort is given to enable the modification of these images using text only as means to offer intuitive and versatile editing. However, editing proves to be difficult for these generative models due to the inherent nature of editing techniques, which involves preserving certain content from the original image. Conversely, in text-based models, even minor modifications to the text prompt frequently result in an entirely distinct result, making attaining one-shot generation that accurately corresponds to the users intent exceedingly challenging. In addition, to edit a real image using these state-of-the-art tools, one must first invert the image into the pre-trained models domain - adding another factor affecting the edit quality, as well as latency. In this exploratory report, we propose LEDITS - a combined lightweight approach for real-image editing, incorporating the Edit Friendly DDPM inversion technique with Semantic Guidance, thus extending Semantic Guidance to real image editing, while harnessing the editing capabilities of DDPM inversion as well. This approach achieves versatile edits, both subtle and extensive as well as alterations in composition and style, while requiring no optimization nor extensions to the architecture.
    摘要 现代大规模文本导向扩散模型提供了强大的图像生成能力。当前,许多努力是为了通过文本来修改这些图像,以便提供直观和灵活的编辑。然而,编辑 proves to be difficult for these generative models due to the inherent nature of editing techniques, which involves preserving certain content from the original image。 conversely, in text-based models, even minor modifications to the text prompt frequently result in an entirely distinct result, making one-shot generation that accurately corresponds to the user's intent exceedingly challenging。 In addition, to edit a real image using these state-of-the-art tools, one must first invert the image into the pre-trained models domain - adding another factor affecting the edit quality, as well as latency。在这份探索报告中,我们提出了 LEDITS - 一种轻量级的方法 для真实图像编辑,将 Edit Friendly DDPM 抽象技术与semantic guidance相结合,从而将semantic guidance扩展到真实图像编辑,同时利用 DDPM 抽象技术的编辑能力。这种方法可以实现多样化的修改,包括细微和广泛的修改,以及修改图像的结构和风格。此外,这种方法不需要扩展 nor optimization to the architecture。

DSTCGCN: Learning Dynamic Spatial-Temporal Cross Dependencies for Traffic Forecasting

  • paper_url: http://arxiv.org/abs/2307.00518
  • repo_url: https://github.com/water-wbq/dstcgcn
  • paper_authors: Binqing Wu, Ling Chen
  • for: 本研究旨在提出一种能够同时学习空间和时间关系的动态空间temporal crossing graph convolutional neural network (DSTCGCN),以便进行智能交通系统中的交通预测。
  • methods: 该模型使用了快速傅立叶变换 (FFT) 基于的注意 selector,选择时间序列中相关的时间步骤,并将其作为动态cross graph构建模块进行学习。
  • results: 对于六个真实世界数据集,DSTCGCN实验结果表明,该模型在交通预测任务中达到了领先的性能。
    Abstract Traffic forecasting is essential to intelligent transportation systems, which is challenging due to the complicated spatial and temporal dependencies within a road network. Existing works usually learn spatial and temporal dependencies separately, ignoring the dependencies crossing spatial and temporal dimensions. In this paper, we propose DSTCGCN, a dynamic spatial-temporal cross graph convolution network to learn dynamic spatial and temporal dependencies jointly via graphs for traffic forecasting. Specifically, we introduce a fast Fourier transform (FFT) based attentive selector to choose relevant time steps for each time step based on time-varying traffic data. Given the selected time steps, we introduce a dynamic cross graph construction module, consisting of the spatial graph construction, temporal connection graph construction, and fusion modules, to learn dynamic spatial-temporal cross dependencies without pre-defined priors. Extensive experiments on six real-world datasets demonstrate that DSTCGCN achieves the state-of-the-art performance.
    摘要 traffic 预测是智能交通系统的重要组成部分,但是它具有较复杂的空间和时间依赖关系,使得现有的方法通常只是分别学习空间和时间依赖关系,忽略了空间和时间维度之间的依赖关系。在这篇论文中,我们提出了DSTCGCN,一种能够同时学习空间和时间依赖关系的动态空间-时间跨Graph卷积网络。具体来说,我们引入了基于快速傅立叶变换(FFT)的注意力选择器,可以根据时间变化的交通数据选择相关的时间步骤。给定选择的时间步骤,我们引入了动态跨 Graph 建构模块,包括空间 Graph 建构模块、时间连接 Graph 建构模块和融合模块,以无预先假设的方式学习动态空间-时间跨依赖关系。我们在六个真实世界数据集上进行了广泛的实验,结果显示DSTCGCN可以达到领先的性能。

SUGAR: Spherical Ultrafast Graph Attention Framework for Cortical Surface Registration

  • paper_url: http://arxiv.org/abs/2307.00511
  • repo_url: None
  • paper_authors: Jianxun Ren, Ning An, Youjia Zhang, Danyang Wang, Zhenyu Sun, Cong Lin, Weigang Cui, Weiwei Wang, Ying Zhou, Wei Zhang, Qingyu Hu, Ping Zhang, Dan Hu, Danhong Wang, Hesheng Liu
    for:SUGAR is designed to improve cortical surface registration, which is crucial for aligning cortical functional and anatomical features across individuals.methods:SUGAR is a deep-learning framework that incorporates a U-Net-based spherical graph attention network and leverages the Euler angle representation for deformation. It also includes a similarity loss, fold loss, and multiple distortion losses to preserve topology and minimize distortions.results:SUGAR achieves high registration performance and accelerated processing times, making it a promising solution for large-scale neuroimaging studies. It exhibits comparable or superior registration performance in accuracy, distortion, and test-retest reliability compared to conventional and learning-based methods, and processes data much faster than conventional methods.
    Abstract Cortical surface registration plays a crucial role in aligning cortical functional and anatomical features across individuals. However, conventional registration algorithms are computationally inefficient. Recently, learning-based registration algorithms have emerged as a promising solution, significantly improving processing efficiency. Nonetheless, there remains a gap in the development of a learning-based method that exceeds the state-of-the-art conventional methods simultaneously in computational efficiency, registration accuracy, and distortion control, despite the theoretically greater representational capabilities of deep learning approaches. To address the challenge, we present SUGAR, a unified unsupervised deep-learning framework for both rigid and non-rigid registration. SUGAR incorporates a U-Net-based spherical graph attention network and leverages the Euler angle representation for deformation. In addition to the similarity loss, we introduce fold and multiple distortion losses, to preserve topology and minimize various types of distortions. Furthermore, we propose a data augmentation strategy specifically tailored for spherical surface registration, enhancing the registration performance. Through extensive evaluation involving over 10,000 scans from 7 diverse datasets, we showed that our framework exhibits comparable or superior registration performance in accuracy, distortion, and test-retest reliability compared to conventional and learning-based methods. Additionally, SUGAR achieves remarkable sub-second processing times, offering a notable speed-up of approximately 12,000 times in registering 9,000 subjects from the UK Biobank dataset in just 32 minutes. This combination of high registration performance and accelerated processing time may greatly benefit large-scale neuroimaging studies.
    摘要 cortical surface registration 在调整 cortical functional 和 anatomical features 之间扮演着关键角色。然而,传统的 registration algorithm computationally inefficient。 recent years, learning-based registration algorithm emerged as a promising solution,significantly improving processing efficiency。然而,there remains a gap in the development of a learning-based method that exceeds the state-of-the-art conventional methods simultaneously in computational efficiency, registration accuracy, and distortion control,despite the theoretically greater representational capabilities of deep learning approaches。to address the challenge, we present SUGAR, a unified unsupervised deep-learning framework for both rigid and non-rigid registration。SUGAR incorporates a U-Net-based spherical graph attention network and leverages the Euler angle representation for deformation。 In addition to the similarity loss, we introduce fold and multiple distortion losses, to preserve topology and minimize various types of distortions。furthermore, we propose a data augmentation strategy specifically tailored for spherical surface registration, enhancing the registration performance。through extensive evaluation involving over 10,000 scans from 7 diverse datasets, we showed that our framework exhibits comparable or superior registration performance in accuracy, distortion, and test-retest reliability compared to conventional and learning-based methods。 additionally, SUGAR achieves remarkable sub-second processing times, offering a notable speed-up of approximately 12,000 times in registering 9,000 subjects from the UK Biobank dataset in just 32 minutes。this combination of high registration performance and accelerated processing time may greatly benefit large-scale neuroimaging studies。

HeGeL: A Novel Dataset for Geo-Location from Hebrew Text

  • paper_url: http://arxiv.org/abs/2307.00509
  • repo_url: https://github.com/onlplab/hegel
  • paper_authors: Tzuf Paz-Argaman, Tal Bauman, Itai Mondshine, Itzhak Omer, Sagi Dalyot, Reut Tsarfaty
  • for: 这篇论文的目的是提出一个新的地理位置检索方法,以解决文本地理位置检索问题。
  • methods: 论文使用了人工勘探方法,收集了5649个直接描述希伯来地点的 literal Hebrew 描述。
  • results: 研究发现,这些描述中有许多具有地ospatial reasoning的语言表达,需要一种新的环境表示方式。
    Abstract The task of textual geolocation - retrieving the coordinates of a place based on a free-form language description - calls for not only grounding but also natural language understanding and geospatial reasoning. Even though there are quite a few datasets in English used for geolocation, they are currently based on open-source data (Wikipedia and Twitter), where the location of the described place is mostly implicit, such that the location retrieval resolution is limited. Furthermore, there are no datasets available for addressing the problem of textual geolocation in morphologically rich and resource-poor languages, such as Hebrew. In this paper, we present the Hebrew Geo-Location (HeGeL) corpus, designed to collect literal place descriptions and analyze lingual geospatial reasoning. We crowdsourced 5,649 literal Hebrew place descriptions of various place types in three cities in Israel. Qualitative and empirical analysis show that the data exhibits abundant use of geospatial reasoning and requires a novel environmental representation.
    摘要 文本地理位置 Retrieving the coordinates of a place based on a natural language description requires not only grounding but also natural language understanding and geospatial reasoning. Although there are several datasets in English for geolocation, they are based on open-source data (Wikipedia and Twitter), where the location of the described place is mostly implicit, resulting in limited location retrieval resolution. Moreover, there are no datasets available for addressing the problem of textual geolocation in morphologically rich and resource-poor languages, such as Hebrew. In this paper, we present the Hebrew Geo-Location (HeGeL) corpus, designed to collect literal place descriptions and analyze lingual geospatial reasoning. We crowdsourced 5,649 literal Hebrew place descriptions of various place types in three cities in Israel. Qualitative and empirical analysis show that the data exhibits abundant use of geospatial reasoning and requires a novel environmental representation.Here's the translation in Traditional Chinese:文本地理位置 Retrieving the coordinates of a place based on a natural language description requires not only grounding but also natural language understanding and geospatial reasoning. Although there are several datasets in English for geolocation, they are based on open-source data (Wikipedia and Twitter), where the location of the described place is mostly implicit, resulting in limited location retrieval resolution. Moreover, there are no datasets available for addressing the problem of textual geolocation in morphologically rich and resource-poor languages, such as Hebrew. In this paper, we present the Hebrew Geo-Location (HeGeL) corpus, designed to collect literal place descriptions and analyze lingual geospatial reasoning. We crowdsourced 5,649 literal Hebrew place descriptions of various place types in three cities in Israel. Qualitative and empirical analysis show that the data exhibits abundant use of geospatial reasoning and requires a novel environmental representation.

Cloud Ensemble Learning for Fault Diagnosis of Rolling Bearings with Stochastic Configuration Networks

  • paper_url: http://arxiv.org/abs/2307.00507
  • repo_url: None
  • paper_authors: Wei Dai, Jiang Liu, Lanhao Wang
  • for: rolling bearing fault diagnosis in few shot scenarios
  • methods: stochastic configuration network (SCN) based cloud ensemble learning
  • results: accurate fault diagnosis with few training samples
    Abstract Fault diagnosis of rolling bearings is of great significance for post-maintenance in rotating machinery, but it is a challenging work to diagnose faults efficiently with a few samples. Additionally, faults commonly occur with randomness and fuzziness due to the complexity of the external environment and the structure of rolling bearings, hindering effective mining of fault characteristics and eventually restricting accuracy of fault diagnosis. To overcome these problems, stochastic configuration network (SCN) based cloud ensemble learning, called SCN-CEL, is developed in this work. Concretely, a cloud feature extraction method is first developed by using a backward cloud generator of normal cloud model to mine the uncertainty of fault information. Then, a cloud sampling method, which generates enough cloud droplets using bidirectional cloud generator, is proposed to extend the cloud feature samples. Finally, an ensemble model with SCNs is developed to comprehensively characterize the uncertainty of fault information and advance the generalization performance of fault diagnosis machine. Experimental results demonstrate that the proposed method indeed performs favorably for distinguishing fault categories of rolling bearings in the few shot scenarios.
    摘要 FAULT诊断rolling bearing是后续维护机器人的重要 significace,但是efficiently fault diagnosis with few samples是一项挑战性的工作。另外,FAULTS通常occurs randomly and fuzzily due to the complexity of the external environment and the structure of rolling bearings, which hinders the effective mining of fault characteristics and eventually restricts the accuracy of fault diagnosis. To overcome these problems, this work proposes a stochastic configuration network (SCN) based cloud ensemble learning method, called SCN-CEL. Specifically, a cloud feature extraction method is first developed by using a backward cloud generator of normal cloud model to mine the uncertainty of fault information. Then, a cloud sampling method, which generates enough cloud droplets using bidirectional cloud generator, is proposed to extend the cloud feature samples. Finally, an ensemble model with SCNs is developed to comprehensively characterize the uncertainty of fault information and improve the generalization performance of fault diagnosis machine. Experimental results show that the proposed method can indeed distinguish fault categories of rolling bearings in the few shot scenarios.Here's the translation in Traditional Chinese as well:FAULT诊断rolling bearing是后续维护机器人的重要 significace,但是efficiently fault diagnosis with few samples是一项挑战性的工作。另外,FAULTS通常occurs randomly and fuzzily due to the complexity of the external environment and the structure of rolling bearings, which hinders the effective mining of fault characteristics and eventually restricts the accuracy of fault diagnosis. To overcome these problems, this work proposes a stochastic configuration network (SCN) based cloud ensemble learning method, called SCN-CEL. Specifically, a cloud feature extraction method is first developed by using a backward cloud generator of normal cloud model to mine the uncertainty of fault information. Then, a cloud sampling method, which generates enough cloud droplets using bidirectional cloud generator, is proposed to extend the cloud feature samples. Finally, an ensemble model with SCNs is developed to comprehensively characterize the uncertainty of fault information and improve the generalization performance of fault diagnosis machine. Experimental results show that the proposed method can indeed distinguish fault categories of rolling bearings in the few shot scenarios.

On efficient computation in active inference

  • paper_url: http://arxiv.org/abs/2307.00504
  • repo_url: https://github.com/aswinpaul/dpefe_2023
  • paper_authors: Aswin Paul, Noor Sajid, Lancelot Da Costa, Adeel Razi
  • for: 提高 active inference 的计算效率和定义目标分布的准确性
  • methods: 提出两种解决方案,包括一种基于动态Programming的新规划算法和一种基于 Z-学习的目标分布设定方法
  • results: 在标准的grid-world任务中进行了模拟测试,并显示了这些方法的效果和可行性
    Abstract Despite being recognized as neurobiologically plausible, active inference faces difficulties when employed to simulate intelligent behaviour in complex environments due to its computational cost and the difficulty of specifying an appropriate target distribution for the agent. This paper introduces two solutions that work in concert to address these limitations. First, we present a novel planning algorithm for finite temporal horizons with drastically lower computational complexity. Second, inspired by Z-learning from control theory literature, we simplify the process of setting an appropriate target distribution for new and existing active inference planning schemes. Our first approach leverages the dynamic programming algorithm, known for its computational efficiency, to minimize the cost function used in planning through the Bellman-optimality principle. Accordingly, our algorithm recursively assesses the expected free energy of actions in the reverse temporal order. This improves computational efficiency by orders of magnitude and allows precise model learning and planning, even under uncertain conditions. Our method simplifies the planning process and shows meaningful behaviour even when specifying only the agent's final goal state. The proposed solutions make defining a target distribution from a goal state straightforward compared to the more complicated task of defining a temporally informed target distribution. The effectiveness of these methods is tested and demonstrated through simulations in standard grid-world tasks. These advances create new opportunities for various applications.
    摘要 despite being recognized as neurobiologically plausible, active inference faces difficulties when employed to simulate intelligent behavior in complex environments due to its computational cost and the difficulty of specifying an appropriate target distribution for the agent. This paper introduces two solutions that work in concert to address these limitations. First, we present a novel planning algorithm for finite temporal horizons with drastically lower computational complexity. Second, inspired by Z-learning from control theory literature, we simplify the process of setting an appropriate target distribution for new and existing active inference planning schemes. Our first approach leverages the dynamic programming algorithm, known for its computational efficiency, to minimize the cost function used in planning through the Bellman-optimality principle. Accordingly, our algorithm recursively assesses the expected free energy of actions in the reverse temporal order. This improves computational efficiency by orders of magnitude and allows precise model learning and planning, even under uncertain conditions. Our method simplifies the planning process and shows meaningful behavior even when specifying only the agent's final goal state. The proposed solutions make defining a target distribution from a goal state straightforward compared to the more complicated task of defining a temporally informed target distribution. The effectiveness of these methods is tested and demonstrated through simulations in standard grid-world tasks. These advances create new opportunities for various applications.Here is the word-for-word translation of the text into Simplified Chinese:尽管被认可为神经生物学上可能的,活动推理却在复杂环境中模拟智能行为时遇到了计算成本和设置合适目标分布的困难。这篇论文介绍了两种解决方案,它们在合作下解决了这些限制。首先,我们提出了一种新的规划算法,可以在有限时间Horizon上降低计算成本。其次,我们受控制理论文学中的Z学习启发,使得设置合适的目标分布变得更加简单。我们的第一个方法利用了动态计划算法,知名的计算效率高,来最小化计划中使用的成本函数。因此,我们的算法递归评估行动的预期自由能量,从反时间顺序进行评估。这会提高计算效率,并允许精确地学习和规划,即使在不确定的条件下。我们的方法简化了规划过程,并在指定只有代理人的最终目标状态时显示了意义的行为。我们的提案使得从目标分布定义变得更加直观,而不是在更复杂的时间 informed target distribution中进行定义。我们的方法的效果通过标准网格世界任务的模拟测试而证明。这些进步创造了新的应用机会。

Classifying World War II Era Ciphers with Machine Learning

  • paper_url: http://arxiv.org/abs/2307.00501
  • repo_url: None
  • paper_authors: Brooke Dalton, Mark Stamp
  • for: 研究使用机器学习和深度学习技术对世界二战时期密码的分类,只使用密码文本。
  • methods: 使用支持向量机(SVM)、k-最近邻($k$-NN)和随机森林(RF)等三种经典机器学习模型,以及多层感知机(MLP)、长短期记忆(LSTM)、极限学习机(ELM)和卷积神经网络(CNN)等四种深度学习神经网络模型。
  • results: 对四种密码(恩igma、M-209、Sigaba、Purple和Typex)进行分类,使用不同的enario和不同长度的密码文本。结果显示,在最实际的scenario下,使用1000个字的密码文本,可以分类密码的准确率高于97%。此外,对一些学习技术的准确率进行分析,发现经典机器学习模型和深度学习模型在一些情况下表现相当。
    Abstract We determine the accuracy with which machine learning and deep learning techniques can classify selected World War II era ciphers when only ciphertext is available. The specific ciphers considered are Enigma, M-209, Sigaba, Purple, and Typex. We experiment with three classic machine learning models, namely, Support Vector Machines (SVM), $k$-Nearest Neighbors ($k$-NN), and Random Forest (RF). We also experiment with four deep learning neural network-based models: Multi-Layer Perceptrons (MLP), Long Short-Term Memory (LSTM), Extreme Learning Machines (ELM), and Convolutional Neural Networks (CNN). Each model is trained on features consisting of histograms, digrams, and raw ciphertext letter sequences. Furthermore, the classification problem is considered under four distinct scenarios: Fixed plaintext with fixed keys, random plaintext with fixed keys, fixed plaintext with random keys, and random plaintext with random keys. Under the most realistic scenario, given 1000 characters per ciphertext, we are able to distinguish the ciphers with greater than 97% accuracy. In addition, we consider the accuracy of a subset of the learning techniques as a function of the length of the ciphertext messages. Somewhat surprisingly, our classic machine learning models perform at least as well as our deep learning models. We also find that ciphers that are more similar in design are somewhat more challenging to distinguish, but not as difficult as might be expected.
    摘要 我们测试了机器学习和深度学习技术可以在只有密文available情况下准确地分类选择的World War II时期密码。我们考虑了Enigma、M-209、Sigaba、Purple和Typex等密码。我们尝试了三种经典机器学习模型,namely Support Vector Machines(SVM)、$k$-Nearest Neighbors($k$-NN)和Random Forest(RF)。我们还尝试了四种深度学习神经网络模型:Multi-Layer Perceptrons(MLP)、Long Short-Term Memory(LSTM)、Extreme Learning Machines(ELM)和Convolutional Neural Networks(CNN)。每个模型都在 histograms、digram和密文字符序列中的特征上进行训练。此外,我们还考虑了密码分类问题在四种不同的enario下进行解决:固定的平文和固定的密钥、随机的平文和固定的密钥、固定的平文和随机的密钥、随机的平文和随机的密钥。在最真实的scenario下,给定1000个密文字符,我们能够分类密码的准确率高于97%。此外,我们还考虑了学习技术的准确率与密文字符数量之间的关系。 surprisingly,我们的经典机器学习模型在准确率上至少与深度学习模型相当。我们还发现,在设计上更相似的密码相对较难分类,但不如可能所想的那么难。

Data-Free Quantization via Mixed-Precision Compensation without Fine-Tuning

  • paper_url: http://arxiv.org/abs/2307.00498
  • repo_url: None
  • paper_authors: Jun Chen, Shipeng Bai, Tianxin Huang, Mengmeng Wang, Guanzhong Tian, Yong Liu
  • For: The paper aims to propose a data-free mixed-precision compensation (DF-MPC) method to recover the performance of an ultra-low precision quantized model without any data and fine-tuning process.* Methods: The proposed DF-MPC method assumes that the quantized error caused by a low-precision quantized layer can be restored via the reconstruction of a high-precision quantized layer, and minimizes the reconstruction loss of the feature maps to achieve the closed-form solution.* Results: The experimental results show that the proposed DF-MPC method is able to achieve higher accuracy for an ultra-low precision quantized model compared to the recent methods without any data and fine-tuning process.Here are the three key points in Simplified Chinese text:* For: 本文提出的目标是无数据、无调教的情况下提高低精度量化模型的性能。* Methods: 提议的DF-MPC方法假设低精度量化层引起的量化误差可以通过高精度量化层的重建来修复,并将重建loss作为目标函数来求解closed-form解决方案。* Results: 实验结果表明,提议的DF-MPC方法可以在无数据、无调教的情况下提高低精度量化模型的性能,相比之下最近的方法表现更好。
    Abstract Neural network quantization is a very promising solution in the field of model compression, but its resulting accuracy highly depends on a training/fine-tuning process and requires the original data. This not only brings heavy computation and time costs but also is not conducive to privacy and sensitive information protection. Therefore, a few recent works are starting to focus on data-free quantization. However, data-free quantization does not perform well while dealing with ultra-low precision quantization. Although researchers utilize generative methods of synthetic data to address this problem partially, data synthesis needs to take a lot of computation and time. In this paper, we propose a data-free mixed-precision compensation (DF-MPC) method to recover the performance of an ultra-low precision quantized model without any data and fine-tuning process. By assuming the quantized error caused by a low-precision quantized layer can be restored via the reconstruction of a high-precision quantized layer, we mathematically formulate the reconstruction loss between the pre-trained full-precision model and its layer-wise mixed-precision quantized model. Based on our formulation, we theoretically deduce the closed-form solution by minimizing the reconstruction loss of the feature maps. Since DF-MPC does not require any original/synthetic data, it is a more efficient method to approximate the full-precision model. Experimentally, our DF-MPC is able to achieve higher accuracy for an ultra-low precision quantized model compared to the recent methods without any data and fine-tuning process.
    摘要 神经网络量化是一个非常有前途的解决方案,可以大幅压缩模型,但是它的结果准确性高度取决于训练/精度调整过程,并且需要原始数据。这不仅带来了重大的计算成本和时间开销,而且不利于隐私和敏感信息保护。因此,一些最近的研究开始关注数据无关量化。然而,数据无关量化在处理超低精度量化时并不很好。尽管研究人员利用生成方法 Synthetic Data 来解决这个问题,但是数据生成需要大量计算和时间。在这篇论文中,我们提出了一种不需要数据的混合精度补偿方法(DF-MPC),可以无需任何数据和精度调整过程,使得模型的性能得到改进。我们假设low-precision量化层所导致的量化误差可以通过高精度量化层的重建来还原。我们将这种情况Mathematically formulated as a reconstruction loss between the pre-trained full-precision model and its layer-wise mixed-precision quantized model。基于我们的形式化,我们可以 theoretically deduce the closed-form solution by minimizing the reconstruction loss of the feature maps。由于 DF-MPC 不需要任何原始/生成的数据,因此它是一种更高效的方法来 aproximate the full-precision model。实验表明,我们的 DF-MPC 能够在无需数据和精度调整过程的情况下,对ultra-low precision量化模型进行更高的准确性改进。

Don’t Memorize; Mimic The Past: Federated Class Incremental Learning Without Episodic Memory

  • paper_url: http://arxiv.org/abs/2307.00497
  • repo_url: None
  • paper_authors: Sara Babakniya, Zalan Fabian, Chaoyang He, Mahdi Soltanolkotabi, Salman Avestimehr
  • for: 本文研究的目的是解决深度学习模型在新数据上学习时忘记过去学习的问题,特别在联合学习(Federated Learning,FL)中,数据是分散的,每个用户都可以独立地更改自己的数据。
  • methods: 本文提出了一个基于生成模型的联合学习类增量学习框架,通过生成样本从过去的分布中来避免忘记现象,而不需要将过去的数据存储在客户端上。服务器端使用数据无需方法在每个任务结束时训练生成模型,因此减少了数据泄露的风险。
  • results: 对于CIFAR-100 dataset,本文比对 existed 的基准值显示了显著的改进。
    Abstract Deep learning models are prone to forgetting information learned in the past when trained on new data. This problem becomes even more pronounced in the context of federated learning (FL), where data is decentralized and subject to independent changes for each user. Continual Learning (CL) studies this so-called \textit{catastrophic forgetting} phenomenon primarily in centralized settings, where the learner has direct access to the complete training dataset. However, applying CL techniques to FL is not straightforward due to privacy concerns and resource limitations. This paper presents a framework for federated class incremental learning that utilizes a generative model to synthesize samples from past distributions instead of storing part of past data. Then, clients can leverage the generative model to mitigate catastrophic forgetting locally. The generative model is trained on the server using data-free methods at the end of each task without requesting data from clients. Therefore, it reduces the risk of data leakage as opposed to training it on the client's private data. We demonstrate significant improvements for the CIFAR-100 dataset compared to existing baselines.
    摘要 深度学习模型容易忘记过去学习的信息,特别在 federated learning(FL)上,数据分散化和每个用户独立变化。 kontinual learning(CL)在中心化Setting中主要研究这种称为“极端忘记”现象,但是在隐私和资源限制的情况下应用CL技术到FL不是直接的。这篇文章介绍了一种基于生成模型的联邦分类逐步学习框架,可以在客户端使用生成模型来避免极端忘记。生成模型在服务器端使用数据free方法在每个任务结束时训练,因此减少了数据泄露的风险,与在客户端私有数据上训练生成模型不同。我们对 CIFAR-100 数据集进行了比较,与现有基eline相比,我们得到了显著的改进。

STG4Traffic: A Survey and Benchmark of Spatial-Temporal Graph Neural Networks for Traffic Prediction

  • paper_url: http://arxiv.org/abs/2307.00495
  • repo_url: https://github.com/trainingl/stg4traffic
  • paper_authors: Xunlian Luo, Chunjiang Zhu, Detian Zhang, Qing Li
  • for: 这个论文主要是为了提出一种基于图学习的实时交通预测方法,以提高智能城市系统的安全、稳定性和多样性。
  • methods: 这篇论文使用了图学习策略和常见的图卷积网络来模型交通系统的空间时间相关性。
  • results: 研究人员通过设计了一个标准化和可扩展的benchmark,并对两种交通数据集进行了比较性评估,发现这种方法可以提供更高的准确率和更好的可扩展性。Please note that the above text is in Simplified Chinese, and the format of the output is as you requested:
  • for: <what are the paper written for?>
  • methods: <what methods the paper use?>
  • results: <what results the paper get?>
    Abstract Traffic prediction has been an active research topic in the domain of spatial-temporal data mining. Accurate real-time traffic prediction is essential to improve the safety, stability, and versatility of smart city systems, i.e., traffic control and optimal routing. The complex and highly dynamic spatial-temporal dependencies make effective predictions still face many challenges. Recent studies have shown that spatial-temporal graph neural networks exhibit great potential applied to traffic prediction, which combines sequential models with graph convolutional networks to jointly model temporal and spatial correlations. However, a survey study of graph learning, spatial-temporal graph models for traffic, as well as a fair comparison of baseline models are pending and unavoidable issues. In this paper, we first provide a systematic review of graph learning strategies and commonly used graph convolution algorithms. Then we conduct a comprehensive analysis of the strengths and weaknesses of recently proposed spatial-temporal graph network models. Furthermore, we build a study called STG4Traffic using the deep learning framework PyTorch to establish a standardized and scalable benchmark on two types of traffic datasets. We can evaluate their performance by personalizing the model settings with uniform metrics. Finally, we point out some problems in the current study and discuss future directions. Source codes are available at https://github.com/trainingl/STG4Traffic.
    摘要 宽泛研究领域:预测交通流量是智能城市系统中的一个活跃研究话题。实时准确的交通预测可以提高智能城市系统的安全性、稳定性和多样性,如交通控制和优化Routing。距离Complex and highly dynamic spatial-temporal dependencies make effective predictions still face many challenges. Recent studies have shown that spatial-temporal graph neural networks exhibit great potential applied to traffic prediction, which combines sequential models with graph convolutional networks to jointly model temporal and spatial correlations. However, a survey study of graph learning, spatial-temporal graph models for traffic, as well as a fair comparison of baseline models are pending and unavoidable issues.在这篇论文中,我们首先提供了系统性的图学策略和通用的图 convolution 算法的评估。然后,我们进行了广泛的 spatial-temporal graph network 模型的分析,探讨其优劣点。其次,我们使用 PyTorch 深度学习框架建立了一个标准化和可扩展的 benchmark,并在两种交通数据集上进行了研究。通过个性化模型设置,我们可以评估其性能。最后,我们指出了当前研究中的一些问题,并讨论了未来的方向。源代码可以在 https://github.com/trainingl/STG4Traffic 上获取。

Optimizing protein fitness using Gibbs sampling with Graph-based Smoothing

  • paper_url: http://arxiv.org/abs/2307.00494
  • repo_url: https://github.com/kirjner/ggs
  • paper_authors: Andrew Kirjner, Jason Yim, Raman Samusevich, Tommi Jaakkola, Regina Barzilay, Ila Fiete
  • for: 本研究旨在设计高适应性蛋白质,以满足医学多种领域的需求。
  • methods: 本文提出了 Gibbs sampling with Graph-based Smoothing(GGS)方法,可以有效地探索蛋白质设计空间。GGS方法通过 iteratively 应用 Gibbs 与梯度来提出有利变化,并使用图基的平滑来消除干扰梯度导致的假阳性。
  • results: 对 GFP 和 AAV 设计问题,以及简单的ablations 和基线,本文进行了研究,并得到了 state-of-the-art 的结果,能够找到高适应性蛋白质的Up to 8 个变化。
    Abstract The ability to design novel proteins with higher fitness on a given task would be revolutionary for many fields of medicine. However, brute-force search through the combinatorially large space of sequences is infeasible. Prior methods constrain search to a small mutational radius from a reference sequence, but such heuristics drastically limit the design space. Our work seeks to remove the restriction on mutational distance while enabling efficient exploration. We propose Gibbs sampling with Graph-based Smoothing (GGS) which iteratively applies Gibbs with gradients to propose advantageous mutations using graph-based smoothing to remove noisy gradients that lead to false positives. Our method is state-of-the-art in discovering high-fitness proteins with up to 8 mutations from the training set. We study the GFP and AAV design problems, ablations, and baselines to elucidate the results. Code: https://github.com/kirjner/GGS
    摘要 能够设计新的蛋白质,其适应能力更高,将对医学多个领域带来革命。然而,简单地通过枚举空间的搜索是不可能的。先前的方法受限于小距离的参照序列,但这些决策限制了设计空间。我们的工作是要 removes 这种距离限制,同时允许有效探索。我们提议使用 Gibbs 抽象法和图像缓和(GGS),每步应用 Gibbs 与梯度相结合,提出有利мутации,并使用图像缓和来消除干扰梯度,以避免干扰梯度导致的假阳性。我们的方法目前为止在使用训练集上达到高适应能力蛋白质的设计问题上占据了国际领先地位。我们在 GFP 和 AAV 设计问题、ablations 和基准值进行了研究,以便解释结果。代码可以在 GitHub 上找到:https://github.com/kirjner/GGS。

Fourier-Mixed Window Attention: Accelerating Informer for Long Sequence Time-Series Forecasting

  • paper_url: http://arxiv.org/abs/2307.00493
  • repo_url: https://github.com/nhatthanhtran/fwin2023
  • paper_authors: Nhat Thanh Tran, Jack Xin
  • for: 用于加速长序列时间预测 Informer 的快本地全球窗口基于注意力方法。
  • methods: 使用本地窗口注意力,而不是假设查询稀疏性,并使用 fourier 变换块来补做全球token信息。
  • results: 通过在单变量和多变量 datasets 上进行实验,我们表明 FWin 变换器可以提高 Informer 的总预测精度,同时提高其推理速度,比如40-50%。此外,我们还在非线性回归模型中显示了一种学习 FWin 类型注意力的方法可以超过 softmax 全注意力。
    Abstract We study a fast local-global window-based attention method to accelerate Informer for long sequence time-series forecasting. While window attention is local and a considerable computational saving, it lacks the ability to capture global token information which is compensated by a subsequent Fourier transform block. Our method, named FWin, does not rely on query sparsity hypothesis and an empirical approximation underlying the ProbSparse attention of Informer. Through experiments on univariate and multivariate datasets, we show that FWin transformers improve the overall prediction accuracies of Informer while accelerating its inference speeds by 40 to 50 %. We also show in a nonlinear regression model that a learned FWin type attention approaches or even outperforms softmax full attention based on key vectors extracted from an Informer model's full attention layer acting on time series data.
    摘要 我们研究一种快速的本地-全局窗口基于注意方法,以加速Informer进行长序时间序列预测。虽然窗口注意是本地的,可以获得较大的计算时间 saving,但是它缺乏捕捉全局Token信息的能力,这是通过后续的傅里叶变换块补做。我们的方法,名为FWin,不依赖于查询稀缺假设和Informer中的ProbSparse注意力的经验近似。通过对单variate和多variate数据集进行实验,我们显示了FWin变换器可以提高Informer的总预测精度,同时加速其推理速度 by 40% to 50%. 我们还在非线性回归模型中表明,一种学习的FWin类型注意力可以与Softmax全注意力相当或者超过从Informer模型的全注意层中提取的键vector。

Pricing European Options with Google AutoML, TensorFlow, and XGBoost

  • paper_url: http://arxiv.org/abs/2307.00476
  • repo_url: https://github.com/juan-esteban-berger/options_pricing_automl_tensorflow_xgboost
  • paper_authors: Juan Esteban Berger
  • for: 该文章用于比较使用Google Cloud AutoML Regressor、TensorFlow神经网络和XGBoost分类决策树来估算欧洲Option价格。
  • methods: 该文章使用了三种不同的机器学习算法来估算欧洲Option价格,即Google Cloud AutoML Regressor、TensorFlow神经网络和XGBoost分类决策树。
  • results: 三种模型都能够超越黑色熊模型,具体来说,使用历史数据来估算欧洲Option价格尤其有效,尤其是使用机器学习算法来学习复杂的模式,传统参数模型不能考虑。
    Abstract Researchers have been using Neural Networks and other related machine-learning techniques to price options since the early 1990s. After three decades of improvements in machine learning techniques, computational processing power, cloud computing, and data availability, this paper is able to provide a comparison of using Google Cloud's AutoML Regressor, TensorFlow Neural Networks, and XGBoost Gradient Boosting Decision Trees for pricing European Options. All three types of models were able to outperform the Black Scholes Model in terms of mean absolute error. These results showcase the potential of using historical data from an option's underlying asset for pricing European options, especially when using machine learning algorithms that learn complex patterns that traditional parametric models do not take into account.
    摘要 研究人员 desde 1990年代开始使用神经网络和相关的机器学习技术来估算选择价格。过去三十年来,机器学习技术、计算机处理能力、云计算和数据可用性得到了大幅提高。本文能够对使用Google Cloud的AutoML Regressor、TensorFlow神经网络和XGBoost分布gradient Boosting Decision Trees来估算欧洲选择价格进行比较。这三种类型的模型都能够超越黑卫星模型(Black Scholes Model)的mean absolute error。这些结果表明,使用选择物品的历史数据来估算欧洲选择价格,尤其是使用机器学习算法来学习复杂的模式,传统参数模型无法考虑。

Moments, Random Walks, and Limits for Spectrum Approximation

  • paper_url: http://arxiv.org/abs/2307.00474
  • repo_url: None
  • paper_authors: Yujia Jin, Christopher Musco, Aaron Sidford, Apoorv Vikram Singh
  • for: 本文研究一维分布近似问题中的下界问题,具体来说是对于已知多项式级别的一维分布,可以达到伪随机性水平。
  • methods: 本文使用了induced by eigenvalue spectra of carefully constructed graph adjacency matrices的hard instance,以及Cohen-Steiner et al. [KDD 2018]提供的spectral moments approximations using $2^{O(1/\epsilon)}$ random walks initiated at uniformly random nodes in the graph。
  • results: 本文证明了一些分布在[-1,1]上不能够在Wasserstein-1 distance上准确地近似,即使知道所有多项式级别的分布。这个结果与Kong和Valiant [Annals of Statistics, 2017]的Upper bound相符。
    Abstract We study lower bounds for the problem of approximating a one dimensional distribution given (noisy) measurements of its moments. We show that there are distributions on $[-1,1]$ that cannot be approximated to accuracy $\epsilon$ in Wasserstein-1 distance even if we know \emph{all} of their moments to multiplicative accuracy $(1\pm2^{-\Omega(1/\epsilon)})$; this result matches an upper bound of Kong and Valiant [Annals of Statistics, 2017]. To obtain our result, we provide a hard instance involving distributions induced by the eigenvalue spectra of carefully constructed graph adjacency matrices. Efficiently approximating such spectra in Wasserstein-1 distance is a well-studied algorithmic problem, and a recent result of Cohen-Steiner et al. [KDD 2018] gives a method based on accurately approximating spectral moments using $2^{O(1/\epsilon)}$ random walks initiated at uniformly random nodes in the graph. As a strengthening of our main result, we show that improving the dependence on $1/\epsilon$ in this result would require a new algorithmic approach. Specifically, no algorithm can compute an $\epsilon$-accurate approximation to the spectrum of a normalized graph adjacency matrix with constant probability, even when given the transcript of $2^{\Omega(1/\epsilon)}$ random walks of length $2^{\Omega(1/\epsilon)}$ started at random nodes.
    摘要 我们研究一维分布的下界,对于受到杂度的测量的矩形几何。我们证明有在[-1,1]上的分布,不能够在 Wasserstein-1 距离下对准确度 $\epsilon$ 的测量。这个结果与 Kong 和 Valiant 的上界相匹配 [Annals of Statistics, 2017]。我们使用具有精确的数值矩阵的对角线几何来提供一个困难的实例。现有一个由 Cohen-Steiner 等人提出的方法 [KDD 2018],可以使用 $2^{O(1/\epsilon)}$ 步骤的随机步进行精确地测量几何的spectrum。作为我们主要结果的强化,我们证明 improvving 这个结果中的 $1/\epsilon$ 取决因数需要一个新的算法方法。具体来说,无法使用现有的算法,在 givent 矩阵的转换矩阵上 compute 一个 $\epsilon$-精确的测量,即使被给定 $2^{\Omega(1/\epsilon)}$ 步骤的随机步进行测量。

Equal Confusion Fairness: Measuring Group-Based Disparities in Automated Decision Systems

  • paper_url: http://arxiv.org/abs/2307.00472
  • repo_url: https://github.com/furkangursoy/equalconfusion
  • paper_authors: Furkan Gursoy, Ioannis A. Kakadiaris
  • for: This paper focuses on evaluating the fairness of automated decision systems, specifically in terms of group fairness.
  • methods: The paper proposes a new equal confusion fairness test and a new confusion parity error metric to measure unfairness, as well as an appropriate post hoc analysis methodology to identify the source of potential unfairness.
  • results: The proposed methods and metrics are demonstrated on the case of COMPAS, an automated decision system used in the US to assess recidivism risks, and show their usefulness in assessing fairness as part of a more extensive accountability assessment.Here’s the same information in Simplified Chinese:
  • for: 这篇论文关注自动决策系统的公平性评估,尤其是集体公平性。
  • methods: 该论文提出了一种新的平衡决策公平测试和一个新的决策错误度量表来衡量不公平,以及一种适当的后续分析方法来检测潜在的不公平源头。
  • results: 提议的方法和指标在使用COMPAS例子进行示例后显示了它们在评估自动决策系统公平性的用于更广泛的责任评估中的用用ifulness。
    Abstract As artificial intelligence plays an increasingly substantial role in decisions affecting humans and society, the accountability of automated decision systems has been receiving increasing attention from researchers and practitioners. Fairness, which is concerned with eliminating unjust treatment and discrimination against individuals or sensitive groups, is a critical aspect of accountability. Yet, for evaluating fairness, there is a plethora of fairness metrics in the literature that employ different perspectives and assumptions that are often incompatible. This work focuses on group fairness. Most group fairness metrics desire a parity between selected statistics computed from confusion matrices belonging to different sensitive groups. Generalizing this intuition, this paper proposes a new equal confusion fairness test to check an automated decision system for fairness and a new confusion parity error to quantify the extent of any unfairness. To further analyze the source of potential unfairness, an appropriate post hoc analysis methodology is also presented. The usefulness of the test, metric, and post hoc analysis is demonstrated via a case study on the controversial case of COMPAS, an automated decision system employed in the US to assist judges with assessing recidivism risks. Overall, the methods and metrics provided here may assess automated decision systems' fairness as part of a more extensive accountability assessment, such as those based on the system accountability benchmark.
    摘要 随着人工智能在影响人类和社会决策中的角色变得越来越重要,决策系统的负责任得到了更多的关注。公平是考虑消除不公正对待和歧视的一个关键方面,它是负责任的一部分。然而,为了评估公平,在文献中有很多不同的公平指标,这些指标常有不同的视角和假设,导致它们之间不兼容。这项工作将关注于群体公平。大多数群体公平指标寻求在不同敏感群体的决策系统中计算的混乱矩阵中实现平衡。总之,这篇论文提出了一种新的平衡公平测试,用于检测自动决策系统的公平,以及一种新的混乱平衡错误来衡量任何不公正。此外,为了分析潜在的不公正来源,我们还提供了一种适用的后续分析方法。这些方法和指标在一个关于 COMPAS 案例的案例研究中被证明了其用于评估自动决策系统的公平。总之,提供的方法和指标可以用于评估自动决策系统的公平,并作为更广泛的负责任评估的一部分,如基于系统负责任指标。

Data-Driven Probabilistic Energy Consumption Estimation for Battery Electric Vehicles with Model Uncertainty

  • paper_url: http://arxiv.org/abs/2307.00469
  • repo_url: None
  • paper_authors: Ayan Maity, Sudeshna Sarkar
  • for: 这个论文是为了提出一种基于概率数据驱动的电动汽车(BEV)行程级能耗估计方法。
  • methods: 该方法使用概率神经网络,并通过 Монテ卡洛分布来确定模型uncertainty。
  • results: 实验结果表明,该方法可以准确地估计电动汽车行程级能耗,并且与其他现有的电动汽车能耗模型相比,具有较高的准确率。
    Abstract This paper presents a novel probabilistic data-driven approach to trip-level energy consumption estimation of battery electric vehicles (BEVs). As there are very few electric vehicle (EV) charging stations, EV trip energy consumption estimation can make EV routing and charging planning easier for drivers. In this research article, we propose a new driver behaviour-centric EV energy consumption estimation model using probabilistic neural networks with model uncertainty. By incorporating model uncertainty into neural networks, we have created an ensemble of neural networks using Monte Carlo approximation. Our method comprehensively considers various vehicle dynamics, driver behaviour and environmental factors to estimate EV energy consumption for a given trip. We propose relative positive acceleration (RPA), average acceleration and average deceleration as driver behaviour factors in EV energy consumption estimation and this paper shows that the use of these driver behaviour features improves the accuracy of the EV energy consumption model significantly. Instead of predicting a single-point estimate for EV trip energy consumption, this proposed method predicts a probability distribution for the EV trip energy consumption. The experimental results of our approach show that our proposed probabilistic neural network with weight uncertainty achieves a mean absolute percentage error of 9.3% and outperforms other existing EV energy consumption models in terms of accuracy.
    摘要 The proposed method uses probabilistic neural networks with model uncertainty to estimate EV energy consumption. By incorporating model uncertainty into neural networks, an ensemble of neural networks is created using Monte Carlo approximation. The method considers various vehicle dynamics, driver behavior, and environmental factors to estimate EV energy consumption for a given trip.The proposed method uses relative positive acceleration (RPA), average acceleration, and average deceleration as driver behavior factors in EV energy consumption estimation. The experimental results show that the proposed method outperforms other existing EV energy consumption models in terms of accuracy, with a mean absolute percentage error of 9.3%. Instead of predicting a single-point estimate for EV trip energy consumption, the proposed method predicts a probability distribution for the EV trip energy consumption.Here is the translation in Simplified Chinese:这篇论文提出了一种新的可能性推理数据驱动的方法来估算电动汽车(BEVs)的旅程级能 consumption。由于有很少的电动汽车充电站, точно估算电动汽车旅程能 consumption可以让驾驶员更容易进行电动汽车路径和充电规划。提议的方法使用可能性神经网络 WITH 模型不确定性来估算电动汽车的能 consumption。通过将模型不确定性引入神经网络中,我们创建了一个 ensemble of 神经网络 using Monte Carlo approximation。该方法广泛考虑了不同的汽车动力学、驾驶行为和环境因素,以估算电动汽车旅程能 consumption。提议的方法使用幂加速度(RPA)、均值加速度和均值减速度作为驾驶行为因素。实验结果表明,提议的方法在准确性方面超过了其他现有的电动汽车能 consumption模型, Mean Absolute Percentage Error 为9.3%。而不是预测单点的电动汽车旅程能 consumption,提议的方法预测了电动汽车旅程能 consumption的概率分布。

MissDiff: Training Diffusion Models on Tabular Data with Missing Values

  • paper_url: http://arxiv.org/abs/2307.00467
  • repo_url: None
  • paper_authors: Yidong Ouyang, Liyan Xie, Chongxuan Li, Guang Cheng
  • for: 模型学习从数据中缺失值的数据分布,以便在各种实际应用中处理缺失数据。
  • methods: 提出了一种统一的涉及原理的扩展 diffusion 模型,可以在不同的缺失机制下学习数据分布。
  • results: 与 state-of-the-art diffusion model 比较,在多个实际的 tabular 数据集上达到了较大的性能提升。
    Abstract The diffusion model has shown remarkable performance in modeling data distributions and synthesizing data. However, the vanilla diffusion model requires complete or fully observed data for training. Incomplete data is a common issue in various real-world applications, including healthcare and finance, particularly when dealing with tabular datasets. This work presents a unified and principled diffusion-based framework for learning from data with missing values under various missing mechanisms. We first observe that the widely adopted "impute-then-generate" pipeline may lead to a biased learning objective. Then we propose to mask the regression loss of Denoising Score Matching in the training phase. We prove the proposed method is consistent in learning the score of data distributions, and the proposed training objective serves as an upper bound for the negative likelihood in certain cases. The proposed framework is evaluated on multiple tabular datasets using realistic and efficacious metrics and is demonstrated to outperform state-of-the-art diffusion model on tabular data with "impute-then-generate" pipeline by a large margin.
    摘要 diffusion 模型在数据分布模型和数据合成方面表现出色,但是普通的扩散模型需要完整或完全观察到的数据进行训练。实际应用中,包括医疗和金融等领域,数据中存在缺失数据是一个常见的问题。这项工作提出了一种统一的、原则正的扩散模型基于缺失数据学习框架,用于处理不同的缺失机制。我们首先发现,通过"填充然后生成"管道可能会导致偏向的学习目标。然后,我们提议在训练阶段对杜邦Score匹配的损失进行遮盖。我们证明了我们提议的方法是学习数据分布的分数的一种一致的方法,并且我们提议的训练目标为certain cases中的负概率下界。我们的框架在多个实际的 tabular 数据集上进行了实际和有效的评估,并与state-of-the-art扩散模型在 tabular 数据上"填充然后生成"管道的比较中表现出了大幅度的超越。

Towards Unbiased Exploration in Partial Label Learning

  • paper_url: http://arxiv.org/abs/2307.00465
  • repo_url: None
  • paper_authors: Zsolt Zombori, Agapi Rissaki, Kristóf Szabó, Wolfgang Gatterbauer, Michael Benedikt
  • for: 这篇论文是用于描述一种基于半标注的学习方法,该方法可以在标准神经网络架构中使用softmax层进行推理,并且可以减轻softmax层中的偏见现象,以便更好地探索多个可能性。
  • methods: 这篇论文使用了一种新的损失函数,该损失函数可以减轻softmax层中的偏见现象,并且可以在标准神经网络架构中进行不偏的探索。
  • results: 论文通过对синтетиче数据、标准半标注benchmark和一个新的规则学习挑战中的数据进行广泛的评估,证明了该损失函数的有效性。
    Abstract We consider learning a probabilistic classifier from partially-labelled supervision (inputs denoted with multiple possibilities) using standard neural architectures with a softmax as the final layer. We identify a bias phenomenon that can arise from the softmax layer in even simple architectures that prevents proper exploration of alternative options, making the dynamics of gradient descent overly sensitive to initialisation. We introduce a novel loss function that allows for unbiased exploration within the space of alternative outputs. We give a theoretical justification for our loss function, and provide an extensive evaluation of its impact on synthetic data, on standard partially labelled benchmarks and on a contributed novel benchmark related to an existing rule learning challenge.
    摘要 我们考虑从受限的指导下学习一个概率分类器,使用标准的神经网络架构,并在最终层使用softmax层。我们发现在简单的架构中,softmax层可能会带来偏袋现象,使得梯度下降的几何对初始化的敏感性。我们介绍了一个新的损失函数,允许不偏的探索多个出力空间中的选项。我们提供了理论上的说明,并对实验数据、标准受限 benchmark和一个新的规则学习挑战中的贡献 benchmark进行了广泛的评估。

FedDefender: Backdoor Attack Defense in Federated Learning

  • paper_url: http://arxiv.org/abs/2307.08672
  • repo_url: https://github.com/warisgill/FedDefender
  • paper_authors: Waris Gill, Ali Anwar, Muhammad Ali Gulzar
  • for: 防止targeted poisoning攻击在 Federated Learning (FL) 中
  • methods: 利用差异测试方法来识别可疑客户端(包含后门)
  • results: 在 MNIST 和 FashionMNIST 数据集上,FedDefender 有效地 mitigates 攻击,从而降低攻击成功率至 10%,无需对全球模型性能产生负面影响。
    Abstract Federated Learning (FL) is a privacy-preserving distributed machine learning technique that enables individual clients (e.g., user participants, edge devices, or organizations) to train a model on their local data in a secure environment and then share the trained model with an aggregator to build a global model collaboratively. In this work, we propose FedDefender, a defense mechanism against targeted poisoning attacks in FL by leveraging differential testing. Our proposed method fingerprints the neuron activations of clients' models on the same input and uses differential testing to identify a potentially malicious client containing a backdoor. We evaluate FedDefender using MNIST and FashionMNIST datasets with 20 and 30 clients, and our results demonstrate that FedDefender effectively mitigates such attacks, reducing the attack success rate (ASR) to 10\% without deteriorating the global model performance.
    摘要 federated 学习(FL)是一种隐私保护的分布式机器学习技术,允许个体客户端(例如用户参与者、边缘设备或组织)在安全环境中使用本地数据进行模型训练,然后将训练好的模型分享给汇总器以建立全球模型。在这项工作中,我们提议了 FedDefender,一种针对攻击型投毒攻击的防御机制,通过对客户端模型的神经元活动进行差异测试来识别可能有恶意后门的客户端。我们使用 MNIST 和 FashionMNIST 数据集,并与 20 和 30 个客户端进行评估,结果表明,FedDefender 有效地防止了这类攻击,攻击成功率降低至 10%,而全球模型性能不受影响。

Conformer LLMs – Convolution Augmented Large Language Models

  • paper_url: http://arxiv.org/abs/2307.00461
  • repo_url: None
  • paper_authors: Prateek Verma
  • for: 这个研究旨在将两种受欢迎的神经架构combined,即卷积层和Transformers,以应用于大型语言模型(LLMs)。
  • methods: 这个研究使用了非 causal 的卷积层,并将它们应用于 causal 的训练架构中。transformers decoder 优化了跨多modalities的长距离依赖,并成为现代机器学习的核心进步。
  • results: 这个研究获得了显著的性能提升,并显示了一个可靠的语音架构,可以在 causal 设置中进行集成和适应。
    Abstract This work builds together two popular blocks of neural architecture, namely convolutional layers and Transformers, for large language models (LLMs). Non-causal conformers are used ubiquitously in automatic speech recognition. This work aims to adapt these architectures in a causal setup for training LLMs. Transformers decoders effectively capture long-range dependencies over several modalities and form a core backbone of modern advancements in machine learning. Convolutional architectures have been popular in extracting features in domains such as raw 1-D signals, speech, and images, to name a few. In this paper, by combining local and global dependencies over latent representations using causal convolutional filters and Transformer, we achieve significant gains in performance. This work showcases a robust speech architecture that can be integrated and adapted in a causal setup beyond speech applications for large-scale language modeling.
    摘要

GenRec: Large Language Model for Generative Recommendation

  • paper_url: http://arxiv.org/abs/2307.00457
  • repo_url: https://github.com/rutgerswiselab/genrec
  • paper_authors: Jianchao Ji, Zelong Li, Shuyuan Xu, Wenyue Hua, Yingqiang Ge, Juntao Tan, Yongfeng Zhang
  • For: This paper presents a novel approach to recommendation systems using large language models (LLMs) based on text data, which can directly generate the target item to recommend rather than calculating ranking scores for each candidate item.* Methods: The proposed approach leverages the vast knowledge encoded in large language models to accomplish recommendation tasks, and uses specialized prompts to enhance the ability of LLM to comprehend recommendation tasks.* Results: The proposed GenRec approach has significant better results on large datasets, and the experiments show the potential of LLM-based generative recommendation in revolutionizing the domain of recommendation systems.Here’s the summary in Chinese:* 用这篇论文,我们提出了一种基于文本数据的推荐系统方法,可以直接生成目标项目的推荐,而不需要计算每个候选项目的排名分数。* 我们的方法利用大型自然语言模型(LLM)的优秀表现能力,实现推荐任务。我们使用特殊的提示来增强LLM的理解能力,以便更好地理解用户喜好和项目特性。* 我们的实验结果显示,我们的GenRec方法在大型数据集上有很好的表现,并证明了LLM基于生成推荐的潜在力量。
    Abstract In recent years, large language models (LLM) have emerged as powerful tools for diverse natural language processing tasks. However, their potential for recommender systems under the generative recommendation paradigm remains relatively unexplored. This paper presents an innovative approach to recommendation systems using large language models (LLMs) based on text data. In this paper, we present a novel LLM for generative recommendation (GenRec) that utilized the expressive power of LLM to directly generate the target item to recommend, rather than calculating ranking score for each candidate item one by one as in traditional discriminative recommendation. GenRec uses LLM's understanding ability to interpret context, learn user preferences, and generate relevant recommendation. Our proposed approach leverages the vast knowledge encoded in large language models to accomplish recommendation tasks. We first we formulate specialized prompts to enhance the ability of LLM to comprehend recommendation tasks. Subsequently, we use these prompts to fine-tune the LLaMA backbone LLM on a dataset of user-item interactions, represented by textual data, to capture user preferences and item characteristics. Our research underscores the potential of LLM-based generative recommendation in revolutionizing the domain of recommendation systems and offers a foundational framework for future explorations in this field. We conduct extensive experiments on benchmark datasets, and the experiments shows that our GenRec has significant better results on large dataset.
    摘要 We first formulate specialized prompts to enhance the ability of LLM to comprehend recommendation tasks. Then, we use these prompts to fine-tune the LLaMA backbone LLM on a dataset of user-item interactions, represented by textual data, to capture user preferences and item characteristics. Our research highlights the potential of LLM-based generative recommendation in revolutionizing the domain of recommendation systems and provides a foundational framework for future explorations in this field. We conduct extensive experiments on benchmark datasets, and the results show that our GenRec method has significantly better performance on large datasets.

3D-IDS: Doubly Disentangled Dynamic Intrusion Detection

  • paper_url: http://arxiv.org/abs/2307.11079
  • repo_url: None
  • paper_authors: Chenyang Qiu, Yingsheng Geng, Junrui Lu, Kaida Chen, Shitong Zhu, Ya Su, Guoshun Nan, Can Zhang, Junsong Fu, Qimei Cui, Xiaofeng Tao
  • for: 提高网络入侵检测系统(NIDS)的检测精度和可解释性,帮助旁减不良攻击对信息基础设施的威胁。
  • methods: 提出了一种基于两步特征分解和动态图diffusion算法的新方法(3D-IDS),通过自动对复杂的攻击特征进行非参数化优化,生成攻击特征表示,并使用动态图diffusion方法进行空间时间聚合,有效地识别多种攻击,包括未知威胁和已知威胁。
  • results: 实验表明,3D-IDS可以有效地识别多种攻击,包括未知威胁和已知威胁,并且比现有方法更高的检测精度和可解释性。
    Abstract Network-based intrusion detection system (NIDS) monitors network traffic for malicious activities, forming the frontline defense against increasing attacks over information infrastructures. Although promising, our quantitative analysis shows that existing methods perform inconsistently in declaring various unknown attacks (e.g., 9% and 35% F1 respectively for two distinct unknown threats for an SVM-based method) or detecting diverse known attacks (e.g., 31% F1 for the Backdoor and 93% F1 for DDoS by a GCN-based state-of-the-art method), and reveals that the underlying cause is entangled distributions of flow features. This motivates us to propose 3D-IDS, a novel method that aims to tackle the above issues through two-step feature disentanglements and a dynamic graph diffusion scheme. Specifically, we first disentangle traffic features by a non-parameterized optimization based on mutual information, automatically differentiating tens and hundreds of complex features of various attacks. Such differentiated features will be fed into a memory model to generate representations, which are further disentangled to highlight the attack-specific features. Finally, we use a novel graph diffusion method that dynamically fuses the network topology for spatial-temporal aggregation in evolving data streams. By doing so, we can effectively identify various attacks in encrypted traffics, including unknown threats and known ones that are not easily detected. Experiments show the superiority of our 3D-IDS. We also demonstrate that our two-step feature disentanglements benefit the explainability of NIDS.
    摘要 To address these issues, we propose 3D-IDS, a novel method that utilizes two-step feature disentanglement and a dynamic graph diffusion scheme. First, we disentangle traffic features using a non-parameterized optimization based on mutual information, automatically differentiating tens and hundreds of complex features of various attacks. These differentiated features are then fed into a memory model to generate representations, which are further disentangled to highlight attack-specific features. Finally, we use a novel graph diffusion method that dynamically fuses the network topology for spatial-temporal aggregation in evolving data streams. This enables effective identification of various attacks in encrypted traffics, including unknown threats and known ones that are not easily detected.Experiments show the superiority of our 3D-IDS. Additionally, we demonstrate that our two-step feature disentanglements improve the explainability of NIDS.

An Adaptive Optimization Approach to Personalized Financial Incentives in Mobile Behavioral Weight Loss Interventions

  • paper_url: http://arxiv.org/abs/2307.00444
  • repo_url: None
  • paper_authors: Qiaomei Li, Kara L. Gavin, Corrine I. Voils, Yonatan Mintz
  • for: 本研究旨在设计个性化的营养干预,使用直接金钱奖励来鼓励身高减轻,同时保持在研究预算内。
  • methods: 本研究使用机器学习方法预测参与者如何响应不同奖励计划,并在Behavioral 干预中使用这些预测来定制奖励。
  • results: 研究结果表明,个性化奖励设计可以提高营养干预的效果和经济性。
    Abstract Obesity is a critical healthcare issue affecting the United States. The least risky treatments available for obesity are behavioral interventions meant to promote diet and exercise. Often these interventions contain a mobile component that allows interventionists to collect participants level data and provide participants with incentives and goals to promote long term behavioral change. Recently, there has been interest in using direct financial incentives to promote behavior change. However, adherence is challenging in these interventions, as each participant will react differently to different incentive structure and amounts, leading researchers to consider personalized interventions. The key challenge for personalization, is that the clinicians do not know a priori how best to administer incentives to participants, and given finite intervention budgets how to disburse costly resources efficiently. In this paper, we consider this challenge of designing personalized weight loss interventions that use direct financial incentives to motivate weight loss while remaining within a budget. We create a machine learning approach that is able to predict how individuals may react to different incentive schedules within the context of a behavioral intervention. We use this predictive model in an adaptive framework that over the course of the intervention computes what incentives to disburse to participants and remain within the study budget. We provide both theoretical guarantees for our modeling and optimization approaches as well as demonstrate their performance in a simulated weight loss study. Our results highlight the cost efficiency and effectiveness of our personalized intervention design for weight loss.
    摘要 肥胖是美国医疗系统中的一个严重问题。最安全有效的肥胖治疗方法是行为改变方法,包括提倡饮食和运动。这些方法经常包括移动组件,允许 interveners 收集参与者的数据并为参与者提供激励和目标,以促进长期行为变化。在最近,有兴趣使用直接金钱激励来促进行为变化。然而,遵循性困难,因为每个参与者都会不同地对不同的激励结构和金额响应不同。这导致研究人员考虑个性化 intervención。个性化挑战是,临床医生不知道在先知道如何向参与者分配激励,以及如何有效地分配有限的投资资源。在这篇论文中,我们考虑这个个性化肥胖损重优化问题。我们开发了一种机器学习方法,可以预测参与者如何响应不同的激励计划。我们使用这个预测模型,在行为改变方法中进行adaptive框架,在训练期间计算怎样分配激励,以保持在研究预算内。我们提供了理论保证和优化方法的实践表现,并在模拟的肥胖损重研究中证明了我们的个性化 intervención的成本效果。我们的结果表明,我们的个性化 intervención设计可以有效地促进肥胖损重。

One Copy Is All You Need: Resource-Efficient Streaming of Medical Imaging Data at Scale

  • paper_url: http://arxiv.org/abs/2307.00438
  • repo_url: https://github.com/um2ii/openjphpy
  • paper_authors: Pranav Kulkarni, Adway Kanhere, Eliot Siegel, Paul H. Yi, Vishwa S. Parekh
  • for: 这篇论文是为了解决医疗影像数据集大量化问题,并且提高人工智能工具的开发速度。
  • methods: 这篇论文使用了一个开源框架called MIST,实现了进度分辨率的运算过程,允许用户在不同的分辨率下载取医疗影像。
  • results: 这篇论文的结果显示,使用MIST可以将医疗影像集中存储和流式处理的设备不足问题降低>90%,并且维持深度学习应用中的诊断质量。
    Abstract Large-scale medical imaging datasets have accelerated development of artificial intelligence tools for clinical decision support. However, the large size of these datasets is a bottleneck for users with limited storage and bandwidth. Many users may not even require such large datasets as AI models are often trained on lower resolution images. If users could directly download at their desired resolution, storage and bandwidth requirements would significantly decrease. However, it is impossible to anticipate every users' requirements and impractical to store the data at multiple resolutions. What if we could store images at a single resolution but send them at different ones? We propose MIST, an open-source framework to operationalize progressive resolution for streaming medical images at multiple resolutions from a single high-resolution copy. We demonstrate that MIST can dramatically reduce imaging infrastructure inefficiencies for hosting and streaming medical images by >90%, while maintaining diagnostic quality for deep learning applications.
    摘要 大规模医疗影像数据集的扩大已经推动了人工智能工具的临床决策支持发展。然而,这些大规模数据集的大小成为了用户储存和带宽限制的瓶颈。许多用户可能不需要这样大的数据集,因为人工智能模型通常是在更低的分辨率图像上训练的。如果用户可以直接下载他们所需的分辨率,储存和带宽需求将会减少很多。然而,预测每个用户的需求是不可能的,并且存储数据在多个分辨率下是不实用的。我们提出了MIST框架,一个开源的框架,用于实现进行式分辨率的流动医疗影像。我们示示了MIST可以减少医疗影像基础设施的不fficient的使用>90%,而且保持深度学习应用的诊断质量。

Data-Driven Design for Metamaterials and Multiscale Systems: A Review

  • paper_url: http://arxiv.org/abs/2307.05506
  • repo_url: None
  • paper_authors: Doksoo Lee, Wei Wayne Chen, Liwei Wang, Yu-Chin Chan, Wei Chen
  • for: 这篇论文旨在探讨数据驱动设计方法在Meta材料设计中的潜力。
  • methods: 该论文使用数据收集、机器学习基于单元细胞设计和数据驱动多尺度优化等方法来实现数据驱动设计。
  • results: 论文提出了一种束缚数据驱动设计的总体方法,并将现有研究分为数据驱动模块,包括数据收集、机器学习基于单元细胞设计和数据驱动多尺度优化等方法。
    Abstract Metamaterials are artificial materials designed to exhibit effective material parameters that go beyond those found in nature. Composed of unit cells with rich designability that are assembled into multiscale systems, they hold great promise for realizing next-generation devices with exceptional, often exotic, functionalities. However, the vast design space and intricate structure-property relationships pose significant challenges in their design. A compelling paradigm that could bring the full potential of metamaterials to fruition is emerging: data-driven design. In this review, we provide a holistic overview of this rapidly evolving field, emphasizing the general methodology instead of specific domains and deployment contexts. We organize existing research into data-driven modules, encompassing data acquisition, machine learning-based unit cell design, and data-driven multiscale optimization. We further categorize the approaches within each module based on shared principles, analyze and compare strengths and applicability, explore connections between different modules, and identify open research questions and opportunities.
    摘要 美特材料是人造材料,旨在实现自然界之外的效果。它们由单元细胞组合而成,单元细胞具有丰富的设计性,可以组成多尺度系统。这些材料具有极高的潜在功能,但是设计困难重大,因为设计空间庞大,结构-性能关系复杂。一种吸引人的思想是数据驱动设计,这种思想在这篇文章中得到了详细的介绍。我们将现有的研究分为三个数据驱动模块:数据收集、机器学习基于单元细胞设计和数据驱动多尺度优化。每个模块都包含不同的方法,我们根据共同原则分类和分析它们。我们还探讨了不同模块之间的连接,并评估了各模块的优劣和适用范围。最后,我们还提出了一些未解决的研究问题和机遇。

Sparsity-aware generalization theory for deep neural networks

  • paper_url: http://arxiv.org/abs/2307.00426
  • repo_url: None
  • paper_authors: Ramchandran Muthukumar, Jeremias Sulam
  • for: 本研究旨在探讨深度人工神经网络的泛化能力,并提出一种新的分析方法来解释这种泛化能力。
  • methods: 本研究使用了深度循环神经网络,并开发了一种基于隐藏层活动的度量方法来衡量模型的泛化能力。
  • results: 研究发现,隐藏层活动的度量可以用于衡量模型的泛化能力,并且可以提供非虚假的下界,即使模型具有较高的参数数量。
    Abstract Deep artificial neural networks achieve surprising generalization abilities that remain poorly understood. In this paper, we present a new approach to analyzing generalization for deep feed-forward ReLU networks that takes advantage of the degree of sparsity that is achieved in the hidden layer activations. By developing a framework that accounts for this reduced effective model size for each input sample, we are able to show fundamental trade-offs between sparsity and generalization. Importantly, our results make no strong assumptions about the degree of sparsity achieved by the model, and it improves over recent norm-based approaches. We illustrate our results numerically, demonstrating non-vacuous bounds when coupled with data-dependent priors in specific settings, even in over-parametrized models.
    摘要 深度人工神经网络实现了奇异的泛化能力,这些能力尚未得到充分理解。在这篇论文中,我们提出了一种新的分析泛化方法,利用隐藏层活动的稀畴程度来考虑。我们开发了一个考虑这个减少的有效模型大小的框架,以便为每个输入样本表示基准。我们可以显示泛化和稀畴之间存在基本的负面关系,这些结果不假设模型达到了哪个水平的稀畴程度。我们的结果超过了最近的 нор-based方法。我们通过数值计算示出了非虚无关的下界,即使在过参数化模型中。

Adaptive Algorithms for Relaxed Pareto Set Identification

  • paper_url: http://arxiv.org/abs/2307.00424
  • repo_url: None
  • paper_authors: Cyrille Kone, Emilie Kaufmann, Laura Richert
  • for: 本文研究了一种多目标多枪支牛津模型中的固定信任性识别Pareto优点集的问题。由于确定精确的Pareto集可能需要很大的样本量,因此研究了一种允许输出一些近似优点枪支的放松。此外,本文还研究了其他放松方法,允许Identify一个相关的Pareto集子集。
  • methods: 本文提出了一种单一的抽样策略,called Adaptive Pareto Exploration,可以与不同的停止规则结合使用,以满足不同的放松。本文还分析了不同组合的抽样复杂度,特别是在寻找最多$k$ Pareto优点枪支时的减少样本复杂度。
  • results: 本文在一个实际应用中展示了Adaptive Pareto Exploration的良好实践性,在考虑多个免疫力标准时选择 Covid-19 疫苗策略的问题上。
    Abstract In this paper we revisit the fixed-confidence identification of the Pareto optimal set in a multi-objective multi-armed bandit model. As the sample complexity to identify the exact Pareto set can be very large, a relaxation allowing to output some additional near-optimal arms has been studied. In this work we also tackle alternative relaxations that allow instead to identify a relevant subset of the Pareto set. Notably, we propose a single sampling strategy, called Adaptive Pareto Exploration, that can be used in conjunction with different stopping rules to take into account different relaxations of the Pareto Set Identification problem. We analyze the sample complexity of these different combinations, quantifying in particular the reduction in sample complexity that occurs when one seeks to identify at most $k$ Pareto optimal arms. We showcase the good practical performance of Adaptive Pareto Exploration on a real-world scenario, in which we adaptively explore several vaccination strategies against Covid-19 in order to find the optimal ones when multiple immunogenicity criteria are taken into account.
    摘要 在这篇论文中,我们重新审视了多目标多机枪猎猎人模型中固定信心的标准化集的预测。由于样本复杂度来确定精确的Pareto集可以非常大,因此有关输出一些附加的近似优致的机枪的放弃措施的研究。在这种工作中,我们也研究了不同的放弃措施,允许在Pareto集标准化问题中确定一个相关的子集。特别是,我们提出了一种单一的采样策略,called Adaptive Pareto Exploration,可以与不同的停止规则结合使用,以便在不同的放弃措施中考虑不同的Pareto集标准化问题。我们分析了不同组合的样本复杂度,特别是在寻找最多$k$ Pareto优致机枪时的样本复杂度减少。我们还展示了Adaptive Pareto Exploration在一个实际场景中的良好实践性,在多种immunogenicity标准下对covid-19疫苗的适应性进行了可控的探索。

JoinBoost: Grow Trees Over Normalized Data Using Only SQL

  • paper_url: http://arxiv.org/abs/2307.00422
  • repo_url: None
  • paper_authors: Zezhou Huang, Rathijit Sen, Jiaxiang Liu, Eugene Wu
  • for: 这种论文的目的是提出一种基于 SQL 的内存中 ML 系统,以避免数据移动和提供数据管理。
  • methods: 这种系统使用了纯 SQL 语句来训练树型模型,并且可以在任何 DBMS 上运行。
  • results: 实验表明,JoinBoost 比特有限的 LightGBM 快三倍(1.1倍),并且与现有的内存中 ML 系统相比,速度超过一个数量级。此外,JoinBoost 可以跨越 LightGBM 的特点,包括特性数、数据库大小和Join图复杂度。
    Abstract Although dominant for tabular data, ML libraries that train tree models over normalized databases (e.g., LightGBM, XGBoost) require the data to be denormalized as a single table, materialized, and exported. This process is not scalable, slow, and poses security risks. In-DB ML aims to train models within DBMSes to avoid data movement and provide data governance. Rather than modify a DBMS to support In-DB ML, is it possible to offer competitive tree training performance to specialized ML libraries...with only SQL? We present JoinBoost, a Python library that rewrites tree training algorithms over normalized databases into pure SQL. It is portable to any DBMS, offers performance competitive with specialized ML libraries, and scales with the underlying DBMS capabilities. JoinBoost extends prior work from both algorithmic and systems perspectives. Algorithmically, we support factorized gradient boosting, by updating the $Y$ variable to the residual in the non-materialized join result. Although this view update problem is generally ambiguous, we identify addition-to-multiplication preserving, the key property of variance semi-ring to support rmse, the most widely used criterion. System-wise, we identify residual updates as a performance bottleneck. Such overhead can be natively minimized on columnar DBMSes by creating a new column of residual values and adding it as a projection. We validate this with two implementations on DuckDB, with no or minimal modifications to its internals for portability. Our experiment shows that JoinBoost is 3x (1.1x) faster for random forests (gradient boosting) compared to LightGBM, and over an order magnitude faster than state-of-the-art In-DB ML systems. Further, JoinBoost scales well beyond LightGBM in terms of the # features, DB size (TPC-DS SF=1000), and join graph complexity (galaxy schemas).
    摘要 although dominant for tabular data, machine learning(ML)libraries that train tree models over normalized databases(e.g., LightGBM, XGBoost)require the data to be denormalized as a single table, materialized, and exported. This process is not scalable, slow, and poses security risks. In-DB ML aims to train models within DBMSes to avoid data movement and provide data governance. Rather than modify a DBMS to support In-DB ML, is it possible to offer competitive tree training performance to specialized ML libraries...with only SQL? We present JoinBoost, a Python library that rewrites tree training algorithms over normalized databases into pure SQL. It is portable to any DBMS, offers performance competitive with specialized ML libraries, and scales with the underlying DBMS capabilities. JoinBoost extends prior work from both algorithmic and systems perspectives. Algorithmically, we support factorized gradient boosting, by updating the $Y$ variable to the residual in the non-materialized join result. Although this view update problem is generally ambiguous, we identify addition-to-multiplication preserving, the key property of variance semi-ring to support rmse, the most widely used criterion. System-wise, we identify residual updates as a performance bottleneck. Such overhead can be natively minimized on columnar DBMSes by creating a new column of residual values and adding it as a projection. We validate this with two implementations on DuckDB, with no or minimal modifications to its internals for portability. Our experiment shows that JoinBoost is 3x (1.1x) faster for random forests (gradient boosting) compared to LightGBM, and over an order magnitude faster than state-of-the-art In-DB ML systems. Further, JoinBoost scales well beyond LightGBM in terms of the # features, DB size (TPC-DS SF=1000), and join graph complexity (galaxy schemas).

Provably Efficient UCB-type Algorithms For Learning Predictive State Representations

  • paper_url: http://arxiv.org/abs/2307.00405
  • repo_url: None
  • paper_authors: Ruiquan Huang, Yingbin Liang, Jing Yang
  • for: 本研究旨在提高累积奖励的策略选择问题,包括Markov决策过程(MDPs)和部分可见MDPs(POMDPs)为特殊情况。
  • methods: 该研究提出了首个已知的UCB类型方法,基于预测状态表示(PSRs),其中包括一个新的奖励项来Upper bound total variation distance between estimated和true模型。
  • results: 我们计算出了在线和离线PSRs的样本复杂性下界,并证明了我们的设计的UCB类型算法具有计算效率、最后一轮保证近似优策、和模型准确性的优点。
    Abstract The general sequential decision-making problem, which includes Markov decision processes (MDPs) and partially observable MDPs (POMDPs) as special cases, aims at maximizing a cumulative reward by making a sequence of decisions based on a history of observations and actions over time. Recent studies have shown that the sequential decision-making problem is statistically learnable if it admits a low-rank structure modeled by predictive state representations (PSRs). Despite these advancements, existing approaches typically involve oracles or steps that are not computationally efficient. On the other hand, the upper confidence bound (UCB) based approaches, which have served successfully as computationally efficient methods in bandits and MDPs, have not been investigated for more general PSRs, due to the difficulty of optimistic bonus design in these more challenging settings. This paper proposes the first known UCB-type approach for PSRs, featuring a novel bonus term that upper bounds the total variation distance between the estimated and true models. We further characterize the sample complexity bounds for our designed UCB-type algorithms for both online and offline PSRs. In contrast to existing approaches for PSRs, our UCB-type algorithms enjoy computational efficiency, last-iterate guaranteed near-optimal policy, and guaranteed model accuracy.
    摘要 通用顺序决策问题(包括Markov决策过程(MDPs)和部分可见MDPs(POMDPs)为特例)的目标是通过时间序列的决策来 maximize 累积奖励。 recent studies have shown that this problem is statistically learnable if it admits a low-rank structure modeled by predictive state representations (PSRs). However, existing approaches typically involve oracles or computationally inefficient steps. On the other hand, the upper confidence bound (UCB) based approaches, which have been successful in bandits and MDPs, have not been investigated for more general PSRs due to the difficulty of optimistic bonus design in these more challenging settings. This paper proposes the first known UCB-type approach for PSRs, featuring a novel bonus term that upper bounds the total variation distance between the estimated and true models. We further characterize the sample complexity bounds for our designed UCB-type algorithms for both online and offline PSRs. Unlike existing approaches for PSRs, our UCB-type algorithms enjoy computational efficiency, last-iterate guaranteed near-optimal policy, and guaranteed model accuracy.Here's the word-for-word translation of the text into Simplified Chinese:通用顺序决策问题(包括Markov决策过程(MDPs)和部分可见MDPs(POMDPs)为特例)的目标是通过时间序列的决策来 maximize 累积奖励。 recent studies have shown that this problem is statistically learnable if it admits a low-rank structure modeled by predictive state representations (PSRs). however, existing approaches typically involve oracles or computationally inefficient steps. On the other hand, the upper confidence bound (UCB) based approaches, which have been successful in bandits and MDPs, have not been investigated for more general PSRs due to the difficulty of optimistic bonus design in these more challenging settings. this paper proposes the first known UCB-type approach for PSRs, featuring a novel bonus term that upper bounds the total variation distance between the estimated and true models. we further characterize the sample complexity bounds for our designed UCB-type algorithms for both online and offline PSRs. unlike existing approaches for PSRs, our UCB-type algorithms enjoy computational efficiency, last-iterate guaranteed near-optimal policy, and guaranteed model accuracy.

ProbVLM: Probabilistic Adapter for Frozen Vison-Language Models

  • paper_url: http://arxiv.org/abs/2307.00398
  • repo_url: https://github.com/ExplainableML/ProbVLM
  • paper_authors: Uddeshya Upadhyay, Shyamgopal Karthik, Massimiliano Mancini, Zeynep Akata
  • for: 本研究旨在提高大规模视觉语言模型(VLM)的表现,以实现更好的协同运算和模型选择。
  • methods: 本研究提出了一种 probabilistic adapter,可以在posts-hoc方式中对已经预训练的 VLM 进行概率调整,以估计嵌入空间中的概率分布。
  • results: 在四个挑战性 dataset 上,包括 COCO、Flickr、CUB 和 Oxford-flowers,研究人员可以通过估计嵌入空间中的概率分布,评估 VLM 的嵌入不确定性,并证明 ProbVLM 在回归任务中表现出色。此外,研究人员还提出了两个现实世界下沉浸任务,即活动学习和模型选择,并证明在这些任务中,估计嵌入空间中的概率分布具有很好的帮助作用。最后,研究人员还介绍了一种基于大规模预训练的潜在扩散模型,用于可见化嵌入分布。
    Abstract Large-scale vision-language models (VLMs) like CLIP successfully find correspondences between images and text. Through the standard deterministic mapping process, an image or a text sample is mapped to a single vector in the embedding space. This is problematic: as multiple samples (images or text) can abstract the same concept in the physical world, deterministic embeddings do not reflect the inherent ambiguity in the embedding space. We propose ProbVLM, a probabilistic adapter that estimates probability distributions for the embeddings of pre-trained VLMs via inter/intra-modal alignment in a post-hoc manner without needing large-scale datasets or computing. On four challenging datasets, i.e., COCO, Flickr, CUB, and Oxford-flowers, we estimate the multi-modal embedding uncertainties for two VLMs, i.e., CLIP and BLIP, quantify the calibration of embedding uncertainties in retrieval tasks and show that ProbVLM outperforms other methods. Furthermore, we propose active learning and model selection as two real-world downstream tasks for VLMs and show that the estimated uncertainty aids both tasks. Lastly, we present a novel technique for visualizing the embedding distributions using a large-scale pre-trained latent diffusion model.
    摘要 大规模视力语言模型(VLM)如CLIP成功地找到图像和文本之间的对应关系。通过标准排定 mapping 过程,一个图像或文本样本将映射到 embedding 空间中的单个向量上。这是一个问题:多个样本(图像或文本)可以抽象 Physical 世界中的同一个概念,因此排定 embedding 不会反映 embedding 空间中的内在含义。我们提议 ProbVLM,一种 probabilistic adapter,通过对 pre-trained VLM 的嵌入进行概率分布的估计,在后续方式中无需大规模数据或计算。在四个具有挑战性的 datasets 上,我们估算 pre-trained VLM 的嵌入不确定性,衡量嵌入不确定性的准确性在检索任务中,并示出 ProbVLM 超过其他方法。此外,我们提出了基于 VLM 的活动学习和模型选择两个实际应用任务,并证明估计不确定性可以 aid 这两个任务。最后,我们介绍了一种使用大规模预训练的潜在扩散模型来可见 embedding 分布的新技术。

MobileViG: Graph-Based Sparse Attention for Mobile Vision Applications

  • paper_url: http://arxiv.org/abs/2307.00395
  • repo_url: https://github.com/sldgroup/mobilevig
  • paper_authors: Mustafa Munir, William Avery, Radu Marculescu
    for: This paper proposes a new graph-based sparse attention mechanism (SVGA) and a hybrid CNN-GNN architecture (MobileViG) for vision tasks on mobile devices.methods: The proposed SVGA mechanism is designed to reduce the computational cost of representing images as graph structures, while the MobileViG architecture combines SVGA with a CNN backbone.results: Extensive experiments show that MobileViG outperforms existing ViG models and mobile CNN and ViT architectures in terms of accuracy and/or speed on image classification, object detection, and instance segmentation tasks. The fastest model, MobileViG-Ti, achieves 75.7% top-1 accuracy with 0.78 ms inference latency on iPhone 13 Mini NPU, while the largest model, MobileViG-B, obtains 82.6% top-1 accuracy with only 2.30 ms latency.
    Abstract Traditionally, convolutional neural networks (CNN) and vision transformers (ViT) have dominated computer vision. However, recently proposed vision graph neural networks (ViG) provide a new avenue for exploration. Unfortunately, for mobile applications, ViGs are computationally expensive due to the overhead of representing images as graph structures. In this work, we propose a new graph-based sparse attention mechanism, Sparse Vision Graph Attention (SVGA), that is designed for ViGs running on mobile devices. Additionally, we propose the first hybrid CNN-GNN architecture for vision tasks on mobile devices, MobileViG, which uses SVGA. Extensive experiments show that MobileViG beats existing ViG models and existing mobile CNN and ViT architectures in terms of accuracy and/or speed on image classification, object detection, and instance segmentation tasks. Our fastest model, MobileViG-Ti, achieves 75.7% top-1 accuracy on ImageNet-1K with 0.78 ms inference latency on iPhone 13 Mini NPU (compiled with CoreML), which is faster than MobileNetV2x1.4 (1.02 ms, 74.7% top-1) and MobileNetV2x1.0 (0.81 ms, 71.8% top-1). Our largest model, MobileViG-B obtains 82.6% top-1 accuracy with only 2.30 ms latency, which is faster and more accurate than the similarly sized EfficientFormer-L3 model (2.77 ms, 82.4%). Our work proves that well designed hybrid CNN-GNN architectures can be a new avenue of exploration for designing models that are extremely fast and accurate on mobile devices. Our code is publicly available at https://github.com/SLDGroup/MobileViG.
    摘要 传统上,卷积神经网络(CNN)和视Transformer(ViT)在计算机视觉领域占据主导地位,但最近提出的视图图神经网络(ViG)提供了一个新的探索方向。然而,由于图像表示为图结构所带来的计算开销,ViG在移动设备上是计算昂贵的。在这种情况下,我们提出了一种新的图像 sparse attention机制——图像 sparse vision graph attention(SVGA),用于适应移动设备上的 ViG 运行。此外,我们还提出了首个在移动设备上使用 CNN-GNN 架构的 Hybrid CNN-GNN 模型——MobileViG,该模型使用 SVGA。我们的实验表明,MobileViG 在图像分类、物体检测和实例 segmentation 任务上比现有的 ViG 模型和现有的移动 CNN 和 ViT 架构更高的准确率和/或运行速度。我们的最快模型,MobileViG-Ti,在 ImageNet-1K 上达到了 75.7% 的顶部 1 准确率,并且在 iPhone 13 Mini NPU 上编译 CoreML 时间为 0.78 ms,比 MobileNetV2x1.4 (1.02 ms, 74.7% top-1) 和 MobileNetV2x1.0 (0.81 ms, 71.8% top-1) 更快。我们的最大模型,MobileViG-B,在 82.6% 的顶部 1 准确率下,只需 2.30 ms 的时间,这比 EfficientFormer-L3 模型 (2.77 ms, 82.4%) 更快和更准确。我们的工作证明了,通过设计合适的 Hybrid CNN-GNN 架构,可以在移动设备上设计出EXTREMELY FAST和EXTREMELY ACCURATE的模型。我们的代码可以在 上获取。

CasTGAN: Cascaded Generative Adversarial Network for Realistic Tabular Data Synthesis

  • paper_url: http://arxiv.org/abs/2307.00384
  • repo_url: https://github.com/abedshantti/castgan
  • paper_authors: Abdallah Alshantti, Damiano Varagnolo, Adil Rasheed, Aria Rahmati, Frank Westad
  • for: 本文提出了一种基于生成对抗网络(GAN)的方法,用于生成具有真实性的表格数据,特别是关注Validity问题。
  • methods: 本文提出了一种级联的表格GAN框架(CasTGAN),通过级联的扩展,使生成的数据更加真实地反映原始数据中的特征相互关系。
  • results: 实验结果表明,CasTGAN能够很好地捕捉原始数据中特征之间的相互关系和约束,尤其是高维数据集。此外,对模型进行一些扰动处理可以提高模型对特定攻击的抗性。
    Abstract Generative adversarial networks (GANs) have drawn considerable attention in recent years for their proven capability in generating synthetic data which can be utilized for multiple purposes. While GANs have demonstrated tremendous successes in producing synthetic data samples that replicate the dynamics of the original datasets, the validity of the synthetic data and the underlying privacy concerns represent major challenges which are not sufficiently addressed. In this work, we design a cascaded tabular GAN framework (CasTGAN) for generating realistic tabular data with a specific focus on the validity of the output. In this context, validity refers to the the dependency between features that can be found in the real data, but is typically misrepresented by traditional generative models. Our key idea entails that employing a cascaded architecture in which a dedicated generator samples each feature, the synthetic output becomes more representative of the real data. Our experimental results demonstrate that our model well captures the constraints and the correlations between the features of the real data, especially the high dimensional datasets. Furthermore, we evaluate the risk of white-box privacy attacks on our model and subsequently show that applying some perturbations to the auxiliary learners in CasTGAN increases the overall robustness of our model against targeted attacks.
    摘要

Residual-based attention and connection to information bottleneck theory in PINNs

  • paper_url: http://arxiv.org/abs/2307.00379
  • repo_url: https://github.com/soanagno/rba-pinns
  • paper_authors: Sokratis J. Anagnostopoulos, Juan Diego Toscano, Nikolaos Stergiopulos, George Em Karniadakis
  • for: 本研究旨在提高物理学习机制中的数据集成效率和无缝性。
  • methods: 该研究提出了一种高效、不需要梯度的重量规则,用于加速物理学习机制中的动态或静态系统的收敛。该简单 yet effective 的注意力机制是基于系统的演化准确误差,并且不需要额外的计算成本或反向学习。
  • results: 该研究表明,该重量规则可以在标准优化器上实现相对 $L^{2}$ 误差在 $10^{-5}$ 水平。此外,通过分析训练过程中的权重演化,研究人员发现了两个不同的学习阶段,与信息瓶颈理论(IB)中的匹配和扩散阶段相似。
    Abstract Driven by the need for more efficient and seamless integration of physical models and data, physics-informed neural networks (PINNs) have seen a surge of interest in recent years. However, ensuring the reliability of their convergence and accuracy remains a challenge. In this work, we propose an efficient, gradient-less weighting scheme for PINNs, that accelerates the convergence of dynamic or static systems. This simple yet effective attention mechanism is a function of the evolving cumulative residuals and aims to make the optimizer aware of problematic regions at no extra computational cost or adversarial learning. We illustrate that this general method consistently achieves a relative $L^{2}$ error of the order of $10^{-5}$ using standard optimizers on typical benchmark cases of the literature. Furthermore, by investigating the evolution of weights during training, we identify two distinct learning phases reminiscent of the fitting and diffusion phases proposed by the information bottleneck (IB) theory. Subsequent gradient analysis supports this hypothesis by aligning the transition from high to low signal-to-noise ratio (SNR) with the transition from fitting to diffusion regimes of the adopted weights. This novel correlation between PINNs and IB theory could open future possibilities for understanding the underlying mechanisms behind the training and stability of PINNs and, more broadly, of neural operators.
    摘要 驱动了更高效和无缝的物理模型和数据集成的需求,物理学 informed neural networks(PINNs)在最近几年内得到了广泛的关注。然而,保证其减少和精度的可靠性仍然是一个挑战。在这种工作中,我们提出了一种高效的无梯度权重方案,用于加速动态或静态系统的PINNs的收敛。这种简单 yet effective的注意力机制是函数所处的积累差异,并且在不Extra的计算成本或对抗学习的情况下,使得优化器对问题地带出更多的注意。我们示出,这种通用方法可以在典型的文献中的测试案例中实现相对的L2误差为10^-5水平。此外,通过分析训练过程中权重的发展,我们发现了IB理论中的两个不同学习阶段,即“适应阶段”和“扩散阶段”。这种预测和权重的采用支持了这一假设,并且对PINNs和更广泛的神经运算器的稳定性和训练机制的理解带来了新的可能性。