cs.LG - 2023-07-29

Monaural Multi-Speaker Speech Separation Using Efficient Transformer Model

  • paper_url: http://arxiv.org/abs/2308.00010
  • repo_url: None
  • paper_authors: S. Rijal, R. Neupane, S. P. Mainali, S. K. Regmi, S. Maharjan
  • for: 这篇论文是为了解决cocktail party问题而写的,该问题存在多个说话者的杂音混合中分离 individu speaker的困难。
  • methods: 该论文基于Transformer架构,使用了有效的形式来实现单频多说话者语音分离。模型在使用LibriMix数据集进行训练,可以分离出2个不同的说话者的源音。
  • results: 该模型可以减少语音分离模型的计算复杂性,而不是与传统语音分离模型的性能there is a significant trade-off。这个项目预期将在计算效率为核心的speech separation研究中做出重要贡献。
    Abstract Cocktail party problem is the scenario where it is difficult to separate or distinguish individual speaker from a mixed speech from several speakers. There have been several researches going on in this field but the size and complexity of the model is being traded off with the accuracy and robustness of speech separation. "Monaural multi-speaker speech separation" presents a speech-separation model based on the Transformer architecture and its efficient forms. The model has been trained with the LibriMix dataset containing diverse speakers' utterances. The model separates 2 distinct speaker sources from a mixed audio input. The developed model approaches the reduction in computational complexity of the speech separation model, with minimum tradeoff with the performance of prevalent speech separation model and it has shown significant movement towards that goal. This project foresees, a rise in contribution towards the ongoing research in the field of speech separation with computational efficiency at its core.
    摘要 干预吧问题是一种场景,在混合语音中分 apart或 distinguishing individu speaker 很困难。有很多研究在这个领域,但是模型的大小和复杂度与准确性和可靠性之间存在负面关系。“单频多 speaker speech separation”提出了一种基于 Transformer 架构的 speech-separation 模型,并且其高效的形式。该模型在使用 LibriMix 数据集中训练,可以从混合音频输入中分离出两个不同的 speaker 源。该模型减少了speech separation 模型的计算复杂度,同时保持了与传统 speech separation 模型的性能相似的水平。这个项目预计会对激进的 speech separation 研究做出重要贡献,计算效率为核心。

A 3D deep learning classifier and its explainability when assessing coronary artery disease

  • paper_url: http://arxiv.org/abs/2308.00009
  • repo_url: None
  • paper_authors: Wing Keung Cheung, Jeremy Kalindjian, Robert Bell, Arjun Nair, Leon J. Menezes, Riyaz Patel, Simon Wan, Kacy Chou, Jiahang Chen, Ryo Torii, Rhodri H. Davies, James C. Moon, Daniel C. Alexander, Joseph Jacob
  • for: 预测和诊断抑阻性心血管疾病 (CAD),以保存生命和减少医疗成本。
  • methods: 使用3D Resnet-50深度学习模型直接将正常人群和CAD患者分类在计算机Tomography coronary angiography图像上。
  • results: 比2D Resnet-50模型提高23.65%的准确率,同时提供了 Grad-GAM解释性。此外,还将3D CAD分类连接到2D两类 semantic segmentation,以提高解释性和精确的畸形定位。
    Abstract Early detection and diagnosis of coronary artery disease (CAD) could save lives and reduce healthcare costs. In this study, we propose a 3D Resnet-50 deep learning model to directly classify normal subjects and CAD patients on computed tomography coronary angiography images. Our proposed method outperforms a 2D Resnet-50 model by 23.65%. Explainability is also provided by using a Grad-GAM. Furthermore, we link the 3D CAD classification to a 2D two-class semantic segmentation for improved explainability and accurate abnormality localisation.
    摘要 早期发现和诊断心络动脉疾病(CAD)可以拯救生命和减少医疗成本。在这项研究中,我们提出了一种基于3D Resnet-50深度学习模型的直接分类正常人和CAD患者的计算机Tomography coronary angiography图像。我们的提议方法比2D Resnet-50模型高出23.65%。此外,我们还使用Grad-GAM来提供解释性。此外,我们将3D CAD分类与2D两个分类semantic segmentation相连接,以提高解释性和精确的异常位置定位。Here's the breakdown of the translation:* 早期发现 (early detection) becomes 早期发现 (zhāo qī fāxìn)* 诊断 (diagnosis) becomes 诊断 (diànfǎng)* 心络动脉疾病 (coronary artery disease) becomes 心络动脉疾病 (xīn liàng dòng mǎi byōng bìng)* computed tomography coronary angiography (CTCA) becomes 计算机Tomography coronary angiography (jìsuànjī Tomography coronary angiography)* 模型 (model) becomes 模型 (módelì)* 直接分类 (direct classification) becomes 直接分类 (zhíxí fānglè)* 正常人 (normal subjects) becomes 正常人 (zhèngzhèng rén)* CAD patients becomes CAD患者 (CAD huàyè)* 解释性 (explainability) becomes 解释性 (jiějīngxìng)* Grad-GAM becomes Grad-GAM (Grad-GAM)* 2D two-class semantic segmentation becomes 2D两个分类semantic segmentation (2D liǎnggè fānglè semantic segmentation)

A data-centric deep learning approach to airway segmentation

  • paper_url: http://arxiv.org/abs/2308.00008
  • repo_url: None
  • paper_authors: Wing Keung Cheung, Ashkan Pakzad, Nesrin Mogulkoc, Sarah Needleman, Bojidar Rangelov, Eyjolfur Gudmundsson, An Zhao, Mariam Abbas, Davina McLaverty, Dimitrios Asimakopoulos, Robert Chapman, Recep Savas, Sam M Janes, Yipeng Hu, Daniel C. Alexander, John R Hurst, Joseph Jacob
  • for: 这种研究用于鉴别和诊断各种慢性呼吸疾病中的气道束缚畸形和分布特征,以便估算疾病的EXTENT和严重程度。
  • methods: 该研究提出了一种数据驱动的深度学习技术,用于 segmenting the airway tree。该技术利用了 interpolate 和 image split,以提高数据的有用性和质量。然后,我们实现了一种 ensemble learning 策略,以集成不同缩放的 segmented airway trees。
  • results: 与基线模型相比,我们的方法在使用 combineloss 时,平均提高了 segmentation performance(dice similarity coefficient)的表现,高于基线模型的平均值2.5%。此外,我们的提posed technique具有低的GPU使用量和高的灵活性,可以在任何2D深度学习模型上部署。
    Abstract The morphology and distribution of airway tree abnormalities enables diagnosis and disease characterisation across a variety of chronic respiratory conditions. In this regard, airway segmentation plays a critical role in the production of the outline of the entire airway tree to enable estimation of disease extent and severity. In this study, we propose a data-centric deep learning technique to segment the airway tree. The proposed technique utilises interpolation and image split to improve data usefulness and quality. Then, an ensemble learning strategy is implemented to aggregate the segmented airway trees at different scales. In terms of segmentation performance (dice similarity coefficient), our method outperforms the baseline model by 2.5% on average when a combined loss is used. Further, our proposed technique has a low GPU usage and high flexibility enabling it to be deployed on any 2D deep learning model.
    摘要 《气道树异常 morphology 和分布》能够用于诊断和疾病特征化多种慢性呼吸疾病。在这种情况下,气道分 segmentation 扮演着关键的角色,以生成整个气道树的轮廓,以便估算疾病的扩散和严重程度。本研究提出了一种基于数据的深度学习技术,用于气道分 segmentation。该技术利用 interpolate 和图像分割来提高数据的有用性和质量。然后,我们实施了一种 ensemble learning 策略,将不同缩放的气道树分割结果聚合 together。在 segmentation 性能( dice 相似度)方面,我们的方法在使用 combinated loss 时比基准模型高出 2.5% 的平均值。此外,我们的提议方法具有低 GPU 使用率和高灵活性,可以在任何 2D 深度学习模型上部署。

UPFL: Unsupervised Personalized Federated Learning towards New Clients

  • paper_url: http://arxiv.org/abs/2307.15994
  • repo_url: None
  • paper_authors: Tiandi Ye, Cen Chen, Yinggui Wang, Xiang Li, Ming Gao
  • for: 这篇论文targets the problem of providing personalized models for new clients in federated learning, when the existing model has already been trained and deployed.
  • methods: 本论文提出了一种基于 adaptive risk minimization 技术的方法,named FedTTA,以及两个优化策略:proxy regularization和early-stopping。此外,本论文还提出了一个特有的知识传授损失,用于Addressing device heterogeneity。
  • results: 实验结果显示,FedTTA和其变型获得了优秀的性能,在五个数据集上比基eline eleven 倍进步。code可以在:https://github.com/anonymous-federated-learning/code。
    Abstract Personalized federated learning has gained significant attention as a promising approach to address the challenge of data heterogeneity. In this paper, we address a relatively unexplored problem in federated learning. When a federated model has been trained and deployed, and an unlabeled new client joins, providing a personalized model for the new client becomes a highly challenging task. To address this challenge, we extend the adaptive risk minimization technique into the unsupervised personalized federated learning setting and propose our method, FedTTA. We further improve FedTTA with two simple yet effective optimization strategies: enhancing the training of the adaptation model with proxy regularization and early-stopping the adaptation through entropy. Moreover, we propose a knowledge distillation loss specifically designed for FedTTA to address the device heterogeneity. Extensive experiments on five datasets against eleven baselines demonstrate the effectiveness of our proposed FedTTA and its variants. The code is available at: https://github.com/anonymous-federated-learning/code.
    摘要 <>个人化联合学习已经吸引了广泛的注意力,作为数据不一致性的解决方案。在这篇论文中,我们解决了联合学习中较为未经探索的问题。当一个联合模型已经训练和部署后,新的客户端加入,提供个性化模型 для新客户端是一项非常具有挑战性的任务。为解决这个挑战,我们将适应风险最小化技术扩展到无监督个性化联合学习设置中,并提出我们的方法FedTTA。此外,我们还提出了两种简单 yet 有效的优化策略:在适应模型训练中使用代理约束和在适应过程中使用Entropy来停止。此外,我们还提出了专门为FedTTA设计的知识塑化损失来Address设备不同性。我们在五个数据集上对十一个基eline进行了广泛的实验,并证明了我们的提议FedTTA和其变种的效果。代码可以在:https://github.com/anonymous-federated-learning/code中找到。

Feature Reweighting for EEG-based Motor Imagery Classification

  • paper_url: http://arxiv.org/abs/2308.02515
  • repo_url: None
  • paper_authors: Taveena Lotey, Prateek Keserwani, Debi Prosad Dogra, Partha Pratim Roy
  • for: 这个研究旨在使用非侵入性的电enzephalographic (EEG) 信号进行 motor imagery (MI) 的分类,以预测用户的手部运动意图。
  • methods: 这个研究使用了卷积神经网络 (CNN) 方法进行 MI-EEG 信号的分类,并提出了一个具有降噪特性的特征重新权重方法,以缓解对于 MI-EEG 信号的训练中的问题。
  • results: 实验结果显示,提出的方法可以对 Physionet EEG-MMIDB 和 BCI Competition IV 2a 资料集进行了有效的分类,与现有方法相比,提高了9.34% 和 3.82% 的分类精度。
    Abstract Classification of motor imagery (MI) using non-invasive electroencephalographic (EEG) signals is a critical objective as it is used to predict the intention of limb movements of a subject. In recent research, convolutional neural network (CNN) based methods have been widely utilized for MI-EEG classification. The challenges of training neural networks for MI-EEG signals classification include low signal-to-noise ratio, non-stationarity, non-linearity, and high complexity of EEG signals. The features computed by CNN-based networks on the highly noisy MI-EEG signals contain irrelevant information. Subsequently, the feature maps of the CNN-based network computed from the noisy and irrelevant features contain irrelevant information. Thus, many non-contributing features often mislead the neural network training and degrade the classification performance. Hence, a novel feature reweighting approach is proposed to address this issue. The proposed method gives a noise reduction mechanism named feature reweighting module that suppresses irrelevant temporal and channel feature maps. The feature reweighting module of the proposed method generates scores that reweight the feature maps to reduce the impact of irrelevant information. Experimental results show that the proposed method significantly improved the classification of MI-EEG signals of Physionet EEG-MMIDB and BCI Competition IV 2a datasets by a margin of 9.34% and 3.82%, respectively, compared to the state-of-the-art methods.
    摘要 “ motor 幻想(MI)使用非侵入性电энцефалографи(EEG)信号的分类是一个关键的目标,因为它可以预测用户的肢体运动意图。在latest的研究中,卷积神经网络(CNN)基本方法广泛应用于MI-EEG分类。MI-EEG信号分类训练 neural network 的挑战包括低信号噪声率、非站ARY、非线性和高复杂性的EEG信号。CNN基本方法在高噪声MI-EEG信号上计算的特征包含无关信息。因此,CNN基本方法计算的特征图中含有无关信息。这些多余的特征通常会误导神经网络训练并下降分类性能。因此,一种新的特征重要性评分方法被提出,以解决这个问题。该方法包括一种干扰降低机制,名为特征重要性评分模块,该模块可以减少无关的时间和通道特征图中的干扰。特征重要性评分模块生成的分数可以重新评估特征图中的重要性,从而降低无关信息的影响。实验结果表明,提出的方法可以显著改善Physionet EEG-MMIDB和BCI Competition IV 2a数据集中MI-EEG信号的分类性能,相比之下state-of-the-art方法的margin为9.34%和3.82%。”

RGB-D-Fusion: Image Conditioned Depth Diffusion of Humanoid Subjects

  • paper_url: http://arxiv.org/abs/2307.15988
  • repo_url: None
  • paper_authors: Sascha Kirch, Valeria Olyunina, Jan Ondřej, Rafael Pagés, Sergio Martin, Clara Pérez-Molina
  • for: 生成高分辨率深度图从低分辨率单色RGB图像中
  • methods: 使用图像conditioned杂度泛化概率模型生成低分辨率深度图,然后使用第二个杂度泛化概率模型conditioned在低分辨率RGB-D图像上upsample depth map
  • results: 提出了一种多模态conditioned杂度泛化概率模型,可以高效地生成高分辨率深度图从低分辨率单色RGB图像中
    Abstract We present RGB-D-Fusion, a multi-modal conditional denoising diffusion probabilistic model to generate high resolution depth maps from low-resolution monocular RGB images of humanoid subjects. RGB-D-Fusion first generates a low-resolution depth map using an image conditioned denoising diffusion probabilistic model and then upsamples the depth map using a second denoising diffusion probabilistic model conditioned on a low-resolution RGB-D image. We further introduce a novel augmentation technique, depth noise augmentation, to increase the robustness of our super-resolution model.
    摘要 我们介绍RGB-D-Fusion,一种多模态条件杂化推敲模型,用于从低分辨率单色RGB图像中生成高分辨率深度图。RGB-D-Fusion首先使用一种图像conditioned杂化推敲probabilistic模型生成低分辨率深度图,然后使用第二个杂化推敲probabilistic模型,conditioned on low-resolution RGB-D图像,进行upsampling。我们还介绍了一种新的扩展技术,深度噪声增强,以提高我们的超分辨率模型的可靠性。

Vehicle Price Prediction By Aggregating decision tree model With Boosting Model

  • paper_url: http://arxiv.org/abs/2307.15982
  • repo_url: None
  • paper_authors: Auwal Tijjani Amshi
  • for: 这个研究的目的是预测二手车价格,这是一个有趣且需要的问题,因为车价预测具有许多特征,需要考虑多个因素以确定准确的预测结果。
  • methods: 这个研究使用的方法包括python脚本建立数据Normalization、标准化和清洁,以避免机器学习算法中的噪音。
  • results: 该研究使用的模型是决策树模型和梯度提升预测模型,这两种模型被组合以实现更加准确的预测结果。研究发现,该模型在预测二手车价格方面表现了良好的性能。未来的车价预测研究可以使用同一数据集,并采用不同的预测技术来进行研究。
    Abstract Predicting the price of used vehicles is a more interesting and needed problem by many users. Vehicle price prediction can be a challenging task due to the high number of attributes that should be considered for accurate prediction. The major step in the prediction process is the collection and pre-processing of the data. In this project, python scripts were built to normalize, standardize, and clean data to avoid unnecessary noise for machine learning algorithms. The data set used in this project can be very valuable in conducting similar research using different prediction techniques. Many assumptions were made on the basis of the data set. The proposed system uses a Decision tree model and Gradient boosting predictive model, which are combined in other to get closed to accurate prediction, the proposed model was evaluated and it gives a promising performance. The future price prediction of used vehicles with the help of the same data set will comprise different models.
    摘要 预测二手车价格是一项更有趣且需要的问题,对多个用户来说。预测二手车价格是一项复杂的任务,因为需要考虑大量的特征来确定准确的预测。主要在预测过程中的一步是数据收集和处理。在该项目中,使用Python脚本来 норма化、标准化和清洁数据,以避免机器学习算法中的无用噪音。使用的数据集可以在进行类似研究中发挥重要作用,使用不同的预测技术进行研究。在项目中,提出了许多假设,基于数据集。提议的系统使用决策树模型和梯度拟合预测模型,这两种模型结合使用,以达到更加准确的预测。该模型在评估中表现良好,未来预测二手车价格将使用同一个数据集进行不同的模型。

Initial State Interventions for Deconfounded Imitation Learning

  • paper_url: http://arxiv.org/abs/2307.15980
  • repo_url: None
  • paper_authors: Samuel Pfrommer, Yatong Bai, Hyunin Lee, Somayeh Sojoudi
  • for: 这个论文旨在解决imitative learning中的 causal confusion问题,即学习策略会因为对于不直接影响专家行为的特征而产生低开放式监督损失,但是在投入后表现不佳。
  • methods: 这篇论文提出了一种新的掩码算法,用于在分离的 observable space 中掩码 observable 特征,以避免 causal confusion。该算法不需要专家问题、专家奖励函数或 causal 图Specification。在某些假设下,论文 theoretically 证明了这种算法是保守的,不会错过 causally 影响专家的观察。
  • results: 论文通过应用掩码算法到 CartPole 和 Reacher 两个示例控制系统中,实践证明了该算法可以有效地避免 causal confusion,并提高 open-loop 监督损失。
    Abstract Imitation learning suffers from causal confusion. This phenomenon occurs when learned policies attend to features that do not causally influence the expert actions but are instead spuriously correlated. Causally confused agents produce low open-loop supervised loss but poor closed-loop performance upon deployment. We consider the problem of masking observed confounders in a disentangled representation of the observation space. Our novel masking algorithm leverages the usual ability to intervene in the initial system state, avoiding any requirement involving expert querying, expert reward functions, or causal graph specification. Under certain assumptions, we theoretically prove that this algorithm is conservative in the sense that it does not incorrectly mask observations that causally influence the expert; furthermore, intervening on the initial state serves to strictly reduce excess conservatism. The masking algorithm is applied to behavior cloning for two illustrative control systems: CartPole and Reacher.
    摘要 模仿学习受到 causal 混乱的影响。这种现象发生在学习的策略会注意到不会导致专家行为的特征,而是与专家行为间corrrelate的。 causally 混乱的代理人会 prodduce low open-loop 监督损失,但是在投入中表现不佳。我们考虑了隐藏观察空间中的观察器的问题。我们的新的masking算法利用了 usual 能够 intervene在初始系统状态上,不需要专家查询、专家奖励函数或 causal graph specification。在某些假设下,我们理论上证明了这个算法是 conservative的,即不会错чно mask 观察到 causally 影响专家的观察; 而且, intervene 在初始状态上会 strict 减少过度保守。 masking 算法应用于 two 个 ilustrative 控制系统: CartPole 和 Reacher。

Blockchain-empowered Federated Learning for Healthcare Metaverses: User-centric Incentive Mechanism with Optimal Data Freshness

  • paper_url: http://arxiv.org/abs/2307.15975
  • repo_url: None
  • paper_authors: Jiawen Kang, Jinbo Wen, Dongdong Ye, Bingkun Lai, Tianhao Wu, Zehui Xiong, Jiangtian Nie, Dusit Niyato, Yang Zhang, Shengli Xie
  • for: 这篇论文旨在为健康Metaverse(metaverse)开发出用户中心的隐私保护框架,以提高Metaverse的安全性和数据新鲜度。
  • methods: 该论文提出了一种基于分布式学习(Federated Learning,FL)的用户中心隐私保护框架,并在此基础上提出了一种跨链接强化的FL框架,以提高感知数据的安全性。
  • results: 数字实验结果表明,提出的方案可以有效地保护Metaverse的感知数据,并且可以提高服务提供者的数据分享利益。
    Abstract Given the revolutionary role of metaverses, healthcare metaverses are emerging as a transformative force, creating intelligent healthcare systems that offer immersive and personalized services. The healthcare metaverses allow for effective decision-making and data analytics for users. However, there still exist critical challenges in building healthcare metaverses, such as the risk of sensitive data leakage and issues with sensing data security and freshness, as well as concerns around incentivizing data sharing. In this paper, we first design a user-centric privacy-preserving framework based on decentralized Federated Learning (FL) for healthcare metaverses. To further improve the privacy protection of healthcare metaverses, a cross-chain empowered FL framework is utilized to enhance sensing data security. This framework utilizes a hierarchical cross-chain architecture with a main chain and multiple subchains to perform decentralized, privacy-preserving, and secure data training in both virtual and physical spaces. Moreover, we utilize Age of Information (AoI) as an effective data-freshness metric and propose an AoI-based contract theory model under Prospect Theory (PT) to motivate sensing data sharing in a user-centric manner. This model exploits PT to better capture the subjective utility of the service provider. Finally, our numerical results demonstrate the effectiveness of the proposed schemes for healthcare metaverses.
    摘要 随着metaverse的革命性发展,健康metaverse正在成为一种转型力量,创造出智能健康系统,提供 immerse和个性化服务。健康metaverse允许用户进行有效的决策和数据分析。然而,建构健康metaverse还存在一些挑战,如敏感数据泄露的风险和感知数据安全和新鲜度的问题,以及数据分享的激励问题。在这篇论文中,我们首先设计了一个基于分布式学习(FL)的用户中心隐私保护框架,以帮助健康metaverse解决这些挑战。为了进一步提高健康metaverse的隐私保护,我们利用了跨链 empowered FL框架,以提高感知数据的安全性。这个框架利用了一个层次结构的跨链架构,包括主链和多个子链,以进行分布式、隐私保护和安全的数据训练 both in virtual and physical spaces。此外,我们利用了Age of Information(AoI)作为有效的新鲜度指标,并提出了基于Prospect Theory(PT)的合约理论模型,以激励感知数据分享。这个模型利用PT来更好地捕捉服务提供者的主观价值。最后,我们的数据示出了健康metaverse中提议的方案的效果。

Graph Condensation for Inductive Node Representation Learning

  • paper_url: http://arxiv.org/abs/2307.15967
  • repo_url: None
  • paper_authors: Xinyi Gao, Tong Chen, Yilong Zang, Wentao Zhang, Quoc Viet Hung Nguyen, Kai Zheng, Hongzhi Yin
  • for: 提高大型图的计算效率,使图神经网络(GNNs)能够更好地应用于多种应用场景。
  • methods: 使用mapping-aware graph condensation(MCond)技术,实际学习原节点与新节点之间的一对多映射,从而将新节点直接 integrate into 简化后的图中进行表示学习。
  • results: 在 inductive 推理中,MCond 可以减少计算开销和存储需求,在 Reddit 数据集上达到了121.5倍的推理速度提升和55.9倍的存储需求减少。
    Abstract Graph neural networks (GNNs) encounter significant computational challenges when handling large-scale graphs, which severely restricts their efficacy across diverse applications. To address this limitation, graph condensation has emerged as a promising technique, which constructs a small synthetic graph for efficiently training GNNs while retaining performance. However, due to the topology structure among nodes, graph condensation is limited to condensing only the observed training nodes and their corresponding structure, thus lacking the ability to effectively handle the unseen data. Consequently, the original large graph is still required in the inference stage to perform message passing to inductive nodes, resulting in substantial computational demands. To overcome this issue, we propose mapping-aware graph condensation (MCond), explicitly learning the one-to-many node mapping from original nodes to synthetic nodes to seamlessly integrate new nodes into the synthetic graph for inductive representation learning. This enables direct information propagation on the synthetic graph, which is much more efficient than on the original large graph. Specifically, MCond employs an alternating optimization scheme with innovative loss terms from transductive and inductive perspectives, facilitating the mutual promotion between graph condensation and node mapping learning. Extensive experiments demonstrate the efficacy of our approach in inductive inference. On the Reddit dataset, MCond achieves up to 121.5x inference speedup and 55.9x reduction in storage requirements compared with counterparts based on the original graph.
    摘要 图 neural network (GNN) 在处理大规模图时遇到了重要的计算挑战,这限制了其在多种应用场景中的效果。为解决这 limitation,图简化技术 emerged as a promising technique, which constructs a small synthetic graph for efficiently training GNNs while retaining performance. However, due to the topology structure among nodes, graph condensation is limited to condensing only the observed training nodes and their corresponding structure, thus lacking the ability to effectively handle unseen data. Therefore, the original large graph is still required in the inference stage to perform message passing to inductive nodes, resulting in substantial computational demands. To overcome this issue, we propose mapping-aware graph condensation (MCond), which explicitly learns the one-to-many node mapping from original nodes to synthetic nodes to seamlessly integrate new nodes into the synthetic graph for inductive representation learning. This enables direct information propagation on the synthetic graph, which is much more efficient than on the original large graph. Specifically, MCond employs an alternating optimization scheme with innovative loss terms from transductive and inductive perspectives, facilitating the mutual promotion between graph condensation and node mapping learning. Extensive experiments demonstrate the efficacy of our approach in inductive inference. On the Reddit dataset, MCond achieves up to 121.5x inference speedup and 55.9x reduction in storage requirements compared with counterparts based on the original graph.

Recommendation Unlearning via Matrix Correction

  • paper_url: http://arxiv.org/abs/2307.15960
  • repo_url: None
  • paper_authors: Jiahao Liu, Dongsheng Li, Hansu Gu, Tun Lu, Jiongran Wu, Peng Zhang, Li Shang, Ning Gu
  • for: 提供个性化服务,但大量用户数据带来隐私、安全和实用性问题。
  • methods: 使用推荐解启(unlearning)方法,允许忘记特定数据和模型,以降低敏感/恶意/毒害用户数据的风险。
  • results: 提出一种基于交互和映射矩阵 corrections(IMCorrect)方法,可以增强推荐解启的完整性、实用性和效率,而无需重新训练模型。实验结果表明,IMCorrect 在多种推荐解启场景中具有更高的完整性、实用性和效率,并且可以逐步学习新数据,进一步提高实际应用性。
    Abstract Recommender systems are important for providing personalized services to users, but the vast amount of collected user data has raised concerns about privacy (e.g., sensitive data), security (e.g., malicious data) and utility (e.g., toxic data). To address these challenges, recommendation unlearning has emerged as a promising approach, which allows specific data and models to be forgotten, mitigating the risks of sensitive/malicious/toxic user data. However, existing methods often struggle to balance completeness, utility, and efficiency, i.e., compromising one for the other, leading to suboptimal recommendation unlearning. In this paper, we propose an Interaction and Mapping Matrices Correction (IMCorrect) method for recommendation unlearning. Firstly, we reveal that many collaborative filtering (CF) algorithms can be formulated as mapping-based approach, in which the recommendation results can be obtained by multiplying the user-item interaction matrix with a mapping matrix. Then, IMCorrect can achieve efficient recommendation unlearning by correcting the interaction matrix and enhance the completeness and utility by correcting the mapping matrix, all without costly model retraining. Unlike existing methods, IMCorrect is a whitebox model that offers greater flexibility in handling various recommendation unlearning scenarios. Additionally, it has the unique capability of incrementally learning from new data, which further enhances its practicality. We conducted comprehensive experiments to validate the effectiveness of IMCorrect and the results demonstrate that IMCorrect is superior in completeness, utility, and efficiency, and is applicable in many recommendation unlearning scenarios.
    摘要 我们发现许多集成过滤(CF)算法可以表示为映射基本的方法,在这种方法中,推荐结果可以通过用户-项目互动矩阵与映射矩阵的乘法来获得。然后,IMCorrect可以通过修正互动矩阵和映射矩阵来实现高效的推荐忘记,同时提高完整性和有用性,而不需要费时重新训练模型。与现有方法不同,IMCorrect是一种白盒模型,可以更好地处理各种推荐忘记场景。此外,它还具有逐步学习新数据的能力,这使其在实际应用中更加具有实用性。我们对IMCorrect的效果进行了广泛的实验验证,结果表明,IMCorrect在完整性、有用性和效率方面均较为出色,并且可以在许多推荐忘记场景中应用。

Towards the Visualization of Aggregated Class Activation Maps to Analyse the Global Contribution of Class Features

  • paper_url: http://arxiv.org/abs/2308.00710
  • repo_url: None
  • paper_authors: Igor Cherepanov, David Sessler, Alex Ulmer, Hendrik Lücke-Tieke, Jörn Kohlhammer
  • for: 这 paper 的目的是解释深度学习模型在分类任务中的决策过程。
  • methods: 该 paper 使用了 Class Activation Maps (CAMs) 方法,该方法可以视觉化每个数据样本中对分类决策的重要性。
  • results: 该 paper 通过对多个样本的 CAMs 的聚合,提供了一个全局的解释视觉化,可以帮助分析员了解深度学习模型的决策过程。
    Abstract Deep learning (DL) models achieve remarkable performance in classification tasks. However, models with high complexity can not be used in many risk-sensitive applications unless a comprehensible explanation is presented. Explainable artificial intelligence (xAI) focuses on the research to explain the decision-making of AI systems like DL. We extend a recent method of Class Activation Maps (CAMs) which visualizes the importance of each feature of a data sample contributing to the classification. In this paper, we aggregate CAMs from multiple samples to show a global explanation of the classification for semantically structured data. The aggregation allows the analyst to make sophisticated assumptions and analyze them with further drill-down visualizations. Our visual representation for the global CAM illustrates the impact of each feature with a square glyph containing two indicators. The color of the square indicates the classification impact of this feature. The size of the filled square describes the variability of the impact between single samples. For interesting features that require further analysis, a detailed view is necessary that provides the distribution of these values. We propose an interactive histogram to filter samples and refine the CAM to show relevant samples only. Our approach allows an analyst to detect important features of high-dimensional data and derive adjustments to the AI model based on our global explanation visualization.
    摘要

The effect of network topologies on fully decentralized learning: a preliminary investigation

  • paper_url: http://arxiv.org/abs/2307.15947
  • repo_url: None
  • paper_authors: Luigi Palmieri, Lorenzo Valerio, Chiara Boldrini, Andrea Passarella
  • for: 这个论文研究了一种分布式机器学习系统中 nodes 之间的网络拓扑对模型的性能影响。
  • methods: 作者使用了 direct collaboration between nodes 方法,并调查了不同类型的网络拓扑对 “知识传播” 的影响。
  • results: 研究发现,即使网络组件之间存在只有弱连接,也可以传输信息;但是,这并不意味着知识可以快速传播。 另外,研究发现,核心节点(hubs)在传播知识方面扮演着更重要的角色,而叶节点(leaves)的影响相对较小。 最后,研究发现,紧密结合的社区会干扰知识传播。
    Abstract In a decentralized machine learning system, data is typically partitioned among multiple devices or nodes, each of which trains a local model using its own data. These local models are then shared and combined to create a global model that can make accurate predictions on new data. In this paper, we start exploring the role of the network topology connecting nodes on the performance of a Machine Learning model trained through direct collaboration between nodes. We investigate how different types of topologies impact the "spreading of knowledge", i.e., the ability of nodes to incorporate in their local model the knowledge derived by learning patterns in data available in other nodes across the networks. Specifically, we highlight the different roles in this process of more or less connected nodes (hubs and leaves), as well as that of macroscopic network properties (primarily, degree distribution and modularity). Among others, we show that, while it is known that even weak connectivity among network components is sufficient for information spread, it may not be sufficient for knowledge spread. More intuitively, we also find that hubs have a more significant role than leaves in spreading knowledge, although this manifests itself not only for heavy-tailed distributions but also when "hubs" have only moderately more connections than leaves. Finally, we show that tightly knit communities severely hinder knowledge spread.
    摘要 Specifically, we examine the roles of more or less connected nodes (hubs and leaves) and macroscopic network properties (such as degree distribution and modularity) in this process. We find that while even weak connectivity among network components is sufficient for information spread, it may not be sufficient for knowledge spread. Additionally, we show that hubs play a more significant role than leaves in spreading knowledge, and that tightly knit communities hinder knowledge spread.我们在这篇论文中开始探讨了一个分布式机器学习系统中数据的分区和节点之间的直接协作对机器学习模型的性能的影响。我们研究了不同类型的网络拓扑如何影响“知识散布”,即每个节点通过学习数据中的学习模式来 incorporate 其他节点中的知识。我们发现,尽管even weak connectivity among network components is sufficient for information spread,但可能不够 для知识散布。此外,我们发现主要的节点(hubs)比叶子节点(leaves)更有助于散布知识,并且这种效应不仅限于重 tailed distribution,而且在“hubs”只有moderately more connections than leaves时也manifests itself。最后,我们发现,紧密结合的社区会严重阻碍知识散布。

PIMbot: Policy and Incentive Manipulation for Multi-Robot Reinforcement Learning in Social Dilemmas

  • paper_url: http://arxiv.org/abs/2307.15944
  • repo_url: None
  • paper_authors: Shahab Nikkhoo, Zexin Li, Aritra Samanta, Yufei Li, Cong Liu
  • for: 这 paper 的目的是探讨如何通过 manipulate 多机器人之间的交流,以达到更好的协作效果。
  • methods: 这 paper 使用了一种新的 manipulate 方法,即 PIMbot,可以在多机器人协作中 manipulate 奖励函数,从而影响 outcome。
  • results: 实验结果表明,PIMbot 可以有效地 manipulate 多机器人协作环境,并且可以影响任务结果的负面和正面效果。
    Abstract Recent research has demonstrated the potential of reinforcement learning (RL) in enabling effective multi-robot collaboration, particularly in social dilemmas where robots face a trade-off between self-interests and collective benefits. However, environmental factors such as miscommunication and adversarial robots can impact cooperation, making it crucial to explore how multi-robot communication can be manipulated to achieve different outcomes. This paper presents a novel approach, namely PIMbot, to manipulating the reward function in multi-robot collaboration through two distinct forms of manipulation: policy and incentive manipulation. Our work introduces a new angle for manipulation in recent multi-agent RL social dilemmas that utilize a unique reward function for incentivization. By utilizing our proposed PIMbot mechanisms, a robot is able to manipulate the social dilemma environment effectively. PIMbot has the potential for both positive and negative impacts on the task outcome, where positive impacts lead to faster convergence to the global optimum and maximized rewards for any chosen robot. Conversely, negative impacts can have a detrimental effect on the overall task performance. We present comprehensive experimental results that demonstrate the effectiveness of our proposed methods in the Gazebo-simulated multi-robot environment. Our work provides insights into how inter-robot communication can be manipulated and has implications for various robotic applications. %, including robotics, transportation, and manufacturing.
    摘要 近期研究表明了强化学习(RL)在多机器人协作中的潜力,特别是在社会冲突中机器人面临自身利益和集体利益之间的负担。然而,环境因素如沟通错误和反对机器人可能会影响合作,使得探索如何 manipulate multi-robot communication 以实现不同的结果变得非常重要。这篇论文提出了一种新的方法,即 PIMbot,用于 manipulate 多机器人协作中的奖励函数。我们的工作描述了两种不同的欺诈方式:策略欺诈和激励欺诈。我们的 PIMbot 机制可以在多机器人社会冲突环境中有效地操纵环境。PIMbot 有可能导致任务结果的正面和负面影响,其中正面影响可以使任务结果更快地 converges 到全局最优解和最大化任务奖励。然而,负面影响可能会对整体任务性能产生负面影响。我们在 Gazebo simulate 多机器人环境中进行了广泛的实验研究,并提供了具有深入意义的结论。我们的工作为 robotics、运输和制造等领域提供了新的思路和方法,并且有助于我们更好地理解如何 manipulate inter-robot communication。

Continual Learning in Predictive Autoscaling

  • paper_url: http://arxiv.org/abs/2307.15941
  • repo_url: https://github.com/anonymousaccountx/DMSHM
  • paper_authors: Hongyan Hao, Zhixuan Chu, Shiyi Zhu, Gangwei Jiang, Yan Wang, Caigao Jiang, James Zhang, Wei Jiang, Siqiao Xue, Jun Zhou
  • for: 预测云服务器负载和预备资源以保证服务水平目标 (SLOs) 在动态云环境中。
  • methods: 提出了一种基于重播学习的 kontinual learning 方法,即密度基于内存选择和提示基于网络学习模型 (DMSHM),只使用历史记录的一小部分来实现准确预测。
  • results: 在公共和工业数据集上进行了实验,证明了我们提出的方法在内存容量和预测精度两个方面比现状态静学习方法更高效,并在实际工业应用中表现出了remarkable的实用性。
    Abstract Predictive Autoscaling is used to forecast the workloads of servers and prepare the resources in advance to ensure service level objectives (SLOs) in dynamic cloud environments. However, in practice, its prediction task often suffers from performance degradation under abnormal traffics caused by external events (such as sales promotional activities and applications re-configurations), for which a common solution is to re-train the model with data of a long historical period, but at the expense of high computational and storage costs. To better address this problem, we propose a replay-based continual learning method, i.e., Density-based Memory Selection and Hint-based Network Learning Model (DMSHM), using only a small part of the historical log to achieve accurate predictions. First, we discover the phenomenon of sample overlap when applying replay-based continual learning in prediction tasks. In order to surmount this challenge and effectively integrate new sample distribution, we propose a density-based sample selection strategy that utilizes kernel density estimation to calculate sample density as a reference to compute sample weight, and employs weight sampling to construct a new memory set. Then we implement hint-based network learning based on hint representation to optimize the parameters. Finally, we conduct experiments on public and industrial datasets to demonstrate that our proposed method outperforms state-of-the-art continual learning methods in terms of memory capacity and prediction accuracy. Furthermore, we demonstrate remarkable practicability of DMSHM in real industrial applications.
    摘要 predictive autoscaling 是用于预测服务器劳动负荷和预先准备资源,以确保云环境中的服务水平目标 (SLO)。然而,在实践中,其预测任务经常受到外部事件(如销售推广活动和应用程序重新配置)引起的异常流量的影响,导致性能下降。为了更好地解决这个问题,我们提议一种基于重温学习的再启用学习方法,即密度基于内存选择和提示基于网络学习模型 (DMSHM),只需使用历史记录中的一小部分来实现准确的预测。首先,我们发现在应用重温学习的预测任务中存在样本重叠现象。为了超越这个挑战并有效地 интегра新样本分布,我们提议一种密度基于样本选择策略,利用核函数密度估计计算样本密度,并employs weight sampling将其构成为新的内存集。然后,我们实现提示基于提示表示来优化参数。最后,我们在公共和工业数据集上进行了实验,并证明了我们提议的方法在内存容量和预测精度两个方面比现状前方法更高。此外,我们还证明了DMSHM在实际工业应用中具有很好的实用性。

A Theory for Emergence of Complex Skills in Language Models

  • paper_url: http://arxiv.org/abs/2307.15936
  • repo_url: https://github.com/dia2018/What-is-the-Difference-Between-AI-and-Machine-Learning
  • paper_authors: Sanjeev Arora, Anirudh Goyal
  • for: 本研究旨在解释语言模型新技能的出现是因为参数集和训练 Corpora 的扩大。
  • methods: 本研究使用了知名的 Scaling Laws of LLMs 和简单的统计分析框架来分析emergence。
  • results: 研究发现,通过增加参数集和训练 Corpora,语言模型会出现强大的 inductive bias,使其可以快速学习。此外,研究还发现,这种 inductive bias 可以使语言模型在执行包含多个技能的任务时表现出高水平的能力。
    Abstract A major driver of AI products today is the fact that new skills emerge in language models when their parameter set and training corpora are scaled up. This phenomenon is poorly understood, and a mechanistic explanation via mathematical analysis of gradient-based training seems difficult. The current paper takes a different approach, analysing emergence using the famous (and empirical) Scaling Laws of LLMs and a simple statistical framework. Contributions include: (a) A statistical framework that relates cross-entropy loss of LLMs to competence on the basic skills that underlie language tasks. (b) Mathematical analysis showing that the Scaling Laws imply a strong form of inductive bias that allows the pre-trained model to learn very efficiently. We informally call this {\em slingshot generalization} since naively viewed it appears to give competence levels at skills that violate usual generalization theory. (c) A key example of slingshot generalization, that competence at executing tasks involving $k$-tuples of skills emerges essentially at the same scaling and same rate as competence on the elementary skills themselves.
    摘要 现代AI产品的主要驱动力之一是语言模型新的技能出现,当它们的参数集和训练 Corpora 的大小增加时。这种现象尚未得到充分理解,而且通过梯度基本训练的数学分析难以提供机制性解释。本文采取了一种不同的方法,通过著名的扩展法律和简单的统计框架来分析emergence。本文的贡献包括:(a) 一种统计框架,将语言任务下的基本技能的混合损失与语言模型的泛化能力相关联。(b) 数学分析表明,扩展法律Imply一种强大的推导偏见,使预训练模型在学习过程中非常高效。我们称之为“飞弹泛化”,因为在常规泛化理论下看来,模型能够学习出违背常规的技能水平。(c) 一个重要的例子, Competence at executing tasks involving $k$-tuples of skills emerges essentially at the same scaling and same rate as competence on the elementary skills themselves。Translation notes:* "Scaling Laws" is translated as "扩展法律" (fāng zhì fǎ) in Chinese, which is a literal translation of the English phrase.* "LLMs" is translated as "语言模型" (yǔ yán mó del) in Chinese, which is a common abbreviation for "language models" in the field of natural language processing.* "competence" is translated as "泛化能力" (fān huà néng lì) in Chinese, which refers to the ability of a language model to perform a task or a set of tasks.* "elementary skills" is translated as "基本技能" (jī běn jì néng) in Chinese, which refers to the basic skills or abilities that underlie language tasks.* "slingshot generalization" is translated as "飞弹泛化" (fēi dàn fān huà) in Chinese, which is a literal translation of the English phrase.

A Noisy-Label-Learning Formulation for Immune Repertoire Classification and Disease-Associated Immune Receptor Sequence Identification

  • paper_url: http://arxiv.org/abs/2307.15934
  • repo_url: https://github.com/tencentailabhealthcare/nll-irc
  • paper_authors: Mingcai Chen, Yu Zhao, Zhonghuang Wang, Bing He, Jianhua Yao
  • for: 革命性贡献对新药和免疫疗法的研究,即 computeational biology 领域的前沿研究。
  • methods: 提出了一种噪声标签学习方法,解决了传统的多例空间 MIL 问题,即直接将袋级标签分配给实例。
  • results: 实现了高精度的序列级分类和抗体组织级分类,并在 CMV 和癌症数据集上进行了实验,得到了显著的性能提升。
    Abstract Immune repertoire classification, a typical multiple instance learning (MIL) problem, is a frontier research topic in computational biology that makes transformative contributions to new vaccines and immune therapies. However, the traditional instance-space MIL, directly assigning bag-level labels to instances, suffers from the massive amount of noisy labels and extremely low witness rate. In this work, we propose a noisy-label-learning formulation to solve the immune repertoire classification task. To remedy the inaccurate supervision of repertoire-level labels for a sequence-level classifier, we design a robust training strategy: The initial labels are smoothed to be asymmetric and are progressively corrected using the model's predictions throughout the training process. Furthermore, two models with the same architecture but different parameter initialization are co-trained simultaneously to remedy the known "confirmation bias" problem in the self-training-like schema. As a result, we obtain accurate sequence-level classification and, subsequently, repertoire-level classification. Experiments on the Cytomegalovirus (CMV) and Cancer datasets demonstrate our method's effectiveness and superior performance on sequence-level and repertoire-level tasks.
    摘要 免疫质量分类是生物计算中一个前沿研究领域,它对新肇病变和免疫治疗做出了革命性的贡献。然而,传统的实例空间MIL(多个实例学习)问题,直接将袋级标签 assigning 到实例,受到巨大量的噪音标签和极低的证人率问题。在这个工作中,我们提出了噪音标签学习形式来解决免疫质量分类任务。为了缓解实例级标签的不精确指导,我们设计了一个预处理策略:初始标签被填充了偏好的差异,并在训练过程中逐渐更正使用模型预测。此外,我们同时培训两个同样的架构,但不同的参数初始化,以缓解自然语言训练中知道的“证人偏见”问题。最终,我们获得了精确的序列级分类和免疫质量分类。实验结果显示,我们的方法在CMV和癌症数据集上具有高效性和优良性,并在序列级和免疫质量任务上获得了佳绩。

Language models as master equation solvers

  • paper_url: http://arxiv.org/abs/2308.02514
  • repo_url: https://github.com/Aryia-Behroziuan/References
  • paper_authors: Chuanbo Liu, Jin Wang
  • for: 解决复杂随机动力系统的精确解方法
  • methods: 使用语言模型学习方法,通过提示基金网络将率参数、初始条件和时间值映射到状态联合概率分布中
  • results: 对多模块和高维系统进行了示例应用,观察了高精度和抽象能力,并证明了使用单个预训练大型模型可以解决任何精确解方程。
    Abstract Master equations are of fundamental importance in modeling stochastic dynamical systems.However, solving master equations is challenging due to the exponential increase in the number of possible states or trajectories with the dimension of the state space. In this study, we propose repurposing language models as a machine learning approach to solve master equations. We design a prompt-based neural network to map rate parameters, initial conditions, and time values directly to the state joint probability distribution that exactly matches the input contexts. In this way, we approximate the solution of the master equation in its most general form. We train the network using the policy gradient algorithm within the reinforcement learning framework, with feedback rewards provided by a set of variational autoregressive models. By applying this approach to representative examples, we observe high accuracy for both multi-module and high-dimensional systems. The trained network also exhibits extrapolating ability, extending its predictability to unseen data. Our findings establish the connection between language models and master equations, highlighting the possibility of using a single pretrained large model to solve any master equation.
    摘要 We designed a prompt-based neural network that maps rate parameters, initial conditions, and time values to the joint probability distribution of the state, which exactly matches the input context. This approach can approximate the solution of the master equation in its most general form.We trained the network using the policy gradient algorithm within the reinforcement learning framework, with feedback rewards provided by a set of variational autoregressive models. We applied this approach to representative examples and found that the trained network had high accuracy for both multi-module and high-dimensional systems. Additionally, the trained network was able to extrapolate to unseen data, demonstrating its predictive power.Our findings establish a connection between language models and master equations, showing that a single pretrained large model can be used to solve any master equation. This approach has the potential to revolutionize the field of stochastic dynamical systems modeling.

Dynamic deep-reinforcement-learning algorithm in Partially Observed Markov Decision Processes

  • paper_url: http://arxiv.org/abs/2307.15931
  • repo_url: None
  • paper_authors: Saki Omi, Hyo-Sang Shin, Namhoon Cho, Antonios Tsourdos
  • for: 解决Partially Observable Markov Decision Process (POMDP)中agent的性能难以保持问题。
  • methods: 提出了一些结构和方法来扩展latest deep reinforcement learning algorithms with LSTM networks,以提高控制性能对不同类型的外部干扰的Robustness。
  • results: 研究表明,包含动作序列的情况可以解决POMDP中agent的性能问题,并且提出了一些结构和方法来扩展latest deep reinforcement learning algorithms with LSTM networks,以提高控制性能对不同类型的外部干扰的Robustness。
    Abstract Reinforcement learning has been greatly improved in recent studies and an increased interest in real-world implementation has emerged in recent years. In many cases, due to the non-static disturbances, it becomes challenging for the agent to keep the performance. The disturbance results in the environment called Partially Observable Markov Decision Process. In common practice, Partially Observable Markov Decision Process is handled by introducing an additional estimator, or Recurrent Neural Network is utilized in the context of reinforcement learning. Both of the cases require to process sequential information on the trajectory. However, there are only a few studies investigating the effect of information to consider and the network structure to handle them. This study shows the benefit of action sequence inclusion in order to solve Partially Observable Markov Decision Process. Several structures and approaches are proposed to extend one of the latest deep reinforcement learning algorithms with LSTM networks. The developed algorithms showed enhanced robustness of controller performance against different types of external disturbances that are added to observation.
    摘要 现在的研究中,人工智能学习(Reinforcement Learning)已经得到了 significativo 改进,而且在实际应用中的兴趣也在不断增长。然而,由于环境中的非静态干扰,agent often 难以维持性能。这种干扰会导致环境中的Partially Observable Markov Decision Process(POMDP)。在常见的做法中,POMDP 通常通过添加额外估计器或 Recurrent Neural Network(RNN)来处理。两者都需要处理序列信息的扩展。然而,只有一些研究探讨了考虑信息的影响和网络结构的处理方式。本研究表明,包含动作序列的包含可以解决POMDP。本研究提出了一些结构和方法,以扩展最新的深度学习控制算法,包括Long Short-Term Memory(LSTM)网络。研究发现,通过包含动作序列,可以提高控制性能的鲁棒性对不同类型的外部干扰。

Opportunistic Air Quality Monitoring and Forecasting with Expandable Graph Neural Networks

  • paper_url: http://arxiv.org/abs/2307.15916
  • repo_url: None
  • paper_authors: Jingwei Zuo, Wenbin Li, Michele Baldo, Hakim Hacid
  • for: 这篇论文主要目的是提出一种可扩展的图注意网络模型(EGAT),用于融合不同空间结构的数据收集,以提高空气质量预测的精度。
  • methods: 该模型使用了图注意网络技术,可以将现有和新增的基础设施数据融合,以满足不同个人化enario的需求。此外,该模型还可以与现有的空气质量预测模型结合使用,以适应变化的空间结构。
  • results: 对实际空气质量数据进行验证,EGAT模型可以提高空气质量预测的精度,并且可以适应不同的空间结构变化。
    Abstract Air Quality Monitoring and Forecasting has been a popular research topic in recent years. Recently, data-driven approaches for air quality forecasting have garnered significant attention, owing to the availability of well-established data collection facilities in urban areas. Fixed infrastructures, typically deployed by national institutes or tech giants, often fall short in meeting the requirements of diverse personalized scenarios, e.g., forecasting in areas without any existing infrastructure. Consequently, smaller institutes or companies with limited budgets are compelled to seek tailored solutions by introducing more flexible infrastructures for data collection. In this paper, we propose an expandable graph attention network (EGAT) model, which digests data collected from existing and newly-added infrastructures, with different spatial structures. Additionally, our proposal can be embedded into any air quality forecasting models, to apply to the scenarios with evolving spatial structures. The proposal is validated over real air quality data from PurpleAir.
    摘要 《空气质量监测和预测研究在最近几年内得到了广泛关注。最近,基于数据驱动的空气质量预测方法受到了广泛关注,因为城市地区的数据收集设施已经成熔化了。固定基础设施,通常由国家机构或科技巨头部署,经常无法满足多样化个性化场景的需求,例如预测没有任何基础设施的地区。因此,更小的机构或公司具有有限预算,需要寻找适合自己的解决方案,例如引入更灵活的数据收集基础设施。在本文中,我们提出了可扩展图注意网络(EGAT)模型,该模型可以处理现有和新增的基础设施,具有不同的空间结构。此外,我们的提议可以与任何空气质量预测模型结合使用,以适应发展中的空间结构。实验 validate 在实际空气质量数据上进行。》Note: Please note that the translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need the translation in Traditional Chinese, please let me know.

An Automata-Theoretic Approach to Synthesizing Binarized Neural Networks

  • paper_url: http://arxiv.org/abs/2307.15907
  • repo_url: None
  • paper_authors: Ye Tao, Wanwei Liu, Fu Song, Zhen Liang, Ji Wang, Hongxu Zhu
  • for: 这个论文的目的是提出一个自动化方法来Synthesize Binarized Neural Networks (BNNs),以满足特定的特性。
  • methods: 这个方法使用了形式语言BLTL来定义特性,并使用自动机来实现。在Synthesis过程中,使用SMT解析器来检查网络是否存在。
  • results: 这个方法可以对BNNs进行自动化Synthesis,并提高个人公平和本地类别价值的表现,同时保持准确性。
    Abstract Deep neural networks, (DNNs, a.k.a. NNs), have been widely used in various tasks and have been proven to be successful. However, the accompanied expensive computing and storage costs make the deployments in resource-constrained devices a significant concern. To solve this issue, quantization has emerged as an effective way to reduce the costs of DNNs with little accuracy degradation by quantizing floating-point numbers to low-width fixed-point representations. Quantized neural networks (QNNs) have been developed, with binarized neural networks (BNNs) restricted to binary values as a special case. Another concern about neural networks is their vulnerability and lack of interpretability. Despite the active research on trustworthy of DNNs, few approaches have been proposed to QNNs. To this end, this paper presents an automata-theoretic approach to synthesizing BNNs that meet designated properties. More specifically, we define a temporal logic, called BLTL, as the specification language. We show that each BLTL formula can be transformed into an automaton on finite words. To deal with the state-explosion problem, we provide a tableau-based approach in real implementation. For the synthesis procedure, we utilize SMT solvers to detect the existence of a model (i.e., a BNN) in the construction process. Notably, synthesis provides a way to determine the hyper-parameters of the network before training.Moreover, we experimentally evaluate our approach and demonstrate its effectiveness in improving the individual fairness and local robustness of BNNs while maintaining accuracy to a great extent.
    摘要 深度神经网络(DNNs,即NNs)在各种任务中广泛应用,并证明了其成功。然而,随之而来的计算和存储成本使得在有限资源设备中部署DNNs成为了一项重要问题。为解决这个问题,量化得到了广泛应用的方法,通过将浮点数转换为低宽定点表示来减少DNNs的成本,同时减少精度下降。量化神经网络(QNNs)已经开发出来,其中二进制神经网络(BNNs)的特殊情况是强制限制为二进制值。另一个神经网络的问题是它的不安全性和解释性的缺失。尽管有大量的研究在深度神经网络的可靠性方面,但对QNNs的研究尚未充分。因此,这篇论文提出了一种自动机理论方法来生成满足特定性质的BNNs。更具体地说,我们定义了一种时间逻辑(BLTL)作为规定语言。我们证明了每个BLTL公式都可以转换为一个在有限字符串上的自动机。为了解决状态爆发问题,我们提供了一种表格基本方法。在Synthesis过程中,我们利用SMT解决器来检测存在一个模型(即BNN)。另外,Synthesis还提供了一种方法来确定网络的各种参数(例如,层数、权重等)在建立过程中。同时,我们还进行了实验评估,并证明了我们的方法可以大幅提高BNNs的个人公平和本地Robustness,保持精度的程度。

Multi-view Sparse Laplacian Eigenmaps for nonlinear Spectral Feature Selection

  • paper_url: http://arxiv.org/abs/2307.15905
  • repo_url: None
  • paper_authors: Gaurav Srivastava, Mahesh Jangid
  • for: Addressing the challenges of high-dimensional datasets in machine learning, such as overfitting and computational complexity, by identifying an informative subset of features.
  • methods: Multi-view Sparse Laplacian Eigenmaps (MSLE) for feature selection, combining multiple views of the data, enforcing sparsity constraints, and using a scalable optimization algorithm to identify a reduced feature set.
  • results: Reduced the feature space by 10 to 90% while maintaining an error rate of 2.72% with Support Vector Machine (SVM), and achieved an accuracy of 96.69% with an 80% reduction in the overall feature space.
    Abstract The complexity of high-dimensional datasets presents significant challenges for machine learning models, including overfitting, computational complexity, and difficulties in interpreting results. To address these challenges, it is essential to identify an informative subset of features that captures the essential structure of the data. In this study, the authors propose Multi-view Sparse Laplacian Eigenmaps (MSLE) for feature selection, which effectively combines multiple views of the data, enforces sparsity constraints, and employs a scalable optimization algorithm to identify a subset of features that capture the fundamental data structure. MSLE is a graph-based approach that leverages multiple views of the data to construct a more robust and informative representation of high-dimensional data. The method applies sparse eigendecomposition to reduce the dimensionality of the data, yielding a reduced feature set. The optimization problem is solved using an iterative algorithm alternating between updating the sparse coefficients and the Laplacian graph matrix. The sparse coefficients are updated using a soft-thresholding operator, while the graph Laplacian matrix is updated using the normalized graph Laplacian. To evaluate the performance of the MSLE technique, the authors conducted experiments on the UCI-HAR dataset, which comprises 561 features, and reduced the feature space by 10 to 90%. Our results demonstrate that even after reducing the feature space by 90%, the Support Vector Machine (SVM) maintains an error rate of 2.72%. Moreover, the authors observe that the SVM exhibits an accuracy of 96.69% with an 80% reduction in the overall feature space.
    摘要 高维数据集的复杂性对机器学习模型提出了 significante挑战,包括过拟合、计算复杂性以及解释结果的困难。为了解决这些挑战,必须 identificainformative的特征子集,以捕捉数据的基本结构。本研究提出了多视图稀疏勋略 Laplacian Eigenmaps(MSLE)来选择特征,它可以有效地结合多个视图的数据,实施稀疏约束,并使用可扩展的优化算法来确定一个捕捉数据基本结构的特征子集。MSLE是基于图的方法,利用多个视图的数据构建更加robust和有用的数据表示。方法使用稀疏勋略减少数据维度,生成减少特征集。优化问题使用迭代更新稀疏系数和Laplacian图matrix的iterative算法解决。稀疏系数使用软阈值运算符更新,而Laplacian图matrix使用正则化Laplacian图。为评估MSLE技术的性能,作者在UCIDataset上进行了实验,UCIDataset包括561个特征,并将特征空间减少到10%-90%。我们的结果显示,即使特征空间减少到90%,Support Vector Machine(SVM)的错误率仅为2.72%。此外,作者发现,将特征空间减少到80%后,SVM的准确率达96.69%。

Online Matching: A Real-time Bandit System for Large-scale Recommendations

  • paper_url: http://arxiv.org/abs/2307.15893
  • repo_url: None
  • paper_authors: Xinyang Yi, Shao-Chuan Wang, Ruining He, Hariharan Chandrasekaran, Charles Wu, Lukasz Heldt, Lichan Hong, Minmin Chen, Ed H. Chi
  • for: 提高大规模推荐系统中的新内容发现和用户兴趣探索能力
  • methods: 采用协同online+offline学习方法,提出Diag-LinUCB算法来实现分布式更新带ит参数
  • results: 通过实验示例,在YouTube平台上实现了在线学习系统的可扩展性和实时性,提高了新内容发现和用户兴趣探索的能力
    Abstract The last decade has witnessed many successes of deep learning-based models for industry-scale recommender systems. These models are typically trained offline in a batch manner. While being effective in capturing users' past interactions with recommendation platforms, batch learning suffers from long model-update latency and is vulnerable to system biases, making it hard to adapt to distribution shift and explore new items or user interests. Although online learning-based approaches (e.g., multi-armed bandits) have demonstrated promising theoretical results in tackling these challenges, their practical real-time implementation in large-scale recommender systems remains limited. First, the scalability of online approaches in servicing a massive online traffic while ensuring timely updates of bandit parameters poses a significant challenge. Additionally, exploring uncertainty in recommender systems can easily result in unfavorable user experience, highlighting the need for devising intricate strategies that effectively balance the trade-off between exploitation and exploration. In this paper, we introduce Online Matching: a scalable closed-loop bandit system learning from users' direct feedback on items in real time. We present a hybrid "offline + online" approach for constructing this system, accompanied by a comprehensive exposition of the end-to-end system architecture. We propose Diag-LinUCB -- a novel extension of the LinUCB algorithm -- to enable distributed updates of bandits parameter in a scalable and timely manner. We conduct live experiments in YouTube and show that Online Matching is able to enhance the capabilities of fresh content discovery and item exploration in the present platform.
    摘要 过去一个 décennial 内,深度学习基于模型在行业级推荐系统中取得了许多成功。这些模型通常在批量方式进行训练。虽然能够充分捕捉用户在推荐平台上的过去交互,但批量学习受到系统偏见的影响,难以适应分布Shift和探索新的用户 интере点。虽然在线学习基于方法(例如多臂投手)有许多理性的研究成果,但在大规模推荐系统中实际实施仍然受限。首先,在面临巨大在线流量的情况下,在线学习方法的可扩展性和实时更新投手参数的挑战是不可或缺的。其次,探索推荐系统中的不确定性容易导致用户体验不佳,高亮了需要制定复杂的策略,以有效地平衡利用和探索之间的负荷。在本文中,我们介绍了在线匹配:一种可扩展的关闭Loop投手系统,通过用户直接反馈ITEMS的实时反馈来学习。我们提出了一种“离线+在线”的方法,并提供了整体系统架构的详细介绍。我们还提出了Diag-LinUCB算法,用于在分布式环境中有效地更新投手参数。我们在 YouTube 上进行了实验,并证明了在线匹配可以增强现有的新内容发现和项目探索。

A new Gradient TD Algorithm with only One Step-size: Convergence Rate Analysis using $L$-$λ$ Smoothness

  • paper_url: http://arxiv.org/abs/2307.15892
  • repo_url: None
  • paper_authors: Hengshuai Yao
  • for: 本文研究了一种新的凸优化 gradient temporal difference(GTD)算法,用于解决强化学习中的偏离策略学习问题。
  • methods: 本文使用了 GTD 算法,并提出了一种新的单时间尺度 GTD 算法,具有只有一个步长参数。此外,本文还使用了 $L$-$\lambda$ 稳定性来证明新算法的 convergency 速率。
  • results: 本文通过实验表明,新提出的 Impression GTD 算法在 Random walks、Boyan chain 和 Baird counterexample 等问题上具有更高的 convergency 速率,并且可以在各种不同的步长参数下实现好的性能。
    Abstract Gradient Temporal Difference (GTD) algorithms (Sutton et al., 2008, 2009) are the first $O(d)$ ($d$ is the number features) algorithms that have convergence guarantees for off-policy learning with linear function approximation. Liu et al. (2015) and Dalal et. al. (2018) proved the convergence rates of GTD, GTD2 and TDC are $O(t^{-\alpha/2})$ for some $\alpha \in (0,1)$. This bound is tight (Dalal et al., 2020), and slower than $O(1/\sqrt{t})$. GTD algorithms also have two step-size parameters, which are difficult to tune. In literature, there is a "single-time-scale" formulation of GTD. However, this formulation still has two step-size parameters. This paper presents a truly single-time-scale GTD algorithm for minimizing the Norm of Expected td Update (NEU) objective, and it has only one step-size parameter. We prove that the new algorithm, called Impression GTD, converges at least as fast as $O(1/t)$. Furthermore, based on a generalization of the expected smoothness (Gower et al. 2019), called $L$-$\lambda$ smoothness, we are able to prove that the new GTD converges even faster, in fact, with a linear rate. Our rate actually also improves Gower et al.'s result with a tighter bound under a weaker assumption. Besides Impression GTD, we also prove the rates of three other GTD algorithms, one by Yao and Liu (2008), another called A-transpose-TD (Sutton et al., 2008), and a counterpart of A-transpose-TD. The convergence rates of all the four GTD algorithms are proved in a single generic GTD framework to which $L$-$\lambda$ smoothness applies. Empirical results on Random walks, Boyan chain, and Baird counterexample show that Impression GTD converges much faster than existing GTD algorithms for both on-policy and off-policy learning problems, with well-performing step-sizes in a big range.
    摘要 Gradient Temporal Difference(GTD)算法(Sutton et al., 2008, 2009)是第一个 $O(d)$ ($d$ 是特征数) 拥有确定性 guarantees 的偏离策略学习算法。Liu et al.(2015)和Dalal et al.(2018)证明 GTD、GTD2 和 TDC 的收敛率为 $O(t^{-\alpha/2})$,其中 $\alpha \in (0,1)$。这个 bound 是紧张的(Dalal et al., 2020),并且 slower than $O(1/\sqrt{t})$。GTD 算法还有两个步长参数,这些参数难以调整。在文献中,有一种“单时间尺度”的 GTD 形式化,但这种形式化仍有两个步长参数。这篇文章提出了一个真正的单时间尺度 GTD 算法,用于最小化 Norm of Expected td Update(NEU)目标函数,并且具有只有一个步长参数。我们证明该新算法,称为 Impression GTD,在至少 $O(1/t)$ 的速度下收敛。此外,基于预期平滑性(Gower et al., 2019)的一种扩展,称为 $L$-$\lambda$ 平滑性,我们能够证明 Impression GTD 在实际情况下收敛更快,具体来说是线性收敛率。我们的速率实际上还超过 Gower et al. 的结果,并且在较弱的假设下提供了更紧张的 bound。除了 Impression GTD 之外,我们还证明了三种 GTD 算法的收敛率,分别是 Yao 和 Liu(2008)的一种 GTD 算法、Sutton et al.(2008)的 A-transpose-TD 算法,以及它的对应者。所有四种 GTD 算法的收敛率在一个通用的 GTD 框架中证明,该框架下 $L$-$\lambda$ 平滑性适用。实际实验表明,Impression GTD 在Random walks、Boyan chain 和 Baird counterexample 等问题上收敛 much faster than 现有 GTD 算法,并且步长在大范围内表现良好。

First-order Policy Optimization for Robust Policy Evaluation

  • paper_url: http://arxiv.org/abs/2307.15890
  • repo_url: https://github.com/jettbrains/-L-
  • paper_authors: Yan Li, Guanghui Lan
  • for: 该论文关注了Markov决策过程中的稳定性和不确定性问题,提出了一种基于政策优化的方法来评估政策的性能。
  • methods: 该论文使用了首领策略评估(FRPE)方法,该方法在幂等设定下提供了对于精确和不确定情况下的政策评估的一个统一框架,可以应用于表格表示或通用函数近似。
  • results: 该论文证明了FRPE方法在幂等设定下具有线性减少性,并且在随机设定下具有$\mathcal{O}(1/\epsilon^2)$的样本复杂度。此外,FRPE还可以自然地扩展到评估不确定状态动作价值函数。
    Abstract We adopt a policy optimization viewpoint towards policy evaluation for robust Markov decision process with $\mathrm{s}$-rectangular ambiguity sets. The developed method, named first-order policy evaluation (FRPE), provides the first unified framework for robust policy evaluation in both deterministic (offline) and stochastic (online) settings, with either tabular representation or generic function approximation. In particular, we establish linear convergence in the deterministic setting, and $\tilde{\mathcal{O}(1/\epsilon^2)$ sample complexity in the stochastic setting. FRPE also extends naturally to evaluating the robust state-action value function with $(\mathrm{s}, \mathrm{a})$-rectangular ambiguity sets. We discuss the application of the developed results for stochastic policy optimization of large-scale robust MDPs.
    摘要 我们采用一个政策优化的看法来评估政策的强健Markov决策过程(MDP)中的政策评估。我们发展的方法,名为首轮政策评估(FRPE),提供了强健政策评估的第一个一致性框架,可以在决定(离线)和随机(在线)设置中进行,并且可以使用表格表示或通用函数近似。具体来说,我们证明了决定性设置中的线性传播,并且在随机设置中实现了$\mathcal{O}(1/\epsilon^2)$的样本复杂度。FRPE还自然地扩展到评估强健状态行为函数的($\mathrm{s}$, $\mathrm{a}$)-矩形不确定集。我们讨论了对大规模强健MDP的随机政策优化的应用。

Explaining Full-disk Deep Learning Model for Solar Flare Prediction using Attribution Methods

  • paper_url: http://arxiv.org/abs/2307.15878
  • repo_url: https://bitbucket.org/gsudmlab/explainfdvgg16
  • paper_authors: Chetraj Pandey, Rafal A. Angryk, Berkay Aydin
  • for: 这研究投入了深度学习方法来预测日冕物理学上的太阳折射风暴,尤其是受到严重忽略的近缘环区风暴。
  • methods: 这paper使用了一个使用每小时全盘线对视图图像的深度学习模型,并采用了二分类预测模式来预测将在下一个24小时期内发生的M级或更大的风暴。为了解决类别不均衡问题,这paper使用了数据拓展和类型权重技术。
  • results: 这paper的分析发现,全盘预测太阳风暴与活跃区(ARs)相关的特征都能够准确预测近缘环区风暴。具体来说,这paper的深度学习模型在24小时内预测M级或更大的风暴的True Skill Statistics(TSS)和Heidke Skill Score(HSS)分别为0.51和0.35,而且模型的解释分析表明,模型可以通过从全盘磁图像中提取与ARs相关的特征来做出对应预测。
    Abstract This paper contributes to the growing body of research on deep learning methods for solar flare prediction, primarily focusing on highly overlooked near-limb flares and utilizing the attribution methods to provide a post hoc qualitative explanation of the model's predictions. We present a solar flare prediction model, which is trained using hourly full-disk line-of-sight magnetogram images and employs a binary prediction mode to forecast $\geq$M-class flares that may occur within the following 24-hour period. To address the class imbalance, we employ a fusion of data augmentation and class weighting techniques; and evaluate the overall performance of our model using the true skill statistic (TSS) and Heidke skill score (HSS). Moreover, we applied three attribution methods, namely Guided Gradient-weighted Class Activation Mapping, Integrated Gradients, and Deep Shapley Additive Explanations, to interpret and cross-validate our model's predictions with the explanations. Our analysis revealed that full-disk prediction of solar flares aligns with characteristics related to active regions (ARs). In particular, the key findings of this study are: (1) our deep learning models achieved an average TSS=0.51 and HSS=0.35, and the results further demonstrate a competent capability to predict near-limb solar flares and (2) the qualitative analysis of the model explanation indicates that our model identifies and uses features associated with ARs in central and near-limb locations from full-disk magnetograms to make corresponding predictions. In other words, our models learn the shape and texture-based characteristics of flaring ARs even at near-limb areas, which is a novel and critical capability with significant implications for operational forecasting.
    摘要 In simplified Chinese:这篇论文为深入学习方法的太阳风暴预测研究做出了贡献,主要集中在快速减少的近边风暴预测方面,并使用负责任分析方法提供后果质量的解释。我们提出了一种太阳风暴预测模型,该模型通过每小时的全盘视线磁agram图像训练,并使用二进制预测模式预测下一个24小时内可能发生的M级风暴。为了解决类别不均衡问题,我们使用数据扩充和分类权重技术。我们使用真实技能统计学(TSS)和海德克技能统计学(HSS)来评估模型的总性表现。此外,我们还应用了三种负责任分析方法,即导航梯度权重映射、整合梯度和深度凝聚式加法解释,以解释和证明我们的模型预测的解释。我们的分析发现,我们的深度学习模型可以准确预测近边风暴,并且可以从全盘磁agram图像中提取活跃区域(ARs)的特征。具体来说,我们的模型可以在中央和近边位置中找到ARs的形状和文本特征,这是一种新的和重要的能力,具有深刻的运营预测意义。

GraphDAC: A Graph-Analytic Approach to Dynamic Airspace Configuration

  • paper_url: http://arxiv.org/abs/2307.15876
  • repo_url: https://github.com/kefenge2022/graphdac
  • paper_authors: Ke Feng, Dahai Liu, Yongxin Liu, Hong Liu, Houbing Song
  • for: 提高航空交通能力和应急响应能力
  • methods: 使用图像学算法和聚类分析生成协同机场组和均衡工作负担
  • results: 在不同交通情况下,实现了减少工作负担差异50%Here’s a brief explanation of each point:
  • for: The paper aims to improve the capacity and responsiveness of the current National Airspace System (NAS) by proposing a more dynamic airspace configuration approach.
  • methods: The proposed approach uses a constraints-embedded graph, dimensional compression, and spectral clustering-enabled adaptive algorithm to generate collaborative airport groups and evenly distribute workloads among them.
  • results: The experiments demonstrate a 50% reduction in workload imbalances under various traffic conditions, indicating the effectiveness of the proposed approach.
    Abstract The current National Airspace System (NAS) is reaching capacity due to increased air traffic, and is based on outdated pre-tactical planning. This study proposes a more dynamic airspace configuration (DAC) approach that could increase throughput and accommodate fluctuating traffic, ideal for emergencies. The proposed approach constructs the airspace as a constraints-embedded graph, compresses its dimensions, and applies a spectral clustering-enabled adaptive algorithm to generate collaborative airport groups and evenly distribute workloads among them. Under various traffic conditions, our experiments demonstrate a 50\% reduction in workload imbalances. This research could ultimately form the basis for a recommendation system for optimized airspace configuration. Code available at https://github.com/KeFenge2022/GraphDAC.git
    摘要 现有的国家航空系统(NAS)因为增加的航空交通量而达到了容量限制,并且基于先前的战略规划。本研究提出了一种更动态航空配置(DAC)方法,可以提高通过率和应对峰值交通时间的适应性。该方法将空间构建为约束嵌入图,压缩其维度,并通过特征集群算法来生成协作机场组和均衡工作负担。在不同的交通情况下,我们的实验表明,可以降低工作负担不均的幅度达50%。这些研究可能最终形成一个优化航空配置的建议系统。代码可以在https://github.com/KeFenge2022/GraphDAC.git中下载。

Cross-dimensional transfer learning in medical image segmentation with deep learning

  • paper_url: http://arxiv.org/abs/2307.15872
  • repo_url: https://github.com/hic-messaoudi/cross-dimensional-transfer-learning-in-medical-image-segmentation-with-deep-learning
  • paper_authors: Hicham Messaoudi, Ahror Belaid, Douraied Ben Salem, Pierre-Henri Conze
  • for: 这篇论文的目的是将2D类别网络转移到2D和3D多模式医疗影像分类中,以提高医疗影像分类的精度和效率。
  • methods: 本论文提出了两个关键原则:首先,通过嵌入2D预训网络的重要特征,实现重要特征的转移,以提高2D类别网络的精度和效率。其次,通过扩展2D类别网络到更高维度,以实现维度的转移,以提高2D和3D多模式医疗影像分类的精度和效率。
  • results: 本论文的实验和质感结果显示,这些方法可以优化2D和3D多模式医疗影像分类的精度和效率。特别是,在CAMUS挑战中,这篇论文的2D网络排名第一,超过了现有的state-of-the-art。在CHAOS挑战中,这篇论文的2D/3D MR和CT腹部影像的分类结果优于其他2D基于方法,并在Dice、RAVD、ASSD和MSSD分类指标中跨越了前一代。在BraTS 2022比赛中,这篇论文的3D网络也获得了良好的结果,平均Dice分类指标为91.69%(91.22%)、核心部分为83.23%(84.77%)和增强部分为81.75%(83.88%)。
    Abstract Over the last decade, convolutional neural networks have emerged and advanced the state-of-the-art in various image analysis and computer vision applications. The performance of 2D image classification networks is constantly improving and being trained on databases made of millions of natural images. However, progress in medical image analysis has been hindered by limited annotated data and acquisition constraints. These limitations are even more pronounced given the volumetry of medical imaging data. In this paper, we introduce an efficient way to transfer the efficiency of a 2D classification network trained on natural images to 2D, 3D uni- and multi-modal medical image segmentation applications. In this direction, we designed novel architectures based on two key principles: weight transfer by embedding a 2D pre-trained encoder into a higher dimensional U-Net, and dimensional transfer by expanding a 2D segmentation network into a higher dimension one. The proposed networks were tested on benchmarks comprising different modalities: MR, CT, and ultrasound images. Our 2D network ranked first on the CAMUS challenge dedicated to echo-cardiographic data segmentation and surpassed the state-of-the-art. Regarding 2D/3D MR and CT abdominal images from the CHAOS challenge, our approach largely outperformed the other 2D-based methods described in the challenge paper on Dice, RAVD, ASSD, and MSSD scores and ranked third on the online evaluation platform. Our 3D network applied to the BraTS 2022 competition also achieved promising results, reaching an average Dice score of 91.69% (91.22%) for the whole tumor, 83.23% (84.77%) for the tumor core, and 81.75% (83.88%) for enhanced tumor using the approach based on weight (dimensional) transfer. Experimental and qualitative results illustrate the effectiveness of our methods for multi-dimensional medical image segmentation.
    摘要 过去一个十年, convolutional neural networks(CNN)在各种图像分析和计算视觉应用中得到了提升和进步。2D图像分类网络的性能不断提高,并在包含数百万个自然图像的数据库上进行训练。然而,医疗图像分析的进步受到有限的标注数据和获取限制的影响。这些限制在医疗图像数据的量度上更加明显。在这篇文章中,我们介绍了一种高效的方法,将2D自然图像分类网络的效率传递到2D、3D单模和多模医疗图像分割应用中。为此,我们设计了基于以下两个关键原则的新架构:1)通过嵌入2D预训练encoder来实现重量传递,2)通过扩展2D分割网络到更高维度来实现维度传递。我们在不同模式的数据集上进行测试,包括MR、CT和ultrasound图像。我们的2D网络在Cardio- computed tomography(CAMUS)挑战中以echo-cardiographic数据分割的方式 ranked first,超过了现状。在2D/3D MR和CT腹部图像上,我们的方法在CHAOS挑战中较其他2D基于方法的Dice、RAVD、ASSD和MSSD分数上表现出色,并在在线评估平台上排名第三。我们的3D网络在BraTS 2022比赛中也实现了良好的结果,取得了91.69%(91.22%)的总肿瘤Dice分数,83.23%(84.77%)的肿瘤核心Dice分数和81.75%(83.88%)的增强肿瘤Dice分数。实验和质量分析结果表明,我们的方法在多维度医疗图像分 segmentation 中具有有效性。

Efficient Semi-Supervised Federated Learning for Heterogeneous Participants

  • paper_url: http://arxiv.org/abs/2307.15870
  • repo_url: None
  • paper_authors: Zhipeng Sun, Yang Xu, Hongli Xu, Zhiyuan Wang
    for: This paper proposes a novel system for training machine learning models in scenarios where labeled data reside on the server, called Pseudo-Clustering Semi-SFL.methods: The proposed system leverages semi-supervised techniques and clustering regularization to improve model performance under data non-IIDness. Additionally, a control algorithm for global updating frequency adaptation is developed to mitigate the training inconsistency.results: The proposed system achieves a 3.3x speed-up in training time and reduces the communication cost by about 80.1% while reaching the target accuracy, and achieves up to 6.9% improvement in accuracy under non-IID scenarios compared to the state-of-the-art.Here is the simplified Chinese version:for: 这篇论文提出了一种基于服务器上的标签数据的机器学习模型训练系统,名为 Pseudo-Clustering Semi-SFL。methods: 该系统利用了半指导的技术和帮助Regularization来提高模型在非同分布场景下的性能。此外,还开发了一种控制算法来调整全局更新频率,以避免训练不一致。results: 该系统实现了训练时间的3.3倍减少和通信成本的约80.1%减少,同时达到目标准确率,并在非同分布场景下实现了6.9%的提升。
    Abstract Federated Learning (FL) has emerged to allow multiple clients to collaboratively train machine learning models on their private data. However, training and deploying large models for broader applications is challenging in resource-constrained environments. Fortunately, Split Federated Learning (SFL) offers an excellent solution by alleviating the computation and communication burden on the clients SFL often assumes labeled data for local training on clients, however, it is not the case in practice.Prior works have adopted semi-supervised techniques for leveraging unlabeled data in FL, but data non-IIDness poses another challenge to ensure training efficiency. Herein, we propose Pseudo-Clustering Semi-SFL, a novel system for training models in scenarios where labeled data reside on the server. By introducing Clustering Regularization, model performance under data non-IIDness can be improved. Besides, our theoretical and experimental investigations into model convergence reveal that the inconsistent training processes on labeled and unlabeled data impact the effectiveness of clustering regularization. Upon this, we develop a control algorithm for global updating frequency adaptation, which dynamically adjusts the number of supervised training iterations to mitigate the training inconsistency. Extensive experiments on benchmark models and datasets show that our system provides a 3.3x speed-up in training time and reduces the communication cost by about 80.1% while reaching the target accuracy, and achieves up to 6.9% improvement in accuracy under non-IID scenarios compared to the state-of-the-art.
    摘要 federated learning (FL) 已经出现以 Allow Multiple clients 共同训练机器学习模型 On Their Private Data 。然而,在有限资源环境中训练和部署大型模型 для更广泛的应用是具有挑战性。幸运的是,Split Federated Learning (SFL) 提供了一个优秀的解决方案,减轻客户端的计算和通信压力。SFL 通常假设客户端上有标注数据进行本地训练,但在实践中并不是这样。先前的工作已经采用了 semi-supervised 技术来利用无标注数据,但是数据非标一致性又是一个困难。在这种情况下,我们提出了 Pseudo-Clustering Semi-SFL,一种用于在客户端上训练模型的新系统。我们引入了集群 regularization,可以在非标一致性情况下提高模型性能。此外,我们对模型融合的理论和实验调查表明,在标注和无标注数据上不一致的训练过程会影响集群 regularization 的效果。为此,我们开发了一种控制算法,可以动态调整全局更新频率,以抵消训练不一致性的影响。我们对标准模型和数据集进行了广泛的实验,结果显示,我们的系统可以提高训练时间速度3.3倍,降低通信成本约80.1%,同时达到目标准确率。此外,我们的系统在非标一致场景下可以提高准确率达6.9%。

Faster Stochastic Algorithms for Minimax Optimization under Polyak–Łojasiewicz Conditions

  • paper_url: http://arxiv.org/abs/2307.15868
  • repo_url: https://github.com/truenobility303/spider-gda
  • paper_authors: Lesi Chen, Boyuan Yao, Luo Luo
  • for: 这个论文考虑了随机首次算法的最大化问题,它们可以在Polyak-{\L}ojasiewicz(PL)条件下进行优化。
  • methods: 我们提出了一种名为SPIDER-GDA的算法来解决具有finite-sum形式的问题,即 $\min_x \max_y f(x,y)\triangleq \frac{1}{n} \sum_{i=1}^n f_i(x,y)$,其中目标函数$f(x,y)$是$\mu_x$-PL在$x$上和$\mu_y$-PL在$y$上,每个$f_i(x,y)$是$L$-平滑的。我们证明了SPIDER-GDA可以在${\mathcal O}\left((n + \sqrt{n},\kappa_x\kappa_y^2)\log (1/\epsilon)\right)$个随机首次oracle(SFO)复杂度内找到$\epsilon$-优解,比现状态艺术法的SFOUpper bound更好。
  • results: 我们的算法可以在糟糕条件下提供更高效的算法,其SFOUpper bound为$\tilde{\mathcal O}\big((n+\sqrt{n},\kappa_x\kappa_y)\log^2 (1/\epsilon)\big)$,当$\kappa_y \gtrsim \sqrt{n}$时。我们的想法还可以应用于更一般的情况下,即目标函数只满足PL条件一个变量。实验 validate了我们的提案的优越性。
    Abstract This paper considers stochastic first-order algorithms for minimax optimization under Polyak--{\L}ojasiewicz (PL) conditions. We propose SPIDER-GDA for solving the finite-sum problem of the form $\min_x \max_y f(x,y)\triangleq \frac{1}{n} \sum_{i=1}^n f_i(x,y)$, where the objective function $f(x,y)$ is $\mu_x$-PL in $x$ and $\mu_y$-PL in $y$; and each $f_i(x,y)$ is $L$-smooth. We prove SPIDER-GDA could find an $\epsilon$-optimal solution within ${\mathcal O}\left((n + \sqrt{n}\,\kappa_x\kappa_y^2)\log (1/\epsilon)\right)$ stochastic first-order oracle (SFO) complexity, which is better than the state-of-the-art method whose SFO upper bound is ${\mathcal O}\big((n + n^{2/3}\kappa_x\kappa_y^2)\log (1/\epsilon)\big)$, where $\kappa_x\triangleq L/\mu_x$ and $\kappa_y\triangleq L/\mu_y$. For the ill-conditioned case, we provide an accelerated algorithm to reduce the computational cost further. It achieves $\tilde{\mathcal O}\big((n+\sqrt{n}\,\kappa_x\kappa_y)\log^2 (1/\epsilon)\big)$ SFO upper bound when $\kappa_y \gtrsim \sqrt{n}$. Our ideas also can be applied to the more general setting that the objective function only satisfies PL condition for one variable. Numerical experiments validate the superiority of proposed methods.
    摘要 本文考虑了随机首次算法 для最小最大化问题,其中目标函数 $f(x,y)$ 满足 Polyak-{\L}ojasiewicz(PL)条件,并且每个 $f_i(x,y)$ 是 $L$ 平滑的。我们提出了 SPIDER-GDA 算法来解决 finite-sum 问题 $\min_x \max_y f(x,y) \triangleq \frac{1}{n} \sum_{i=1}^n f_i(x,y)$,其中 $x$ 和 $y$ 都是维度为 $n$ 的变量。我们证明了 SPIDER-GDA 可以在 ${\mathcal O}\left((n + \sqrt{n}\,\kappa_x\kappa_y^2)\log (1/\epsilon)\right)$ 随机首次访问(SFO)复杂度内找到 $\epsilon$-优解,这比现状最佳方法的 SFO upper bound 更好,其为 ${\mathcal O}\big((n + n^{2/3}\kappa_x\kappa_y^2)\log (1/\epsilon)\big)$。在坏条件下,我们提供了一个加速算法,可以将计算复杂度降低到 $\tilde{\mathcal O}\big((n+\sqrt{n}\,\kappa_x\kappa_y)\log^2 (1/\epsilon)\big)$。我们的想法也可以应用于更一般的情况,其中目标函数只满足 PL 条件一个变量。实验 validate 我们提出的方法的优势。

Catching Elusive Depression via Facial Micro-Expression Recognition

  • paper_url: http://arxiv.org/abs/2307.15862
  • repo_url: None
  • paper_authors: Xiaohui Chen, Tie Luo
  • for: 这项研究旨在识别隐藏型抑郁症(Concealed Depression),通过识别面部微表情(Facial Micro-Expressions,FMEs)来检测和识别真正的情感表达。
  • methods: 该研究提出了一种基于面部特征点(Facial Landmarks)的区域 интерес点(Region-of-Interest,ROI)方法,以解决识别FMEs的挑战。此外,该研究还提出了一种低成本、隐私保护的解决方案,允许用户在个人 Setting(如家中)进行自诊断,使用可携带的移动设备。
  • results: 研究结果和发现表明,该方法可以有效地识别和检测隐藏型抑郁症。然而,在实际临床设置中,还需要解决一些技术挑战,以确保方法的可靠性和精度。
    Abstract Depression is a common mental health disorder that can cause consequential symptoms with continuously depressed mood that leads to emotional distress. One category of depression is Concealed Depression, where patients intentionally or unintentionally hide their genuine emotions through exterior optimism, thereby complicating and delaying diagnosis and treatment and leading to unexpected suicides. In this paper, we propose to diagnose concealed depression by using facial micro-expressions (FMEs) to detect and recognize underlying true emotions. However, the extremely low intensity and subtle nature of FMEs make their recognition a tough task. We propose a facial landmark-based Region-of-Interest (ROI) approach to address the challenge, and describe a low-cost and privacy-preserving solution that enables self-diagnosis using portable mobile devices in a personal setting (e.g., at home). We present results and findings that validate our method, and discuss other technical challenges and future directions in applying such techniques to real clinical settings.
    摘要 抑郁是一种常见的心理健康问题,可能导致严重的情感不适和情绪压力。一种类型的抑郁是隐藏型抑郁,病人通过表面上的乐观情绪隐藏真实的情感,从而复杂和延迟诊断和治疗,导致意外的自杀。在这篇论文中,我们提议使用表情微表情(FMEs)来检测和识别隐藏的真实情感。然而,表情微表情的非常低敏感和细腻性使其识别成为一项困难的任务。我们提议使用面部特征点的区域利用方法(ROI)解决这个挑战,并描述一种低成本、隐私保护的解决方案,允许自我诊断在家庭环境(如家中)使用手持式移动设备进行。我们展示了结果和发现,并讨论了其他技术挑战和未来方向在实际临床设置中应用such techniques。

Multi-output Headed Ensembles for Product Item Classification

  • paper_url: http://arxiv.org/abs/2307.15858
  • repo_url: None
  • paper_authors: Hotaka Shiokawa, Pradipto Das, Arthur Toth, Justin Chiu
  • for: The paper is written for the problem of product item classification for large-scale e-commerce catalogs, specifically addressing the issue of poor generalization performance due to the unavailability of sizable curated training sets.
  • methods: The paper proposes an extensible deep learning based classification model framework that combines multiple classifiers and uses metadata features and low-level feature engineering to boost classification performance.
  • results: The paper shows improvements in classification performance against robust industry standard baseline models using hyperparameter optimization, and also proposes a novel way to evaluate model performance using user sessions that provides better insights in addition to traditional measures of precision and recall.
    Abstract In this paper, we revisit the problem of product item classification for large-scale e-commerce catalogs. The taxonomy of e-commerce catalogs consists of thousands of genres to which are assigned items that are uploaded by merchants on a continuous basis. The genre assignments by merchants are often wrong but treated as ground truth labels in automatically generated training sets, thus creating a feedback loop that leads to poorer model quality over time. This problem of taxonomy classification becomes highly pronounced due to the unavailability of sizable curated training sets. Under such a scenario it is common to combine multiple classifiers to combat poor generalization performance from a single classifier. We propose an extensible deep learning based classification model framework that benefits from the simplicity and robustness of averaging ensembles and fusion based classifiers. We are also able to use metadata features and low-level feature engineering to boost classification performance. We show these improvements against robust industry standard baseline models that employ hyperparameter optimization. Additionally, due to continuous insertion, deletion and updates to real-world high-volume e-commerce catalogs, assessing model performance for deployment using A/B testing and/or manual annotation becomes a bottleneck. To this end, we also propose a novel way to evaluate model performance using user sessions that provides better insights in addition to traditional measures of precision and recall.
    摘要 在这篇论文中,我们重新回到大规模电商目录中的产品项目分类问题上。电商目录的分类系统包含数千个分类,这些分类被商户上传的商品项目分配给了。商户将这些分类分配是经常错误的,但是这些分配被视为真实的标签,因此在自动生成的训练集中创建了一个反馈循环,导致模型质量变得更加低下。这个问题在没有大量精心编辑的训练集时特别突出来。在这种情况下,常见的方法是将多个分类器组合起来,以避免单个分类器的差异性。我们提出了一个可扩展的深度学习基于分类模型框架,该框架具有简单性和鲁棒性,可以使用averaging ensemble和 fusión基类ifiers。此外,我们还可以使用元数据特征和低级特征工程来提高分类性能。我们在使用超参优化的基础模型上显示了这些改进。此外,由于高量电商目录中的不断插入、删除和更新,使得在部署时使用A/B测试和/或手动标注来评估模型性能变得困难。为此,我们还提出了一种新的评估模型性能的方法,使用用户会话,该方法可以提供更好的投影,并且与传统的准确率和受损率相加。

Improving Realistic Worst-Case Performance of NVCiM DNN Accelerators through Training with Right-Censored Gaussian Noise

  • paper_url: http://arxiv.org/abs/2307.15853
  • repo_url: None
  • paper_authors: Zheyu Yan, Yifan Qin, Wujie Wen, Xiaobo Sharon Hu, Yiyu Shi
  • for: 提高深度神经网络(DNN)加速器的可靠性和稳定性,适用于安全关键应用场景如自动驾驶车辆。
  • methods: 利用k-th percentile性能(KPP)来捕捉DNN模型在 compute-in-Memory(CiM)加速器上的准确最差性能,并通过 formal 分析和噪声插入法来提高KPP。
  • results: 比起现有方法,提出一种自动确定插入右 censored Gaussian 噪声的方法,可以达到26%的KPP提高。
    Abstract Compute-in-Memory (CiM), built upon non-volatile memory (NVM) devices, is promising for accelerating deep neural networks (DNNs) owing to its in-situ data processing capability and superior energy efficiency. Unfortunately, the well-trained model parameters, after being mapped to NVM devices, can often exhibit large deviations from their intended values due to device variations, resulting in notable performance degradation in these CiM-based DNN accelerators. There exists a long list of solutions to address this issue. However, they mainly focus on improving the mean performance of CiM DNN accelerators. How to guarantee the worst-case performance under the impact of device variations, which is crucial for many safety-critical applications such as self-driving cars, has been far less explored. In this work, we propose to use the k-th percentile performance (KPP) to capture the realistic worst-case performance of DNN models executing on CiM accelerators. Through a formal analysis of the properties of KPP and the noise injection-based DNN training, we demonstrate that injecting a novel right-censored Gaussian noise, as opposed to the conventional Gaussian noise, significantly improves the KPP of DNNs. We further propose an automated method to determine the optimal hyperparameters for injecting this right-censored Gaussian noise during the training process. Our method achieves up to a 26% improvement in KPP compared to the state-of-the-art methods employed to enhance DNN robustness under the impact of device variations.
    摘要 计算在内存(CiM),基于不可塑性存储器(NVM)设备,对深度神经网络(DNN)进行加速,因其可在位置处理数据和能效率具有优势。然而,将训练好的模型参数映射到NVM设备后,由于设备变化而导致的巨大偏差,可能会导致DNN加速器的性能下降。当前的解决方法主要关注提高CiM DNN加速器的平均性能。然而,对于许多安全关键应用,如自动驾驶车辆,保证最坏情况性能是非常重要。在这种情况下,我们提出使用k-th percentile性能(KPP)来捕捉DNN模型在CiM加速器上的实际最坏情况性能。我们通过对KPP和噪声注入式DNN训练的形式分析,表明在插入右 censored Gaussian噪声时,DNN的KPP有显著改善。此外,我们还提出一种自动确定在训练过程中插入这种右 censored Gaussian噪声的优化参数的方法。我们的方法可以与现有的状态艺法相比,提高KPP达26%。

Comprehensive Algorithm Portfolio Evaluation using Item Response Theory

  • paper_url: http://arxiv.org/abs/2307.15850
  • repo_url: https://github.com/sevvandi/airt-scripts
  • paper_authors: Sevvandi Kandanaarachchi, Kate Smith-Miles
  • for: 评估机器学习算法的表现 across a repository of datasets,同时描述算法的一般特征和异常性。
  • methods: 使用修改后的 Item Response Theory(IRT)模型,无需更多的数据特征计算,以获得更加具体的算法性能特征。
  • results: 在各种应用领域中测试了算法股投资,并证明了这种方法的广泛适用性和可解释性。
    Abstract Item Response Theory (IRT) has been proposed within the field of Educational Psychometrics to assess student ability as well as test question difficulty and discrimination power. More recently, IRT has been applied to evaluate machine learning algorithm performance on a single classification dataset, where the student is now an algorithm, and the test question is an observation to be classified by the algorithm. In this paper we present a modified IRT-based framework for evaluating a portfolio of algorithms across a repository of datasets, while simultaneously eliciting a richer suite of characteristics - such as algorithm consistency and anomalousness - that describe important aspects of algorithm performance. These characteristics arise from a novel inversion and reinterpretation of the traditional IRT model without requiring additional dataset feature computations. We test this framework on algorithm portfolios for a wide range of applications, demonstrating the broad applicability of this method as an insightful algorithm evaluation tool. Furthermore, the explainable nature of IRT parameters yield an increased understanding of algorithm portfolios.
    摘要 item response theory (IRT) 在教育心理测量领域提出来评估学生能力以及测试题目难度和抗択力。更加最近,IRT 被应用于评估单一分类 dataset 上机器学习算法的性能,其中学生现在是一个算法,测试题目是一个需要被分类的观察。在这篇文章中,我们提出了一种基于 IRT 的修改后的框架,用于评估一个数据库中的算法投资组,同时同时抽取一系列特征,例如算法一致性和异常性,这些特征描述了算法性能中重要的一些方面。这些特征来自于传统 IRT 模型的新的倒推和重新解释,不需要额外的数据特征计算。我们在各种应用领域测试了这种框架, demonstarting its 广泛适用性作为一种深入的算法评估工具。此外,可解释的 IRT 参数带来了对算法投资组的更好的理解。

Quantum Kernel Estimation With Neutral Atoms For Supervised Classification: A Gate-Based Approach

  • paper_url: http://arxiv.org/abs/2307.15840
  • repo_url: None
  • paper_authors: Marco Russo, Edoardo Giusto, Bartolomeo Montrucchio
  • for: 本文提出了一种基于量子计算机的kernel估计技术(量子卷积kernel估计,QKE),用于训练支持向量机(SVM)。由于实现特征映射需要大量的2本地运算,因此需要较高的量子比特数连接。而现代超导器device不可能实现这种连接。因此,本文使用中性原子量子计算机,因为它们允许更多的自由度。
  • methods: 本文提出了一种基于门odel的通用方法,包括1个和2个门的门。然后,通过实验计算kernel矩阵,从数据集中获得了高准确率。此外,本文还将这种过程推广到N个量子比特上,利用中性原子device的更 flexible的排列方式。
  • results: 本文的实验结果表明,使用中性原子量子计算机和基于门odel的方法可以实现高准确率的支持向量机训练,即使数据集小并且分离度低。这是首先提出了一种可以在中性原子device上实现通用 kernel估计的文献。
    Abstract Quantum Kernel Estimation (QKE) is a technique based on leveraging a quantum computer to estimate a kernel function that is classically difficult to calculate, which is then used by a classical computer for training a Support Vector Machine (SVM). Given the high number of 2-local operators necessary for realizing a feature mapping hard to simulate classically, a high qubit connectivity is needed, which is not currently possible on superconducting devices. For this reason, neutral atom quantum computers can be used, since they allow to arrange the atoms with more freedom. Examples of neutral-atom-based QKE can be found in the literature, but they are focused on graph learning and use the analogue approach. In this paper, a general method based on the gate model is presented. After deriving 1-qubit and 2-qubit gates starting from laser pulses, a parameterized sequence for feature mapping on 3 qubits is realized. This sequence is then used to empirically compute the kernel matrix starting from a dataset, which is finally used to train the SVM. It is also shown that this process can be generalized up to N qubits taking advantage of the more flexible arrangement of atoms that this technology allows. The accuracy is shown to be high despite the small dataset and the low separation. This is the first paper that not only proposes an algorithm for explicitly deriving a universal set of gates but also presents a method of estimating quantum kernels on neutral atom devices for general problems using the gate model.
    摘要 量子均衡估计(QKE)是一种基于使用量子计算机来估计类比Difficult to calculate classical kernel function,然后使用类型计算机进行培训支持向量机(SVM)的技术。由于实现特征映射所需的2本本操作数量很高,因此需要高 qubit 连接度,这 Currently not possible on superconducting devices. 因此,中性原子量子计算机可以使用,它们允许 atoms 的更多自由排序。文中提到了中性原子基于QKE的例子,但是它们主要关注于图学学习和使用分析方法。本文则提出了一种基于门模型的通用方法。通过从激光脉冲开始 derive 1 qubit 和 2 qubit 门,实现了基于3 qubits的特征映射序列。这个序列然后用来实际计算基于数据集的kernel矩阵,最后用来培训SVM。此外,文中还证明了这种过程可以扩展到N qubits,利用中性原子技术允许的更 flexible atoms 排序。具体来说,文中通过使用小型数据集和低分离度来证明这种方法的高准确率。这是第一篇不仅提出了一种算法来直接 derivation 一组 universal gates,而且还提出了使用门模型来估计中性原子设备上的量子kernels的方法。

Holistic Survey of Privacy and Fairness in Machine Learning

  • paper_url: http://arxiv.org/abs/2307.15838
  • repo_url: None
  • paper_authors: Sina Shaham, Arash Hajisafi, Minh K Quan, Dinh C Nguyen, Bhaskar Krishnamachari, Charith Peris, Gabriel Ghinita, Cyrus Shahabi, Pubudu N. Pathirana
  • for: 本文旨在探讨负责任人工智能(AI)和可靠机器学习(ML)中的隐私和公平问题,以及这两个目标如何同时 integrate into ML 模型中。
  • methods: 本文通过对隐私和公平在 ML 中的研究,包括指导、不指导、半指导和奖励学习等多种方法,以及这些方法在应用领域的交互。
  • results: 本文结合了现有的研究成果,提出了隐私和公平在 ML 中的影响关系,以及如何同时实现这两个目标而减少功能损失。 However, the paper also identifies research challenges in achieving privacy and fairness concurrently in large language models.
    Abstract Privacy and fairness are two crucial pillars of responsible Artificial Intelligence (AI) and trustworthy Machine Learning (ML). Each objective has been independently studied in the literature with the aim of reducing utility loss in achieving them. Despite the significant interest attracted from both academia and industry, there remains an immediate demand for more in-depth research to unravel how these two objectives can be simultaneously integrated into ML models. As opposed to well-accepted trade-offs, i.e., privacy-utility and fairness-utility, the interrelation between privacy and fairness is not well-understood. While some works suggest a trade-off between the two objective functions, there are others that demonstrate the alignment of these functions in certain scenarios. To fill this research gap, we provide a thorough review of privacy and fairness in ML, including supervised, unsupervised, semi-supervised, and reinforcement learning. After examining and consolidating the literature on both objectives, we present a holistic survey on the impact of privacy on fairness, the impact of fairness on privacy, existing architectures, their interaction in application domains, and algorithms that aim to achieve both objectives while minimizing the utility sacrificed. Finally, we identify research challenges in achieving privacy and fairness concurrently in ML, particularly focusing on large language models.
    摘要 <> translate the following text into Simplified Chinese:Privacy and fairness are two crucial pillars of responsible Artificial Intelligence (AI) and trustworthy Machine Learning (ML). Each objective has been independently studied in the literature with the aim of reducing utility loss in achieving them. Despite the significant interest attracted from both academia and industry, there remains an immediate demand for more in-depth research to unravel how these two objectives can be simultaneously integrated into ML models. As opposed to well-accepted trade-offs, i.e., privacy-utility and fairness-utility, the interrelation between privacy and fairness is not well-understood. While some works suggest a trade-off between the two objective functions, there are others that demonstrate the alignment of these functions in certain scenarios. To fill this research gap, we provide a thorough review of privacy and fairness in ML, including supervised, unsupervised, semi-supervised, and reinforcement learning. After examining and consolidating the literature on both objectives, we present a holistic survey on the impact of privacy on fairness, the impact of fairness on privacy, existing architectures, their interaction in application domains, and algorithms that aim to achieve both objectives while minimizing the utility sacrificed. Finally, we identify research challenges in achieving privacy and fairness concurrently in ML, particularly focusing on large language models.Translation:<>隐私和公正是负责任人工智能(AI)和可靠机器学习(ML)中的两个关键柱子。每个目标都已经独立地在文献中研究,以减少实现它们的利用损失。尽管学术界和industry都对这两个目标表示了极大的兴趣,但是还有一个立即需要更深入的研究,以了解这两个目标如何同时integrated into ML模型。与well-accepted的交易所不同,i.e., 隐私-实用和公正-实用,隐私和公正之间的关系还不够了解。一些工作表明了这两个目标函数之间的交易,而其他一些则表明了这两个目标函数在某些场景下的alignment。为了填补这个研究漏洞,我们提供了隐私和公正在ML中的经过系统性的综述,包括supervised, unsupervised, semi-supervised,和reinforcement learning。我们结合了文献中关于这两个目标的所有研究,并提供了一个总体的回顾,探讨了隐私对公正的影响,公正对隐私的影响,现有的架构,其交互在应用领域中,以及可以实现这两个目标的Algorithms,同时尽量减少实用损失。最后,我们确定了在ML中实现隐私和公正的研究挑战,特别是关注大型语言模型。

Mean Estimation with User-level Privacy under Data Heterogeneity

  • paper_url: http://arxiv.org/abs/2307.15835
  • repo_url: None
  • paper_authors: Rachel Cummings, Vitaly Feldman, Audra McMillan, Kunal Talwar
  • for: Handle heterogeneous user data with different distribution and quantity of data while preserving user-level differential privacy.
  • methods: Propose a simple model of heterogeneous user data and an estimator that achieves asymptotic optimality with proven lower bounds on error.
  • results: Demonstrate the effectiveness of the proposed method through theoretical analysis and prove the asymptotic optimality and lower bounds on error.
    Abstract A key challenge in many modern data analysis tasks is that user data are heterogeneous. Different users may possess vastly different numbers of data points. More importantly, it cannot be assumed that all users sample from the same underlying distribution. This is true, for example in language data, where different speech styles result in data heterogeneity. In this work we propose a simple model of heterogeneous user data that allows user data to differ in both distribution and quantity of data, and provide a method for estimating the population-level mean while preserving user-level differential privacy. We demonstrate asymptotic optimality of our estimator and also prove general lower bounds on the error achievable in the setting we introduce.
    摘要 现代数据分析任务中一个关键挑战是用户数据异ogeneous。不同的用户可能拥有极其不同的数据点数。更重要的是,不能假设所有用户来自同一个下面分布。这是语言数据中的不同说话风格导致数据异ogeneous的一个例子。在这个工作中,我们提出了一种简单的异ogeneous用户数据模型,允许用户数据在分布和数据量方面不同,并提供了保持用户级别隐私的方法来估算人口级别的 mean。我们证明了我们的估计器在我们所引入的设定下是 asymptotic optimality 的,并且证明了该设定下的一般下界。

DeepTSF: Codeless machine learning operations for time series forecasting

  • paper_url: http://arxiv.org/abs/2308.00709
  • repo_url: None
  • paper_authors: Sotiris Pelekis, Evangelos Karakolis, Theodosios Pountridis, George Kormpakis, George Lampropoulos, Spiros Mouzakits, Dimitris Askounis
  • for: 这篇论文旨在提供一个通用的机器学习操作(MLOps)框架,以创新时间序列预测(TS)领域。
  • methods: 这篇论文使用了深度学习(DL)和机器学习(ML)方法,并自动化了运算和模型化的过程,以提高资料科学家和机器学习工程师的生产力和效率。
  • results: 这篇论文在实际应用中已经证明了 DeepTSF 的有效性,并且在电力和能源系统领域中展示了它的重要加值。
    Abstract This paper presents DeepTSF, a comprehensive machine learning operations (MLOps) framework aiming to innovate time series forecasting through workflow automation and codeless modeling. DeepTSF automates key aspects of the ML lifecycle, making it an ideal tool for data scientists and MLops engineers engaged in machine learning (ML) and deep learning (DL)-based forecasting. DeepTSF empowers users with a robust and user-friendly solution, while it is designed to seamlessly integrate with existing data analysis workflows, providing enhanced productivity and compatibility. The framework offers a front-end user interface (UI) suitable for data scientists, as well as other higher-level stakeholders, enabling comprehensive understanding through insightful visualizations and evaluation metrics. DeepTSF also prioritizes security through identity management and access authorization mechanisms. The application of DeepTSF in real-life use cases of the I-NERGY project has already proven DeepTSF's efficacy in DL-based load forecasting, showcasing its significant added value in the electrical power and energy systems domain.
    摘要 DeepTSF provides a robust and user-friendly solution that seamlessly integrates with existing data analysis workflows, enhancing productivity and compatibility. The framework offers a front-end user interface (UI) suitable for data scientists and other higher-level stakeholders, providing insightful visualizations and evaluation metrics for comprehensive understanding.In addition, DeepTSF prioritizes security through identity management and access authorization mechanisms. The application of DeepTSF in real-life use cases of the I-NERGY project has already demonstrated its efficacy in DL-based load forecasting, showcasing its significant added value in the electrical power and energy systems domain.

A Distance Correlation-Based Approach to Characterize the Effectiveness of Recurrent Neural Networks for Time Series Forecasting

  • paper_url: http://arxiv.org/abs/2307.15830
  • repo_url: None
  • paper_authors: Christopher Salazar, Ashis G. Banerjee
  • for: 这个论文主要针对时间序列预测问题,尤其是使用循环神经网络(RNN)模型来解决这个问题。
  • methods: 该论文使用距离相关度指标来链接时间序列特征和RNN活动层的组件,以便解释和解释RNN的性能。
  • results: 研究发现,RNN活动层可以良好地学习时间序列的延迟结构,但是随着层数的增加,这些信息会逐渐丢失,导致时间序列预测质量下降。此外,活动层也无法完善地模拟平均移动和不均等时间序列过程。
    Abstract Time series forecasting has received a lot of attention with recurrent neural networks (RNNs) being one of the widely used models due to their ability to handle sequential data. Prior studies of RNNs for time series forecasting yield inconsistent results with limited insights as to why the performance varies for different datasets. In this paper, we provide an approach to link the characteristics of time series with the components of RNNs via the versatile metric of distance correlation. This metric allows us to examine the information flow through the RNN activation layers to be able to interpret and explain their performance. We empirically show that the RNN activation layers learn the lag structures of time series well. However, they gradually lose this information over a span of a few consecutive layers, thereby worsening the forecast quality for series with large lag structures. We also show that the activation layers cannot adequately model moving average and heteroskedastic time series processes. Last, we generate heatmaps for visual comparisons of the activation layers for different choices of the network hyperparameters to identify which of them affect the forecast performance. Our findings can, therefore, aid practitioners in assessing the effectiveness of RNNs for given time series data without actually training and evaluating the networks.
    摘要 时间序列预测已经受到了很多关注,回归神经网络(RNN)是一种广泛使用的模型,因为它们可以处理序列数据。先前的研究表明,RNN在不同的数据集上的性能异常各异,具体原因不够清晰。在这篇论文中,我们提出了一种方法,通过距离相关度的灵活度量来连接时间序列的特征和RNN的组件。这种度量允许我们检查RNN活动层中信息的流动,以便解释和解释它们的性能。我们实际证明了,RNN活动层可以良好地学习时间序列的延迟结构。然而,它们随着连续层的数量增加,慢慢地失去这些信息,从而使时间序列预测质量下降。此外,我们还证明了,活动层不能够合理地模型平均和不均时间序列过程。最后,我们生成了不同网络参数选择的热图,以便比较活动层的效果。我们的发现可以帮助实践者评估RNN在给定时间序列数据上的效果,而不需要实际训练和评估网络。

RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control

  • paper_url: http://arxiv.org/abs/2307.15818
  • repo_url: None
  • paper_authors: Anthony Brohan, Noah Brown, Justice Carbajal, Yevgen Chebotar, Xi Chen, Krzysztof Choromanski, Tianli Ding, Danny Driess, Avinava Dubey, Chelsea Finn, Pete Florence, Chuyuan Fu, Montse Gonzalez Arenas, Keerthana Gopalakrishnan, Kehang Han, Karol Hausman, Alexander Herzog, Jasmine Hsu, Brian Ichter, Alex Irpan, Nikhil Joshi, Ryan Julian, Dmitry Kalashnikov, Yuheng Kuang, Isabel Leal, Lisa Lee, Tsang-Wei Edward Lee, Sergey Levine, Yao Lu, Henryk Michalewski, Igor Mordatch, Karl Pertsch, Kanishka Rao, Krista Reymann, Michael Ryoo, Grecia Salazar, Pannag Sanketi, Pierre Sermanet, Jaspiar Singh, Anikait Singh, Radu Soricut, Huong Tran, Vincent Vanhoucke, Quan Vuong, Ayzaan Wahid, Stefan Welker, Paul Wohlhart, Jialin Wu, Fei Xia, Ted Xiao, Peng Xu, Sichun Xu, Tianhe Yu, Brianna Zitkovich
  • for: 这个论文的目标是把视力语言模型直接 integrate into end-to-end robotic control,以提高总结和允许emergent semantic reasoning。
  • methods: 作者提出了一种简单的、通用的方法,即将 robotic actions 表示为文本token,并将其直接 incorporated into the training set of the model。
  • results: 作者的方法导致了高性能的 робо控制策略,并允许模型获得了一系列的emergent capabilities,如对新物体的总结、理解不在机器人培训数据中的命令(如将物体放置在特定的数字或图标上)、以及对用户命令的简单逻辑处理(如选择最小或最大的物体、或者最近的物体)。
    Abstract We study how vision-language models trained on Internet-scale data can be incorporated directly into end-to-end robotic control to boost generalization and enable emergent semantic reasoning. Our goal is to enable a single end-to-end trained model to both learn to map robot observations to actions and enjoy the benefits of large-scale pretraining on language and vision-language data from the web. To this end, we propose to co-fine-tune state-of-the-art vision-language models on both robotic trajectory data and Internet-scale vision-language tasks, such as visual question answering. In contrast to other approaches, we propose a simple, general recipe to achieve this goal: in order to fit both natural language responses and robotic actions into the same format, we express the actions as text tokens and incorporate them directly into the training set of the model in the same way as natural language tokens. We refer to such category of models as vision-language-action models (VLA) and instantiate an example of such a model, which we call RT-2. Our extensive evaluation (6k evaluation trials) shows that our approach leads to performant robotic policies and enables RT-2 to obtain a range of emergent capabilities from Internet-scale training. This includes significantly improved generalization to novel objects, the ability to interpret commands not present in the robot training data (such as placing an object onto a particular number or icon), and the ability to perform rudimentary reasoning in response to user commands (such as picking up the smallest or largest object, or the one closest to another object). We further show that incorporating chain of thought reasoning allows RT-2 to perform multi-stage semantic reasoning, for example figuring out which object to pick up for use as an improvised hammer (a rock), or which type of drink is best suited for someone who is tired (an energy drink).
    摘要 我们研究如何将互联网规模数据上训练的视力语言模型直接应用到终端控制中,以提高泛化和启动Semantic Reasoning。我们的目标是将单一的终端训练模型,能够将机器人观察到动作映射到动作,并且从互联网的视力语言数据获益。为了实现这一目标,我们提出了一个简单的普遍方法:将机器人动作表示为文本 токен,并将它们直接添加到模型的训练集中,与自然语言 токен一样。我们称这种类型的模型为视力语言动作模型(VLA),并实现了一个简单的示例,即 RT-2。我们的广泛评估(6000次评估)表明,我们的方法将带来高效的机器人政策,并允许 RT-2 获得互联网训练中的许多类型的能力。这包括对新物品的泛化、对机器人训练数据中没有的命令(如将物品放在特定数字或图示上)的理解,以及对使用者命令进行基本的推理(如选择最小或最大的物品,或者最近的物品)。我们进一步显示,将链接思维理论添加到 VLA 中,允许它执行多阶层Semantic Reasoning,例如选择哪样的物品用作做扩展钻(一个岩石),或者选择哪种饮料适合疲劳的人(一种能量饮料)。

Multi-growth stage plant recognition: a case study of Palmer amaranth (Amaranthus palmeri) in cotton (Gossypium hirsutum)

  • paper_url: http://arxiv.org/abs/2307.15816
  • repo_url: None
  • paper_authors: Guy RY Coleman, Matthew Kutugata, Michael J Walsh, Muthukumar Bagavathiannan
  • for: 这个论文旨在测试不同版本的YOLO框架在识别不同生长阶段的amaranthus palmeri中的性能。
  • methods: 该论文使用了YOLO框架的26种不同变体,并对其进行了测试和比较,以评估它们在识别不同生长阶段的表现。
  • results: 研究发现,使用最新版本的YOLO框架(v8)可以达到47.34%的识别精度,而将所有生长阶段 grouped为一个类型可以提高性能,最高的mean average precision(mAP)为67.05%。此外,使用不同的分割方法和权重也可以提高模型的性能。
    Abstract Many advanced, image-based precision agricultural technologies for plant breeding, field crop research, and site-specific crop management hinge on the reliable detection and phenotyping of plants across highly variable morphological growth stages. Convolutional neural networks (CNNs) have shown promise for image-based plant phenotyping and weed recognition, but their ability to recognize growth stages, often with stark differences in appearance, is uncertain. Amaranthus palmeri (Palmer amaranth) is a particularly challenging weed plant in cotton (Gossypium hirsutum) production, exhibiting highly variable plant morphology both across growth stages over a growing season, as well as between plants at a given growth stage due to high genetic diversity. In this paper, we investigate eight-class growth stage recognition of A. palmeri in cotton as a challenging model for You Only Look Once (YOLO) architectures. We compare 26 different architecture variants from YOLO v3, v5, v6, v6 3.0, v7, and v8 on an eight-class growth stage dataset of A. palmeri. The highest mAP@[0.5:0.95] for recognition of all growth stage classes was 47.34% achieved by v8-X, with inter-class confusion across visually similar growth stages. With all growth stages grouped as a single class, performance increased, with a maximum mean average precision (mAP@[0.5:0.95]) of 67.05% achieved by v7-Original. Single class recall of up to 81.42% was achieved by v5-X, and precision of up to 89.72% was achieved by v8-X. Class activation maps (CAM) were used to understand model attention on the complex dataset. Fewer classes, grouped by visual or size features improved performance over the ground-truth eight-class dataset. Successful growth stage detection highlights the substantial opportunity for improving plant phenotyping and weed recognition technologies with open-source object detection architectures.
    摘要 多种高级图像基于精准农业技术,如植物选择、田间考核和场景特定作物管理,都需要可靠地检测和phenotyping植物。 convolutional neural networks (CNNs) 已经在图像基于植物phenotyping和苔藿识别中展示了抢夺性,但它们在不同生长阶段之间的形态差异 recognition 的能力尚未得到证明。 Amaranthus palmeri(Palmer amaranth)是在棉花(Gossypium hirsutum)生产中 particualrly 挑战人工智能,它的植物形态具有高度变化和 между植物之间的高遗传多样性。 在这篇论文中,我们Investigate Amaranthus palmeri 在棉花中的八个生长阶段识别,作为YOLO 架构的挑战模型。我们对 YOLO v3、v5、v6、v6 3.0、v7 和 v8 中的26种不同架构variant进行比较,并获得了最高的mAP@[0.5:0.95] 值为47.34%,由 v8-X 实现。在所有生长阶段被 grouped 为一个单一类时,性能提高,最高的mAP@[0.5:0.95] 值为67.05%,由 v7-Original 实现。单个类回归率可达81.42%,由 v5-X 实现,而特征精度可达89.72%,由 v8-X 实现。通过使用类活动图(CAM)来理解模型在复杂数据集上的注意力。 fewer classes, grouped by visual or size features 可以提高性能。成功的生长阶段检测表明了开源物体检测架构在植物phenotyping和苔藿识别技术中的潜在潜力。

Anomaly Detection in Industrial Machinery using IoT Devices and Machine Learning: a Systematic Mapping

  • paper_url: http://arxiv.org/abs/2307.15807
  • repo_url: None
  • paper_authors: Sérgio F. Chevtchenko, Elisson da Silva Rocha, Monalisa Cristina Moura Dos Santos, Ricardo Lins Mota, Diego Moura Vieira, Ermeson Carneiro de Andrade, Danilo Ricardo Barbosa de Araújo
  • For: This paper is written for researchers and practitioners who are interested in Anomaly Detection for industrial machinery using IoT devices and ML algorithms.* Methods: The paper uses a systematic mapping study to evaluate 84 relevant studies from 2016 to 2023, providing an extensive review of Anomaly Detection research in industrial machinery. The study covers the most commonly used algorithms, preprocessing techniques, and sensor types.* Results: The paper identifies the application areas and points to future challenges and research opportunities in Anomaly Detection for industrial machinery using IoT devices and ML algorithms.Here is the same information in Simplified Chinese text:* For: 这篇论文是为研究者和实践者而写的,他们关心工业机器使用互联网物联网设备和机器学习算法进行异常检测。* Methods: 这篇论文使用系统性的映射研究来评估84篇相关研究,从2016年到2023年,提供了工业机器异常检测研究的广泛审视。研究涵盖了最常用的算法、预处理技术和传感器类型。* Results: 这篇论文 indentifies了应用领域和未来挑战和研究机会在工业机器使用互联网物联网设备和机器学习算法进行异常检测。
    Abstract Anomaly detection is critical in the smart industry for preventing equipment failure, reducing downtime, and improving safety. Internet of Things (IoT) has enabled the collection of large volumes of data from industrial machinery, providing a rich source of information for Anomaly Detection. However, the volume and complexity of data generated by the Internet of Things ecosystems make it difficult for humans to detect anomalies manually. Machine learning (ML) algorithms can automate anomaly detection in industrial machinery by analyzing generated data. Besides, each technique has specific strengths and weaknesses based on the data nature and its corresponding systems. However, the current systematic mapping studies on Anomaly Detection primarily focus on addressing network and cybersecurity-related problems, with limited attention given to the industrial sector. Additionally, these studies do not cover the challenges involved in using ML for Anomaly Detection in industrial machinery within the context of the IoT ecosystems. This paper presents a systematic mapping study on Anomaly Detection for industrial machinery using IoT devices and ML algorithms to address this gap. The study comprehensively evaluates 84 relevant studies spanning from 2016 to 2023, providing an extensive review of Anomaly Detection research. Our findings identify the most commonly used algorithms, preprocessing techniques, and sensor types. Additionally, this review identifies application areas and points to future challenges and research opportunities.
    摘要 “异常探测是智能产业中的关键任务,可以预防设备故障、减少停机时间和提高安全性。互联网物件(IoT)已经允许了对工业机械的大量数据收集,提供了丰富的数据来源供异常探测。然而,由于互联网物件生态系统所生成的数据量和复杂度,使得人类手动探测异常具有困难。机器学习(ML)算法可以自动探测工业机械中的异常,通过分析生成的数据。然而,目前的系统性映射研究主要集中在网络和预防网络攻击等方面,对于工业 сектору的关注相对较少。此外,这些研究并未考虑使用ML探测工业机械中的异常在互联网物件生态系统中的挑战。本文提出了一个系统性映射研究,涵盖2016年至2023年间84份相关的研究,提供了广泛的异常探测研究评估。我们的发现显示了最常使用的算法、处理前置技术和感应器类型。此外,这个评估还点出了应用领域和未来挑战和研究机会。”

On Single Index Models beyond Gaussian Data

  • paper_url: http://arxiv.org/abs/2307.15804
  • repo_url: None
  • paper_authors: Joan Bruna, Loucas Pillaud-Vivien, Aaron Zweig
  • for: 本文研究了在非标准分布 Setting下,使用批处理梯度下降法(SGD)来修复潜在的隐藏向量 $\theta^*$。
  • methods: 本文基于 \cite{arous2020online} 的框架,并对非标准分布 Setting进行扩展。
  • results: 本文的主要结果表明,在高维度 Setting下,SGD 可以有效地修复隐藏向量 $\theta^*$,提供了对previous works \cite{yehudai2020learning,wu2022learning} 的扩展。
    Abstract Sparse high-dimensional functions have arisen as a rich framework to study the behavior of gradient-descent methods using shallow neural networks, showcasing their ability to perform feature learning beyond linear models. Amongst those functions, the simplest are single-index models $f(x) = \phi( x \cdot \theta^*)$, where the labels are generated by an arbitrary non-linear scalar link function $\phi$ applied to an unknown one-dimensional projection $\theta^*$ of the input data. By focusing on Gaussian data, several recent works have built a remarkable picture, where the so-called information exponent (related to the regularity of the link function) controls the required sample complexity. In essence, these tools exploit the stability and spherical symmetry of Gaussian distributions. In this work, building from the framework of \cite{arous2020online}, we explore extensions of this picture beyond the Gaussian setting, where both stability or symmetry might be violated. Focusing on the planted setting where $\phi$ is known, our main results establish that Stochastic Gradient Descent can efficiently recover the unknown direction $\theta^*$ in the high-dimensional regime, under assumptions that extend previous works ~\cite{yehudai2020learning,wu2022learning}.
    摘要 稀疏高维函数已成为研究梯度下降方法使用浅层神经网络的理想平台,展示它们在超 linear 模型中进行特征学习。这些函数中最简单的是单指数模型 $f(x) = \phi(x \cdot \theta^*)$,其中标签由一个未知的一维投影 $\theta^*$ 和一个任意非线性的链接函数 $\phi$ 生成。通过对 Gaussian 数据进行研究,一些最近的工作已经构建了一幅很出色的图像,其中信息指数(相关于链接函数的正则性)控制了样本复杂性。这些工具利用了 Gaussian 分布的稳定性和圆涂函数的圆涂性。在本文中,我们基于 \cite{arous2020online} 的框架,探讨这个图像在非 Gaussian 设置下的扩展。我们主要研究在植入 Setting 下,其中 $\phi$ 是已知的,Stochastic Gradient Descent 能够高维化 Recover 未知方向 $\theta^*$,并且我们的主要结果表明,在一些适用于前工作 \cite{yehudai2020learning,wu2022learning} 的假设下,Stochastic Gradient Descent 可以高效地进行梯度下降。

SAFE: Saliency-Aware Counterfactual Explanations for DNN-based Automated Driving Systems

  • paper_url: http://arxiv.org/abs/2307.15786
  • repo_url: None
  • paper_authors: Amir Samadi, Amir Shirian, Konstantinos Koufos, Kurt Debattista, Mehrdad Dianati
  • for: 本研究的目的是提出一种新的CF解释方法,即使用saliency map来生成更有用的CF解释。
  • methods: 本研究使用了现有的深度生成CF模型,并提出了一种基于saliency map的CF解释方法,该方法可以更好地考虑黑盒模型的权重分布。
  • results: 研究发现,使用saliency map可以生成更有用的CF解释,并且可以更好地考虑黑盒模型的权重分布。此外,研究还发现了一些相关的CF特征,可以用于更好地理解黑盒模型的决策过程。
    Abstract A CF explainer identifies the minimum modifications in the input that would alter the model's output to its complement. In other words, a CF explainer computes the minimum modifications required to cross the model's decision boundary. Current deep generative CF models often work with user-selected features rather than focusing on the discriminative features of the black-box model. Consequently, such CF examples may not necessarily lie near the decision boundary, thereby contradicting the definition of CFs. To address this issue, we propose in this paper a novel approach that leverages saliency maps to generate more informative CF explanations. Source codes are available at: https://github.com/Amir-Samadi//Saliency_Aware_CF.
    摘要 一种 CF 解释器可以确定输入中最小的修改,使模型的输出变为其补做。即使是深度生成的 CF 模型通常使用用户选择的特征而不是黑盒模型的激发特征,因此 CF 示例可能不会位于决策边界附近,从而违反 CF 的定义。为解决这个问题,我们在这篇论文中提出了一种新的方法,利用saliency map生成更有用的 CF 解释。代码可以在 GitHub 上找到:https://github.com/Amir-Samadi//Saliency_Aware_CF.

Spherical and Hyperbolic Toric Topology-Based Codes On Graph Embedding for Ising MRF Models: Classical and Quantum Topology Machine Learning

  • paper_url: http://arxiv.org/abs/2307.15778
  • repo_url: https://github.com/Lcrypto/Topology-Signal-Processing
  • paper_authors: Vasiliy Usatyuk, Sergey Egorov, Denis Sapozhnikov
  • for: 本研究探讨了用信息 геометрия来描述爱丁顿模型的基态。
  • methods: 该方法利用了矩阵检查法和自动同构的概念,并与机器学习和错误检查编码之间的联系。
  • results: 该研究发现了一种将深度神经网络架构与错误检查编码相关的方法,并提出了一种基于捕获集的嵌入和稀疏分解方法。此外,研究还发现了一种将量子近似优化算法与深度神经网络架构相关的方法。
    Abstract The paper introduces the application of information geometry to describe the ground states of Ising models. This is achieved by utilizing parity-check matrices of cyclic and quasi-cyclic codes on toric and spherical topologies. The approach establishes a connection between machine learning and error-correcting coding, specifically in terms of automorphism and the size of the circulant of the quasi-cyclic code. This proposed approach has implications for the development of new embedding methods based on trapping sets. Statistical physics and number geometry are utilized to optimize error-correcting codes, leading to these embedding and sparse factorization methods. The paper establishes a direct connection between DNN architecture and error-correcting coding by demonstrating how state-of-the-art DNN architectures (ChordMixer, Mega, Mega-chunk, CDIL, ...) from the long-range arena can be equivalent to specific types (Cage-graph, Repeat Accumulate) of block and convolutional LDPC codes. QC codes correspond to certain types of chemical elements, with the carbon element being represented by the mixed automorphism Shu-Lin-Fossorier QC-LDPC code. The Quantum Approximate Optimization Algorithm (QAOA) used in the Sherrington-Kirkpatrick Ising model can be seen as analogous to the back-propagation loss function landscape in training DNNs. This similarity creates a comparable problem with TS pseudo-codeword, resembling the belief propagation method. Additionally, the layer depth in QAOA correlates to the number of decoding belief propagation iterations in the Wiberg decoding tree. Overall, this work has the potential to advance multiple fields, from Information Theory, DNN architecture design (sparse and structured prior graph topology), efficient hardware design for Quantum and Classical DPU/TPU (graph, quantize and shift register architect.) to Materials Science and beyond.
    摘要 文章介绍了使用信息 геометрии来描述铁模型的基态。这是通过利用cyclic和quasi-cyclic codes的parity-check矩阵在toric和spherical topologies上进行实现的。该方法确立了机器学习和错误修正编码之间的连接,特别是自动orf和 quasi-cyclic code的 circulant 的大小。这个提议的方法可以用于开发新的嵌入方法,基于拦束集。物理统计学和数字几何被用来优化错误修正编码,导致这些嵌入和稀疏因子化方法。文章显示了如何将state-of-the-art DNN架构(ChordMixer、Mega、Mega-chunk、CDIL等)与error-correcting coding相关联,并证明了这些架构可以被视为特定类型(Cage-graph、Repeat Accumulate)的块和 convolutional LDPC codes。QC codes与某些类型的化学元素相对应,如碳元素被表示为Shu-Lin-Fossorier QC-LDPC code的混合自动orf。Quantum Approximate Optimization Algorithm(QAOA)在Sherrington-Kirkpatrick Ising模型中可以被视为back-propagation loss function landscape在训练DNN时的相似。这种相似性创造了一个相似的问题,与TS pseudo-codeword相似,类似于 belief propagation 方法。此外,QAOA层深度与 belief propagation 迭代数相关。总之,这种工作有可能推动多个领域的进步,从信息理论、DNN架构设计(稀疏和结构化图前 topology)、高效的古典和量子 DPU/TPU 设计(图、量化和移动register 架构)到材料科学和更远的领域。

Seeking the Yield Barrier: High-Dimensional SRAM Evaluation Through Optimal Manifold

  • paper_url: http://arxiv.org/abs/2307.15773
  • repo_url: None
  • paper_authors: Yanfang Liu, Guohao Dai, Wei W. Xing
  • for: 该研究目标是提高高级规模的SRAM组件失败概率估计的效率和准确性。
  • methods: 该研究基于经典的 нор minimization 方法,并将其扩展到无穷Components和得到新的优化 manifold 概念,这个概念连接了代理基本和重要抽样(IS)估计方法。然后,提出了一种不良边缘aware的落囊采样方法,并使用神经 Coupling 流(可以学习从样本如surrogate模型)作为 IS 提案分布。
  • results: 该研究结果显示,OPTIMIS 方法可以具有与 SOTA 方法相同的性能和稳定性,同时具有更高的效率和准确性。在高维度 SRAM 评估中,OPTIMIS 方法可以提高效率达 3.5倍,并提高准确性达 3倍。
    Abstract Being able to efficiently obtain an accurate estimate of the failure probability of SRAM components has become a central issue as model circuits shrink their scale to submicrometer with advanced technology nodes. In this work, we revisit the classic norm minimization method. We then generalize it with infinite components and derive the novel optimal manifold concept, which bridges the surrogate-based and importance sampling (IS) yield estimation methods. We then derive a sub-optimal manifold, optimal hypersphere, which leads to an efficient sampling method being aware of the failure boundary called onion sampling. Finally, we use a neural coupling flow (which learns from samples like a surrogate model) as the IS proposal distribution. These combinations give rise to a novel yield estimation method, named Optimal Manifold Important Sampling (OPTIMIS), which keeps the advantages of the surrogate and IS methods to deliver state-of-the-art performance with robustness and consistency, with up to 3.5x in efficiency and 3x in accuracy over the best of SOTA methods in High-dimensional SRAM evaluation.
    摘要 “能够效率地获得SRAM ком ponent的失败概率估计已成为技术迁移到 submicrometer 级别的 central issue。在这种工作中,我们回到了经典的 нор minimization 方法。然后,我们推广它到无限组件,并 derive novel optimal manifold 概念,该概念将 surrogate-based 和 importance sampling (IS) yield estimation 方法相连接。然后,我们 deriv sub-optimal manifold,optimal hypersphere,这导致了一种高效的抽样方法,意识到失败边界called onion sampling。最后,我们使用 neural coupling flow(学习从样本如 surrogate model)作为 IS 提案分布。这些组合在OPTIMIS 方法中,可以带来一种 novel yield estimation 方法,具有 surrogate 和 IS 方法的优点,可以提供 state-of-the-art 性能,同时具有 robustness 和一致性,高效率和高准确率,相比 SOTA 方法,可以提高到 3.5x 的效率和 3x 的准确率。”

Weighted variation spaces and approximation by shallow ReLU networks

  • paper_url: http://arxiv.org/abs/2307.15772
  • repo_url: None
  • paper_authors: Ronald DeVore, Robert D. Nowak, Rahul Parhi, Jonathan W. Siegel
  • for: 本研究探讨了使用单层ReLU神经网络 approximate 函数 $f$ 在固定域 $\Omega$ 的表示方法。
  • methods: 本研究使用了单层ReLU神经网络来approximate 函数 $f$ 的表示方法。
  • results: 研究发现,使用这种方法可以在固定域 $\Omega$ 上获得更高精度的函数表示,并且这些表示的精度不受维度的影响。
    Abstract We investigate the approximation of functions $f$ on a bounded domain $\Omega\subset \mathbb{R}^d$ by the outputs of single-hidden-layer ReLU neural networks of width $n$. This form of nonlinear $n$-term dictionary approximation has been intensely studied since it is the simplest case of neural network approximation (NNA). There are several celebrated approximation results for this form of NNA that introduce novel model classes of functions on $\Omega$ whose approximation rates avoid the curse of dimensionality. These novel classes include Barron classes, and classes based on sparsity or variation such as the Radon-domain BV classes. The present paper is concerned with the definition of these novel model classes on domains $\Omega$. The current definition of these model classes does not depend on the domain $\Omega$. A new and more proper definition of model classes on domains is given by introducing the concept of weighted variation spaces. These new model classes are intrinsic to the domain itself. The importance of these new model classes is that they are strictly larger than the classical (domain-independent) classes. Yet, it is shown that they maintain the same NNA rates.
    摘要 我们研究函数 $f$ 在受限区域 $\Omega$ 上的近似方法,使用单层ReLU神经网络的输出。这种单一神经网络近似方法已经广泛研究,因为它是最简单的神经网络近似方法(NNA)的案例。有很多著名的近似结果,包括Barron类和基于稀疏或变化的类,如Radon域BV类。本文关注这些新的模型类在领域 $\Omega$ 上的定义。现有的定义不依赖于领域 $\Omega$。我们引入权重变化空间的概念,并提出一种新的、更加适当的模型类定义。这些新的模型类与传统的领域独立的类相比,更加强大,但它们保持了同样的NNA率。

The Hydra Effect: Emergent Self-repair in Language Model Computations

  • paper_url: http://arxiv.org/abs/2307.15771
  • repo_url: None
  • paper_authors: Thomas McGrath, Matthew Rahtz, Janos Kramar, Vladimir Mikulik, Shane Legg
  • for: 本研究使用 causal 分析探讨语言模型计算的内部结构。
  • methods: 研究使用了ablation studying和counterfactual reasoning来探讨语言模型层次结构的作用。
  • results: 研究发现,语言模型层次结构具有adaptive computation和counterbalancing功能,即一种叫做“响应层补做”的现象,以及一种叫做“较量层下降”的现象。这些效果存在于不含dropout的语言模型中,并且层次结构相对较为松散。这些结果对于语言模型的审计和归因具有重要意义。
    Abstract We investigate the internal structure of language model computations using causal analysis and demonstrate two motifs: (1) a form of adaptive computation where ablations of one attention layer of a language model cause another layer to compensate (which we term the Hydra effect) and (2) a counterbalancing function of late MLP layers that act to downregulate the maximum-likelihood token. Our ablation studies demonstrate that language model layers are typically relatively loosely coupled (ablations to one layer only affect a small number of downstream layers). Surprisingly, these effects occur even in language models trained without any form of dropout. We analyse these effects in the context of factual recall and consider their implications for circuit-level attribution in language models.
    摘要 我们使用 causal 分析 investigate 语言模型计算的内部结构,并发现了两种模式:(1)一种适应计算,其中剪除一层注意力层会导致另一层补偿(我们称之为“哈迪拉效应”),以及(2)一种延迟多层扩散(MLP)层的抵消功能,它会降低最大可能性的 Token。我们的剪除研究表明,语言模型层通常是相对松散 Coupled(剪除一层只会影响一小部分下游层)。奇怪的是,这些效果会在没有任何dropout的情况下出现。我们在事实记忆中分析这些效果,并考虑它们对语言模型的征ircuit-level 归因的影响。

Goodness-of-Fit of Attributed Probabilistic Graph Generative Models

  • paper_url: http://arxiv.org/abs/2308.03773
  • repo_url: None
  • paper_authors: Pablo Robles-Granda, Katherine Tsai, Oluwasanmi Koyejo
  • for: 这篇论文主要用于描述如何评估Random Attributed Graph模型的合适性。
  • methods: 该论文使用了 Mean Square Contingency Coefficient 来评估模型的合适性,并提供了一种验证过程来确保模型的结构具有最低的偏差。
  • results: 该论文通过应用这些标准来验证各种流行的图模型的表示能力。
    Abstract Probabilistic generative models of graphs are important tools that enable representation and sampling. Many recent works have created probabilistic models of graphs that are capable of representing not only entity interactions but also their attributes. However, given a generative model of random attributed graph(s), the general conditions that establish goodness of fit are not clear a-priori. In this paper, we define goodness of fit in terms of the mean square contingency coefficient for random binary networks. For this statistic, we outline a procedure for assessing the quality of the structure of a learned attributed graph by ensuring that the discrepancy of the mean square contingency coefficient (constant, or random) is minimal with high probability. We apply these criteria to verify the representation capability of a probabilistic generative model for various popular types of graph models.
    摘要 probabilistic生成模型是重要工具,它们可以 representation和采样。在最近的许多研究中,人们已经创建了可以表示不只是实体交互,还有属性的 probabilistic模型。然而,给定一个生成模型的random attributed graph,通用的goodness of fit的条件并不明确。在这篇论文中,我们定义goodness of fit为random binary network的mean square contingency coefficient的 Statistics。我们详细说明了验证学习 attributed graph的结构质量的方法,确保discrepancy的mean square contingency coefficient(随机或常数)是最小的,并且具有高概率。我们应用这些标准来验证不同类型的图模型的表示能力。

Resume Evaluation through Latent Dirichlet Allocation and Natural Language Processing for Effective Candidate Selection

  • paper_url: http://arxiv.org/abs/2307.15752
  • repo_url: None
  • paper_authors: Vidhita Jagwani, Smit Meghani, Krishna Pai, Sudhir Dhage
  • for: 这 paper 是为了提出一种基于 Latent Dirichlet Allocation (LDA) 和 SpaCy 实体检测的简历评分方法。
  • methods: 该方法首先使用 SpaCy 的Named Entity Recognition (NER) 提取简历中的相关实体,例如教育、工作经验和技能。然后,LDA 模型使用这些实体对简历进行评分,并将每个实体分配一个主题概率。
  • results: 我们的提出的系统使用 LDA 分解简历为 latent topics,并提取有意义的 semantic representations。在尝试使用只考虑技能的情况下,我们的模型达到了 77% 的准确率;在考虑所有属性的情况下,我们的模型达到了 82% 的准确率。
    Abstract In this paper, we propose a method for resume rating using Latent Dirichlet Allocation (LDA) and entity detection with SpaCy. The proposed method first extracts relevant entities such as education, experience, and skills from the resume using SpaCy's Named Entity Recognition (NER). The LDA model then uses these entities to rate the resume by assigning topic probabilities to each entity. Furthermore, we conduct a detailed analysis of the entity detection using SpaCy's NER and report its evaluation metrics. Using LDA, our proposed system breaks down resumes into latent topics and extracts meaningful semantic representations. With a vision to define our resume score to be more content-driven rather than a structure and keyword match driven, our model has achieved 77% accuracy with respect to only skills in consideration and an overall 82% accuracy with all attributes in consideration. (like college name, work experience, degree and skills)
    摘要 在这篇论文中,我们提出了一种使用Latent Dirichlet Allocation(LDA)和实体检测(SpaCy)来评分简历的方法。我们的方法首先从简历中提取有关的实体,如教育、经验和技能,使用SpaCy的命名实体识别(NER)。然后,LDA模型使用这些实体来评分简历,并将每个实体分配话题概率。此外,我们还进行了NER的实体检测的详细分析,并发布了评估指标。使用LDA,我们的提案系统将简历分解成了隐藏主题,并提取了有意义的语义表示。我们的模型的目标是通过对简历的内容进行评估,而不是仅仅是结构和关键词匹配,因此我们的模型在只考虑技能方面达到了77%的准确率,而在所有属性方面达到了82%的准确率。(包括学院名、工作经验、学位和技能)

How regularization affects the geometry of loss functions

  • paper_url: http://arxiv.org/abs/2307.15744
  • repo_url: None
  • paper_authors: Nathaniel Bottman, Y. Cooper, Antonio Lerario
  • for: 研究深度神经网络如何学习,即使用不同的正则化方法。
  • methods: 研究不同的正则化方法如何改变损失函数的几何结构。
  • results: 发现在权重 decay 等正则化方法下,损失函数可能会变成 Morse 函数,这意味着神经网络可能会更好地学习。
    Abstract What neural networks learn depends fundamentally on the geometry of the underlying loss function. We study how different regularizers affect the geometry of this function. One of the most basic geometric properties of a smooth function is whether it is Morse or not. For nonlinear deep neural networks, the unregularized loss function $L$ is typically not Morse. We consider several different regularizers, including weight decay, and study for which regularizers the regularized function $L_\epsilon$ becomes Morse.
    摘要 neuronal networks 的学习听取函数的geometry的基本属性。我们研究不同的正则化对函数的geometry的影响。Morse函数是一种最基本的几何属性,我们研究非线性深度神经网络中未正则化的损失函数$L$是否为Morse函数。我们考虑了多种正则化器,包括权重减少,并研究哪些正则化器使得正则化后的函数$L_\epsilon$变为Morse函数。Note: "Morse function" is a mathematical concept, not a term commonly used in deep learning. In this text, it is used to refer to a smooth function that has a single global minimum.Here's the translation with some additional explanations:neuronal networks 的学习听取函数的geometry的基本属性。我们研究不同的正则化对函数的geometry的影响。Morse函数是一种最基本的几何属性,我们研究非线性深度神经网络中未正则化的损失函数$L$是否为Morse函数。我们考虑了多种正则化器,包括权重减少,并研究哪些正则化器使得正则化后的函数$L_\epsilon$变为Morse函数。In this text, the authors are studying the geometry of the loss function in deep neural networks, specifically how different regularizers affect the geometry of the function. They use the concept of a Morse function, which is a smooth function that has a single global minimum, to describe the geometry of the loss function. They consider various regularizers, including weight decay, and investigate which regularizers cause the regularized function $L_\epsilon$ to become a Morse function.

Quantum-noise-limited optical neural networks operating at a few quanta per activation

  • paper_url: http://arxiv.org/abs/2307.15712
  • repo_url: None
  • paper_authors: Shi-Yuan Ma, Tianyu Wang, Jérémie Laydevant, Logan G. Wright, Peter L. McMahon
  • for: 这个论文是研究激光神经网络在低功率 режи响应下的性能的,特别是在层次结构中使用单 photon来触发神经元的情况下。
  • methods: 作者使用了一种直接模型摄像头检测器的随机行为来训练激光神经网络,以实现高精度的图像分类任务。
  • results: 实验结果显示,使用这种方法可以在低功率 régime下实现高精度的图像分类,并且使用的光能量相对于前一个state-of-the-art低光能量示范项目减少了>40倍。
    Abstract Analog physical neural networks, which hold promise for improved energy efficiency and speed compared to digital electronic neural networks, are nevertheless typically operated in a relatively high-power regime so that the signal-to-noise ratio (SNR) is large (>10). What happens if an analog system is instead operated in an ultra-low-power regime, in which the behavior of the system becomes highly stochastic and the noise is no longer a small perturbation on the signal? In this paper, we study this question in the setting of optical neural networks operated in the limit where some layers use only a single photon to cause a neuron activation. Neuron activations in this limit are dominated by quantum noise from the fundamentally probabilistic nature of single-photon detection of weak optical signals. We show that it is possible to train stochastic optical neural networks to perform deterministic image-classification tasks with high accuracy in spite of the extremely high noise (SNR ~ 1) by using a training procedure that directly models the stochastic behavior of photodetection. We experimentally demonstrated MNIST classification with a test accuracy of 98% using an optical neural network with a hidden layer operating in the single-photon regime; the optical energy used to perform the classification corresponds to 0.008 photons per multiply-accumulate (MAC) operation, which is equivalent to 0.003 attojoules of optical energy per MAC. Our experiment used >40x fewer photons per inference than previous state-of-the-art low-optical-energy demonstrations, to achieve the same accuracy of >90%. Our work shows that some extremely stochastic analog systems, including those operating in the limit where quantum noise dominates, can nevertheless be used as layers in neural networks that deterministically perform classification tasks with high accuracy if they are appropriately trained.
    摘要 аналог物理神经网络,它们在能效率和速度方面比数字电子神经网络有前景,然而通常在高功率 режи干(SNR > 10)下运行。如果这样的 аналог系统反而在超低功率 режи干下运行,那么系统的行为会变得极其抽象和随机,而噪声不再是信号的小杂音。在这篇论文中,我们研究了这个问题,具体来说是在使用单 photon 来触发神经元的光学神经网络中。在这种情况下,神经元活动受到光学信号的渐进性和单 photon 的探测的抽象性的限制。我们表明,可以通过直接模型单 photon 探测的随机行为来训练随机光学神经网络,以实现高精度的图像分类任务。我们在 MNIST 分类任务上进行了实验,测试精度达 98%,使用的光学能量为 0.008 photons/MAC 操作,相当于 0.003 attojoules/MAC 的光学能量。我们的实验使用了 >40x fewer photons per inference than previous state-of-the-art low-optical-energy demonstrations,以达到同样的准确率(>90%)。我们的工作表明,一些极其随机的 аналог系统,包括在噪声dominates的情况下,可以作为神经网络的层使用,以实现高精度的图像分类任务,只要采用相应的训练方法。

Semi-Supervised Object Detection in the Open World

  • paper_url: http://arxiv.org/abs/2307.15710
  • repo_url: None
  • paper_authors: Garvita Allabadi, Ana Lucic, Peter Pao-Huang, Yu-Xiong Wang, Vikram Adve
  • for: 本研究旨在 Addressing the challenges of open-world semi-supervised object detection, where the model must detect out-of-distribution (OOD) samples and learn from both in-distribution (ID) and OOD data.
  • methods: 我们提出了 Open World Semi-supervised Detection 框架 (OWSSD), which combines an OOD detector based on lightweight auto-encoder networks trained only on ID data, along with a semi-supervised learning pipeline that learns from both ID and OOD data.
  • results: 我们通过广泛的评估表明,我们的方法可以与现状最佳的OOD检测算法竞争,同时也可以在开放世界场景下提高 semi-supervised 学习性能。
    Abstract Existing approaches for semi-supervised object detection assume a fixed set of classes present in training and unlabeled datasets, i.e., in-distribution (ID) data. The performance of these techniques significantly degrades when these techniques are deployed in the open-world, due to the fact that the unlabeled and test data may contain objects that were not seen during training, i.e., out-of-distribution (OOD) data. The two key questions that we explore in this paper are: can we detect these OOD samples and if so, can we learn from them? With these considerations in mind, we propose the Open World Semi-supervised Detection framework (OWSSD) that effectively detects OOD data along with a semi-supervised learning pipeline that learns from both ID and OOD data. We introduce an ensemble based OOD detector consisting of lightweight auto-encoder networks trained only on ID data. Through extensive evalulation, we demonstrate that our method performs competitively against state-of-the-art OOD detection algorithms and also significantly boosts the semi-supervised learning performance in open-world scenarios.
    摘要 现有的半超vised对象检测方法假设训练和无标据数据集中的类集是固定的,即在distribution(ID)数据。这些技术在开放世界中部署时,其性能会受到很大降低,因为测试和无标据数据可能包含训练中没有看到的对象,即out-of-distribution(OOD)数据。我们在这篇文章中考虑了两个关键问题:我们可以检测OOD样本,并且如果可以,我们可以学习它们吗?为此,我们提出了开放世界半超vised检测框架(OWSSD),可以有效地检测OOD数据,同时还可以通过半超vised学习来学习ID和OOD数据。我们提出了一个ensemble基于自动编码网络,该网络只在ID数据上训练。经过广泛的评估,我们发现我们的方法可以与当前的OOD检测算法竞争,同时还可以在开放世界 scenarios中显著提高半超vised学习性能。

Uncertainty in Natural Language Generation: From Theory to Applications

  • paper_url: http://arxiv.org/abs/2307.15703
  • repo_url: https://github.com/Rastaman4e/-1
  • paper_authors: Joris Baan, Nico Daheim, Evgenia Ilia, Dennis Ulmer, Haau-Sing Li, Raquel Fernández, Barbara Plank, Rico Sennrich, Chrysoula Zerva, Wilker Aziz
  • for: 本研究旨在提高自然语言生成(NLG)系统的可靠性和可靠性,使其能够更好地满足人们的需求。
  • methods: 本文提出了一种基于理论的不确定处理方法,以提高NLG系统的可靠性和多样性。
  • results: 本研究提出了一种两维分类方法,可以更好地捕捉NLG系统中的不确定性。此外,本文还提出了一些实际应用的研究方向,如推理、自我评估、活动学习等。
    Abstract Recent advances of powerful Language Models have allowed Natural Language Generation (NLG) to emerge as an important technology that can not only perform traditional tasks like summarisation or translation, but also serve as a natural language interface to a variety of applications. As such, it is crucial that NLG systems are trustworthy and reliable, for example by indicating when they are likely to be wrong; and supporting multiple views, backgrounds and writing styles -- reflecting diverse human sub-populations. In this paper, we argue that a principled treatment of uncertainty can assist in creating systems and evaluation protocols better aligned with these goals. We first present the fundamental theory, frameworks and vocabulary required to represent uncertainty. We then characterise the main sources of uncertainty in NLG from a linguistic perspective, and propose a two-dimensional taxonomy that is more informative and faithful than the popular aleatoric/epistemic dichotomy. Finally, we move from theory to applications and highlight exciting research directions that exploit uncertainty to power decoding, controllable generation, self-assessment, selective answering, active learning and more.
    摘要 We first present the fundamental theory, frameworks, and vocabulary required to represent uncertainty. We then characterize the main sources of uncertainty in NLG from a linguistic perspective and propose a two-dimensional taxonomy that is more informative and faithful than the popular aleatoric/epistemic dichotomy. Finally, we move from theory to applications and highlight exciting research directions that exploit uncertainty to power decoding, controllable generation, self-assessment, selective answering, active learning, and more.Here's the text in Simplified Chinese:近期,具有强大语言模型的进步,使自然语言生成(NLG)成为一种重要的技术,不仅可以完成传统任务如概要或翻译,而且可以作为一种自然语言界面,与多种应用集成。因此,NLG系统需要可靠和可信,例如指示它们可能错误的时候,并支持多个视角、背景和写作风格——反映人类子 популяции的多样性。在这篇论文中,我们 argue That a principled treatment of uncertainty can assist in creating systems and evaluation protocols better aligned with these goals.我们首先介绍了需要表示不确定性的基本理论、框架和术语。然后,我们从语言学角度描述了NLG中的主要不确定性来源,并提出了一个两个维度的分类,比较有用和忠实于流行的 aleatoric/epistemic 对立。最后,我们从理论到应用,高亮了利用不确定性来实现排码、可控生成、自我评估、选择答案、活动学习等研究方向。

Universal Recurrent Event Memories for Streaming Data

  • paper_url: http://arxiv.org/abs/2307.15694
  • repo_url: None
  • paper_authors: Ran Dou, Jose Principe
  • for: 这个论文提出了一种新的事件记忆架构(MemNet),用于 Recurrent Neural Networks(RNN),可以处理不同类型的时间序列数据,包括标量、多变量和符号时间序列。
  • methods: MemNet 使用键值对来存储信息,这种方式可以提高表示力,同时也避免了模型状态构建的缺点。 MemNet 使用线性适应映射函数实现非线性运算。
  • results: MemNet 可以应用于不同的应用领域,包括混沌时间序列、符号运算任务和问答任务(bAbI),并在所有应用领域中达到了状态对抗网络和外部储存网络的性能水平。 MemNet 需要 fewer 的训练参数和更小的空间复杂度,使得注意机制更加有效率,开门到 IoT 应用领域。
    Abstract In this paper, we propose a new event memory architecture (MemNet) for recurrent neural networks, which is universal for different types of time series data such as scalar, multivariate or symbolic. Unlike other external neural memory architectures, it stores key-value pairs, which separate the information for addressing and for content to improve the representation, as in the digital archetype. Moreover, the key-value pairs also avoid the compromise between memory depth and resolution that applies to memories constructed by the model state. One of the MemNet key characteristics is that it requires only linear adaptive mapping functions while implementing a nonlinear operation on the input data. MemNet architecture can be applied without modifications to scalar time series, logic operators on strings, and also to natural language processing, providing state-of-the-art results in all application domains such as the chaotic time series, the symbolic operation tasks, and the question-answering tasks (bAbI). Finally, controlled by five linear layers, MemNet requires a much smaller number of training parameters than other external memory networks as well as the transformer network. The space complexity of MemNet equals a single self-attention layer. It greatly improves the efficiency of the attention mechanism and opens the door for IoT applications.
    摘要 在这篇论文中,我们提出了一种新的事件记忆架构(MemNet),用于逻辑神经网络,可以 universal 地应用于不同类型的时间序列数据,如scalar、多变量或 симвоlic。与其他外部神经网络记忆架构不同,MemNet 存储 key-value 对,以分离地址和内容信息,从而提高表示能力,类似于数字原型。此外,key-value 对还避免了模型状态构建的记忆深度和分辨率之间的compromise。MemNet 的一个关键特点是只需要线性适应映射函数,实现输入数据的非线性操作。MemNet 架构可以无需修改应用于scalar时间序列、逻辑运算符 strings 和自然语言处理等领域,并在所有应用领域中达到了state-of-the-art 结果,如混沌时间序列、符号操作任务和问答任务(bAbI)。最后,由五个线性层控制,MemNet 需要训练参数的数量比其他外部记忆网络和 transformer 网络要少得多。MemNet 的空间复杂度等于单个自我注意层。它大大提高了注意机制的效率,开启了对 IoT 应用的大门。

ODTlearn: A Package for Learning Optimal Decision Trees for Prediction and Prescription

  • paper_url: http://arxiv.org/abs/2307.15691
  • repo_url: https://github.com/d3m-research-group/odtlearn
  • paper_authors: Patrick Vossler, Sina Aghaei, Nathan Justin, Nathanael Jo, Andrés Gómez, Phebe Vayanos
  • for: 这个论文主要针对高风险预测和规划任务中的优化决策树问题,提供了一个基于杂Integer优化(MIO)框架的开源Python包。
  • methods: 论文提出了一种基于MIO框架的优化决策树算法,以及其 extensions。包括优化分类树、优化公平分类树、分类树对 distribuition shift 的Robustness、和优化规划树等。
  • results: 论文提供了一个开源的Python包,名为ODTLearn,可以帮助用户快速地学习优化决策树。包括对 observational data 的学习、分类和规划等任务。
    Abstract ODTLearn is an open-source Python package that provides methods for learning optimal decision trees for high-stakes predictive and prescriptive tasks based on the mixed-integer optimization (MIO) framework proposed in Aghaei et al. (2019) and several of its extensions. The current version of the package provides implementations for learning optimal classification trees, optimal fair classification trees, optimal classification trees robust to distribution shifts, and optimal prescriptive trees from observational data. We have designed the package to be easy to maintain and extend as new optimal decision tree problem classes, reformulation strategies, and solution algorithms are introduced. To this end, the package follows object-oriented design principles and supports both commercial (Gurobi) and open source (COIN-OR branch and cut) solvers. The package documentation and an extensive user guide can be found at https://d3m-research-group.github.io/odtlearn/. Additionally, users can view the package source code and submit feature requests and bug reports by visiting https://github.com/D3M-Research-Group/odtlearn.
    摘要 ODTLearn 是一个开源的 Python 包,提供了用于学习优化决策树的方法,用于高风险预测和指导任务基于混合整数优化(MIO)框架,如 Aghaei et al. (2019) 等扩展。目前版本的包提供了学习优化分类树、优化公平分类树、分类树对分布变化强度 Robust 和指导树的实现。我们设计了该包,以便轻松维护和扩展,随着新的优化决策树问题类型、重新表述策略和解决算法的引入。为此,该包遵循 объек oriented 设计原则,并支持商业(Gurobi)和开源(COIN-OR branch and cut)解决方案。包的文档和详细用户指南可以在 https://d3m-research-group.github.io/odtlearn/ 找到。此外,用户可以在 https://github.com/D3M-Research-Group/odtlearn 查看包源代码,提交功能需求和错误报告。

AI for Anticipatory Action: Moving Beyond Climate Forecasting

  • paper_url: http://arxiv.org/abs/2307.15727
  • repo_url: None
  • paper_authors: Benjamin Q. Huynh, Mathew V. Kiang
  • for: 该论文主要旨在探讨气候预测转向预先行动的趋势,以及机器学习模型在气候预测中的应用和挑战。
  • methods: 论文详细介绍了预先行动的概念和实践,并评估了现有机器学习模型在气候预测中的应用。
  • results: 论文指出,机器学习模型在气候预测中具有极高的准确率和可靠性,但在实现预先行动方面存在一些挑战和限制。
    Abstract Disaster response agencies have been shifting from a paradigm of climate forecasting towards one of anticipatory action: assessing not just what the climate will be, but how it will impact specific populations, thereby enabling proactive response and resource allocation. Machine learning models are becoming exceptionally powerful at climate forecasting, but methodological gaps remain in terms of facilitating anticipatory action. Here we provide an overview of anticipatory action, review relevant applications of machine learning, identify common challenges, and highlight areas where machine learning can uniquely contribute to advancing disaster response for populations most vulnerable to climate change.
    摘要 气候灾害机构正在从气候预测 парадигshift towards一个anticipatory action:评估不仅气候将如何发展,而且如何影响特定的人口,从而实现先进的应急响应和资源分配。机器学习模型在气候预测方面已经非常强大,但在实现anticipatory action方面还存在方法学挑战。本文提供了anticipatory action的概述,评估了相关的机器学习应用,描述了常见的挑战,并强调机器学习在对气候变化最容易受影响的人口进行应对方面的独特贡献。

Benchmarking Offline Reinforcement Learning on Real-Robot Hardware

  • paper_url: http://arxiv.org/abs/2307.15690
  • repo_url: https://github.com/rr-learning/trifinger_rl_datasets
  • paper_authors: Nico Gürtler, Sebastian Blaes, Pavel Kolev, Felix Widmaier, Manuel Wüthrich, Stefan Bauer, Bernhard Schölkopf, Georg Martius
  • for: 这篇论文旨在提出一个关于基于先前记录的数据学习的策略,用于实际Robotics任务。
  • methods: 论文使用大量多样数据和离线权威学习来解决dexterous manipulation问题。
  • results: 论文提供了一个大量数据的集合,包括在离线学习中学习的策略,以及一个可以在实际Robotics系统和模拟器上调试的选项。
    Abstract Learning policies from previously recorded data is a promising direction for real-world robotics tasks, as online learning is often infeasible. Dexterous manipulation in particular remains an open problem in its general form. The combination of offline reinforcement learning with large diverse datasets, however, has the potential to lead to a breakthrough in this challenging domain analogously to the rapid progress made in supervised learning in recent years. To coordinate the efforts of the research community toward tackling this problem, we propose a benchmark including: i) a large collection of data for offline learning from a dexterous manipulation platform on two tasks, obtained with capable RL agents trained in simulation; ii) the option to execute learned policies on a real-world robotic system and a simulation for efficient debugging. We evaluate prominent open-sourced offline reinforcement learning algorithms on the datasets and provide a reproducible experimental setup for offline reinforcement learning on real systems.
    摘要 学习政策从前期录制的数据中提取知识是一个有前途的方向,因为在线学习经常不可能。灵活的操作特别是一个打开的问题。 combining offline reinforcement learning with large and diverse datasets has the potential to make significant progress in this challenging domain, much like the rapid progress made in supervised learning in recent years. To coordinate the efforts of the research community towards tackling this problem, we propose a benchmark that includes:* A large collection of data for offline learning from a dexterous manipulation platform on two tasks, obtained with capable RL agents trained in simulation;* The option to execute learned policies on a real-world robotic system and a simulation for efficient debugging.We evaluate prominent open-sourced offline reinforcement learning algorithms on the datasets and provide a reproducible experimental setup for offline reinforcement learning on real systems.

A supervised hybrid quantum machine learning solution to the emergency escape routing problem

  • paper_url: http://arxiv.org/abs/2307.15682
  • repo_url: None
  • paper_authors: Nathan Haboury, Mo Kordzanganeh, Sebastian Schmitt, Ayush Joshi, Igor Tokarev, Lukas Abdallah, Andrii Kurkin, Basil Kyriacou, Alexey Melnikov
    methods: 该论文使用了一种新的混合监督学习方法,其包括一个量子神经网络并与一个经典神经网络并行运行。results: 该研究表明,使用混合监督学习方法可以提高急救规划的准确率,相比之下,纯经典监督学习方法的准确率仅高于7%。此外,研究还表明,量子神经网络在预测中占据了45.(3)%的比重。
    Abstract Managing the response to natural disasters effectively can considerably mitigate their devastating impact. This work explores the potential of using supervised hybrid quantum machine learning to optimize emergency evacuation plans for cars during natural disasters. The study focuses on earthquake emergencies and models the problem as a dynamic computational graph where an earthquake damages an area of a city. The residents seek to evacuate the city by reaching the exit points where traffic congestion occurs. The situation is modeled as a shortest-path problem on an uncertain and dynamically evolving map. We propose a novel hybrid supervised learning approach and test it on hypothetical situations on a concrete city graph. This approach uses a novel quantum feature-wise linear modulation (FiLM) neural network parallel to a classical FiLM network to imitate Dijkstra's node-wise shortest path algorithm on a deterministic dynamic graph. Adding the quantum neural network in parallel increases the overall model's expressivity by splitting the dataset's harmonic and non-harmonic features between the quantum and classical components. The hybrid supervised learning agent is trained on a dataset of Dijkstra's shortest paths and can successfully learn the navigation task. The hybrid quantum network improves over the purely classical supervised learning approach by 7% in accuracy. We show that the quantum part has a significant contribution of 45.(3)% to the prediction and that the network could be executed on an ion-based quantum computer. The results demonstrate the potential of supervised hybrid quantum machine learning in improving emergency evacuation planning during natural disasters.
    摘要 naturale 灾害的回应可以优化它们的影响,这个工作探讨使用监督式量子机器学习来优化自然灾害时的紧急避难计划。研究专注在地震紧急情况下,模型问题为一个动态计算图,地震会对城市区域造成破坏。居民尝试通过到达城市边缘的出口点,以避免交通堵塞。这个问题被模型为一个短est-path问题,在一个不确定和动态变化的地图上。我们提出了一种新的复合监督学习方法,并在假设情况下进行了实验。这种方法使用了一个新的量子特征wise线性调整(FiLM)神经网络,与一个 классиical FiLM 神经网络并行,以模仿迪克斯特拉的节点短est-path算法。将量子神经网络加入平行增加了整个模型的表达能力,并将数据集的几何和非几何特征分别分配到量子和классиical ком成分中。复合监督学习代理被训练在一个短est-path 数据集上,并成功学习到 Navigation 任务。复合量子网络与仅使用классиical 监督学习方法相比,提高了7%的准确性。我们显示出量子部分对预测的贡献为45.(3)%,并且显示了这个网络可以在钠基数量子电脑上执行。结果显示了超级监督量子机器学习在自然灾害时的紧急避难规划中的潜力。

Benchmarking Anomaly Detection System on various Jetson Edge Devices

  • paper_url: http://arxiv.org/abs/2307.16834
  • repo_url: None
  • paper_authors: Hoang Viet Pham, Thinh Gia Tran, Chuong Dinh Le, An Dinh Le, Hien Bich Vo
    for:This paper focuses on developing an end-to-end crime-scene anomaly detection system using weakly supervised video anomaly detection called Robust Temporal Feature Magnitude Learning (RTFM) and edge computing technology.methods:The system uses edge computing technology and TensorRT as the software developer kit from NVIDIA for system performance enhancement, and is tested directly on multiple Jetson edge devices with Docker technology.results:The anomaly detection model yields competitive results compared to other state-of-the-art (SOTA) algorithms on available datasets such as UCF-Crime and UIT VNAnomaly, with an inference speed of 47.56 frames per second (FPS) on a Jetson edge device with only 3.11 GB RAM usage total. Additionally, the AI system achieves 15% better performance than the previous version of Jetson devices while consuming 50% less energy power.Here is the format you requested:for: 这篇论文关注开发一个结束点犯罪场景异常检测系统,使用弱有监督视频异常检测方法 called Robust Temporal Feature Magnitude Learning (RTFM) 和边缘计算技术。methods: 该系统使用边缘计算技术和NVIDIA的TensorRT软件开发工具包进行性能优化,并直接在多个Jetson边缘设备上进行测试,使用Docker技术进行系统部署。results: 异常检测模型与其他状态艺术算法在可用的 datasets such as UCF-Crime和UIT VNAnomaly 上达到竞争性的结果,推理速度达到47.56帧每秒 (FPS) 在Jetson边缘设备上,具有只有3.11 GB RAM 的总用量。此外,AI系统在不同的Jetson设备上 achieves 15% 更好的性能,同时占用50% menos的能源电力。
    Abstract Capturing the abnormal event from surveillance videos enhances the safety and well-being of the citizens. The application of EdgeAI (Edge computing-based Artificial Intelligent ) meets the strict latency requirements for security. In this paper, we apply weakly supervised video anomaly detection called Robust Temporal Feature Magnitude Learning (RTFM) to an end-to-end crime-scene anomaly detection system from the surveillance cameras with the help of edge computing technology. The system is tested directly on multiple Jetson edge devices combined with TensorRT as the software developer kit from NVIDIA for system performance enhancement. The experience of an AI-based system deployment on various Jetson Edge devices with Docker technology is also provided. The anomaly detection model yields competitive results compared to other state-of-the-art (SOTA) algorithms on available datasets such as UCF-Crime and UIT VNAnomaly. The approach system reaches 47.56 frames per second (FPS) inference speed on a Jetson edge device with only 3.11 GB RAM usage total. We also discover the promising Jetson device that the AI system achieves 15% better performance than the previous version of Jetson devices while consuming 50% less energy power.
    摘要 capturing the abnormal event from surveillance videos enhances the safety and well-being of the citizens. The application of EdgeAI (Edge computing-based Artificial Intelligent) meets the strict latency requirements for security. In this paper, we apply weakly supervised video anomaly detection called Robust Temporal Feature Magnitude Learning (RTFM) to an end-to-end crime-scene anomaly detection system from the surveillance cameras with the help of edge computing technology. The system is tested directly on multiple Jetson edge devices combined with TensorRT as the software developer kit from NVIDIA for system performance enhancement. The experience of an AI-based system deployment on various Jetson Edge devices with Docker technology is also provided. The anomaly detection model yields competitive results compared to other state-of-the-art (SOTA) algorithms on available datasets such as UCF-Crime and UIT VNAnomaly. The approach system reaches 47.56 frames per second (FPS) inference speed on a Jetson edge device with only 3.11 GB RAM usage total. We also discover the promising Jetson device that the AI system achieves 15% better performance than the previous version of Jetson devices while consuming 50% less energy power.

Dynamic Analysis and an Eigen Initializer for Recurrent Neural Networks

  • paper_url: http://arxiv.org/abs/2307.15679
  • repo_url: None
  • paper_authors: Ran Dou, Jose Principe
  • for: 本研究探讨了深度循环神经网络中隐藏状态的动态行为,尤其是长期依赖问题。
  • methods: 我们采用了一种基于weight矩阵 eigen分解的新视角来分析隐藏状态空间。我们首先使用线性状态空间模型进行分析,并解释了激活函数如何保持信息。我们还对长期依赖进行了解释,并发现了不同任务类型下的独特行为。
  • results: 我们提出了一种新的初始化方法,可以在vanilla-RNN、LSTM和GRU等深度循环神经网络中提高表现。这种初始化方法在多个 datasets 上(如 Tomita Grammars、 pixel-by-pixel MNIST 数据集和 machine translation 数据集)进行了测试,并与 Xavier 初始izer 和 kaiming 初始izer 以及其他 RNN-only 初始izer LIKE IRNN 和 sp-RNN 相比,在多个任务中具有更高的表现。
    Abstract In recurrent neural networks, learning long-term dependency is the main difficulty due to the vanishing and exploding gradient problem. Many researchers are dedicated to solving this issue and they proposed many algorithms. Although these algorithms have achieved great success, understanding how the information decays remains an open problem. In this paper, we study the dynamics of the hidden state in recurrent neural networks. We propose a new perspective to analyze the hidden state space based on an eigen decomposition of the weight matrix. We start the analysis by linear state space model and explain the function of preserving information in activation functions. We provide an explanation for long-term dependency based on the eigen analysis. We also point out the different behavior of eigenvalues for regression tasks and classification tasks. From the observations on well-trained recurrent neural networks, we proposed a new initialization method for recurrent neural networks, which improves consistently performance. It can be applied to vanilla-RNN, LSTM, and GRU. We test on many datasets, such as Tomita Grammars, pixel-by-pixel MNIST datasets, and machine translation datasets (Multi30k). It outperforms the Xavier initializer and kaiming initializer as well as other RNN-only initializers like IRNN and sp-RNN in several tasks.
    摘要 在回归神经网络中,长期依赖是主要挑战,主要因为衰减和爆炸梯度问题。许多研究人员努力解决这个问题,并提出了多种算法。尽管这些算法取得了很大成功,但我们还未完全理解信息如何衰减。在这篇论文中,我们研究了回归神经网络中隐藏状态的动态。我们提出了一新的视角来分析隐藏状态空间,基于权重矩阵的归一化分解。我们从线性状态空间模型开始,解释隐藏状态中的信息保持功能,并对长期依赖进行解释。我们还发现了不同任务类型的欧拉值之间的差异。从已经训练过的回归神经网络的观察结果来看,我们提出了一种新的初始化方法,可以提高回归神经网络的性能。它可以应用于普通RNN、LSTM和GRU。我们在多个数据集上进行了测试,包括Tomita Grammar、像素级MNIST数据集和机器翻译数据集(Multi30k),并且在多个任务上超越了Xavier初始化器、kaiming初始izer以及其他RNN专用的初始化器如IRNN和sp-RNN。

Case Studies of Causal Discovery from IT Monitoring Time Series

  • paper_url: http://arxiv.org/abs/2307.15678
  • repo_url: https://github.com/Aryia-Behroziuan/References
  • paper_authors: Ali Aït-Bachir, Charles K. Assaad, Christophe de Bignicourt, Emilie Devijver, Simon Ferreira, Eric Gaussier, Hosein Mohanna, Lei Zan
  • for: 这篇论文是为了探讨在现代企业中IT系统的监控和缓解问题,以及通过对历史数据进行分析,预测未来问题的可能性。
  • methods: 这篇论文使用了 causal discovery 算法来分析 IT 监控数据,并提出了一些对应的挑战,如时序列不对齐、睡眠时序列、时间戳错误和缺失值等。
  • results: 该论文通过对不同 IT 监控数据集的应用,显示了 causal discovery 算法的好处,但也描述了当前的挑战和未解决的问题。
    Abstract Information technology (IT) systems are vital for modern businesses, handling data storage, communication, and process automation. Monitoring these systems is crucial for their proper functioning and efficiency, as it allows collecting extensive observational time series data for analysis. The interest in causal discovery is growing in IT monitoring systems as knowing causal relations between different components of the IT system helps in reducing downtime, enhancing system performance and identifying root causes of anomalies and incidents. It also allows proactive prediction of future issues through historical data analysis. Despite its potential benefits, applying causal discovery algorithms on IT monitoring data poses challenges, due to the complexity of the data. For instance, IT monitoring data often contains misaligned time series, sleeping time series, timestamp errors and missing values. This paper presents case studies on applying causal discovery algorithms to different IT monitoring datasets, highlighting benefits and ongoing challenges.
    摘要 信息技术(IT)系统是现代企业的重要组成部分,负责数据存储、通信和自动化进程。监测这些系统非常重要,因为它可以收集广泛的观察时间序列数据,用于分析。在IT监测系统中,探索 causal 关系的兴趣在增长,因为它可以帮助降低系统停机时间、提高系统性能和识别异常和事件的根本原因。此外,它还允许预测未来问题的预测通过历史数据分析。虽然它拥有很多利点,但是应用 causal 探索算法在IT监测数据中存在很多挑战,例如IT监测数据中时间序列偏移、睡眠时间序列、时间戳错误和缺失值。本文通过不同的IT监测数据集的案例研究,highlights 这些挑战和继续挑战。

Adversarial training for tabular data with attack propagation

  • paper_url: http://arxiv.org/abs/2307.15677
  • repo_url: None
  • paper_authors: Tiago Leon Melo, João Bravo, Marco O. P. Sampaio, Paolo Romano, Hugo Ferreira, João Tiago Ascensão, Pedro Bizarro
  • For: 防止机器学习模型受到攻击,防止恶意攻击者误导模型为非法活动预测为合法,降低系统维护人员的劳动负担。* Methods: 提出了一种新的对抗训练方法,在训练循环中带动攻击在两个空间中传播。* Results: 通过实验表明,该方法可以防止约30%的性能下降,并在非常攻击性下是必要的,但是存在一定的性能损失。
    Abstract Adversarial attacks are a major concern in security-centered applications, where malicious actors continuously try to mislead Machine Learning (ML) models into wrongly classifying fraudulent activity as legitimate, whereas system maintainers try to stop them. Adversarially training ML models that are robust against such attacks can prevent business losses and reduce the work load of system maintainers. In such applications data is often tabular and the space available for attackers to manipulate undergoes complex feature engineering transformations, to provide useful signals for model training, to a space attackers cannot access. Thus, we propose a new form of adversarial training where attacks are propagated between the two spaces in the training loop. We then test this method empirically on a real world dataset in the domain of credit card fraud detection. We show that our method can prevent about 30% performance drops under moderate attacks and is essential under very aggressive attacks, with a trade-off loss in performance under no attacks smaller than 7%.
    摘要 “对于安全应用程序而言,对抗攻击是一项重要的挑战,恶意攻击者不断尝试让机器学习(ML)模型错误地分类为合法的活动,而系统维护人员则努力阻止他们。使 ML 模型通过对抗训练得到鲜度的Robustness可以防止业务损失和减轻系统维护人员的劳重。在这些应用程序中,数据经常是表格式的,攻击者可以在复杂的特征工程转换下进行操作,以提供有用的信号 для模型训练。因此,我们提出了一种新的对抗训练方法,在训练循环中传递攻击。我们在实际世界数据集上进行了empirical测试,显示我们的方法可以在中等攻击下预防约30%的性能下降,并在非常攻击下是必要的,与无攻击下的性能损失比例小于7%。”

Bayesian Time-Series Classifier for Decoding Simple Visual Stimuli from Intracranial Neural Activity

  • paper_url: http://arxiv.org/abs/2307.15672
  • repo_url: None
  • paper_authors: Navid Ziaei, Reza Saadatifard, Ali Yousefi, Behzad Nazari, Sydney S. Cash, Angelique C. Paulk
  • For: This paper is written to address the need for developing analytical tools that can handle limited data and intrinsic stochasticity present in neural data, with the goal of understanding how external stimuli are encoded in distributed neural activity.* Methods: The proposed Bayesian time series classifier (BTsC) model is used to classify neural data and decode colors in a visual task. The model is based on a straightforward approach that maintains a high level of interpretability.* Results: The BTsC model exhibits consistent and reliable average performance of 75.55% on 4 patients’ dataset, improving upon state-of-the-art machine learning techniques by about 3.0 percent. The proposed solution provides interpretable results, making it a valuable tool to study neural activity in various tasks and categories.
    Abstract Understanding how external stimuli are encoded in distributed neural activity is of significant interest in clinical and basic neuroscience. To address this need, it is essential to develop analytical tools capable of handling limited data and the intrinsic stochasticity present in neural data. In this study, we propose a straightforward Bayesian time series classifier (BTsC) model that tackles these challenges whilst maintaining a high level of interpretability. We demonstrate the classification capabilities of this approach by utilizing neural data to decode colors in a visual task. The model exhibits consistent and reliable average performance of 75.55% on 4 patients' dataset, improving upon state-of-the-art machine learning techniques by about 3.0 percent. In addition to its high classification accuracy, the proposed BTsC model provides interpretable results, making the technique a valuable tool to study neural activity in various tasks and categories. The proposed solution can be applied to neural data recorded in various tasks, where there is a need for interpretable results and accurate classification accuracy.
    摘要 理解外部刺激如何在神经活动中被编码是клиниче和基础神经科学中的一项关键问题。为了解决这个问题,需要开发可以处理有限数据和神经数据的内在噪声的分析工具。在这个研究中,我们提出了一种简单的抽象时间序列分类器(BTsC)模型,该模型可以解决这些挑战,同时保持高度的可读性。我们通过使用神经数据来解码视觉任务中的颜色,示出了该模型的分类能力。模型在4名病人的数据集上显示了平均性和可靠性的75.55%的分类精度,比前一个状态的机器学习技术提高约3.0%。此外,我们的提议的BTsC模型不仅具有高的分类精度,还提供了可读的结果,使得该技术成为各种任务和类别中神经活动研究的有价值工具。该解决方案可以应用于神经数据记录在各种任务中,需要可读的结果和高的分类精度。

CoRe Optimizer: An All-in-One Solution for Machine Learning

  • paper_url: http://arxiv.org/abs/2307.15663
  • repo_url: https://github.com/jettbrains/-L-
  • paper_authors: Marco Eckhoff, Markus Reiher
  • for: 训练机器学习模型的优化算法和其超参数可以对训练速度和模型准确率产生重要影响。
  • methods: 本文使用了10种优化算法,包括Adam优化器和抗衰减反propagation(RPROP),并对不同的机器学习任务进行了广泛的性能比较。
  • results: 研究发现,CoRe优化器在各种机器学习任务中表现最佳或与其他优化器竞争,而只需要根据mini-batch或批处理学习而改变一个超参数。
    Abstract The optimization algorithm and its hyperparameters can significantly affect the training speed and resulting model accuracy in machine learning applications. The wish list for an ideal optimizer includes fast and smooth convergence to low error, low computational demand, and general applicability. Our recently introduced continual resilient (CoRe) optimizer has shown superior performance compared to other state-of-the-art first-order gradient-based optimizers for training lifelong machine learning potentials. In this work we provide an extensive performance comparison of the CoRe optimizer and nine other optimization algorithms including the Adam optimizer and resilient backpropagation (RPROP) for diverse machine learning tasks. We analyze the influence of different hyperparameters and provide generally applicable values. The CoRe optimizer yields best or competitive performance in every investigated application, while only one hyperparameter needs to be changed depending on mini-batch or batch learning.
    摘要 优化算法和其超参数可以很大地影响机器学习模型的训练速度和结果准确率。理想的优化器的愿望列表包括快速和平滑地 converges to 低误差,低计算成本,通用性。我们最近引入的连续强健(CoRe)优化器在训练持续学习潜力方面显示出优于其他当前状态艺术首频导导优化器。在这个工作中,我们对CoRe优化器和9种其他优化算法,包括Adam优化器和快速反演(RPROP)进行了广泛的性能比较。我们分析了不同的超参数对应用的影响,并提供了通用的值。CoRe优化器在所有调查应用中表现最佳或竞争力强,而只需要根据mini-batch或批处理学习而变化一个超参数。

Multi-layer Aggregation as a key to feature-based OOD detection

  • paper_url: http://arxiv.org/abs/2307.15647
  • repo_url: https://github.com/benolmbrt/MedicOOD
  • paper_authors: Benjamin Lambert, Florence Forbes, Senan Doyle, Michel Dojat
  • For: 本研究旨在探讨 Deep Learning 模型对输入图像变化的抗干扰性,尤其是在医学图像分析中, где范围内的可能的异常非常广泛。* Methods: 本研究使用了基于模型中间特征的新一代方法,可以分为单层方法和多层方法。单层方法考虑在固定、仔细选择的层获得的特征图,而多层方法考虑模型生成的特征图ensemble。* Results: 本研究对20种异常类型(对应约7800个3D MRI)进行了大规模的对比,发现多层方法在各种异常类型中都有更高的抗干扰性,而单层方法则具有不一致的行为,具体取决于异常类型。此外,本研究还发现了基于模型网络架构的 OOD 检测性能强度的关系。
    Abstract Deep Learning models are easily disturbed by variations in the input images that were not observed during the training stage, resulting in unpredictable predictions. Detecting such Out-of-Distribution (OOD) images is particularly crucial in the context of medical image analysis, where the range of possible abnormalities is extremely wide. Recently, a new category of methods has emerged, based on the analysis of the intermediate features of a trained model. These methods can be divided into 2 groups: single-layer methods that consider the feature map obtained at a fixed, carefully chosen layer, and multi-layer methods that consider the ensemble of the feature maps generated by the model. While promising, a proper comparison of these algorithms is still lacking. In this work, we compared various feature-based OOD detection methods on a large spectra of OOD (20 types), representing approximately 7800 3D MRIs. Our experiments shed the light on two phenomenons. First, multi-layer methods consistently outperform single-layer approaches, which tend to have inconsistent behaviour depending on the type of anomaly. Second, the OOD detection performance highly depends on the architecture of the underlying neural network.
    摘要 深度学习模型容易受到训练阶段未见到的输入图像变化的影响,导致预测结果不可预测。在医学图像分析上,检测这些外围(Out-of-Distribution,OOD)图像特别重要。最近,一种新的类型的方法出现了,基于模型的中间特征分析。这些方法可以分为两个组:单层方法,考虑模型在固定、仔细选择的层获得的特征图,以及多层方法,考虑模型生成的特征图ensemble。虽然有承诺,但是这些算法之间的比较仍然缺乏。在这项工作中,我们对各种特征基于OOD检测方法进行了大规模的测试,包括20种类型的OOD图像,代表约7800个3D MRI图像。我们的实验揭示了两种现象:首先,多层方法在单层方法中具有更高的检测性能,而且这些单层方法在异常类型之间存在不一致的行为。其次,OOD检测性能强烈取决于下游神经网络的架构。

Scale-aware Test-time Click Adaptation for Pulmonary Nodule and Mass Segmentation

  • paper_url: http://arxiv.org/abs/2307.15645
  • repo_url: https://github.com/splinterli/sattca
  • paper_authors: Zhihao Li, Jiancheng Yang, Yongchao Xu, Li Zhang, Wenhui Dong, Bo Du
  • for: 这篇论文是为了提高lung cancer screening中图像分割的精度,特别是处理不同大小的肺脏病变所写的。
  • methods: 这篇论文提出了一种基于多尺度神经网络的test-time adaptiveClick权重调整方法,使得分割性能特别是对大肺脏病变进行了改进。
  • results: experiments表明,这种方法可以与一些CNN和Transformer基于的分割方法进行比较,并且可以很好地处理不同大小的肺脏病变。
    Abstract Pulmonary nodules and masses are crucial imaging features in lung cancer screening that require careful management in clinical diagnosis. Despite the success of deep learning-based medical image segmentation, the robust performance on various sizes of lesions of nodule and mass is still challenging. In this paper, we propose a multi-scale neural network with scale-aware test-time adaptation to address this challenge. Specifically, we introduce an adaptive Scale-aware Test-time Click Adaptation method based on effortlessly obtainable lesion clicks as test-time cues to enhance segmentation performance, particularly for large lesions. The proposed method can be seamlessly integrated into existing networks. Extensive experiments on both open-source and in-house datasets consistently demonstrate the effectiveness of the proposed method over some CNN and Transformer-based segmentation methods. Our code is available at https://github.com/SplinterLi/SaTTCA
    摘要 肺脏结核和肿块是肺癌检测中重要的成像特征,需要仔细的诊断管理。尽管深度学习基于医疗图像分割的技术取得了成功,但是对不同大小的肿块和结核的性能仍然是挑战。在这篇论文中,我们提出了一种多尺度神经网络以及Scale-aware Test-time Click Adaptation方法,以提高分割性能,特别是大肿块的分割。这种方法可以轻松地与现有网络集成。我们在开源和自有数据集上进行了广泛的实验,并经过了一系列的比较,结果表明我们提出的方法在一些CNN和Transformer基于的分割方法之上表现更加有力。我们的代码可以在https://github.com/SplinterLi/SaTTCA上获取。

Scaling Data Generation in Vision-and-Language Navigation

  • paper_url: http://arxiv.org/abs/2307.15644
  • repo_url: https://github.com/wz0919/scalevln
  • paper_authors: Zun Wang, Jialu Li, Yicong Hong, Yi Wang, Qi Wu, Mohit Bansal, Stephen Gould, Hao Tan, Yu Qiao
  • for: 提高语言导航agent的总体性和可靠性
  • methods: 使用HM3D和Gibson数据集中的1200多个真实照片环境,以及网络上可以访问的全部资源,生成490万个指令轨迹对,并对这些数据进行预训练和精度调整
  • results: 使用这些扩大数据,提高了现有agent的性能,单次成功率提高11%,与前一个SoTA的比较达到了80%的单次成功率,同时将不 familier环境下的性能差距降低到0.1%(相比前一个方法的8%),并且这种方法可以让不同的模型在CVDN、REVERIE和R2R中实现新的顶峰导航成绩。
    Abstract Recent research in language-guided visual navigation has demonstrated a significant demand for the diversity of traversable environments and the quantity of supervision for training generalizable agents. To tackle the common data scarcity issue in existing vision-and-language navigation datasets, we propose an effective paradigm for generating large-scale data for learning, which applies 1200+ photo-realistic environments from HM3D and Gibson datasets and synthesizes 4.9 million instruction trajectory pairs using fully-accessible resources on the web. Importantly, we investigate the influence of each component in this paradigm on the agent's performance and study how to adequately apply the augmented data to pre-train and fine-tune an agent. Thanks to our large-scale dataset, the performance of an existing agent can be pushed up (+11% absolute with regard to previous SoTA) to a significantly new best of 80% single-run success rate on the R2R test split by simple imitation learning. The long-lasting generalization gap between navigating in seen and unseen environments is also reduced to less than 1% (versus 8% in the previous best method). Moreover, our paradigm also facilitates different models to achieve new state-of-the-art navigation results on CVDN, REVERIE, and R2R in continuous environments.
    摘要