cs.LG - 2023-07-12

Machine learning and Topological data analysis identify unique features of human papillae in 3D scans

  • paper_url: http://arxiv.org/abs/2307.06255
  • repo_url: None
  • paper_authors: Rayna Andreeva, Anwesha Sarkar, Rik Sarkar
  • for: 这个论文旨在探讨舌尖上的苔毛是否具有个性特征,以及这些特征是如何影响食物的嗅觉和口感感受。
  • methods: 该论文使用了计算机视觉和数据科学技术,对3D微显微镜像中的人类舌尖进行了计算和分析,揭示了舌尖的几何和拓扑特征的独特性。
  • results: 研究发现,舌尖的几何和拓扑特征是具有个性特征的,可以用来识别个体。模型使用了这些特征,可以准确地分类舌尖为不同类型,并且可以映射舌尖的空间布局。这些结果表明,舌尖可以作为一个唯一标识符,并且可以推动新的食品偏好和口腔诊断研究。
    Abstract The tongue surface houses a range of papillae that are integral to the mechanics and chemistry of taste and textural sensation. Although gustatory function of papillae is well investigated, the uniqueness of papillae within and across individuals remains elusive. Here, we present the first machine learning framework on 3D microscopic scans of human papillae (n = 2092), uncovering the uniqueness of geometric and topological features of papillae. The finer differences in shapes of papillae are investigated computationally based on a number of features derived from discrete differential geometry and computational topology. Interpretable machine learning techniques show that persistent homology features of the papillae shape are the most effective in predicting the biological variables. Models trained on these features with small volumes of data samples predict the type of papillae with an accuracy of 85%. The papillae type classification models can map the spatial arrangement of filiform and fungiform papillae on a surface. Remarkably, the papillae are found to be distinctive across individuals and an individual can be identified with an accuracy of 48% among the 15 participants from a single papillae. Collectively, this is the first unprecedented evidence demonstrating that tongue papillae can serve as a unique identifier inspiring new research direction for food preferences and oral diagnostics.
    摘要 tongues 表面上有一些特殊的皮质结构,它们是味蕾和口感感觉的机械和化学方面的重要组成部分。although the gustatory function of these papillae has been well studied, the uniqueness of papillae within and across individuals remains poorly understood. here, we present the first machine learning framework on 3D microscopic scans of human papillae (n = 2092), revealing the uniqueness of the geometric and topological features of papillae. we use computational methods to investigate the finer differences in papillae shapes based on a number of features derived from discrete differential geometry and computational topology. our interpretable machine learning models show that persistent homology features of the papillae shape are the most effective in predicting the biological variables. models trained on these features with small volumes of data samples predict the type of papillae with an accuracy of 85%. remarkably, the papillae are found to be distinctive across individuals, and an individual can be identified with an accuracy of 48% among the 15 participants from a single papillae. this is the first unprecedented evidence demonstrating that tongue papillae can serve as a unique identifier, inspiring new research directions for food preferences and oral diagnostics.

Identifiability Guarantees for Causal Disentanglement from Soft Interventions

  • paper_url: http://arxiv.org/abs/2307.06250
  • repo_url: https://github.com/uhlerlab/discrepancy_vae
  • paper_authors: Jiaqi Zhang, Chandler Squires, Kristjan Greenewald, Akash Srivastava, Karthikeyan Shanmugam, Caroline Uhler
  • for: 本研究旨在探讨如何通过 latent variable 来抽象数据,以实现 causal disentanglement。
  • methods: 本研究使用了 unpaired observational 和 intervenational 数据,并采用了一种基于 causal model 的方法来 indentify latent variable。
  • results: 研究结果表明,可以通过一种 generalized notion of faithfulness 来确保 causal model 的可 identificability,并且可以预测未经见过的混合干扰效应。
    Abstract Causal disentanglement aims to uncover a representation of data using latent variables that are interrelated through a causal model. Such a representation is identifiable if the latent model that explains the data is unique. In this paper, we focus on the scenario where unpaired observational and interventional data are available, with each intervention changing the mechanism of a latent variable. When the causal variables are fully observed, statistically consistent algorithms have been developed to identify the causal model under faithfulness assumptions. We here show that identifiability can still be achieved with unobserved causal variables, given a generalized notion of faithfulness. Our results guarantee that we can recover the latent causal model up to an equivalence class and predict the effect of unseen combinations of interventions, in the limit of infinite data. We implement our causal disentanglement framework by developing an autoencoding variational Bayes algorithm and apply it to the problem of predicting combinatorial perturbation effects in genomics.
    摘要 causal disentanglement aims to uncover a representation of data using latent variables that are interrelated through a causal model. such a representation is identifiable if the latent model that explains the data is unique. in this paper, we focus on the scenario where unpaired observational and interventional data are available, with each intervention changing the mechanism of a latent variable. when the causal variables are fully observed, statistically consistent algorithms have been developed to identify the causal model under faithfulness assumptions. we here show that identifiability can still be achieved with unobserved causal variables, given a generalized notion of faithfulness. our results guarantee that we can recover the latent causal model up to an equivalence class and predict the effect of unseen combinations of interventions, in the limit of infinite data. we implement our causal disentanglement framework by developing an autoencoding variational bayes algorithm and apply it to the problem of predicting combinatorial perturbation effects in genomics.Here's the text with some additional information about the translation:The text is translated into Simplified Chinese, which is the standard form of Chinese used in mainland China and widely used in other countries as well. The translation is written in a formal and technical style, using precise language and terminology to convey the meaning of the original text. The translation includes all the key concepts and ideas of the original text, including "causal disentanglement," "latent variables," "causal model," "identifiability," "faithfulness assumptions," "equivalence class," and "combinatorial perturbation effects." The translation also includes some additional information and context to help readers understand the text better.

Diffusion Based Multi-Agent Adversarial Tracking

  • paper_url: http://arxiv.org/abs/2307.06244
  • repo_url: None
  • paper_authors: Sean Ye, Manisha Natarajan, Zixuan Wu, Matthew Gombolay
  • for: 这篇论文的目的是提高自动追踪系统,以更好地帮助无人飞行、水面和水下车辆对抗走私者使用推测游戏和追踪技术。
  • methods: 这篇论文提出了一种称为Constrained Agent-based Diffusion for Enhanced Multi-Agent Tracking(CADENCE)的方法,它利用过去的稀疏状态信息来生成敌人位置的全面预测。
  • results: 这篇论文的实验结果显示,CADENCE方法的单目标和多目标追踪环境下的预测性能都高于所有基eline方法,尤其是在所有时间检查点上。
    Abstract Target tracking plays a crucial role in real-world scenarios, particularly in drug-trafficking interdiction, where the knowledge of an adversarial target's location is often limited. Improving autonomous tracking systems will enable unmanned aerial, surface, and underwater vehicles to better assist in interdicting smugglers that use manned surface, semi-submersible, and aerial vessels. As unmanned drones proliferate, accurate autonomous target estimation is even more crucial for security and safety. This paper presents Constrained Agent-based Diffusion for Enhanced Multi-Agent Tracking (CADENCE), an approach aimed at generating comprehensive predictions of adversary locations by leveraging past sparse state information. To assess the effectiveness of this approach, we evaluate predictions on single-target and multi-target pursuit environments, employing Monte-Carlo sampling of the diffusion model to estimate the probability associated with each generated trajectory. We propose a novel cross-attention based diffusion model that utilizes constraint-based sampling to generate multimodal track hypotheses. Our single-target model surpasses the performance of all baseline methods on Average Displacement Error (ADE) for predictions across all time horizons.
    摘要 Target tracking plays a crucial role in real-world scenarios, particularly in drug-trafficking interdiction, where the knowledge of an adversarial target's location is often limited. Improving autonomous tracking systems will enable unmanned aerial, surface, and underwater vehicles to better assist in interdicting smugglers that use manned surface, semi-submersible, and aerial vessels. As unmanned drones proliferate, accurate autonomous target estimation is even more crucial for security and safety. This paper presents Constrained Agent-based Diffusion for Enhanced Multi-Agent Tracking (CADENCE), an approach aimed at generating comprehensive predictions of adversary locations by leveraging past sparse state information. To assess the effectiveness of this approach, we evaluate predictions on single-target and multi-target pursuit environments, employing Monte-Carlo sampling of the diffusion model to estimate the probability associated with each generated trajectory. We propose a novel cross-attention based diffusion model that utilizes constraint-based sampling to generate multimodal track hypotheses. Our single-target model surpasses the performance of all baseline methods on Average Displacement Error (ADE) for predictions across all time horizons.Translation notes:* "Target tracking" is translated as "目标跟踪" (mùzhì yùcháng)* "Adversarial target" is translated as "反对目标" (fǎndǎo mùzhì)* "Autonomous tracking systems" is translated as "自主跟踪系统" (zìzhòu yùcháng jìtè)* "Unmanned aerial, surface, and underwater vehicles" is translated as "无人航空、表面、水下车辆" (wúrén hángkōng, biǎo miàn, shuǐ xià kā)* "Single-target" is translated as "单目标" (dān mùzhì)* "Multi-target" is translated as "多目标" (duō mùzhì)* "Pursuit environments" is translated as "追踪环境" (zhuīcháng yuánjì)* "Monte-Carlo sampling" is translated as "蒙特卡洛采样" (mēng tè kā luō cǎi yàng)* "Diffusion model" is translated as "扩散模型" (kuòchǎn móde)* "Constraint-based sampling" is translated as "约束基于采样" (guīshì jíyù yùcháng)* "Multimodal track hypotheses" is translated as "多模态跟踪假设" (duō módai yùcháng pinyì)* "Average Displacement Error" is translated as "平均偏移错误" (píngjì diānchēng xiǎngwù)

Reconstructing Spatiotemporal Data with C-VAEs

  • paper_url: http://arxiv.org/abs/2307.06243
  • repo_url: https://github.com/ciic-c-t-polytechnic-of-leiria/reconstr_cvae_paper
  • paper_authors: Tiago F. R. Ribeiro, Fernando Silva, Rogério Luís de C. Costa
  • for: 这篇论文的目的是研究如何使用Conditional Variational Autoencoder(C-VAE)模型来生成2D移动区域的平滑和实际的演化表示。
  • methods: 这篇论文使用了C-VAE模型和其他常用的插值算法来生成间隔数据表示。
  • results: 研究结果表明,C-VAE模型可以与其他方法相比,在几何相似度指标上达到竞争水平,并且在时间一致指标上表现出色, suggesting that C-VAE models may be a viable alternative for modelling the spatiotemporal evolution of 2D moving regions.
    Abstract The continuous representation of spatiotemporal data commonly relies on using abstract data types, such as \textit{moving regions}, to represent entities whose shape and position continuously change over time. Creating this representation from discrete snapshots of real-world entities requires using interpolation methods to compute in-between data representations and estimate the position and shape of the object of interest at arbitrary temporal points. Existing region interpolation methods often fail to generate smooth and realistic representations of a region's evolution. However, recent advancements in deep learning techniques have revealed the potential of deep models trained on discrete observations to capture spatiotemporal dependencies through implicit feature learning. In this work, we explore the capabilities of Conditional Variational Autoencoder (C-VAE) models to generate smooth and realistic representations of the spatiotemporal evolution of moving regions. We evaluate our proposed approach on a sparsely annotated dataset on the burnt area of a forest fire. We apply compression operations to sample from the dataset and use the C-VAE model and other commonly used interpolation algorithms to generate in-between region representations. To evaluate the performance of the methods, we compare their interpolation results with manually annotated data and regions generated by a U-Net model. We also assess the quality of generated data considering temporal consistency metrics. The proposed C-VAE-based approach demonstrates competitive results in geometric similarity metrics. It also exhibits superior temporal consistency, suggesting that C-VAE models may be a viable alternative to modelling the spatiotemporal evolution of 2D moving regions.
    摘要 continuous representation of spatiotemporal data 通常使用抽象数据类型,如移动区域,来表示时间和空间上的变化。从真实世界中的精碎快照中创建这种表示需要使用 interpolate 方法来计算中间数据表示和估算对象关注点的位置和形状。现有的区域 interpolate 方法经常无法生成平滑和真实的区域演化表示。然而,最近的深度学习技术的发展已经探明了深度学习模型在离散观察数据上学习隐式特征的潜力。在这项工作中,我们探索使用 Conditional Variational Autoencoder(C-VAE)模型来生成平滑和真实的区域演化表示。我们使用缺乏注释的 dataset 对受灾区域进行评估,并应用压缩操作来采样 dataset。我们使用 C-VAE 模型和其他常用的 interpolate 算法来生成中间区域表示。为了评估方法的性能,我们比较 interpolate 结果与手动注释数据和 U-Net 模型生成的区域。我们还评估生成数据的 temporal 一致性指标。根据 geometric similarity 指标,我们的提议的 C-VAE 基于方法达到了竞争力的结果。此外,它还表现出了superior temporal consistency, suggesting that C-VAE models may be a viable alternative to modelling the spatiotemporal evolution of 2D moving regions.

DSSE: a drone swarm search environment

  • paper_url: http://arxiv.org/abs/2307.06240
  • repo_url: https://github.com/pfe-embraer/drone-swarm-search
  • paper_authors: Manuel Castanares, Luis F. S. Carrete, Enrico F. Damiani, Leonardo D. M. de Abreu, José Fernando B. Brancalion, Fabrício J. Barth
  • for: 这个论文是为了研究基于可变概率输入的多智能体(或单智能体)强化学习算法。
  • methods: 这个项目使用了基于PettingZoo的环境,其中多个智能体(或单个智能体)需要在无知目标位置的情况下找到失事人员。这些智能体不会根据自己与目标的距离获得奖励,但会收到地图中每个单元的目标概率。
  • results: 这个项目的目的是用于研究基于动态概率输入的强化学习算法。I hope that helps! Let me know if you have any other questions.
    Abstract The Drone Swarm Search project is an environment, based on PettingZoo, that is to be used in conjunction with multi-agent (or single-agent) reinforcement learning algorithms. It is an environment in which the agents (drones), have to find the targets (shipwrecked people). The agents do not know the position of the target and do not receive rewards related to their own distance to the target(s). However, the agents receive the probabilities of the target(s) being in a certain cell of the map. The aim of this project is to aid in the study of reinforcement learning algorithms that require dynamic probabilities as inputs.
    摘要 “这个Drone Swarm Search项目是一个基于PettingZoo的环境,用于与多项(或单项)循环学习算法相结合。在这个环境中,代理人(无人机)需要找到目标(船难生还者)。代理人不知道目标的位置,也不会因自己与目标之间的距离而获得奖励。然而,代理人会获得目标可能存在某个地图范围中的概率。这个项目的目的是帮助研究需要动态概率作为输入的循环学习算法。”Note that the word "项目" (project) is in Simplified Chinese, while the word "PettingZoo" is in Traditional Chinese.

Unified Molecular Modeling via Modality Blending

  • paper_url: http://arxiv.org/abs/2307.06235
  • repo_url: None
  • paper_authors: Qiying Yu, Yudi Zhang, Yuyan Ni, Shikun Feng, Yanyan Lan, Hao Zhou, Jingjing Liu
  • for: 本研究旨在提高基于分子的任务,如人工智能药物发现,的自动学习表示。
  • methods: 我们提出了一种新的“混合然后预测”自然学习方法(分子融合),将不同模态的原子关系融合为一个统一的关系矩阵,以便编码。然后,通过回归模态特有的信息,对2D和3D结构进行细致的关系级别的协调。
  • results: 我们的实验表明,MoleBLEND在主要的2D/3D标准测试集上达到了状态 искусственный智能表示的最佳性能。此外,我们还提供了基于协同信息最大化的理论启示,显示了我们的方法可以将对比学习、生成学习(跨模态预测)和封预测(内模态预测)的目标集成为一个完整的融合混合然后预测框架。
    Abstract Self-supervised molecular representation learning is critical for molecule-based tasks such as AI-assisted drug discovery. Recent studies consider leveraging both 2D and 3D information for representation learning, with straightforward alignment strategies that treat each modality separately. In this work, we introduce a novel "blend-then-predict" self-supervised learning method (MoleBLEND), which blends atom relations from different modalities into one unified relation matrix for encoding, then recovers modality-specific information for both 2D and 3D structures. By treating atom relationships as anchors, seemingly dissimilar 2D and 3D manifolds are aligned and integrated at fine-grained relation-level organically. Extensive experiments show that MoleBLEND achieves state-of-the-art performance across major 2D/3D benchmarks. We further provide theoretical insights from the perspective of mutual-information maximization, demonstrating that our method unifies contrastive, generative (inter-modal prediction) and mask-then-predict (intra-modal prediction) objectives into a single cohesive blend-then-predict framework.
    摘要 自适应分子表示学习是药物发现方面的关键技术。近年研究人员倾向于同时利用2D和3D信息进行表示学习,通常采用简单的对齐策略,将每种模式处理为独立的数据。在这项工作中,我们提出了一种新的“混合然后预测”自适应学习方法(分子融合),将不同模式的原子关系混合到一个统一的关系矩阵中进行编码,然后在2D和3D结构中恢复模式特有的信息。通过对原子关系作为锚点, apparently 不同的2D和3D manifold 被平滑地对应和融合,从 fine-grained 关系水平出发。实验表明, MoleBLEND 在主要的2D/3D标准 benchmar 上实现了状态机器的表现。我们还提供了从多信息最大化角度的理论启示,表明我们的方法将对异常对的、生成(交互模式预测)和Mask-Then-Predict(内模式预测)目标集成到一个完整的混合然后预测框架中。

Local Conditional Neural Fields for Versatile and Generalizable Large-Scale Reconstructions in Computational Imaging

  • paper_url: http://arxiv.org/abs/2307.06207
  • repo_url: https://github.com/bu-cisl/LCNF
  • paper_authors: Hao Wang, Jiabei Zhu, Yunzhe Li, QianWan Yang, Lei Tian
  • for: 解决计算成像领域中的大规模逆问题,使用深度学习技术。
  • methods: 使用Local Conditional Neural Fields(LCNF)框架,利用连续卷积神经表示,解决传统像素基的限制,捕捉 объек 的连续、多尺度特征。
  • results: 在快速扫描微型镜中实现了高分辨率相位恢复,使用只有一些多重混合测量数据,并且能够捕捉宽视场、高分辨率的相位图像。LCNF可以学习自然图像数据集上的物理 simulate 中的对象约束,并在实验中成功应用于生物样本测量。
    Abstract Deep learning has transformed computational imaging, but traditional pixel-based representations limit their ability to capture continuous, multiscale details of objects. Here we introduce a novel Local Conditional Neural Fields (LCNF) framework, leveraging a continuous implicit neural representation to address this limitation. LCNF enables flexible object representation and facilitates the reconstruction of multiscale information. We demonstrate the capabilities of LCNF in solving the highly ill-posed inverse problem in Fourier ptychographic microscopy (FPM) with multiplexed measurements, achieving robust, scalable, and generalizable large-scale phase retrieval. Unlike traditional neural fields frameworks, LCNF incorporates a local conditional representation that promotes model generalization, learning multiscale information, and efficient processing of large-scale imaging data. By combining an encoder and a decoder conditioned on a learned latent vector, LCNF achieves versatile continuous-domain super-resolution image reconstruction. We demonstrate accurate reconstruction of wide field-of-view, high-resolution phase images using only a few multiplexed measurements. LCNF robustly captures the continuous object priors and eliminates various phase artifacts, even when it is trained on imperfect datasets. The framework exhibits strong generalization, reconstructing diverse objects even with limited training data. Furthermore, LCNF can be trained on a physics simulator using natural images and successfully applied to experimental measurements on biological samples. Our results highlight the potential of LCNF for solving large-scale inverse problems in computational imaging, with broad applicability in various deep-learning-based techniques.
    摘要 深度学习已经改变计算影像的方式,但传统的像素基于表示限制了它们捕捉对象的连续、多尺度细节的能力。在这里,我们介绍了一种新的本地conditional神经场(LCNF)框架,利用连续假设神经表示来解决这一限制。LCNF允许 flexible对象表示,并且使得多尺度信息的重建。我们在快 Fourierptychographic microscopy(FPM)中解决了高度不稳定的逆问题, achieved robust、可扩展、通用的大规模阶段逆解决方案。与传统神经场框架不同,LCNF包含本地conditional表示,从而促进模型通用、学习多尺度信息和高效处理大规模影像数据。通过将编码器和解码器conditioned on learned latent vector,LCNF实现了 versatile continuous-domain超Resolution image reconstruction。我们示出了使用只有几个多重化测量的宽视场高分辨率相位图像的准确重建。LCNF坚定地捕捉连续对象假设,并消除了各种相位杂质,即使在训练数据不完整的情况下。框架具有强大的通用性,可以在不同的对象和测量数据上重建多样化的图像。此外,LCNF可以在物理模拟器上使用自然图像进行训练,并成功应用于实验室测量。我们的结果表明LCNF可以解决计算影像中的大规模逆问题,并具有广泛的应用前景。

Learning Decentralized Partially Observable Mean Field Control for Artificial Collective Behavior

  • paper_url: http://arxiv.org/abs/2307.06175
  • repo_url: None
  • paper_authors: Kai Cui, Sascha Hauck, Christian Fabian, Heinz Koeppl
    for:This paper focuses on multi-agent reinforcement learning (MARL) with decentralized partially observable markov decision processes (Dec-POMFC) to address scalability and partial observability challenges in collective behavior tasks.methods:The proposed method uses mean field control (MFC) with novel models for decentralized partially observable MFC, which enables decentralized behavior of agents under partial information. The method also includes policy gradient methods for MARL via centralized training and decentralized execution, with policy gradient approximation guarantees.results:The proposed method is evaluated numerically on representative collective behavior tasks such as adapted Kuramoto and Vicsek swarming models, and is on par with state-of-the-art MARL. The method improves upon state-of-the-art histogram-based MFC by kernel methods, which is of separate interest also for fully observable MFC.
    Abstract Recent reinforcement learning (RL) methods have achieved success in various domains. However, multi-agent RL (MARL) remains a challenge in terms of decentralization, partial observability and scalability to many agents. Meanwhile, collective behavior requires resolution of the aforementioned challenges, and remains of importance to many state-of-the-art applications such as active matter physics, self-organizing systems, opinion dynamics, and biological or robotic swarms. Here, MARL via mean field control (MFC) offers a potential solution to scalability, but fails to consider decentralized and partially observable systems. In this paper, we enable decentralized behavior of agents under partial information by proposing novel models for decentralized partially observable MFC (Dec-POMFC), a broad class of problems with permutation-invariant agents allowing for reduction to tractable single-agent Markov decision processes (MDP) with single-agent RL solution. We provide rigorous theoretical results, including a dynamic programming principle, together with optimality guarantees for Dec-POMFC solutions applied to finite swarms of interest. Algorithmically, we propose Dec-POMFC-based policy gradient methods for MARL via centralized training and decentralized execution, together with policy gradient approximation guarantees. In addition, we improve upon state-of-the-art histogram-based MFC by kernel methods, which is of separate interest also for fully observable MFC. We evaluate numerically on representative collective behavior tasks such as adapted Kuramoto and Vicsek swarming models, being on par with state-of-the-art MARL. Overall, our framework takes a step towards RL-based engineering of artificial collective behavior via MFC.
    摘要 近期的强化学习(RL)方法在不同领域中已经取得了成功。然而,多智能RL(MARL)仍然面临了分布式、偏见性和可扩展性等挑战。同时,集体行为需要解决这些挑战,并对许多现代应用程序如活跃物理、自组织系统、意见动力学和生物或机器群体等产生了重要性。在这篇论文中,我们通过提出新的均场控制(MFC)模型来解决分布式和偏见性的问题,并实现了可扩展的集体行为。我们提供了准确的理论结果,包括动态程序理论,以及对均场控制解决方案的可行性保证。从算法角度来看,我们提出了基于均场控制的策略梯度法,并提供了策略梯度预测 guarantees。此外,我们提高了现有的频谱矩阵控制方法,这也是一个独立的研究兴趣。我们在代表性的收集行为任务上进行了数值计算,与当前的MARL技术一样。总的来说,我们的框架向RL基于MFC的人工集体行为工程做出了一步进展。

Auxiliary-Tasks Learning for Physics-Informed Neural Network-Based Partial Differential Equations Solving

  • paper_url: http://arxiv.org/abs/2307.06167
  • repo_url: https://github.com/junjun-yan/atl-pinn
  • paper_authors: Junjun Yan, Xinhai Chen, Zhichao Wang, Enqiang Zhou, Jie Liu
  • for: 解决部分整数方程(PDEs)的数学预测问题
  • methods: 利用物理学信息学习(PINNs)和辅助任务学习(ATL)两种方法
  • results: 在不同领域和场景中对三个PDE问题进行了实验,发现辅助任务学习模式可以显著提高解决精度,最高提升率为96.62%(平均提升率为28.23%)比单任务PINN更高。Here’s the breakdown of each point:
  • for: 解释了这篇论文的目的是解决部分整数方程(PDEs)的数学预测问题。
  • methods: 讲述了这篇论文使用的两种方法:原始的物理学信息学习(PINNs)和辅助任务学习(ATL)。
  • results: 描述了这篇论文的实验结果,包括在不同领域和场景中对三个PDE问题进行了实验,并发现辅助任务学习模式可以显著提高解决精度。
    Abstract Physics-informed neural networks (PINNs) have emerged as promising surrogate modes for solving partial differential equations (PDEs). Their effectiveness lies in the ability to capture solution-related features through neural networks. However, original PINNs often suffer from bottlenecks, such as low accuracy and non-convergence, limiting their applicability in complex physical contexts. To alleviate these issues, we proposed auxiliary-task learning-based physics-informed neural networks (ATL-PINNs), which provide four different auxiliary-task learning modes and investigate their performance compared with original PINNs. We also employ the gradient cosine similarity algorithm to integrate auxiliary problem loss with the primary problem loss in ATL-PINNs, which aims to enhance the effectiveness of the auxiliary-task learning modes. To the best of our knowledge, this is the first study to introduce auxiliary-task learning modes in the context of physics-informed learning. We conduct experiments on three PDE problems across different fields and scenarios. Our findings demonstrate that the proposed auxiliary-task learning modes can significantly improve solution accuracy, achieving a maximum performance boost of 96.62% (averaging 28.23%) compared to the original single-task PINNs. The code and dataset are open source at https://github.com/junjun-yan/ATL-PINN.
    摘要 physics-informed neural networks (PINNs) 已经出现为解决部分微分方程(PDEs)的可靠供应方法。它们的有效性来自于通过神经网络捕捉解决方案相关的特征。然而,原始的PINNs经常受到瓶颈,如精度低下和不收敛,限制它们在复杂的物理上下文中的应用。为了解决这些问题,我们提出了auxiliary-task learning-based physics-informed neural networks(ATL-PINNs),它们提供了四种不同的auxiliary-task学习模式,并对其性能与原始PINNs进行了比较。此外,我们采用了梯度cosine相似性算法将auxiliary问题损失与主问题损失集成在ATL-PINNs中,以提高auxiliary-task学习模式的效果。在我们所知道的范围内,这是首次在物理学习中引入auxiliary-task学习模式的研究。我们在不同的领域和场景中进行了三个PDE问题的实验。我们的发现表明,我们提出的auxiliary-task学习模式可以显著提高解决精度,最高提高96.62%(平均提高28.23%)相比原始单任务PINNs。代码和数据集可以在https://github.com/junjun-yan/ATL-PINN上获取。

Deep Generative Models for Physiological Signals: A Systematic Literature Review

  • paper_url: http://arxiv.org/abs/2307.06162
  • repo_url: None
  • paper_authors: Nour Neifar, Afef Mdhaffar, Achraf Ben-Hamadou, Mohamed Jmaiel
  • for: 本文是一篇系统性的文献综述,探讨深度生成模型在生理信号方面的应用,具体是电心电征、电 Encyclopaedia、光谱 Plethysmogram 和电动肌征。
  • methods: 本文分析了深度生成模型的现状,包括其主要应用和挑战,同时也详细介绍了 employed evaluation protocol 和主要使用的生理数据库。
  • results: 本文对深度生成模型的应用进行了系统性的梳理和分析,并对这些模型的评价和比较提供了参考。
    Abstract In this paper, we present a systematic literature review on deep generative models for physiological signals, particularly electrocardiogram, electroencephalogram, photoplethysmogram and electromyogram. Compared to the existing review papers, we present the first review that summarizes the recent state-of-the-art deep generative models. By analysing the state-of-the-art research related to deep generative models along with their main applications and challenges, this review contributes to the overall understanding of these models applied to physiological signals. Additionally, by highlighting the employed evaluation protocol and the most used physiological databases, this review facilitates the assessment and benchmarking of deep generative models.
    摘要 在这篇论文中,我们提出了一项系统性的文献评视对深度生成模型的应用于生物信号,特别是电卡ardiogram、电脑电幕ogram、光谱 plethysmogram 和 electromyogram。与现有的评视纸相比,我们的评视是第一个总结最新的深度生成模型的。通过分析深度生成模型的主要应用和挑战,以及employped评价协议和最常用的生物信号数据库,这篇评视对深度生成模型的应用于生物信号进行了贡献。此外,这篇评视还促进了深度生成模型的评价和比较。

Detecting the Presence of COVID-19 Vaccination Hesitancy from South African Twitter Data Using Machine Learning

  • paper_url: http://arxiv.org/abs/2307.15072
  • repo_url: None
  • paper_authors: Nicholas Perikli, Srimoy Bhattacharya, Blessing Ogbuokiri, Zahra Movahedi Nia, Benjamin Lieberman, Nidhi Tripathi, Salah-Eddine Dahbi, Finn Stevenson, Nicola Bragazzi, Jude Kong, Bruce Mellado
    for: 这个研究的目的是使用 sentiment analysis 分析南非用户生成内容中的疫苗拒绝 sentiment,并训练 AI 模型来 categorize UGC。methods: 该研究使用了 LSTM、bi-LSTM、SVM、BERT-base-cased 和 RoBERTa-base 机器学习模型,并且hyperparameters 通过 WandB 平台进行了精心调整。两种不同的数据预处理方法(semantics-based和corpus-based)也被比较使用。results: 所有模型都有在 45$%$-55$%$ 的范围内的低 F1-scores,只有 BERT 和 RoBERTa 两者达到了显著更好的度量,其中 BERT 的总 F1-scores 为 60$%$, RoBERTa 的总 F1-scores 为 61$%$. 使用 LDA 进行主题分析的miss-classified tweets 可以提供模型准确性的提高之路。
    Abstract Very few social media studies have been done on South African user-generated content during the COVID-19 pandemic and even fewer using hand-labelling over automated methods. Vaccination is a major tool in the fight against the pandemic, but vaccine hesitancy jeopardizes any public health effort. In this study, sentiment analysis on South African tweets related to vaccine hesitancy was performed, with the aim of training AI-mediated classification models and assessing their reliability in categorizing UGC. A dataset of 30000 tweets from South Africa were extracted and hand-labelled into one of three sentiment classes: positive, negative, neutral. The machine learning models used were LSTM, bi-LSTM, SVM, BERT-base-cased and the RoBERTa-base models, whereby their hyperparameters were carefully chosen and tuned using the WandB platform. We used two different approaches when we pre-processed our data for comparison: one was semantics-based, while the other was corpus-based. The pre-processing of the tweets in our dataset was performed using both methods, respectively. All models were found to have low F1-scores within a range of 45$\%$-55$\%$, except for BERT and RoBERTa which both achieved significantly better measures with overall F1-scores of 60$\%$ and 61$\%$, respectively. Topic modelling using an LDA was performed on the miss-classified tweets of the RoBERTa model to gain insight on how to further improve model accuracy.
    摘要 很少有关于南非用户生成内容 durign COVID-19 大流行的社交媒体研究,而且使用手动标注而不是自动方法更加罕见。疫苗是战胜 COVID-19 的重要工具,但是疫苗不信任会对公共卫生产生威胁。本研究通过对南非推特上关于疫苗不信任的 sentiment 分析,以训练 AI 媒介分类模型并评估其可靠性。我们收集了30000个推特消息,并 manually 标注为一个sentiment类型:正面、负面或中性。我们使用的机器学习模型包括 LSTM、bi-LSTM、SVM、BERT-base-cased 和 RoBERTa-base 模型,其中每个模型的 гиперparameters 都经过精心选择和调整使用 WandB 平台。我们使用了两种不同的方法来处理我们的数据,以便进行比较:一种是 semantics-based,另一种是 corpus-based。对于我们的数据集,我们使用了这两种方法进行预处理。所有模型的 F1 分数都在45%-55%之间,只有 BERT 和 RoBERTa 两个模型显示出了明显更好的表现,它们的总 F1 分数分别为 60% 和 61%。为了提高模型准确性,我们使用 LDA 进行主题分析对 RoBERTa 模型中的误分类消息。

Sequential Experimental Design for X-Ray CT Using Deep Reinforcement Learning

  • paper_url: http://arxiv.org/abs/2307.06343
  • repo_url: https://github.com/tianyuan1wang/seqanglerl
  • paper_authors: Tianyuan Wang, Felix Lucka, Tristan van Leeuwen
  • For: 这个研究旨在使X射 Computed Tomography (CT) 技术适合进行生产线上的质量控制,减少投射角度的数量,同时维持三维重建的品质。* Methods: 这个研究使用了简减角度 Tomatoesography (OED) 问题,并使用了深度学习来解决这个问题。* Results: 研究发现,这种方法可以成功地在线上获取最有用的投射角度,并且可以提高 CT 技术的质量控制能力。
    Abstract In X-ray Computed Tomography (CT), projections from many angles are acquired and used for 3D reconstruction. To make CT suitable for in-line quality control, reducing the number of angles while maintaining reconstruction quality is necessary. Sparse-angle tomography is a popular approach for obtaining 3D reconstructions from limited data. To optimize its performance, one can adapt scan angles sequentially to select the most informative angles for each scanned object. Mathematically, this corresponds to solving and optimal experimental design (OED) problem. OED problems are high-dimensional, non-convex, bi-level optimization problems that cannot be solved online, i.e., during the scan. To address these challenges, we pose the OED problem as a partially observable Markov decision process in a Bayesian framework, and solve it through deep reinforcement learning. The approach learns efficient non-greedy policies to solve a given class of OED problems through extensive offline training rather than solving a given OED problem directly via numerical optimization. As such, the trained policy can successfully find the most informative scan angles online. We use a policy training method based on the Actor-Critic approach and evaluate its performance on 2D tomography with synthetic data.
    摘要 在X射 Computed Tomography(CT)中,从多个角度获取投影,并用于3D重建。以使CT适用于直接质控,减少投影角度而保持重建质量是必要的。稀疏角度 computed tomography 是一种常用的方法,以获取3D重建从有限数据。为了优化其性能,可以逐渐更新扫描角度,以选择每个扫描对象中最有用的角度。这种问题可以表示为一个优化实验设计(OED)问题。OED问题是高维度、非凸、双级优化问题,无法在扫描过程中解决。为了解决这些挑战,我们将OED问题 posed 为一个部分可见 Markov 决策过程在 bayesian 框架中,并通过深度强化学习解决。这种方法可以学习高效的非准确策略,并在大量的 offline 训练中学习,而不是直接解决一个给定的 OED 问题。因此,训练好的策略可以成功地在线找到最有用的扫描角度。我们使用一种基于actor-critic方法的策略训练方法,并对2D tomography 的 sintetic 数据进行评估。

Maneuver Decision-Making Through Automatic Curriculum Reinforcement Learning Without Handcrafted Reward functions

  • paper_url: http://arxiv.org/abs/2307.06152
  • repo_url: None
  • paper_authors: Zhang Hong-Peng
  • for: 本文旨在解决无人战斗飞机 autonomous 空中作战中的决策问题,提出一种自动课程强化学习方法,使代理人可以从零开始学习有效的决策。
  • methods: 本文使用自动课程强化学习方法,将决策分解为一系列不同Difficulty Level的子任务,并通过测试结果来调整子任务。代理人逐渐学习完成不同Difficulty Level的子任务,从而学习有效地做出决策。
  • results: 实验表明,无人战斗飞机使用自动课程强化学习方法可以在不同状态下做出有效的决策,包括跟踪、攻击和逃脱等,这些决策都是合理且可解释的。
    Abstract Maneuver decision-making is the core of unmanned combat aerial vehicle for autonomous air combat. To solve this problem, we propose an automatic curriculum reinforcement learning method, which enables agents to learn effective decisions in air combat from scratch. The range of initial states are used for distinguishing curricula of different difficulty levels, thereby maneuver decision is divided into a series of sub-tasks from easy to difficult, and test results are used to change sub-tasks. As sub-tasks change, agents gradually learn to complete a series of sub-tasks from easy to difficult, enabling them to make effective maneuvering decisions to cope with various states without the need to spend effort designing reward functions. The ablation studied show that the automatic curriculum learning proposed in this article is an essential component for training through reinforcement learning, namely, agents cannot complete effective decisions without curriculum learning. Simulation experiments show that, after training, agents are able to make effective decisions given different states, including tracking, attacking and escaping, which are both rational and interpretable.
    摘要 <>掌握决策是无人战斗飞行器的核心,以实现自主空中作战。为解决这个问题,我们提出了一种自动课程强化学习方法,允许代理人从零开始学习有效的决策。Initial states的范围用于 distinguish curricula of different difficulty levels, thereby dividing maneuver decision into a series of sub-tasks from easy to difficult, and test results are used to change sub-tasks. As sub-tasks change, agents gradually learn to complete a series of sub-tasks from easy to difficult, enabling them to make effective maneuvering decisions to cope with various states without the need to spend effort designing reward functions. The ablation studied show that the automatic curriculum learning proposed in this article is an essential component for training through reinforcement learning, namely, agents cannot complete effective decisions without curriculum learning. Simulation experiments show that, after training, agents are able to make effective decisions given different states, including tracking, attacking and escaping, which are both rational and interpretable.Translated by Google Translate.

NetGPT: A Native-AI Network Architecture Beyond Provisioning Personalized Generative Services

  • paper_url: http://arxiv.org/abs/2307.06148
  • repo_url: None
  • paper_authors: Yuxuan Chen, Rongpeng Li, Zhifeng Zhao, Chenghui Peng, Jianjun Wu, Ekram Hossain, Honggang Zhang
  • for: 提供个性化生成服务,使大语言模型(LLM)更加适应人类意图。
  • methods: collaborative cloud-edge方法,可以有效地协调多种不同的分布式通信和计算资源。
  • results: 提出了NetGPT,可以在云和边缘部署适当的LLM,并使用地址基本信息进行个性化提示完成。
    Abstract Large language models (LLMs) have triggered tremendous success to empower daily life by generative information, and the personalization of LLMs could further contribute to their applications due to better alignment with human intents. Towards personalized generative services, a collaborative cloud-edge methodology sounds promising, as it facilitates the effective orchestration of heterogeneous distributed communication and computing resources. In this article, after discussing the pros and cons of several candidate cloud-edge collaboration techniques, we put forward NetGPT to capably deploy appropriate LLMs at the edge and the cloud in accordance with their computing capacity. In addition, edge LLMs could efficiently leverage location-based information for personalized prompt completion, thus benefiting the interaction with cloud LLMs. After deploying representative open-source LLMs (e.g., GPT-2-base and LLaMA model) at the edge and the cloud, we present the feasibility of NetGPT on the basis of low-rank adaptation-based light-weight fine-tuning. Subsequently, we highlight substantial essential changes required for a native artificial intelligence (AI) network architecture towards NetGPT, with special emphasis on deeper integration of communications and computing resources and careful calibration of logical AI workflow. Furthermore, we demonstrate several by-product benefits of NetGPT, given edge LLM's astonishing capability to predict trends and infer intents, which possibly leads to a unified solution for intelligent network management \& orchestration. In a nutshell, we argue that NetGPT is a promising native-AI network architecture beyond provisioning personalized generative services.
    摘要 大型语言模型(LLM)已经带来了巨大的成功,并且可以强化日常生活中的信息生成,而个人化的LLM可能会进一步应用 Due to better alignment with human intents。面对个人生成服务的个人化,一种 cloud-edge 方法论是可行的,这种方法论可以协调跨多种不同的分布式通信和计算资源。在这篇文章中,我们首先讨论了几种候选的 cloud-edge 合作技术,然后我们提出了 NetGPT,可以将适当的 LLM 部署到云和edge 之间,并且考虑到它们的计算能力。此外,edge LLM 可以充分利用位置基本信息来进行个人化的提示完成,以便与云上的 LLM 进行更好的互动。我们在部署了一些代表性的开源 LLM (例如 GPT-2-base 和 LLaMA 模型)在云和edge 之间时,显示了 NetGPT 的可行性,基于低维度适应的轻量化微调。接下来,我们强调了 NetGPT 的Native AI 网络架构中的重要更改,包括对应用程序的更深入的融合,以及当地的适应和精确的逻辑 AI 工作流程。此外,我们还详细介绍了 NetGPT 的一些副产品优点,例如edge LLM 的惊人的趋势预测和推论能力,这可能导致一个统一的智能网络管理和协调解决方案。简而言之,我们认为 NetGPT 是一个可行的 Native AI 网络架构,不仅提供个人化的生成服务,更重要的是它的副产品优点。

Enhancing ECG Analysis of Implantable Cardiac Monitor Data: An Efficient Pipeline for Multi-Label Classification

  • paper_url: http://arxiv.org/abs/2307.07423
  • repo_url: None
  • paper_authors: Amnon Bleich, Antje Linnemann, Benjamin Jaidi, Björn H Diem, Tim OF Conrad
  • for: 这个研究是为了解决嵌入式心脏监测器(ICM)数据自动分析中的挑战。
  • methods: 这个研究使用了一种新的分类方法,该方法可以在ICM数据上自动分类并提高分类精度。
  • results: 研究发现,新的分类方法可以在ICM数据上提高分类精度,并且比现有的方法更好地处理ICM数据的特殊特征。
    Abstract Implantable Cardiac Monitor (ICM) devices are demonstrating as of today, the fastest-growing market for implantable cardiac devices. As such, they are becoming increasingly common in patients for measuring heart electrical activity. ICMs constantly monitor and record a patient's heart rhythm and when triggered - send it to a secure server where health care professionals (denote HCPs from here on) can review it. These devices employ a relatively simplistic rule-based algorithm (due to energy consumption constraints) to alert for abnormal heart rhythms. This algorithm is usually parameterized to an over-sensitive mode in order to not miss a case (resulting in relatively high false-positive rate) and this, combined with the device's nature of constantly monitoring the heart rhythm and its growing popularity, results in HCPs having to analyze and diagnose an increasingly growing amount of data. In order to reduce the load on the latter, automated methods for ECG analysis are nowadays becoming a great tool to assist HCPs in their analysis. While state-of-the-art algorithms are data-driven rather than rule-based, training data for ICMs often consist of specific characteristics which make its analysis unique and particularly challenging. This study presents the challenges and solutions in automatically analyzing ICM data and introduces a method for its classification that outperforms existing methods on such data. As such, it could be used in numerous ways such as aiding HCPs in the analysis of ECGs originating from ICMs by e.g. suggesting a rhythm type.
    摘要 内置式心脏监测器(ICM)设备已经今天为内置式心脏设备市场带来了最快增长。因此,它们在患者中变得越来越普遍,用于测量心电活动。ICM设备不断监测和记录患者的心跳rhythm,并在触发时将其传输到一个安全的服务器上, где医疗专业人员(以下简称为HCP)可以查看和诊断。这些设备使用一种相对简单的规则基于算法(由于能量消耗限制)来警示异常的心跳rhythm。这个算法通常会被参数化为过敏模式,以确保不会错过任何 случа子(导致相对较高的假阳性率)。由于ICM设备的特殊性和不断增长的普遍性,HCPs需要分析和诊断越来越多的数据。为了减轻后者的负担,自动化ECG分析方法在当今变得越来越重要。尽管现有的状态 arts algorithms是数据驱动的,而不是规则基于的,但ICM设备的训练数据常常具有特定的特征,使其分析变得特殊和困难。本研究描述了ICM数据自动分析的挑战和解决方案,并介绍了一种可以超越现有方法的分类方法。因此,它可以在许多方面 aid HCPs在ICM设备上的ECG分析,例如,建议心跳类型。

Learning Hierarchical Interactive Multi-Object Search for Mobile Manipulation

  • paper_url: http://arxiv.org/abs/2307.06125
  • repo_url: https://github.com/robot-learning-freiburg/HIMOS
  • paper_authors: Fabian Schmalstieg, Daniel Honerkamp, Tim Welschehold, Abhinav Valada
  • for: 这篇论文的目的是解决机器人在无结构化人类中心环境中实现多对象搜索任务。
  • methods: 这篇论文使用了归纳学习方法,把探索、导航和操作技能组合在一起,以解决在未经探索的环境中实现多对象搜索任务。
  • results: 实验和实际应用中的结果表明,HIMOS可以在零个shot情况下在新环境中转移,并能够承受未看过的子策略、执行失败和不同的机器人姿态。这些能力开启了许多下渠任务和实际应用场景。
    Abstract Existing object-search approaches enable robots to search through free pathways, however, robots operating in unstructured human-centered environments frequently also have to manipulate the environment to their needs. In this work, we introduce a novel interactive multi-object search task in which a robot has to open doors to navigate rooms and search inside cabinets and drawers to find target objects. These new challenges require combining manipulation and navigation skills in unexplored environments. We present HIMOS, a hierarchical reinforcement learning approach that learns to compose exploration, navigation, and manipulation skills. To achieve this, we design an abstract high-level action space around a semantic map memory and leverage the explored environment as instance navigation points. We perform extensive experiments in simulation and the real-world that demonstrate that HIMOS effectively transfers to new environments in a zero-shot manner. It shows robustness to unseen subpolicies, failures in their execution, and different robot kinematics. These capabilities open the door to a wide range of downstream tasks across embodied AI and real-world use cases.
    摘要 现有的对象搜索方法可以让机器人在开放的路径上搜索,但是在人类中心环境中,机器人经常需要 manipulate 环境以满足自己的需求。在这种工作中,我们引入了一种新的多对象搜索任务,其中机器人需要打开门来探索房间,并在柜子和抽屉中搜索目标对象。这些新挑战需要机器人结合探索、导航和操作技能。我们提出了一种层次学习策略,称为HIMOS,它可以学习搜索、导航和操作技能的组合。为了实现这一点,我们设计了一个抽象的高级动作空间,基于 semantic map 的快照和已经探索的环境,并利用这些环境作为实例导航点。我们在实验中进行了广泛的 simulate 和实际应用,并证明了 HIMOS 可以在零shot 情况下转移到新环境,并且具有对不见的互聪策略、操作失败和不同机器人骨干的Robustness。这些能力开启了许多下游任务和实际应用场景。

SoK: Comparing Different Membership Inference Attacks with a Comprehensive Benchmark

  • paper_url: http://arxiv.org/abs/2307.06123
  • repo_url: https://github.com/mibench/mibench.github.io
  • paper_authors: Jun Niu, Xiaoyan Zhu, Moxuan Zeng, Ge Zhang, Qingyang Zhao, Chunhui Huang, Yangming Zhang, Suyu An, Yangzhong Wang, Xinghui Yue, Zhipeng He, Weihao Guo, Kuo Shen, Peng Liu, Yulong Shen, Xiaohong Jiang, Jianfeng Ma, Yuqing Zhang
  • for: 本研究旨在提供一个完整的比较不同密盟攻击方法的 benchmark,以帮助研究人员更好地了解不同密盟攻击方法的表现。
  • methods: 本研究使用了15种现状最佳的密盟攻击算法,并在7种广泛使用的数据集和7种常见的模型上进行了784个评估场景的比较。
  • results: 根据我们的评估结果,存在一些已有的比较结果在 литераature中报道的是有误的,我们提出了三个比较方法的原则,并在84个评估场景中测试了这些原则。
    Abstract Membership inference (MI) attacks threaten user privacy through determining if a given data example has been used to train a target model. However, it has been increasingly recognized that the "comparing different MI attacks" methodology used in the existing works has serious limitations. Due to these limitations, we found (through the experiments in this work) that some comparison results reported in the literature are quite misleading. In this paper, we seek to develop a comprehensive benchmark for comparing different MI attacks, called MIBench, which consists not only the evaluation metrics, but also the evaluation scenarios. And we design the evaluation scenarios from four perspectives: the distance distribution of data samples in the target dataset, the distance between data samples of the target dataset, the differential distance between two datasets (i.e., the target dataset and a generated dataset with only nonmembers), and the ratio of the samples that are made no inferences by an MI attack. The evaluation metrics consist of ten typical evaluation metrics. We have identified three principles for the proposed "comparing different MI attacks" methodology, and we have designed and implemented the MIBench benchmark with 84 evaluation scenarios for each dataset. In total, we have used our benchmark to fairly and systematically compare 15 state-of-the-art MI attack algorithms across 588 evaluation scenarios, and these evaluation scenarios cover 7 widely used datasets and 7 representative types of models. All codes and evaluations of MIBench are publicly available at https://github.com/MIBench/MIBench.github.io/blob/main/README.md.
    摘要 Member inference (MI) attacks threaten user privacy by determining whether a given data example has been used to train a target model. However, existing works have serious limitations in their "comparing different MI attacks" methodology. Through our experiments, we found that some comparison results in the literature are misleading. In this paper, we aim to develop a comprehensive benchmark for comparing different MI attacks, called MIBench, which includes both evaluation metrics and evaluation scenarios. We design the evaluation scenarios from four perspectives: the distribution of data samples in the target dataset, the distance between data samples, the difference in distance between two datasets, and the ratio of samples that are not inferred by an MI attack. The evaluation metrics include ten typical metrics. We have established three principles for the proposed "comparing different MI attacks" methodology and have designed and implemented the MIBench benchmark with 84 evaluation scenarios for each dataset. In total, we have used our benchmark to compare 15 state-of-the-art MI attack algorithms across 588 evaluation scenarios, covering 7 widely used datasets and 7 representative types of models. All codes and evaluations of MIBench are publicly available at .

Deep learning for dynamic graphs: models and benchmarks

  • paper_url: http://arxiv.org/abs/2307.06104
  • repo_url: https://github.com/gravins/dynamic_graph_benchmark
  • paper_authors: Alessio Gravina, Davide Bacciu
  • for: 这种研究旨在为实际世界中的变化连接体系进行预测任务准备深度图网络(DGNs)。
  • methods: 这种研究使用了最新的优势来学习时间和空间信息,提供了现有的领域状态对话的全面回顾。
  • results: 该研究对最受欢迎的提议方法进行了公平的性能比较,通过严格的模型选择和评估方法,建立了可靠的基准点 для评估新的架构和方法。
    Abstract Recent progress in research on Deep Graph Networks (DGNs) has led to a maturation of the domain of learning on graphs. Despite the growth of this research field, there are still important challenges that are yet unsolved. Specifically, there is an urge of making DGNs suitable for predictive tasks on realworld systems of interconnected entities, which evolve over time. With the aim of fostering research in the domain of dynamic graphs, at first, we survey recent advantages in learning both temporal and spatial information, providing a comprehensive overview of the current state-of-the-art in the domain of representation learning for dynamic graphs. Secondly, we conduct a fair performance comparison among the most popular proposed approaches, leveraging rigorous model selection and assessment for all the methods, thus establishing a sound baseline for evaluating new architectures and approaches
    摘要 近期研究深度图网络(DGN)的进展,使得图学学习领域得到了成熔。尽管这一研究领域的发展,但还有许多重要的挑战未解决。具体来说,是让DGN适用于真实世界中的连接实体系统,这些系统随着时间的演变而变化。为促进动态图学学习的研究,我们首先审查了最近各种利用时间和空间信息学习的优势,提供了动态图学学习领域的全面概述。其次,我们对最受欢迎的提议方法进行了公正的性能比较,通过严格的模型选择和评估方法,以建立一个坚实的基准 для评估新的建筑和方法。

CLAIMED – the open source framework for building coarse-grained operators for accelerated discovery in science

  • paper_url: http://arxiv.org/abs/2307.06824
  • repo_url: https://github.com/claimed-framework/component-library
  • paper_authors: Romeo Kienzler, Rafflesia Khan, Jerome Nilmeier, Ivan Nesic, Ibrahim Haddad
  • for: Addressing the repeatability and reusability issues in modern data-driven science.
  • methods: Introducing CLAIMED, a framework for building reusable operators and scalable scientific workflows by supporting the scientist to draw from previous work by re-composing workflows from existing libraries of coarse-grained scientific operators.
  • results: CLAIMED is programming language, scientific library, and execution environment agnostic, and has a proven track record in scientific research.
    Abstract In modern data-driven science, reproducibility and reusability are key challenges. Scientists are well skilled in the process from data to publication. Although some publication channels require source code and data to be made accessible, rerunning and verifying experiments is usually hard due to a lack of standards. Therefore, reusing existing scientific data processing code from state-of-the-art research is hard as well. This is why we introduce CLAIMED, which has a proven track record in scientific research for addressing the repeatability and reusability issues in modern data-driven science. CLAIMED is a framework to build reusable operators and scalable scientific workflows by supporting the scientist to draw from previous work by re-composing workflows from existing libraries of coarse-grained scientific operators. Although various implementations exist, CLAIMED is programming language, scientific library, and execution environment agnostic.
    摘要 现代数据驱动科学中,重复性和可重用性是关键挑战。科学家们具备了从数据到发表的过程技能,但有些发表渠道需要提供源代码和数据,但重新运行和验证实验很难,因为缺乏标准。因此, reuse 已经成功地在科学研究中解决了现代数据驱动科学中的重复性和可重用性问题。 reuse 是一个框架,用于构建可重用的操作和可扩展的科学工作流程,帮助科学家从已有的库中粗粒度的科学操作中复用工作流程。尽管有各种实现, reuse 是编程语言、科学库和执行环境不受限制的。

Learning Stochastic Dynamical Systems as an Implicit Regularization with Graph Neural Networks

  • paper_url: http://arxiv.org/abs/2307.06097
  • repo_url: None
  • paper_authors: Jin Guo, Ting Gao, Yufu Lan, Peng Zhang, Sikun Yang, Jinqiao Duan
  • for: 学习高维时序数据,即观察维度具有空间相关性的数据。
  • methods: 使用泊松戈姆比例矩阵嵌入,学习扩散和涨落过程中的噪声效应。
  • results: 提供了一种理论保证,并通过库拉摩托模型生成数据,实验结果表明S-GGNs在比较难以学习的高维时序数据上具有更好的收敛、稳定性和泛化能力。
    Abstract Stochastic Gumbel graph networks are proposed to learn high-dimensional time series, where the observed dimensions are often spatially correlated. To that end, the observed randomness and spatial-correlations are captured by learning the drift and diffusion terms of the stochastic differential equation with a Gumble matrix embedding, respectively. In particular, this novel framework enables us to investigate the implicit regularization effect of the noise terms in S-GGNs. We provide a theoretical guarantee for the proposed S-GGNs by deriving the difference between the two corresponding loss functions in a small neighborhood of weight. Then, we employ Kuramoto's model to generate data for comparing the spectral density from the Hessian Matrix of the two loss functions. Experimental results on real-world data, demonstrate that S-GGNs exhibit superior convergence, robustness, and generalization, compared with state-of-the-arts.
    摘要

Online Laplace Model Selection Revisited

  • paper_url: http://arxiv.org/abs/2307.06093
  • repo_url: None
  • paper_authors: Jihao Andreas Lin, Javier Antorán, José Miguel Hernández-Lobato
  • for: 这个论文是为了提出一种关于神经网络(NN)的闭合形模型选择目标函数的方法,以及一种在线变体,即同时调整 NN 参数和 гипер参数。
  • methods: 这个论文使用了 Laplace 方法,并将其修改为一种可以在线进行的方法,以及一种基于模型修正的变体。
  • results: 该论文显示了在实际应用中,使用 full-batch gradient descent 算法和 online Laplace 方法可以减少过拟合和提高模型的性能。
    Abstract The Laplace approximation provides a closed-form model selection objective for neural networks (NN). Online variants, which optimise NN parameters jointly with hyperparameters, like weight decay strength, have seen renewed interest in the Bayesian deep learning community. However, these methods violate Laplace's method's critical assumption that the approximation is performed around a mode of the loss, calling into question their soundness. This work re-derives online Laplace methods, showing them to target a variational bound on a mode-corrected variant of the Laplace evidence which does not make stationarity assumptions. Online Laplace and its mode-corrected counterpart share stationary points where 1. the NN parameters are a maximum a posteriori, satisfying the Laplace method's assumption, and 2. the hyperparameters maximise the Laplace evidence, motivating online methods. We demonstrate that these optima are roughly attained in practise by online algorithms using full-batch gradient descent on UCI regression datasets. The optimised hyperparameters prevent overfitting and outperform validation-based early stopping.
    摘要 laplace approximation提供了一个关闭式的神经网络(NN)选择目标函数,而在线变体,即同时调整NN参数和 гипер参数,在 bayesian deep learning 社区中又受到了 renovated 的关注。然而,这些方法违背了laplace 方法的核心假设,即在损失函数的拟合中进行假设,这就引入了其准确性的问题。这项工作重新 derivation 了在线 laplace 方法,显示它们是targeting 一种修正后的 laplace 证据 bound,不假设站点性,并且与模式 corrected 的 laplace 证据相关。在线 laplace 和其修正版之间共享站点点,其中1. NN 参数是最大 posteriori,满足 laplace 方法的假设,2. гипер参数最大化 laplace 证据,这种 motivation 是在线方法的。我们示出了在实践中,使用 full-batch 梯度下降,在 UCI 回归数据集上,online 算法可以很好地实现这些最优点。修正的 гипер参数避免过拟合,并且超越验证基于 early stopping 的性能。

Quantitative CLTs in Deep Neural Networks

  • paper_url: http://arxiv.org/abs/2307.06092
  • repo_url: None
  • paper_authors: Stefano Favaro, Boris Hanin, Domenico Marinucci, Ivan Nourdin, Giovanni Peccati
  • for: 这个论文是为了研究一个全连接神经网络的分布而写的。
  • methods: 这个论文使用了随机 Gaussian 权重和偏置来研究神经网络的分布。
  • results: 论文提出了一些关于神经网络的正负样本分布的量化 bounds,这些 bounds 表明在大于零的 $n$ 下,神经网络的分布与无限宽 Gaussian 过程的分布之间的距离 scales like $n^{- \gamma}$,其中 $\gamma > 0$ 是一个小于一个常数的数。这些 bounds 比前一个文献中的 bounds 更加严格,在一维情况下,我们还证明了它们是优秀的,即我们提出了匹配的下界。
    Abstract We study the distribution of a fully connected neural network with random Gaussian weights and biases in which the hidden layer widths are proportional to a large constant $n$. Under mild assumptions on the non-linearity, we obtain quantitative bounds on normal approximations valid at large but finite $n$ and any fixed network depth. Our theorems show both for the finite-dimensional distributions and the entire process, that the distance between a random fully connected network (and its derivatives) to the corresponding infinite width Gaussian process scales like $n^{-\gamma}$ for $\gamma>0$, with the exponent depending on the metric used to measure discrepancy. Our bounds are strictly stronger in terms of their dependence on network width than any previously available in the literature; in the one-dimensional case, we also prove that they are optimal, i.e., we establish matching lower bounds.
    摘要 我们研究了一个完全连接的神经网络,其Random Gaussian weights和biases的分布。我们的假设是宽度是一个大常数n的隐藏层。我们获得了在大于0的γ的强制下的量化上限,这些上限适用于任何固定的网络深度。我们的定理表明,在Random fully connected network和其导数的距离与相应的无限宽 Gaussian process之间的距离成正比于n^-γ,其中γ>0,这取决于用来衡量偏差的度量。我们的上限是文献中已知的宽度依赖性更加严格,在一维情况下,我们还证明了它们是优化的,即我们设置了匹配的下界。

Assessment of the suitability of degradation models for the planning of CCTV inspections of sewer pipes

  • paper_url: http://arxiv.org/abs/2307.06341
  • repo_url: https://github.com/fidaeic/sewer-pred
  • paper_authors: Fidae El Morer, Stefan Wittek, Andreas Rausch
  • for: 这种研究的目的是为了制定更有效的管道维护计划,以避免管道腐蚀的经济、环境和健康问题。
  • methods: 这种方法使用了统计学和机器学习方法来评估管道腐蚀的模型,包括精度指标、长期腐蚀曲线的生成能力和可解释性。
  • results: 结果表明, ensemble 模型具有最高精度,但无法推断管道的长期腐蚀趋势,而логистиック回归模型具有轻度下降的精度,但能够生成高可解释性的长期腐蚀曲线。
    Abstract The degradation of sewer pipes poses significant economical, environmental and health concerns. The maintenance of such assets requires structured plans to perform inspections, which are more efficient when structural and environmental features are considered along with the results of previous inspection reports. The development of such plans requires degradation models that can be based on statistical and machine learning methods. This work proposes a methodology to assess their suitability to plan inspections considering three dimensions: accuracy metrics, ability to produce long-term degradation curves and explainability. Results suggest that although ensemble models yield the highest accuracy, they are unable to infer the long-term degradation of the pipes, whereas the Logistic Regression offers a slightly less accurate model that is able to produce consistent degradation curves with a high explainability. A use case is presented to demonstrate this methodology and the efficiency of model-based planning compared to the current inspection plan.
    摘要 排水管道的衰化带来经济、环境和健康问题。维护这些资产需要结构化的计划,包括考虑结构和环境特征以及上一次检测报告的结果。开发这些计划需要衰化模型,这些模型可以基于统计学和机器学习方法。这项工作提出一种方法来评估这些模型的适用性,包括三个维度:准确度指标、能够生成长期衰化曲线和可解释性。结果表明, ensemble 模型具有最高准确度,但是它们无法描述管道的长期衰化趋势,而逻辑回归方法则提供了一个微妙的不准确的模型,但它能够生成高度可解释的衰化曲线。一个使用 случа例介绍了这种方法和模型基于的规划的效率,与现有的检测计划相比。

Efficient and Joint Hyperparameter and Architecture Search for Collaborative Filtering

  • paper_url: http://arxiv.org/abs/2307.11004
  • repo_url: https://github.com/overwenyan/joint-search
  • paper_authors: Yan Wen, Chen Gao, Lingling Yi, Liwei Qiu, Yaqing Wang, Yong Li
  • for: 本研究旨在透过自动机器学习(AutoML)技术设计协同推荐(CF)模型,并将架构和参数搜索视为一体化进行。
  • methods: 本研究提出了一个两阶段搜索算法,首先利用子集数据来范畴搜索空间,然后使用整体理解个别参数的知识来范畴搜索。
  • results: 实验结果显示, compared with手动设计和先前搜索的模型,本研究的搜索架构可以更好地运行,并且可以更好地适应实际应用中的挑战。
    Abstract Automated Machine Learning (AutoML) techniques have recently been introduced to design Collaborative Filtering (CF) models in a data-specific manner. However, existing works either search architectures or hyperparameters while ignoring the fact they are intrinsically related and should be considered together. This motivates us to consider a joint hyperparameter and architecture search method to design CF models. However, this is not easy because of the large search space and high evaluation cost. To solve these challenges, we reduce the space by screening out usefulness yperparameter choices through a comprehensive understanding of individual hyperparameters. Next, we propose a two-stage search algorithm to find proper configurations from the reduced space. In the first stage, we leverage knowledge from subsampled datasets to reduce evaluation costs; in the second stage, we efficiently fine-tune top candidate models on the whole dataset. Extensive experiments on real-world datasets show better performance can be achieved compared with both hand-designed and previous searched models. Besides, ablation and case studies demonstrate the effectiveness of our search framework.
    摘要 自动机器学习(AutoML)技术最近被引入设计共同推荐(CF)模型的设计中。然而,现有的工作都是搜索 архитектуры或超参数而忽略了它们是内在相关的,应该一起考虑。这种情况引发我们考虑一种共同搜索超参数和architecture的方法来设计CF模型。然而,这并不容易,因为搜索空间很大,评估成本高。为解决这些挑战,我们将搜索空间减少,通过对各个超参数的含义进行全面理解,排除无用的超参数选择。然后,我们提议一种两阶段搜索算法,在第一阶段,利用子样本数据来减少评估成本,在第二阶段,高效地精化top候选模型。广泛的实验表明,我们的搜索框架可以比手动设计和先前搜索的模型表现更好。此外,剖除和案例研究表明我们的搜索框架的有效性。

Interpreting deep embeddings for disease progression clustering

  • paper_url: http://arxiv.org/abs/2307.06060
  • repo_url: None
  • paper_authors: Anna Munoz-Farre, Antonios Poulakakis-Daktylidis, Dilini Mahesha Kothalawala, Andrea Rodriguez-Martinez
  • for: 这个论文是为了解释深度嵌入的应用在患者划分中。
  • methods: 这个论文使用了一种新的方法来解释深度嵌入,并在UK Biobank数据集上进行了评估。
  • results: 研究发现,使用这种方法可以提供有价值的医学意义的疾病进程 patrern。
    Abstract We propose a novel approach for interpreting deep embeddings in the context of patient clustering. We evaluate our approach on a dataset of participants with type 2 diabetes from the UK Biobank, and demonstrate clinically meaningful insights into disease progression patterns.
    摘要 我们提出了一种新的方法来解释深度嵌入在患者划分中的应用。我们在UK Biobank中的 participant数据集上进行了评估,并发现了临床意义的疾病进程模式。Here's the word-for-word translation:我们提出了一种新的方法来解释深度嵌入在患者划分中的应用。我们在UK Biobank中的 participant数据集上进行了评估,并发现了临床意义的疾病进程模式。Note that the word "UK Biobank" is not a commonly used term in Simplified Chinese, so I've translated it as " participated in the UK Biobank".

Machine Learning for Autonomous Vehicle’s Trajectory Prediction: A comprehensive survey, Challenges, and Future Research Directions

  • paper_url: http://arxiv.org/abs/2307.07527
  • repo_url: None
  • paper_authors: Vibha Bharilya, Neetesh Kumar
  • for: 本文旨在探讨自动驾驶车辆(AV)的轨迹预测方法,尤其是基于机器学习技术的深度学习和奖励学习方法。
  • methods: 本文综述了许多关于AV轨迹预测的研究,包括深度学习和奖励学习等机器学习技术。
  • results: 本文对许多研究进行了详细的分析和评价,并提出了未来研究方向。
    Abstract Autonomous Vehicles (AVs) have emerged as a promising solution by replacing human drivers with advanced computer-aided decision-making systems. However, for AVs to effectively navigate the road, they must possess the capability to predict the future behavior of nearby traffic participants, similar to the predictive driving abilities of human drivers. Building upon existing literature is crucial to advance the field and develop a comprehensive understanding of trajectory prediction methods in the context of automated driving. To address this need, we have undertaken a comprehensive review that focuses on trajectory prediction methods for AVs, with a particular emphasis on machine learning techniques including deep learning and reinforcement learning-based approaches. We have extensively examined over two hundred studies related to trajectory prediction in the context of AVs. The paper begins with an introduction to the general problem of predicting vehicle trajectories and provides an overview of the key concepts and terminology used throughout. After providing a brief overview of conventional methods, this review conducts a comprehensive evaluation of several deep learning-based techniques. Each method is summarized briefly, accompanied by a detailed analysis of its strengths and weaknesses. The discussion further extends to reinforcement learning-based methods. This article also examines the various datasets and evaluation metrics that are commonly used in trajectory prediction tasks. Encouraging an unbiased and objective discussion, we compare two major learning processes, considering specific functional features. By identifying challenges in the existing literature and outlining potential research directions, this review significantly contributes to the advancement of knowledge in the domain of AV trajectory prediction.
    摘要 自动驾驶车(AV)已经出现为可能的解决方案,替代人类驾驶员使用高级计算机辅助决策系统。然而,为AV navigation道路,它们必须具备预测周围交通参与者的未来行为能力,类似于人类驾驶员的预测驾驶能力。 builds upon existing literature is crucial to advance the field and develop a comprehensive understanding of trajectory prediction methods in the context of automated driving. To address this need, we have undertaken a comprehensive review that focuses on trajectory prediction methods for AVs, with a particular emphasis on machine learning techniques including deep learning and reinforcement learning-based approaches. We have extensively examined over two hundred studies related to trajectory prediction in the context of AVs. The paper begins with an introduction to the general problem of predicting vehicle trajectories and provides an overview of the key concepts and terminology used throughout. After providing a brief overview of conventional methods, this review conducts a comprehensive evaluation of several deep learning-based techniques. Each method is summarized briefly, accompanied by a detailed analysis of its strengths and weaknesses. The discussion further extends to reinforcement learning-based methods. This article also examines the various datasets and evaluation metrics that are commonly used in trajectory prediction tasks. Encouraging an unbiased and objective discussion, we compare two major learning processes, considering specific functional features. By identifying challenges in the existing literature and outlining potential research directions, this review significantly contributes to the advancement of knowledge in the domain of AV trajectory prediction.Here's the translation in Traditional Chinese:自动驾驶车(AV)已经出现为可能的解决方案,替代人类驾驶员使用高级计算机辅助决策系统。然而,为AV navigate道路,它们必须具备预测周围交通参加者的未来行为能力,类似于人类驾驶员的预测驾驶能力。 builds upon existing literature is crucial to advance the field and develop a comprehensive understanding of trajectory prediction methods in the context of automated driving. To address this need, we have undertaken a comprehensive review that focuses on trajectory prediction methods for AVs, with a particular emphasis on machine learning techniques including deep learning and reinforcement learning-based approaches. We have extensively examined over two hundred studies related to trajectory prediction in the context of AVs. The paper begins with an introduction to the general problem of predicting vehicle trajectories and provides an overview of the key concepts and terminology used throughout. After providing a brief overview of conventional methods, this review conducts a comprehensive evaluation of several deep learning-based techniques. Each method is summarized briefly, accompanied by a detailed analysis of its strengths and weaknesses. The discussion further extends to reinforcement learning-based methods. This article also examines the various datasets and evaluation metrics that are commonly used in trajectory prediction tasks. Encouraging an unbiased and objective discussion, we compare two major learning processes, considering specific functional features. By identifying challenges in the existing literature and outlining potential research directions, this review significantly contributes to the advancement of knowledge in the domain of AV trajectory prediction.

Function-Space Regularization for Deep Bayesian Classification

  • paper_url: http://arxiv.org/abs/2307.06055
  • repo_url: https://github.com/Aryia-Behroziuan/neurons
  • paper_authors: Jihao Andreas Lin, Joe Watson, Pascal Klink, Jan Peters
  • for: 这个研究旨在提高深度学习模型中的uncertainty量化和信任度,并避免过度自信和难以预测的行为。
  • methods: 这个研究使用Dirichlet几何统计学来做假设空间的扩散推断,并将其应用到不同的深度学习模型上。
  • results: 这个研究的结果显示了这种方法可以提高图像识别 tasks的uncertainty量化和防火墙性能,并且可以与不同的深度学习模型搭配使用。
    Abstract Bayesian deep learning approaches assume model parameters to be latent random variables and infer posterior distributions to quantify uncertainty, increase safety and trust, and prevent overconfident and unpredictable behavior. However, weight-space priors are model-specific, can be difficult to interpret and are hard to specify. Instead, we apply a Dirichlet prior in predictive space and perform approximate function-space variational inference. To this end, we interpret conventional categorical predictions from stochastic neural network classifiers as samples from an implicit Dirichlet distribution. By adapting the inference, the same function-space prior can be combined with different models without affecting model architecture or size. We illustrate the flexibility and efficacy of such a prior with toy experiments and demonstrate scalability, improved uncertainty quantification and adversarial robustness with large-scale image classification experiments.
    摘要

Online Inventory Problems: Beyond the i.i.d. Setting with Online Convex Optimization

  • paper_url: http://arxiv.org/abs/2307.06048
  • repo_url: None
  • paper_authors: Massil Hihat, Stéphane Gaïffas, Guillaume Garrigos, Simon Bussy
  • for: 管理者面临多产品存储控制问题,以尽量减少累累损失。
  • methods: 提出MaxCOSD算法,具有可证明的保证,能够应对非新闻式损失、状态变化和非同异常变量。
  • results: 提出非特征假设,以便学习。
    Abstract We study multi-product inventory control problems where a manager makes sequential replenishment decisions based on partial historical information in order to minimize its cumulative losses. Our motivation is to consider general demands, losses and dynamics to go beyond standard models which usually rely on newsvendor-type losses, fixed dynamics, and unrealistic i.i.d. demand assumptions. We propose MaxCOSD, an online algorithm that has provable guarantees even for problems with non-i.i.d. demands and stateful dynamics, including for instance perishability. We consider what we call non-degeneracy assumptions on the demand process, and argue that they are necessary to allow learning.
    摘要 我们研究多种产品存储控制问题,其中管理者根据历史信息进行顺序充备决策,以最小化总损失。我们的动机是考虑通用的需求、损失和动态,以超越标准模型,这些模型通常基于新闻 vendor 类型的损失、固定动态和不实际的 i.i.d. 需求假设。我们提出 MaxCOSD,一种在线算法,具有证明的保证,包括非 i.i.d. 需求和状态动态。我们认为非 degeneracy 假设对需求过程是必要的,以便学习。

  • paper_url: http://arxiv.org/abs/2307.06046
  • repo_url: None
  • paper_authors: Jincheng Zhou, Beatrice Bevilacqua, Bruno Ribeiro
  • for: 预测扩展到新的测试图表中缺失的链接(关系),尤其是在面对新的节点和关系类型的外部数据(OOD)时。
  • methods: 基于双交换性(节点与关系类型)的理论概念,与传统的关系学习方法不同,我们提出了一种OOD链接预测方法。
  • results: 我们的方法可以有效地泛化到完全新的关系类型,无需访问额外信息,在实际数据集上实现了显著的性能提升。
    Abstract The task of inductive link prediction in (discrete) attributed multigraphs infers missing attributed links (relations) between nodes in new test multigraphs. Traditional relational learning methods face the challenge of limited generalization to OOD test multigraphs containing both novel nodes and novel relation types not seen in training. Recently, under the only assumption that all relation types share the same structural predictive patterns (single task), Gao et al. (2023) proposed an OOD link prediction method using the theoretical concept of double exchangeability (for nodes & relation types), in contrast to the (single) exchangeability (only for nodes) used to design Graph Neural Networks (GNNs). In this work we further extend the double exchangeability concept to multi-task double exchangeability, where we define link prediction in attributed multigraphs that can have distinct and potentially conflicting predictive patterns for different sets of relation types (multiple tasks). Our empirical results on real-world datasets demonstrate that our approach can effectively generalize to entirely new relation types in test, without access to additional information, yielding significant performance improvements over existing methods.
    摘要 这个任务是在抽象的对称多边Graph中预测缺失的关联(关系)。传统的关系学习方法面临新的外部测试多边Graph中的限定应用。在2023年,高等等(Gao et al.)提出了一种外部链接预测方法,基于关联类型之间的同structural predictive pattern(单一任务)。在这个工作中,我们进一步扩展了双交换性概念(for nodes & relation types),并定义了在具有不同任务的对称多边Graph中进行预测。我们的实验结果显示,我们的方法可以对于整个新的关联类型进行有效的扩展,无需进一步的训练或资讯,实现了与现有方法的 significiant performance improvement。

Rhythm Modeling for Voice Conversion

  • paper_url: http://arxiv.org/abs/2307.06040
  • repo_url: https://github.com/bshall/urhythmic
  • paper_authors: Benjamin van Niekerk, Marc-André Carbonneau, Herman Kamper
  • for: 这篇论文的目的是提出一种不需要平行数据或文本译写的无监督语音变换方法,以改善语音识别的感知。
  • methods: 该方法首先将源语音分成不同类型的段落,包括声门声、塞音声和空格声。然后,它使用自我监督表示来模型语音的节奏,并将目标语音的节奏与源语音的节奏匹配。
  • results: 实验结果表明,Urhythmic方法在质量和语音落幕方面表现更好于现有的无监督方法。代码和检查点:https://github.com/bshall/urhythmic。音频 demo 页面:https://ubisoft-laforge.github.io/speech/urhythmic。
    Abstract Voice conversion aims to transform source speech into a different target voice. However, typical voice conversion systems do not account for rhythm, which is an important factor in the perception of speaker identity. To bridge this gap, we introduce Urhythmic-an unsupervised method for rhythm conversion that does not require parallel data or text transcriptions. Using self-supervised representations, we first divide source audio into segments approximating sonorants, obstruents, and silences. Then we model rhythm by estimating speaking rate or the duration distribution of each segment type. Finally, we match the target speaking rate or rhythm by time-stretching the speech segments. Experiments show that Urhythmic outperforms existing unsupervised methods in terms of quality and prosody. Code and checkpoints: https://github.com/bshall/urhythmic. Audio demo page: https://ubisoft-laforge.github.io/speech/urhythmic.
    摘要 声音转换目标是将源语音转换成不同的目标声音。然而,常见的声音转换系统不会考虑节奏,这是声音认知的重要因素。为了弥足这一差距,我们介绍了 Urhythmic,一种不需要平行数据或文本译写的不supervised方法 для节奏转换。我们首先使用自我supervised表示来将源音频分成sonorants、obstruents和沟通 silence 等多个段落。然后,我们使用计算speaking rate或每段类型的duration分布来模拟节奏。最后,我们使用时间压缩来匹配目标speaking rate或节奏。实验表明,Urhythmic在质量和语调方面与现有的不supervised方法相比,表现出色。代码和Checkpoint:https://github.com/bshall/urhythmic。音频 demo 页面:https://ubisoft-laforge.github.io/speech/urhythmic。

Learning from Exemplary Explanations

  • paper_url: http://arxiv.org/abs/2307.06026
  • repo_url: https://github.com/Sfedfcv/redesigned-pancake
  • paper_authors: Misgina Tsighe Hagos, Kathleen M. Curran, Brian Mac Namee
    for:这篇论文旨在提高Interactive Machine Learning(IML)中的模型解释approach的效果,使得模型更加透明和可解释。methods:该论文使用了两个输入实例和它们对应的Gradient Weighted Class Activation Mapping(GradCAM)模型解释作为示例来实现XBL。results:该论文使用了医学图像分类任务,并通过最小化人工输入,实现了改进的解释 (+0.02, +3%)和降低分类性能 (-0.04, -4%),与不使用交互的模型相比。
    Abstract eXplanation Based Learning (XBL) is a form of Interactive Machine Learning (IML) that provides a model refining approach via user feedback collected on model explanations. Although the interactivity of XBL promotes model transparency, XBL requires a huge amount of user interaction and can become expensive as feedback is in the form of detailed annotation rather than simple category labelling which is more common in IML. This expense is exacerbated in high stakes domains such as medical image classification. To reduce the effort and expense of XBL we introduce a new approach that uses two input instances and their corresponding Gradient Weighted Class Activation Mapping (GradCAM) model explanations as exemplary explanations to implement XBL. Using a medical image classification task, we demonstrate that, using minimal human input, our approach produces improved explanations (+0.02, +3%) and achieves reduced classification performance (-0.04, -4%) when compared against a model trained without interactions.
    摘要 <>TRANSLATE_TEXT explaination_based_learning 是一种 interactive_machine_learning 方法,它通过用户反馈来修改模型。 although explaination_based_learning 提高了模型透明度,它需要很大量的用户交互,这会变得昂贵,特别是在高度重要的领域,如医学图像分类。 为了减少 XBL 的努力和成本,我们介绍了一种新的方法,该方法使用两个输入实例和它们对应的 Gradient Weighted Class Activation Mapping 模型解释作为示例来实现 XBL。 使用医学图像分类任务,我们示示了,使用最小的人工输入,我们的方法可以生成改进的解释 (+0.02, +3%),并实现了降低分类性能 (-0.04, -4%),比一个没有交互的模型更好。Note: The translation is in Simplified Chinese, which is the standard form of Chinese used in mainland China and Singapore. If you need Traditional Chinese, please let me know.

An Effective and Efficient Time-aware Entity Alignment Framework via Two-aspect Three-view Label Propagation

  • paper_url: http://arxiv.org/abs/2307.06013
  • repo_url: None
  • paper_authors: Li Cai, Xin Mao, Youshao Xiao, Changxu Wu, Man Lan
  • for: 提高知识融合的实体对应关系检索
  • methods: 非神经网络方法,包括两个视角三元标签协力、稀疏相似度与时间约束、汇聚运算和时间迭代学习
  • results: 与状态艺术方法相比,提高实体对应关系检索的性能,并且时间占用只有毫秒级,比最高效TEA方法的10%左右
    Abstract Entity alignment (EA) aims to find the equivalent entity pairs between different knowledge graphs (KGs), which is crucial to promote knowledge fusion. With the wide use of temporal knowledge graphs (TKGs), time-aware EA (TEA) methods appear to enhance EA. Existing TEA models are based on Graph Neural Networks (GNN) and achieve state-of-the-art (SOTA) performance, but it is difficult to transfer them to large-scale TKGs due to the scalability issue of GNN. In this paper, we propose an effective and efficient non-neural EA framework between TKGs, namely LightTEA, which consists of four essential components: (1) Two-aspect Three-view Label Propagation, (2) Sparse Similarity with Temporal Constraints, (3) Sinkhorn Operator, and (4) Temporal Iterative Learning. All of these modules work together to improve the performance of EA while reducing the time consumption of the model. Extensive experiments on public datasets indicate that our proposed model significantly outperforms the SOTA methods for EA between TKGs, and the time consumed by LightTEA is only dozens of seconds at most, no more than 10% of the most efficient TEA method.
    摘要 Entity alignment (EA) 目标是找到不同知识 graphs (KGs) 中相对应的实体对,这对知识融合提供了重要的支持。随着时间知识 graphs (TKGs) 的广泛使用,时间意识 EA (TEA) 方法得到了提高 EA 的可能性。现有的 TEA 模型基于图神经网络 (GNN) ,实现了状态之巅 (SOTA) 性能,但是将其应用到大规模 TKGs 上却存在可插运行性问题。在这篇论文中,我们提出了一种高效和高效的非神经 EA 框架 между TKGs,即 LightTEA,它包括以下四个基本组件:1. 两个方面三个视图标签卷积2. 稀疏相似度 WITH 时间约束3. Sinkhorn 算子4. 时间迭代学习这些模块结合起来,可以提高 EA 的性能,同时降低模型的时间消耗。我们在公共数据集上进行了广泛的实验,发现我们提出的模型在 EA between TKGs 方面具有显著的优势,并且模型的时间消耗只有毫秒级,最多只有最高效 TEA 方法的 10%。

What Happens During Finetuning of Vision Transformers: An Invariance Based Investigation

  • paper_url: http://arxiv.org/abs/2307.06006
  • repo_url: None
  • paper_authors: Gabriele Merlin, Vedant Nanda, Ruchit Rawal, Mariya Toneva
  • for: 这篇论文探讨了预训练-精度调整模式下的模型性能提升问题,并提出了新的度量来评估预训练模型中吸收的特征是否被细化或忘记。
  • methods: 作者使用了多个 benchmark 数据集和任务来研究预训练视Transformers 和其精度调整版本之间的关系。他们还提出了一些新的度量来评估预训练模型中吸收的特征是否被细化或忘记。
  • results: 研究发现,预训练可以带来跨任务的特征转移,且这种特征转移主要发生在预训练模型的浅层。此外,预训练模型的深层特征会在精度调整过程中压缩到浅层。这些发现可以帮助我们更好地理解预训练模型的成功原因和精度调整过程中的变化。
    Abstract The pretrain-finetune paradigm usually improves downstream performance over training a model from scratch on the same task, becoming commonplace across many areas of machine learning. While pretraining is empirically observed to be beneficial for a range of tasks, there is not a clear understanding yet of the reasons for this effect. In this work, we examine the relationship between pretrained vision transformers and the corresponding finetuned versions on several benchmark datasets and tasks. We present new metrics that specifically investigate the degree to which invariances learned by a pretrained model are retained or forgotten during finetuning. Using these metrics, we present a suite of empirical findings, including that pretraining induces transferable invariances in shallow layers and that invariances from deeper pretrained layers are compressed towards shallower layers during finetuning. Together, these findings contribute to understanding some of the reasons for the successes of pretrained models and the changes that a pretrained model undergoes when finetuned on a downstream task.
    摘要 通常情况下,预训练-精度调整方法会提高下游任务的性能,这种方法在多个机器学习领域变得普遍。虽然预训练的效果在多种任务上被观察到,但还没有一个明确的理解,预训练对这些效果的原因。在这项工作中,我们研究预训练的视图转换器和相应的精度调整版本在几个 benchmark 数据集和任务上的关系。我们提出了新的指标,专门探讨预训练模型中学习的不变性是否在精度调整过程中被保留或忘记。使用这些指标,我们提出了一组实验结果,包括预训练层的权重会导致预训练层的不变性被传递到下游任务中,以及深层预训练层的不变性会在精度调整过程中压缩到浅层。总的来说,这些发现对预训练模型的成功和精度调整过程中的变化做出了贡献。

DDNAS: Discretized Differentiable Neural Architecture Search for Text Classification

  • paper_url: http://arxiv.org/abs/2307.06005
  • repo_url: https://github.com/ddnas/ddnas
  • paper_authors: Kuan-Chun Chen, Cheng-Te Li, Kuo-Jung Lee
  • for: 文本表示学习中的Neural Architecture Search(NAS)方法可以提供更好的表示能力。
  • methods: 本文提出了一种新的NAS方法,即Discretized Differentiable Neural Architecture Search(DDNAS),可以用于文本表示学习和分类。DDNAS使用了连续的权重下降来优化搜索,同时通过最大化相互信息来增加搜索节点的拓扑结构,以模型文本输入的层次分类。
  • results: 在八种真实数据集上进行了广泛的实验,DDNAS可以一致性地超越现有的NAS方法。尽管DDNAS只使用了三种基本操作(即卷积、聚合和none)作为NAS建构块的候选者,但其表现良好并可以进一步提高通过添加更多不同的操作。
    Abstract Neural Architecture Search (NAS) has shown promising capability in learning text representation. However, existing text-based NAS neither performs a learnable fusion of neural operations to optimize the architecture, nor encodes the latent hierarchical categorization behind text input. This paper presents a novel NAS method, Discretized Differentiable Neural Architecture Search (DDNAS), for text representation learning and classification. With the continuous relaxation of architecture representation, DDNAS can use gradient descent to optimize the search. We also propose a novel discretization layer via mutual information maximization, which is imposed on every search node to model the latent hierarchical categorization in text representation. Extensive experiments conducted on eight diverse real datasets exhibit that DDNAS can consistently outperform the state-of-the-art NAS methods. While DDNAS relies on only three basic operations, i.e., convolution, pooling, and none, to be the candidates of NAS building blocks, its promising performance is noticeable and extensible to obtain further improvement by adding more different operations.
    摘要 neural architecture search (NAS) 显示了可观的能力在文本表示学习中。然而,现有的文本基于的 NAS neither performs a learnable fusion of neural operations to optimize the architecture, nor encodes the latent hierarchical categorization behind text input. This paper presents a novel NAS method, Discretized Differentiable Neural Architecture Search (DDNAS), for text representation learning and classification. With the continuous relaxation of architecture representation, DDNAS can use gradient descent to optimize the search. We also propose a novel discretization layer via mutual information maximization, which is imposed on every search node to model the latent hierarchical categorization in text representation. Extensive experiments conducted on eight diverse real datasets exhibit that DDNAS can consistently outperform the state-of-the-art NAS methods. While DDNAS relies on only three basic operations, i.e., convolution, pooling, and none, to be the candidates of NAS building blocks, its promising performance is noticeable and extensible to obtain further improvement by adding more different operations.Here is the word-for-word translation of the text into Simplified Chinese: neural architecture search (NAS) 显示了可观的能力在文本表示学习中。然而,现有的文本基于的 NAS neither performs a learnable fusion of neural operations to optimize the architecture, nor encodes the latent hierarchical categorization behind text input. This paper presents a novel NAS method, Discretized Differentiable Neural Architecture Search (DDNAS), for text representation learning and classification. With the continuous relaxation of architecture representation, DDNAS can use gradient descent to optimize the search. We also propose a novel discretization layer via mutual information maximization, which is imposed on every search node to model the latent hierarchical categorization in text representation. Extensive experiments conducted on eight diverse real datasets exhibit that DDNAS can consistently outperform the state-of-the-art NAS methods. While DDNAS relies on only three basic operations, i.e., convolution, pooling, and none, to be the candidates of NAS building blocks, its promising performance is noticeable and extensible to obtain further improvement by adding more different operations.

Pathway: a fast and flexible unified stream data processing framework for analytical and Machine Learning applications

  • paper_url: http://arxiv.org/abs/2307.13116
  • repo_url: None
  • paper_authors: Michal Bartoszkiewicz, Jan Chorowski, Adrian Kosowski, Jakub Kowalski, Sergey Kulik, Mateusz Lewandowski, Krzysztof Nowicki, Kamil Piechowiak, Olivier Ruas, Zuzanna Stamirowska, Przemyslaw Uznanski
  • for: 这个论文是为了解决物理经济数据流处理中的挑战,包括互联网物联网和企业系统生成的数据流。
  • methods: 这个论文使用了一种新的统一数据处理框架,叫做Pathway,可以在 bounded和unbounded数据流中运行工作负荷。Pathway使用了Python和Python/SQL工作流程的表格API,并由分布式增量数据流程在Rust中实现。
  • results: 作者们present了Pathway的系统和benchmarking结果,表明它在批处理和流处理上能够超过现有的行业框架。此外,Pathway还可以处理一些现有框架无法解决的流处理用例,如流式迭代图算法(PageRank等)。
    Abstract We present Pathway, a new unified data processing framework that can run workloads on both bounded and unbounded data streams. The framework was created with the original motivation of resolving challenges faced when analyzing and processing data from the physical economy, including streams of data generated by IoT and enterprise systems. These required rapid reaction while calling for the application of advanced computation paradigms (machinelearning-powered analytics, contextual analysis, and other elements of complex event processing). Pathway is equipped with a Table API tailored for Python and Python/SQL workflows, and is powered by a distributed incremental dataflow in Rust. We describe the system and present benchmarking results which demonstrate its capabilities in both batch and streaming contexts, where it is able to surpass state-of-the-art industry frameworks in both scenarios. We also discuss streaming use cases handled by Pathway which cannot be easily resolved with state-of-the-art industry frameworks, such as streaming iterative graph algorithms (PageRank, etc.).
    摘要 我们介绍Pathway,一个新的统一数据处理框架,可以处理 bounded和unbounded数据流。这个框架由physical economy中数据分析和处理难题的原动机而生,包括来自互联网东西和企业系统的数据流。这些数据流需要快速应对,并且需要应用高级计算 paradigms(机器学习驱动分析、上下文分析和其他复杂事件处理)。Pathway具有适合Python和Python/SQL工作流的表格API,并由分布式增量数据流在Rust中实现。我们描述了这个系统,并提供了对其在批处理和流处理上下文中的性能测试结果,其中它在两个场景中都能够超越当前行业框架。我们还讨论了由Pathway处理的流处理用例,包括流行 iterate 图算法(PageRank等),这些用例无法轻松地通过当前行业框架解决。

A Comprehensive Review of Automated Data Annotation Techniques in Human Activity Recognition

  • paper_url: http://arxiv.org/abs/2307.05988
  • repo_url: None
  • paper_authors: Florenc Demrozi, Cristian Turetta, Fadi Al Machot, Graziano Pravadelli, Philipp H. Kindt
    for: 这篇论文的目的是为了提供关于人体活动识别(HAR)数据注释技术的系统性回顾。methods: 本论文使用了分类法将现有的方法分为不同的类别,并提供了一个分类法,以帮助选择适用于给定场景的技术。results: 本论文提供了关于HAR数据注释技术的系统性回顾,并将现有的方法分为不同的类别,以便在不同的场景中选择适用的技术。
    Abstract Human Activity Recognition (HAR) has become one of the leading research topics of the last decade. As sensing technologies have matured and their economic costs have declined, a host of novel applications, e.g., in healthcare, industry, sports, and daily life activities have become popular. The design of HAR systems requires different time-consuming processing steps, such as data collection, annotation, and model training and optimization. In particular, data annotation represents the most labor-intensive and cumbersome step in HAR, since it requires extensive and detailed manual work from human annotators. Therefore, different methodologies concerning the automation of the annotation procedure in HAR have been proposed. The annotation problem occurs in different notions and scenarios, which all require individual solutions. In this paper, we provide the first systematic review on data annotation techniques for HAR. By grouping existing approaches into classes and providing a taxonomy, our goal is to support the decision on which techniques can be beneficially used in a given scenario.
    摘要 人类活动识别(HAR)在过去一个 décennial 内成为了研究领域的主导话题之一。随着感知技术的成熔和经济成本的下降,一系列的新应用,如医疗、工业、运动和日常生活活动,在人类活动识别领域得到了广泛的应用。人类活动识别系统的设计需要不同的时间consuming 的处理步骤,如数据收集、注释、模型训练和优化。特别是数据注释是人类活动识别中最劳力占用和繁琐的步骤,因为它需要大量的人工注释员进行详细的手动工作。因此,不同的方法和技术在人类活动识别中自动注释的问题上提出了多种方法。这些问题在不同的概念和场景下都需要具体的解决方案。本文是人类活动识别领域的首次系统性的文献评论。我们将现有的方法分类并提供了一个分类法,以支持在给定的场景下选择合适的技术。

Transformers in Reinforcement Learning: A Survey

  • paper_url: http://arxiv.org/abs/2307.05979
  • repo_url: None
  • paper_authors: Pranav Agarwal, Aamer Abdul Rahman, Pierre-Luc St-Charles, Simon J. D. Prince, Samira Ebrahimi Kahou
  • for: 这篇论文探讨了如何使用 transformers 来解决 reinforcement learning 中的挑战,包括不稳定的训练、归因问题、不可解释性和部分可见性。
  • methods: 这篇论文详细介绍了 transformers 的性质和其变体,并讲解了它们在 reinforcement learning 中的应用,包括表示学习、过程和奖励函数模型化以及策略优化。
  • results: 这篇论文总结了在不同应用中使用 transformers 的研究,包括机器人、医学、自然语言处理和云计算等。它们还讨论了如何使用可视化技术和高效的训练策略来提高 transformers 的可解释性和效率。
    Abstract Transformers have significantly impacted domains like natural language processing, computer vision, and robotics, where they improve performance compared to other neural networks. This survey explores how transformers are used in reinforcement learning (RL), where they are seen as a promising solution for addressing challenges such as unstable training, credit assignment, lack of interpretability, and partial observability. We begin by providing a brief domain overview of RL, followed by a discussion on the challenges of classical RL algorithms. Next, we delve into the properties of the transformer and its variants and discuss the characteristics that make them well-suited to address the challenges inherent in RL. We examine the application of transformers to various aspects of RL, including representation learning, transition and reward function modeling, and policy optimization. We also discuss recent research that aims to enhance the interpretability and efficiency of transformers in RL, using visualization techniques and efficient training strategies. Often, the transformer architecture must be tailored to the specific needs of a given application. We present a broad overview of how transformers have been adapted for several applications, including robotics, medicine, language modeling, cloud computing, and combinatorial optimization. We conclude by discussing the limitations of using transformers in RL and assess their potential for catalyzing future breakthroughs in this field.
    摘要 启发器在自然语言处理、计算机视觉和机器人领域内的表现有所改善,而在奖励学习(RL)领域的应用也吸引了广泛的关注。这篇评论将探讨启发器如何在RL中应用,并评估它们在RL中的潜在优势和局限性。我们首先提供了RL领域的简要概述,然后讨论了 классиRL算法的挑战。接着,我们介绍了启发器的性质和其变种,并讨论了它们在RL中的特点,以及它们如何解决RL中的挑战。我们还检视了启发器在RL中的应用,包括表示学习、转移和奖励函数模型化、政策优化等方面。此外,我们还讨论了如何使用视觉化技术和高效训练策略来提高启发器在RL中的解释性和效率。 Finally, we discuss the limitations of using transformers in RL and assess their potential for catalyzing future breakthroughs in this field.Note: Simplified Chinese is used here, as it is the most widely used version of Chinese in mainland China and other countries. However, if you prefer Traditional Chinese, I can also provide the translation.

Towards Safe Self-Distillation of Internet-Scale Text-to-Image Diffusion Models

  • paper_url: http://arxiv.org/abs/2307.05977
  • repo_url: https://github.com/nannullna/safe-diffusion
  • paper_authors: Sanghyun Kim, Seohyeon Jung, Balhae Kim, Moonseok Choi, Jinwoo Shin, Juho Lee
  • for: 防止文本到图像扩散模型中的危险或版权内容生成
  • methods: 提出了一种名为SDD的方法,通过自我概念混合来引导噪声估计,使得噪声估计与目标 removals 的概念匹配无条件的噪声估计
  • results: 比前一些方法更高效地减少了危险内容生成的图像质量下降,同时允许同时除掉多个概念,而前一些工作只能一个概念一次除掉
    Abstract Large-scale image generation models, with impressive quality made possible by the vast amount of data available on the Internet, raise social concerns that these models may generate harmful or copyrighted content. The biases and harmfulness arise throughout the entire training process and are hard to completely remove, which have become significant hurdles to the safe deployment of these models. In this paper, we propose a method called SDD to prevent problematic content generation in text-to-image diffusion models. We self-distill the diffusion model to guide the noise estimate conditioned on the target removal concept to match the unconditional one. Compared to the previous methods, our method eliminates a much greater proportion of harmful content from the generated images without degrading the overall image quality. Furthermore, our method allows the removal of multiple concepts at once, whereas previous works are limited to removing a single concept at a time.
    摘要 大规模图像生成模型,具有吸引人艺术品质,受互联网庞大数据的支持,但也引发社会问题,这些模型可能生成害性或版权内容。这些偏见和害性在训练过程中产生,难以完全除去,成为安全部署这些模型的主要障碍。在这篇论文中,我们提出了一种方法called SDD,用于防止文本到图像扩散模型中的害性内容生成。我们通过自我馈散扩散模型,使噪声估计 conditional on 目标 removalfocus 与无条件噪声估计匹配。相比之前的方法,我们的方法可以更好地除去害性内容,而不会影响整体图像质量。此外,我们的方法允许同时除掉多个概念,而前一代方法只能一个概念一次。

Outlier detection in regression: conic quadratic formulations

  • paper_url: http://arxiv.org/abs/2307.05975
  • repo_url: None
  • paper_authors: Andrés Gómez, José Neto
  • for: Linear regression model building with outlier detection
  • methods: Second-order conic relaxations without big-M constraints
  • results: Faster computational performance compared to existing big-M formulations
    Abstract In many applications, when building linear regression models, it is important to account for the presence of outliers, i.e., corrupted input data points. Such problems can be formulated as mixed-integer optimization problems involving cubic terms, each given by the product of a binary variable and a quadratic term of the continuous variables. Existing approaches in the literature, typically relying on the linearization of the cubic terms using big-M constraints, suffer from weak relaxation and poor performance in practice. In this work we derive stronger second-order conic relaxations that do not involve big-M constraints. Our computational experiments indicate that the proposed formulations are several orders-of-magnitude faster than existing big-M formulations in the literature for this problem.
    摘要 在许多应用中,当建立线性回归模型时,需要考虑异常值(即损坏输入数据点)的存在。这些问题可以表示为杂合整数优化问题,每个问题由一个二进制变量和一个连续变量的二次项组成。现有的文献中的方法通常采用线性化立方项使用大M约束,但这些方法在实践中具有弱约束和低效性。在这个工作中,我们 derivates stronger的第二阶几何relaxation,不需要大M约束。我们的计算实验表明,提议的形式化比现有的大M形式化快数个次几何。

Contrastive Learning for Conversion Rate Prediction

  • paper_url: http://arxiv.org/abs/2307.05974
  • repo_url: https://github.com/dongruihust/cl4cvr
  • paper_authors: Wentao Ouyang, Rui Dong, Xiuwu Zhang, Chaofeng Guo, Jinmei Luo, Xiangzheng Liu, Yanlong Du
  • for: 预测广告点击率 (CVR) 在广告系统中扮演着重要的角色,现今的深度神经网络模型在 CVR 预测方面已经显示出了可观的表现。但是,这些深度模型需要巨量数据进行训练,在在线广告系统中,尽管有数以百万到数以亿的广告,但用户往往只会点击一小部分的广告,并且转化的部分更加罕见。这种数据稀缺问题限制了深度模型的应用。
  • methods: 本文提出了一种名为 Contrastive Learning for CVR prediction (CL4CVR) 的框架,它将 CVR 预测任务与对比学习任务相联系起来,可以通过利用丰富的无标注数据来提取更好的数据表示,提高 CVR 预测性能。为了适应 CVR 预测问题,我们提出了嵌入屏蔽 (EM),而不是特征屏蔽,来创建两个视图的扩展样本。我们还提出了一个假值排除 (FNE) 组件,用于消除具有同样特征的样本,以考虑用户行为数据中的自然特性。此外,我们还提出了一个监督正常包含 (SPI) 组件,用于包含每个拥有样本的额外正确样本,以便充分利用稀缺 yet 珍贵的用户转化事件。
  • results: 实验结果表明,CL4CVR 在两个真实的转化数据集上显示出了更高的性能。源代码可以在 https://github.com/DongRuiHust/CL4CVR 上获取。
    Abstract Conversion rate (CVR) prediction plays an important role in advertising systems. Recently, supervised deep neural network-based models have shown promising performance in CVR prediction. However, they are data hungry and require an enormous amount of training data. In online advertising systems, although there are millions to billions of ads, users tend to click only a small set of them and to convert on an even smaller set. This data sparsity issue restricts the power of these deep models. In this paper, we propose the Contrastive Learning for CVR prediction (CL4CVR) framework. It associates the supervised CVR prediction task with a contrastive learning task, which can learn better data representations exploiting abundant unlabeled data and improve the CVR prediction performance. To tailor the contrastive learning task to the CVR prediction problem, we propose embedding masking (EM), rather than feature masking, to create two views of augmented samples. We also propose a false negative elimination (FNE) component to eliminate samples with the same feature as the anchor sample, to account for the natural property in user behavior data. We further propose a supervised positive inclusion (SPI) component to include additional positive samples for each anchor sample, in order to make full use of sparse but precious user conversion events. Experimental results on two real-world conversion datasets demonstrate the superior performance of CL4CVR. The source code is available at https://github.com/DongRuiHust/CL4CVR.
    摘要 “ conversión rate(CVR)预测在广告系统中扮演着重要的角色。最近,我们使用了超级vised深度神经网络模型,并表现出了让人惊叹的效果。但是,这些深度模型需要巨量数据进行训练,而在在线广告系统中,用户只有Click的一小部分,并且转化的部分更加小。这个数据稀缺问题限制了这些深度模型的力量。在这篇论文中,我们提出了对CVR预测的Contrastive Learning框架(CL4CVR)。它将supervised CVR预测任务与contrastive learning任务联系起来,可以利用庞大的无标签数据来学习更好的数据表示,提高CVR预测性能。为了适应CVR预测问题,我们提出了 embedding masking(EM),而不是特征masking,来创建两个视图的增强样本。我们还提出了false negative elimination(FNE)组件,以消除样本中的同样特征的锚样本,以满足用户行为数据的自然性质。我们还提出了supervised positive inclusion(SPI)组件,以包括每个锚样本的额外正例样本,以便充分利用稀缺而价值很高的用户转化事件。实验结果表明,CL4CVR在两个真实的转化数据集上表现出了superior的效果。源代码可以在https://github.com/DongRuiHust/CL4CVR中下载。”

VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models

  • paper_url: http://arxiv.org/abs/2307.05973
  • repo_url: None
  • paper_authors: Wenlong Huang, Chen Wang, Ruohan Zhang, Yunzhu Li, Jiajun Wu, Li Fei-Fei
  • for: This paper aims to synthesize robot trajectories for a large variety of manipulation tasks given an open-set of instructions and an open-set of objects.
  • methods: The proposed method leverages large language models (LLMs) to infer affordances and constraints from free-form language instructions, and then composes 3D value maps with a visual-language model (VLM) to ground the knowledge into the observation space of the agent. The composed value maps are then used in a model-based planning framework to zero-shot synthesize closed-loop robot trajectories with robustness to dynamic perturbations.
  • results: The proposed method is demonstrated to be effective in both simulated and real-robot environments, showcasing the ability to perform a large variety of everyday manipulation tasks specified in free-form natural language. The method also benefits from online experiences by efficiently learning a dynamics model for scenes that involve contact-rich interactions.
    Abstract Large language models (LLMs) are shown to possess a wealth of actionable knowledge that can be extracted for robot manipulation in the form of reasoning and planning. Despite the progress, most still rely on pre-defined motion primitives to carry out the physical interactions with the environment, which remains a major bottleneck. In this work, we aim to synthesize robot trajectories, i.e., a dense sequence of 6-DoF end-effector waypoints, for a large variety of manipulation tasks given an open-set of instructions and an open-set of objects. We achieve this by first observing that LLMs excel at inferring affordances and constraints given a free-form language instruction. More importantly, by leveraging their code-writing capabilities, they can interact with a visual-language model (VLM) to compose 3D value maps to ground the knowledge into the observation space of the agent. The composed value maps are then used in a model-based planning framework to zero-shot synthesize closed-loop robot trajectories with robustness to dynamic perturbations. We further demonstrate how the proposed framework can benefit from online experiences by efficiently learning a dynamics model for scenes that involve contact-rich interactions. We present a large-scale study of the proposed method in both simulated and real-robot environments, showcasing the ability to perform a large variety of everyday manipulation tasks specified in free-form natural language. Project website: https://voxposer.github.io
    摘要 大型语言模型(LLM)已经展示了丰富的行为知识,可以用于机器人操作中的理解和观念。 despite the progress, most still rely on pre-defined motion primitives to carry out physical interactions with the environment, which remains a major bottleneck. In this work, we aim to synthesize robot trajectories, i.e., a dense sequence of 6-DoF end-effector waypoints, for a large variety of manipulation tasks given an open-set of instructions and an open-set of objects. We achieve this by first observing that LLMs excel at inferring affordances and constraints given a free-form language instruction. More importantly, by leveraging their code-writing capabilities, they can interact with a visual-language model (VLM) to compose 3D value maps to ground the knowledge into the observation space of the agent. The composed value maps are then used in a model-based planning framework to zero-shot synthesize closed-loop robot trajectories with robustness to dynamic perturbations. We further demonstrate how the proposed framework can benefit from online experiences by efficiently learning a dynamics model for scenes that involve contact-rich interactions. We present a large-scale study of the proposed method in both simulated and real-robot environments, showcasing the ability to perform a large variety of everyday manipulation tasks specified in free-form natural language. Project website:

Self-Distilled Quantization: Achieving High Compression Rates in Transformer-Based Language Models

  • paper_url: http://arxiv.org/abs/2307.05972
  • repo_url: None
  • paper_authors: James O’ Neill, Sourav Dutta
  • for: investigate the effects of post-training quantization and quantization-aware training on the generalization of Transformer language models, and propose a new method called self-distilled quantization (SDQ) to minimize accumulative quantization errors.
  • methods: post-training quantization, quantization-aware training, self-distilled quantization (SDQ)
  • results: both multilingual models XLM-R-Base and InfoXLM-Base can be reduced from 32-bit floating point weights to 8-bit integer weights while maintaining a high level of performance on the XGLUE benchmark, but multilingual models have challenges in generalizing to languages they were not fine-tuned on.
    Abstract We investigate the effects of post-training quantization and quantization-aware training on the generalization of Transformer language models. We present a new method called self-distilled quantization (SDQ) that minimizes accumulative quantization errors and outperforms baselines. We apply SDQ to multilingual models XLM-R-Base and InfoXLM-Base and demonstrate that both models can be reduced from 32-bit floating point weights to 8-bit integer weights while maintaining a high level of performance on the XGLUE benchmark. Our results also highlight the challenges of quantizing multilingual models, which must generalize to languages they were not fine-tuned on.
    摘要 我团队 investigate transformer语言模型的普适性,包括post-training quantization和quantization-aware training的影响。我们提出了一种新的自适应减量法(SDQ),可以减少累加减量错误,并超过基线。我们在多语言模型XLM-R-Base和InfoXLM-Base上应用SDQ,并证明这两个模型可以由32位浮点数变量降低到8位整数变量,同时保持高水平的性能在XGLUE测试准则上。我们的结果也透视了多语言模型的减量挑战,它们需要总结到它们没有精心调整过的语言。

Giving Robots a Hand: Learning Generalizable Manipulation with Eye-in-Hand Human Video Demonstrations

  • paper_url: http://arxiv.org/abs/2307.05959
  • repo_url: None
  • paper_authors: Moo Jin Kim, Jiajun Wu, Chelsea Finn
  • for: 本研究旨在增强视觉控制策略的通用性,使用人类视频示例来增强眼手控制策略的泛化能力。
  • methods: 我们使用人类视频示例和眼手摄像头来增强眼手控制策略的泛化能力。我们不需要使用显式领域适应方法,而是利用眼手摄像头的部分可见性和简单的固定图像屏蔽 schemes。
  • results: 我们在八个真实世界任务中,包括3DoF和6DoF机器人控制任务,实现了通过眼手控制策略的成功率提高58%(绝对)的平均提升。这些结果表明我们的方法可以帮助机器人在新的环境配置和新任务中泛化。请参考视频结果:https://giving-robots-a-hand.github.io/。
    Abstract Eye-in-hand cameras have shown promise in enabling greater sample efficiency and generalization in vision-based robotic manipulation. However, for robotic imitation, it is still expensive to have a human teleoperator collect large amounts of expert demonstrations with a real robot. Videos of humans performing tasks, on the other hand, are much cheaper to collect since they eliminate the need for expertise in robotic teleoperation and can be quickly captured in a wide range of scenarios. Therefore, human video demonstrations are a promising data source for learning generalizable robotic manipulation policies at scale. In this work, we augment narrow robotic imitation datasets with broad unlabeled human video demonstrations to greatly enhance the generalization of eye-in-hand visuomotor policies. Although a clear visual domain gap exists between human and robot data, our framework does not need to employ any explicit domain adaptation method, as we leverage the partial observability of eye-in-hand cameras as well as a simple fixed image masking scheme. On a suite of eight real-world tasks involving both 3-DoF and 6-DoF robot arm control, our method improves the success rates of eye-in-hand manipulation policies by 58% (absolute) on average, enabling robots to generalize to both new environment configurations and new tasks that are unseen in the robot demonstration data. See video results at https://giving-robots-a-hand.github.io/ .
    摘要 眼手相机已经在视觉基于机器人操作中展现了扩大样本效率和总体化的承袭性。然而,为了机器人模仿,仍然是非常昂贵的收集大量专业人员操作实验数据。相比之下,人类完成任务的视频记录则非常便宜,因为它们消除了机器人操作专业人员的需求,并可以快速在多种情况下采集。因此,人类视频示例是学习普适机器人操作策略的有力的数据源。在这项工作中,我们将宽频精确的机器人模仿数据集与广泛的无标签人类视频示例相结合,以大大提高眼手视觉动作策略的普适性。虽然人机视觉域之间存在明显的视觉领域差异,但我们的框架并不需要直接使用适应领域方法,而是利用眼手相机的部分可见性以及简单的固定图像遮盾方案。在八个真实世界任务中,我们的方法提高了眼手操作策略的成功率by 58%(绝对)的平均值,使机器人能够通过新环境配置和新任务来普适化。请参考视频结果在

Newell’s theory based feature transformations for spatio-temporal traffic prediction

  • paper_url: http://arxiv.org/abs/2307.05949
  • repo_url: None
  • paper_authors: Agnimitra Sengupta, S. Ilgin Guler
  • for: 这种研究是为了提高深度学习模型在空间和时间流行预测中的表现,以及使这些模型更容易转移到新的位置。
  • methods: 这种方法使用了卷积或图像卷积filter,并结合回归神经网络来捕捉空间和时间相关性。
  • results: 研究表明,通过physics-based feature transformation,可以提高深度学习模型在不同预测距离和不同位置上的表现,并且这些模型可以更好地适应新的位置。
    Abstract Deep learning (DL) models for spatio-temporal traffic flow forecasting employ convolutional or graph-convolutional filters along with recurrent neural networks to capture spatial and temporal dependencies in traffic data. These models, such as CNN-LSTM, utilize traffic flows from neighboring detector stations to predict flows at a specific location of interest. However, these models are limited in their ability to capture the broader dynamics of the traffic system, as they primarily learn features specific to the detector configuration and traffic characteristics at the target location. Hence, the transferability of these models to different locations becomes challenging, particularly when data is unavailable at the new location for model training. To address this limitation, we propose a traffic flow physics-based feature transformation for spatio-temporal DL models. This transformation incorporates Newell's uncongested and congested-state estimators of traffic flows at the target locations, enabling the models to learn broader dynamics of the system. Our methodology is empirically validated using traffic data from two different locations. The results demonstrate that the proposed feature transformation improves the models' performance in predicting traffic flows over different prediction horizons, as indicated by better goodness-of-fit statistics. An important advantage of our framework is its ability to be transferred to new locations where data is unavailable. This is achieved by appropriately accounting for spatial dependencies based on station distances and various traffic parameters. In contrast, regular DL models are not easily transferable as their inputs remain fixed. It should be noted that due to data limitations, we were unable to perform spatial sensitivity analysis, which calls for further research using simulated data.
    摘要 深度学习(DL)模型用于空间时间流量预测利用 convolutional 或图像卷积filter 以及循环神经网络,以捕捉流量数据中的空间和时间相互关系。这些模型,如 CNN-LSTM,利用周围探测站的流量数据预测目标位置的流量。然而,这些模型受到特定探测站和流量特征的限制,难以捕捉整个交通系统的广泛动态,因此在不同地点传输性不佳。为解决这种限制,我们提出了基于流量物理特征的特性变换方法,该方法包括Newell的拥塞和塞缩状态估计器,使模型学习更广泛的系统动态。我们的方法ологи是基于实验验证,使用了两个不同的位置的交通数据。结果表明,我们的特性变换方法可以提高模型在不同预测时间 horizon 上的流量预测性能,如果精度指标。与常见DL模型不同,我们的框架可以在新的位置传输,而不需要训练数据。这是因为我们采用了基于站点距离和各种交通参数的空间依赖关系。相比之下,常见DL模型的输入固定,不易在新位置传输。尽管由于数据限制,我们无法进行空间敏感分析,这是需要进一步研究使用模拟数据。

Diversity-enhancing Generative Network for Few-shot Hypothesis Adaptation

  • paper_url: http://arxiv.org/abs/2307.05948
  • repo_url: None
  • paper_authors: Ruijiang Dong, Feng Liu, Haoang Chi, Tongliang Liu, Mingming Gong, Gang Niu, Masashi Sugiyama, Bo Han
  • for: addressing the few-shot hypothesis adaptation (FHA) problem
  • methods: 使用 diversity-enhancing generative network (DEG-Net),通过最小化 Hilbert-Schmidt independence criterion (HSIC) 值来生成多元的无标示数据
  • results: 比对 existed FHA baselines 表现更好,并证明生成多元数据对解决 FHA 问题具有重要作用
    Abstract Generating unlabeled data has been recently shown to help address the few-shot hypothesis adaptation (FHA) problem, where we aim to train a classifier for the target domain with a few labeled target-domain data and a well-trained source-domain classifier (i.e., a source hypothesis), for the additional information of the highly-compatible unlabeled data. However, the generated data of the existing methods are extremely similar or even the same. The strong dependency among the generated data will lead the learning to fail. In this paper, we propose a diversity-enhancing generative network (DEG-Net) for the FHA problem, which can generate diverse unlabeled data with the help of a kernel independence measure: the Hilbert-Schmidt independence criterion (HSIC). Specifically, DEG-Net will generate data via minimizing the HSIC value (i.e., maximizing the independence) among the semantic features of the generated data. By DEG-Net, the generated unlabeled data are more diverse and more effective for addressing the FHA problem. Experimental results show that the DEG-Net outperforms existing FHA baselines and further verifies that generating diverse data plays a vital role in addressing the FHA problem
    摘要 很近期,生成无标示数据已经被证明可以帮助解决几个难点假设适应(FHA)问题,我们希望通过几个标注目标领域数据和一个已经训练好的源领域分类器(即源假设)来训练目标领域分类器。然而,现有方法生成的数据很相似或甚至是完全相同的。这强大的数据生成相依关系会导致学习失败。在这篇论文中,我们提出了一种多样化提升生成网络(DEG-Net),用于解决FHA问题。DEG-Net使用希尔伯特- Schmidt独立度量(HSIC)来生成多样化的无标示数据。具体来说,DEG-Net通过最小化HSIC值(即最大化独立度)来生成数据。由于DEG-Net可以生成更多样化的无标示数据,因此它可以更好地解决FHA问题。实验结果表明,DEG-Net在FHA基线上表现出色,并证明了生成多样化数据在解决FHA问题中的重要性。

A Bayesian approach to quantifying uncertainties and improving generalizability in traffic prediction models

  • paper_url: http://arxiv.org/abs/2307.05946
  • repo_url: None
  • paper_authors: Agnimitra Sengupta, Sudeepta Mondal, Adway Das, S. Ilgin Guler
  • for: 预测交通数据的深度学习模型可以提供更高的性能,但是它们通常不提供不确定性估计,这是交通运营和控制中不可或缺的。
  • methods: 我们提出了一种 bayesian 反复神经网络框架,用于交通预测中的不确定性量化。我们引入了spectral normalization来控制神经网络的复杂性,从而改善模型的泛化性能。
  • results: 我们的结果表明,spectral normalization可以更好地地方化特征空间,并且在单步预测历史中显著超过了layer normalization和没有normalization的模型。这表明,spectral normalization可以更好地捕捉交通数据的下变换特征。
    Abstract Deep-learning models for traffic data prediction can have superior performance in modeling complex functions using a multi-layer architecture. However, a major drawback of these approaches is that most of these approaches do not offer forecasts with uncertainty estimates, which are essential for traffic operations and control. Without uncertainty estimates, it is difficult to place any level of trust to the model predictions, and operational strategies relying on overconfident predictions can lead to worsening traffic conditions. In this study, we propose a Bayesian recurrent neural network framework for uncertainty quantification in traffic prediction with higher generalizability by introducing spectral normalization to its hidden layers. In our paper, we have shown that normalization alters the training process of deep neural networks by controlling the model's complexity and reducing the risk of overfitting to the training data. This, in turn, helps improve the generalization performance of the model on out-of-distribution datasets. Results demonstrate that spectral normalization improves uncertainty estimates and significantly outperforms both the layer normalization and model without normalization in single-step prediction horizons. This improved performance can be attributed to the ability of spectral normalization to better localize the feature space of the data under perturbations. Our findings are especially relevant to traffic management applications, where predicting traffic conditions across multiple locations is the goal, but the availability of training data from multiple locations is limited. Spectral normalization, therefore, provides a more generalizable approach that can effectively capture the underlying patterns in traffic data without requiring location-specific models.
    摘要 深度学习模型可以在交通数据预测中表现出优秀的性能,因为它们可以模型复杂的函数使用多层架构。然而,这些方法的主要缺点是不提供预测结果的不确定性估计,这是交通运营和控制中非常重要的。 Without uncertainty estimates, it is difficult to trust the model predictions, and operational strategies relying on overconfident predictions can lead to worsening traffic conditions.在这种情况下,我们提出了一种 bayesian 循环神经网络框架,用于交通预测中的不确定性量化。我们在论文中示出,normalization 控制了深度神经网络的复杂性,从而降低了模型在训练数据上的风险欠拟合。这种方法可以提高模型在不同数据集上的总体性能。我们的结果表明,spectral normalization 可以提高不确定性估计,并在单步预测征 horizon 中显著超过层 normalization 和没有normalization的情况。这种改进的性能可以归因于spectral normalization 更好地localize 数据的特征空间下的干扰。我们的发现对交通管理应用非常重要,因为需要预测多个位置的交通条件,但是具体的训练数据受限。spectral normalization 因此提供了一种更通用的方法,可以更好地捕捉交通数据的下面特征,而无需建立具体的位置特定的模型。

YOGA: Deep Object Detection in the Wild with Lightweight Feature Learning and Multiscale Attention

  • paper_url: http://arxiv.org/abs/2307.05945
  • repo_url: https://github.com/LabSAINT/YOGA
  • paper_authors: Raja Sunkara, Tie Luo
  • for: 这个论文是为了开发一种基于深度学习的轻量级物体检测模型,可以在低端边缘设备上运行,并且可以达到竞争性的准确率。
  • methods: 该模型采用了一个两阶段特征学习管道,包括一个便宜的线性变换,可以使用只有半数的卷积核来学习特征图。此外,它使用了一种注意机制来实现多 scales特征融合,而不是 conventinal检测器中的笼性 concatenation。
  • results: 我们评估了YOGA模型在COCO-val和COCO-testdev数据集上,与其他10个状态对照检测器进行比较。结果表明,YOGA能够占据最佳的平衡点,即同时具有高准确率和轻量级模型(相比 conventinal检测器,YOGA可以提高AP值22%,参数和FLOPs减少23-34%),因此适合在低端边缘设备上部署。此外,我们还对YOGA模型进行了硬件实现和NVIDIA Jetson Nano上的评估,结果表明YOGA在硬件上也表现出了优秀的性能。
    Abstract We introduce YOGA, a deep learning based yet lightweight object detection model that can operate on low-end edge devices while still achieving competitive accuracy. The YOGA architecture consists of a two-phase feature learning pipeline with a cheap linear transformation, which learns feature maps using only half of the convolution filters required by conventional convolutional neural networks. In addition, it performs multi-scale feature fusion in its neck using an attention mechanism instead of the naive concatenation used by conventional detectors. YOGA is a flexible model that can be easily scaled up or down by several orders of magnitude to fit a broad range of hardware constraints. We evaluate YOGA on COCO-val and COCO-testdev datasets with other over 10 state-of-the-art object detectors. The results show that YOGA strikes the best trade-off between model size and accuracy (up to 22% increase of AP and 23-34% reduction of parameters and FLOPs), making it an ideal choice for deployment in the wild on low-end edge devices. This is further affirmed by our hardware implementation and evaluation on NVIDIA Jetson Nano.
    摘要 我们介绍YOGA,一种基于深度学习的轻量级对象检测模型,可以在低端边缘设备上运行而仍然达到竞争性的准确率。 YOGA架构包括两个阶段特征学习管道,使用便宜的线性变换学习特征地图,只需半数的卷积核数量相对于常见卷积神经网络来学习特征地图。此外,它使用注意机制进行多scale特征融合,而不是常见检测器中的简单 concatenation。YOGA是一种灵活的模型,可以轻松地缩放到适应各种硬件限制。我们对COCO-val和COCO-testdev数据集进行评估,与其他10个状态对照检测器进行比较。结果表明,YOGA在准确率和模型大小之间达到了最佳平衡(增加AP22%,减少参数和FLOPs23-34%),使其成为在野外部署的理想选择。此外,我们对NVIDIA Jetson Nano硬件实现和评估也得到了证明。

Towards the Better Ranking Consistency: A Multi-task Learning Framework for Early Stage Ads Ranking

  • paper_url: http://arxiv.org/abs/2307.11096
  • repo_url: None
  • paper_authors: Xuewei Wang, Qiang Jin, Shengyu Huang, Min Zhang, Xi Liu, Zhengli Zhao, Yukun Chen, Zhengyu Zhang, Jiyan Yang, Ellie Wen, Sagar Chordia, Wenlin Chen, Qin Huang
    for: 大规模广告推荐系统中分为三个阶段的广告推荐是一种常见的做法,以平衡效率和准确性。methods: 我们提出了一种多任务学习框架,用于在早期阶段推荐广告,以捕捉多个最终阶段推荐组件(例如广告点击和广告质量事件)的关系。results: 在大规模实际应用中,我们的框架在线A/B测试中获得了显著高的点击率(CTR)、转化率(CVR)、总值和更好的广告质量(例如减少广告横幅率)。
    Abstract Dividing ads ranking system into retrieval, early, and final stages is a common practice in large scale ads recommendation to balance the efficiency and accuracy. The early stage ranking often uses efficient models to generate candidates out of a set of retrieved ads. The candidates are then fed into a more computationally intensive but accurate final stage ranking system to produce the final ads recommendation. As the early and final stage ranking use different features and model architectures because of system constraints, a serious ranking consistency issue arises where the early stage has a low ads recall, i.e., top ads in the final stage are ranked low in the early stage. In order to pass better ads from the early to the final stage ranking, we propose a multi-task learning framework for early stage ranking to capture multiple final stage ranking components (i.e. ads clicks and ads quality events) and their task relations. With our multi-task learning framework, we can not only achieve serving cost saving from the model consolidation, but also improve the ads recall and ranking consistency. In the online A/B testing, our framework achieves significantly higher click-through rate (CTR), conversion rate (CVR), total value and better ads-quality (e.g. reduced ads cross-out rate) in a large scale industrial ads ranking system.
    摘要 To address this issue, we propose a multi-task learning framework for early stage ranking that captures multiple final stage ranking components (ads clicks and ads quality events) and their task relations. By using our multi-task learning framework, we can not only achieve serving cost savings from model consolidation but also improve ads recall and ranking consistency.In online A/B testing, our framework achieved significantly higher click-through rate (CTR), conversion rate (CVR), total value, and better ads quality (e.g., reduced ads cross-out rate) in a large-scale industrial ads ranking system.

Filling time-series gaps using image techniques: Multidimensional context autoencoder approach for building energy data imputation

  • paper_url: http://arxiv.org/abs/2307.05926
  • repo_url: None
  • paper_authors: Chun Fu, Matias Quintana, Zoltan Nagy, Clayton Miller
  • for: 本研究旨在提高建筑物能源预测和管理的精度,通过利用互联网对话设备(IoT)和更多的能源数据。然而,能源数据 часто来自多个源头,可能存在不完整或不一致的数据,从而阻碍精确的预测和管理。为了解决这个问题,过去的研究主要集中在填充缺失的能源数据中,包括随机和连续的缺失。
  • methods: 本研究使用了现代深度学习方法,包括Partial Convolution(PConv),以填充缺失的能源数据。PConv是在计算机视觉领域广泛应用的图像填充方法,可以处理复杂的缺失模式。
  • results: 研究结果表明,相比raw时间序列(1D-CNN)和每周平均方法,使用两维填充的神经网络模型可以降低 Mean Squared Error(MSE)的值,下降10%到30%。而Partial convolution(PConv)方法更进一步降低MSE值,比2D-CNN和其他模型更出色。
    Abstract Building energy prediction and management has become increasingly important in recent decades, driven by the growth of Internet of Things (IoT) devices and the availability of more energy data. However, energy data is often collected from multiple sources and can be incomplete or inconsistent, which can hinder accurate predictions and management of energy systems and limit the usefulness of the data for decision-making and research. To address this issue, past studies have focused on imputing missing gaps in energy data, including random and continuous gaps. One of the main challenges in this area is the lack of validation on a benchmark dataset with various building and meter types, making it difficult to accurately evaluate the performance of different imputation methods. Another challenge is the lack of application of state-of-the-art imputation methods for missing gaps in energy data. Contemporary image-inpainting methods, such as Partial Convolution (PConv), have been widely used in the computer vision domain and have demonstrated their effectiveness in dealing with complex missing patterns. To study whether energy data imputation can benefit from the image-based deep learning method, this study compared PConv, Convolutional neural networks (CNNs), and weekly persistence method using one of the biggest publicly available whole building energy datasets, consisting of 1479 power meters worldwide, as the benchmark. The results show that, compared to the CNN with the raw time series (1D-CNN) and the weekly persistence method, neural network models with reshaped energy data with two dimensions reduced the Mean Squared Error (MSE) by 10% to 30%. The advanced deep learning method, Partial convolution (PConv), has further reduced the MSE by 20-30% than 2D-CNN and stands out among all models.
    摘要 “建筑能源预测和管理在最近几十年中日益重要,受互联网物联网(IoT)设备的快速发展和更多能源数据的可用性的推动。然而,能源数据 часто来自多个源头,可能存在异常或缺失数据,这会阻碍精准预测和管理能源系统,限制数据的使用价值,降低决策和研究的价值。为解决这一问题,过去的研究主要集中在填充缺失的能源数据中,包括随机和连续缺失。一个主要挑战在这一领域是缺乏一个标准的测试集,包含不同的建筑和计量类型,这使得评估不同的填充方法的性能具有困难。另一个挑战是缺乏使用现代填充方法来填充能源数据中的缺失。现代图像填充方法,如Partial Convolution(PConv),在计算机视觉领域中广泛应用,并在处理复杂缺失模式方面表现出色。为了研究能源数据填充是否可以从图像深度学习方法中受益,本研究对1479个全球的电力计量数据进行了比较,该数据集是公共可用的最大数据集之一。结果表明,相比1D-CNN( raw 时间序列 CNN)和周期性方法,二维神经网络模型(2D-CNN)减少了平均方差Error(MSE)的10%至30%。进一步地,Partial convolution(PConv)在2D-CNN和2D-CNN之间减少了MSE的20%至30%,并在所有模型中脱颖而出。”

Unified Medical Image-Text-Label Contrastive Learning With Continuous Prompt

  • paper_url: http://arxiv.org/abs/2307.05920
  • repo_url: None
  • paper_authors: Yuhao Wang
    for:The paper is written for the task of medical image-text pre-training, specifically addressing the challenges of using large-scale medical image and radiology report datasets.methods:The proposed method uses a unified Image-Text-Label contrastive learning framework based on continuous prompts, which includes three main contributions: unifying image, text, and label data, introducing continuous implicit prompts, and proposing an ImageText-Label contrastive training method to mitigate the problem of too many false-negative samples.results:The proposed UMCL framework exhibits excellent performance on several downstream tasks, demonstrating the effectiveness of the unified Image-Text-Label contrastive learning framework and the benefits of using continuous prompts.
    Abstract Contrastive language-image Pre-training (CLIP) [13] can leverage large datasets of unlabeled Image-Text pairs, which have demonstrated impressive performance in various downstream tasks. Given that annotating medical data is time-consuming and laborious, Image-Text Pre-training has promising applications in exploiting large-scale medical image and radiology report datasets. However, medical Image-Text Pre-training faces several challenges, as follows: (1) Due to privacy concerns, the amount of available medical data is relatively small compared to natural data, leading to weaker generalization ability of the model. (2) Medical images are highly similar with only fine-grained differences in subtleties, resulting in a large number of false-negative sample pairs in comparison learning. (3) The hand-crafted Prompt usually differs from the natural medical image report, Subtle changes in wording can lead to significant differences in performance. In this paper, we propose a unified Image-Text-Label contrastive learning framework based on continuous prompts, with three main contributions. First, We unified the data of images, text, and labels, which greatly expanded the training data that the model could utilize. Second, we address the issue of data diversity and the impact of hand-crafted prompts on model performance by introducing continuous implicit prompts. Lastly, we propose a ImageText-Label contrastive Training to mitigate the problem of too many false-negative samples. We demonstrate through sufficient experiments that the Unified Medical Contrastive Learning (UMCL) framework exhibits excellent performance on several downstream tasks.
    摘要 对比语言图像预训练(CLIP)[13] 可以利用大量无标注图像文本对 pairs,已经在多种下游任务中表现出色。由于标注医疗数据占用时间和劳动力,图像文本预训练在医疗领域有承袭的应用。然而,医疗图像文本预训练面临多个挑战,包括:1. 因为隐私问题,可用的医疗数据相对较少,导致模型的泛化能力弱化。2. 医疗图像具有高度相似的特征,只有细腻的差别,导致False Negative样本对比学习中的很多错误样本。3. 手工设计的提示通常与自然医疗图像报告不同,wording的微小变化可能导致性能的显著下降。在这篇论文中,我们提出一种统一图像文本标签对比学习框架,基于连续提示,有三个主要贡献:1. 我们统一图像、文本和标签数据,大大扩展了模型可以使用的训练数据。2. 我们解决数据多样性和手工设计提示对模型性能的影响,通过引入连续隐藏提示。3. 我们提出图像文本标签对比训练, Mitigate the problem of too many false-negative samples.我们通过充分的实验表明,Unified Medical Contrastive Learning(UMCL)框架在多种下游任务中表现出色。

Prompt Generate Train (PGT): Few-shot Domain Adaption of Retrieval Augmented Generation Models for Open Book Question-Answering

  • paper_url: http://arxiv.org/abs/2307.05915
  • repo_url: None
  • paper_authors: C. S. Krishna
  • for: 这个论文是为了提出一种框架(PGT),用于快速开发一个生成型问答模型,以便在一个专有文档集上进行开 кни问答。
  • methods: 这个框架使用了一种搜索加生成(RAG)模型,通过监督微调和强化学习,在几枚批处理下实现对目标领域的适应。
  • results: 这个框架可以生成高度相关、uncertainty calibrated的答案,并且可以在服务成本下降的情况下与GPT-4基于Context Retrieval Augmented Generation(CR-AG)模型竞争。
    Abstract We propose a framework - Prompt, Generate, Train (PGT) - to efficiently develop a generative question-answering model for open-book question-answering over a proprietary collection of text documents. The framework adapts a retriever augmented generation (RAG) model to the target domain using supervised fine-tuning and reinforcement learning with synthetic feedback in a few-shot setting. This, we hypothesize, will yield an aligned, uncertainty calibrated model that is competitive with GPT-4 based in-context retrieval augmented generation in generating relevant answers at lower serving costs. The framework's synthetic generation pipeline will generate synthetic training data comprising tuples using an open-source LLM and a novel consistency filtering scheme. The pipeline will be designed to generate both abstractive and extractive questions that span the entire corpus. The framework proposes to fine-tune a smaller RAG model comprising a dense retriever (ColBERTv2) and a smaller sized LLM on the synthetic dataset. In parallel, the framework will train a Reward model to score domain grounded answers higher than hallucinated answers using an a priori relevance ordering of synthetically assembled samples. In the next phase, the framework will align the RAG model with the target domain using reinforcement learning (Proximal Policy Optimization). This step may improve the RAG model's ability to generate grounded answers and ignore out of domain questions. In the final phase, the framework will calibrate the model's uncertainty for extractive question-answers.
    摘要 我们提出了一个框架 - 提示、生成、训练(PGT) - 以高效地开发一个生成问答模型,用于 открытых书籍问答。这个框架采用了一种搜索机器人加 augmented generation(RAG)模型,通过监督微调和强化学习来适应目标领域。我们认为,这将生成一个与GPT-4基于 контекст内 Retrieval Augmented Generation(RAG)模型相似的,uncertainty calibrated的模型,能够在低服务成本下生成相关的答案。PGT框架的 sintetic生成管道将使用一个开源的大语言模型和一种新的一致性筛选算法来生成 <文章、问题、答案> triplets。这个管道将生成抽象和EXTRACTIVE的问题,覆盖整个文库。PGT框架将微调一个较小的 RAG 模型,包括 dense retriever(ColBERTv2)和一个更小的语言模型,在 sintetic 数据上进行微调。同时,PGT 框架将训练一个奖励模型,以尝试高优化域内答案。在下一个阶段,PGT 框架将将 RAG 模型与目标领域进行对接,使用强化学习(Proximal Policy Optimization)进行调整。这步可能会提高 RAG 模型的能力生成固定答案和忽略非目标域问题。在最后一个阶段,PGT 框架将对抽取式问答模型进行uncertainty calibration。

FIS-ONE: Floor Identification System with One Label for Crowdsourced RF Signals

  • paper_url: http://arxiv.org/abs/2307.05914
  • repo_url: https://github.com/stevezhuo/fis-one
  • paper_authors: Weipeng Zhuo, Ka Ho Chiu, Jierun Chen, Ziqi Zhao, S. -H. Gary Chan, Sangtae Ha, Chul-Ho Lee
  • for: 这个论文主要是为了提出一种基于单个标注样本的层数标识方法,以便在智能城市应用中实现多层indoor定位、地OFencing和机器人监测等功能。
  • methods: 该论文提出了一种基于注意力图 neural network 模型的信号分 clustering 和集群索引方法,其中首先建立了一个 two-mode 图模型,以模型 RF 信号样本,然后使用注意力图 neural network 模型来建立每个信号样本的潜在表示,并使用这些表示来更准确地分 clustering 信号样本。
  • results: 该论文的实验结果表明,基于单个标注样本的层数标识方法可以达到高效率和高准确率,并且与其他基线算法相比,具有最高的改进度(最大化 adjusted rand index 和 normalized mutual information)。
    Abstract Floor labels of crowdsourced RF signals are crucial for many smart-city applications, such as multi-floor indoor localization, geofencing, and robot surveillance. To build a prediction model to identify the floor number of a new RF signal upon its measurement, conventional approaches using the crowdsourced RF signals assume that at least few labeled signal samples are available on each floor. In this work, we push the envelope further and demonstrate that it is technically feasible to enable such floor identification with only one floor-labeled signal sample on the bottom floor while having the rest of signal samples unlabeled. We propose FIS-ONE, a novel floor identification system with only one labeled sample. FIS-ONE consists of two steps, namely signal clustering and cluster indexing. We first build a bipartite graph to model the RF signal samples and obtain a latent representation of each node (each signal sample) using our attention-based graph neural network model so that the RF signal samples can be clustered more accurately. Then, we tackle the problem of indexing the clusters with proper floor labels, by leveraging the observation that signals from an access point can be detected on different floors, i.e., signal spillover. Specifically, we formulate a cluster indexing problem as a combinatorial optimization problem and show that it is equivalent to solving a traveling salesman problem, whose (near-)optimal solution can be found efficiently. We have implemented FIS-ONE and validated its effectiveness on the Microsoft dataset and in three large shopping malls. Our results show that FIS-ONE outperforms other baseline algorithms significantly, with up to 23% improvement in adjusted rand index and 25% improvement in normalized mutual information using only one floor-labeled signal sample.
    摘要 floor labels of crowdsourced RF signals are crucial for many smart-city applications, such as multi-floor indoor localization, geofencing, and robot surveillance. To build a prediction model to identify the floor number of a new RF signal upon its measurement, conventional approaches using the crowdsourced RF signals assume that at least few labeled signal samples are available on each floor. In this work, we push the envelope further and demonstrate that it is technically feasible to enable such floor identification with only one floor-labeled signal sample on the bottom floor while having the rest of signal samples unlabeled. We propose FIS-ONE, a novel floor identification system with only one labeled sample. FIS-ONE consists of two steps, namely signal clustering and cluster indexing. We first build a bipartite graph to model the RF signal samples and obtain a latent representation of each node (each signal sample) using our attention-based graph neural network model so that the RF signal samples can be clustered more accurately. Then, we tackle the problem of indexing the clusters with proper floor labels, by leveraging the observation that signals from an access point can be detected on different floors, i.e., signal spillover. Specifically, we formulate a cluster indexing problem as a combinatorial optimization problem and show that it is equivalent to solving a traveling salesman problem, whose (near-)optimal solution can be found efficiently. We have implemented FIS-ONE and validated its effectiveness on the Microsoft dataset and in three large shopping malls. Our results show that FIS-ONE outperforms other baseline algorithms significantly, with up to 23% improvement in adjusted rand index and 25% improvement in normalized mutual information using only one floor-labeled signal sample.

Grain and Grain Boundary Segmentation using Machine Learning with Real and Generated Datasets

  • paper_url: http://arxiv.org/abs/2307.05911
  • repo_url: None
  • paper_authors: Peter Warren, Nandhini Raju, Abhilash Prasad, Shajahan Hossain, Ramesh Subramanian, Jayanta Kapat, Navin Manjooran, Ranajay Ghosh
  • for: This paper aims to improve the accuracy of grain boundary segmentation in stainless steel microstructure images using Convolutional Neural Networks (CNN) trained on a combination of real and generated data.
  • methods: The paper uses a combination of real and generated data to train a CNN model for grain boundary segmentation, and employs a novel artificial grain image fabrication method based on Voronoi tessellation patterns and random synthetic noise.
  • results: The paper reports significantly improved accuracy of grain boundary segmentation using the proposed method, with the CNN model achieving an accuracy of 95.6% on a test set of images. The results also show that the proposed method outperforms existing computational methods and manual segmentation in terms of accuracy and efficiency.
    Abstract We report significantly improved accuracy of grain boundary segmentation using Convolutional Neural Networks (CNN) trained on a combination of real and generated data. Manual segmentation is accurate but time-consuming, and existing computational methods are faster but often inaccurate. To combat this dilemma, machine learning models can be used to achieve the accuracy of manual segmentation and have the efficiency of a computational method. An extensive dataset of from 316L stainless steel samples is additively manufactured, prepared, polished, etched, and then microstructure grain images were systematically collected. Grain segmentation via existing computational methods and manual (by-hand) were conducted, to create "real" training data. A Voronoi tessellation pattern combined with random synthetic noise and simulated defects, is developed to create a novel artificial grain image fabrication method. This provided training data supplementation for data-intensive machine learning methods. The accuracy of the grain measurements from microstructure images segmented via computational methods and machine learning methods proposed in this work are calculated and compared to provide much benchmarks in grain segmentation. Over 400 images of the microstructure of stainless steel samples were manually segmented for machine learning training applications. This data and the artificial data is available on Kaggle.
    摘要 我们发现使用卷积神经网络(CNN)在组合实际和生成数据上训练后,grain boundary segmentation的精度得到了显著改善。人工分割是准确的,但是时间消耗过多,而现有的计算方法快速,但是准确性往往不高。为了解决这个矛盾,机器学习模型可以用来实现人工分割的准确性,同时具有计算方法的效率。我们收集了316L不锈钢样品的数据集,并对其进行了加工、磨砺、镀金和镀膜等处理。然后,我们通过人工分割和计算方法来获得实际和模拟的grain图像,以便用于机器学习训练。我们还开发了一种基于Voronoi嵌入和随机生成的人工grain图像生成方法,以提供更多的训练数据。通过对计算方法和机器学习方法 segmented的grain measurement的精度进行比较,我们提供了许多benchmarks在grain segmentation方面。我们手动为机器学习训练应用分割了超过400张stainless steel样品的微结构图像,这些数据和人工生成的数据都可以在Kaggle上下载。

Predictive Pipelined Decoding: A Compute-Latency Trade-off for Exact LLM Decoding

  • paper_url: http://arxiv.org/abs/2307.05908
  • repo_url: None
  • paper_authors: Seongjun Yang, Gibbeum Lee, Jaewoong Cho, Dimitris Papailiopoulos, Kangwook Lee
  • for: 本文提出了一种名为“预测管道化解码(PPD)”的方法,用于加速大语言模型(LLM)的排序解码,而不会影响输出的精度。
  • methods: PPD使用了额外的计算资源,以并行Initialize后续的字符解码 durante 当前字符解码。
  • results: 结果表明,通过使用更多的计算资源,可以减少解码延迟,并改变 LLM 解码策略的理解。我们还提出了一个理论框架,用于分析计算和延迟之间的贸易OFF。这个框架可以 Analytical estimate 使用更多计算资源可以减少解码延迟的潜在降低。
    Abstract This paper presents "Predictive Pipelined Decoding (PPD)," an approach that speeds up greedy decoding in Large Language Models (LLMs) while maintaining the exact same output as the original decoding. Unlike conventional strategies, PPD employs additional compute resources to parallelize the initiation of subsequent token decoding during the current token decoding. This innovative method reduces decoding latency and reshapes the understanding of trade-offs in LLM decoding strategies. We have developed a theoretical framework that allows us to analyze the trade-off between computation and latency. Using this framework, we can analytically estimate the potential reduction in latency associated with our proposed method, achieved through the assessment of the match rate, represented as p_correct. The results demonstrate that the use of extra computational resources has the potential to accelerate LLM greedy decoding.
    摘要 Translation notes:* "Predictive Pipelined Decoding" (PPD) is translated as "预测式管道解码" (PPD)* "Large Language Models" (LLMs) is translated as "大型语言模型" (LLMs)* "greedy decoding" is translated as "贪吃解码" (greedy decoding)* "trade-offs" is translated as "交互" (trade-offs)* "theoretical framework" is translated as "理论框架" (theoretical framework)* "match rate" is translated as "匹配率" (match rate)* "potential reduction in latency" is translated as "可能的延迟减少" (potential reduction in latency)

Mini-Batch Optimization of Contrastive Loss

  • paper_url: http://arxiv.org/abs/2307.05906
  • repo_url: https://github.com/krafton-ai/mini-batch-cl
  • paper_authors: Jaewoong Cho, Kartik Sreenivasan, Keon Lee, Kyunghoo Mun, Soheun Yi, Jeong-Gwan Lee, Anna Lee, Jy-yong Sohn, Dimitris Papailiopoulos, Kangwook Lee
  • for: 本文研究了对比学习中的小批量优化问题,尤其是在实际应用中的内存限制下。
  • methods: 本文使用了几何学上的比较方法来分析小批量优化的理论基础,并提出了一种基于特征值分布的方法来快速速度下降。
  • results: 实验结果表明,提出的方法可以在实际应用中提高对比学习的效率,并且在不同的数据集上都能够达到更好的性能。
    Abstract Contrastive learning has gained significant attention as a method for self-supervised learning. The contrastive loss function ensures that embeddings of positive sample pairs (e.g., different samples from the same class or different views of the same object) are similar, while embeddings of negative pairs are dissimilar. Practical constraints such as large memory requirements make it challenging to consider all possible positive and negative pairs, leading to the use of mini-batch optimization. In this paper, we investigate the theoretical aspects of mini-batch optimization in contrastive learning. We show that mini-batch optimization is equivalent to full-batch optimization if and only if all $\binom{N}{B}$ mini-batches are selected, while sub-optimality may arise when examining only a subset. We then demonstrate that utilizing high-loss mini-batches can speed up SGD convergence and propose a spectral clustering-based approach for identifying these high-loss mini-batches. Our experimental results validate our theoretical findings and demonstrate that our proposed algorithm outperforms vanilla SGD in practically relevant settings, providing a better understanding of mini-batch optimization in contrastive learning.
    摘要 对比学习已经受到了大量注意力,这是一种无监督学习的方法。对比损失函数使得对象的嵌入相似,而不同类别或不同观察角度的对象的嵌入则不相似。实际上,考虑所有可能的正例和负例对可能会带来巨大的内存需求,因此使用了小批量优化。在这篇论文中,我们 investigate了对比学习中的小批量优化的理论方面。我们证明了,只要所有 $\binom{N}{B}$ 小批量都被选择,则小批量优化和全批量优化是等价的。但是,只选择一部分小批量可能会导致优化落后。我们随后提出了基于 спектраль clustering 的一种方法来识别高损失小批量,并证明这种方法可以加速 SGD 的演进。我们的实验结果证实了我们的理论结果,并证明了我们的提案方法在实际上有更好的性能,提供了更好的理解小批量优化在对比学习中的性能。

Stability Guarantees for Feature Attributions with Multiplicative Smoothing

  • paper_url: http://arxiv.org/abs/2307.05902
  • repo_url: None
  • paper_authors: Anton Xue, Rajeev Alur, Eric Wong
  • for: 这篇论文旨在提供可靠的特征归因方法,以确保模型的决策过程是可靠的。
  • methods: 该论文使用了多项式简化技术(MuS)来实现模型的稳定性,并且可以与任何分类器和特征归因方法结合使用。
  • results: 研究人员通过对视觉和语言模型进行测试,证明了 MuS 可以为特征归因方法提供非rivial的稳定性保证。
    Abstract Explanation methods for machine learning models tend to not provide any formal guarantees and may not reflect the underlying decision-making process. In this work, we analyze stability as a property for reliable feature attribution methods. We prove that relaxed variants of stability are guaranteed if the model is sufficiently Lipschitz with respect to the masking of features. To achieve such a model, we develop a smoothing method called Multiplicative Smoothing (MuS). We show that MuS overcomes theoretical limitations of standard smoothing techniques and can be integrated with any classifier and feature attribution method. We evaluate MuS on vision and language models with a variety of feature attribution methods, such as LIME and SHAP, and demonstrate that MuS endows feature attributions with non-trivial stability guarantees.
    摘要 machine learning 模型的解释方法通常没有正式的保证,并且可能不准确反映决策过程。在这项工作中,我们研究稳定性作为可靠特征归属方法的性质。我们证明如果模型具有特定的 Lipschitz 性,那么可以 garantuee 稳定性。为了实现这种模型,我们开发了一种简单的平滑方法called Multiplicative Smoothing(MuS)。我们表明了 MuS 可以超越标准平滑技术的理论限制,并且可以与任何分类器和特征归属方法结合使用。我们对视觉和语言模型进行了各种特征归属方法的评估,包括 LIME 和 SHAP,并证明了 MuS 可以为特征归属提供非正式的稳定保证。

Deep Unrolling for Nonconvex Robust Principal Component Analysis

  • paper_url: http://arxiv.org/abs/2307.05893
  • repo_url: None
  • paper_authors: Elizabeth Z. C. Tan, Caroline Chaux, Emmanuel Soubies, Vincent Y. F. Tan
  • for: 该研究是为了提出一种基于深度学习的Robust Principal Component Analysis(RPCA)算法,用于分解矩阵为低维矩阵和稀疏矩阵的含义。
  • methods: 该算法基于加速交互预测算法,使得RPCA可以在非对称形式下解决。该方法结合了深度神经网络的优点和原始算法的可读性,自动学习超参数。
  • results: 在synthetic数据集和一个面部模型问题中,该算法实现了更好的数值和视觉性能。
    Abstract We design algorithms for Robust Principal Component Analysis (RPCA) which consists in decomposing a matrix into the sum of a low rank matrix and a sparse matrix. We propose a deep unrolled algorithm based on an accelerated alternating projection algorithm which aims to solve RPCA in its nonconvex form. The proposed procedure combines benefits of deep neural networks and the interpretability of the original algorithm and it automatically learns hyperparameters. We demonstrate the unrolled algorithm's effectiveness on synthetic datasets and also on a face modeling problem, where it leads to both better numerical and visual performances.
    摘要 我们设计了一种Robust Principal Component Analysis(RPCA)算法,该算法可以将矩阵分解成低级矩阵和稀疏矩阵的和。我们提出了一种深度卷积算法,该算法基于加速的交互式投影算法,用于解决RPCA的非核心形式。我们的方法结合了深度神经网络的优点和原始算法的可读性,自动学习超参数。我们在 synthetic 数据集和一个面部建模问题上证明了这种算法的有效性,其 numerically 和 visually 都表现出色。Note: "Simplified Chinese" is a romanization of Chinese characters, which is used to represent the pronunciation of Chinese characters in a Latin-based alphabet. The translation is written in Simplified Chinese, which is the most widely used form of Chinese writing system.

PID-Inspired Inductive Biases for Deep Reinforcement Learning in Partially Observable Control Tasks

  • paper_url: http://arxiv.org/abs/2307.05891
  • repo_url: https://github.com/ianchar/gpide
  • paper_authors: Ian Char, Jeff Schneider
  • for: 这篇论文旨在探讨深度强化学习(RL)如何在数据 alone 上学习控制系统。
  • methods: 论文使用 PID 控制器的成功原理,提出了一种基于 summing 和 differencing 的历史编码方法,以及一种可以应用于任何控制任务的扩展方法。
  • results: 与先前的方法相比,论文的编码器可以生成更加Robust 和更高性能的政策,并在一系列高维控制任务上 achieve 1.7 倍的性能提升。
    Abstract Deep reinforcement learning (RL) has shown immense potential for learning to control systems through data alone. However, one challenge deep RL faces is that the full state of the system is often not observable. When this is the case, the policy needs to leverage the history of observations to infer the current state. At the same time, differences between the training and testing environments makes it critical for the policy not to overfit to the sequence of observations it sees at training time. As such, there is an important balancing act between having the history encoder be flexible enough to extract relevant information, yet be robust to changes in the environment. To strike this balance, we look to the PID controller for inspiration. We assert the PID controller's success shows that only summing and differencing are needed to accumulate information over time for many control tasks. Following this principle, we propose two architectures for encoding history: one that directly uses PID features and another that extends these core ideas and can be used in arbitrary control tasks. When compared with prior approaches, our encoders produce policies that are often more robust and achieve better performance on a variety of tracking tasks. Going beyond tracking tasks, our policies achieve 1.7x better performance on average over previous state-of-the-art methods on a suite of high dimensional control tasks.
    摘要

Efficient Task Offloading Algorithm for Digital Twin in Edge/Cloud Computing Environment

  • paper_url: http://arxiv.org/abs/2307.05888
  • repo_url: None
  • paper_authors: Ziru Zhang, Xuling Zhang, Guangzhi Zhu, Yuyang Wang, Pan Hui
  • for: 本研究旨在提出一种基于多种数据资源的数字双方(DT)系统模型,以及一种基于分布式深度学习(DDL)的卸载决策算法,以提高DT系统的响应速度和能效性。
  • methods: 本研究使用虚拟化和模拟技术,并采用移动云计算(MCC)和边缘计算(MEC)等技术来实现DT系统中的多功能化。而且,本研究还提出了一种基于DDL的卸载决策算法,以解决DT系统中数据卸载的问题。
  • results: 根据实验结果,本研究的提出的算法可以有效地降低DT系统的平均延迟和能 consumption。与基eline相比,本研究的方法在动态环境下得到了显著的提高。
    Abstract In the era of Internet of Things (IoT), Digital Twin (DT) is envisioned to empower various areas as a bridge between physical objects and the digital world. Through virtualization and simulation techniques, multiple functions can be achieved by leveraging computing resources. In this process, Mobile Cloud Computing (MCC) and Mobile Edge Computing (MEC) have become two of the key factors to achieve real-time feedback. However, current works only considered edge servers or cloud servers in the DT system models. Besides, The models ignore the DT with not only one data resource. In this paper, we propose a new DT system model considering a heterogeneous MEC/MCC environment. Each DT in the model is maintained in one of the servers via multiple data collection devices. The offloading decision-making problem is also considered and a new offloading scheme is proposed based on Distributed Deep Learning (DDL). Simulation results demonstrate that our proposed algorithm can effectively and efficiently decrease the system's average latency and energy consumption. Significant improvement is achieved compared with the baselines under the dynamic environment of DTs.
    摘要 在互联网OF Things(IoT)时代,数字双(DT)被描述为在物理对象和数字世界之间的桥梁,通过虚拟化和模拟技术,DT可以实现多种功能,并利用计算资源。在这个过程中,移动云计算(MCC)和边缘计算(MEC)已成为DT系统模型中的两个关键因素,以实现实时反馈。然而,当前的工作仅考虑了边缘服务器或云服务器在DT系统模型中。此外,现有的模型忽略了DT具有多个数据资源的情况。在本文中,我们提出了一种新的DT系统模型,该模型考虑了多种服务器环境中的DT,每个DT在服务器中被维护,并通过多个数据收集设备进行维护。此外,我们还考虑了卸载决策问题,并提出了基于分布式深度学习(DDL)的卸载方案。实验结果表明,我们的提议算法可以有效地和高效地降低系统的平均延迟和能耗。相比基eline,我们的算法在DT动态环境下表现出了显著的改善。

Dynamic Prediction using Time-Dependent Cox Survival Neural Network

  • paper_url: http://arxiv.org/abs/2307.05881
  • repo_url: None
  • paper_authors: Lang Zeng, Jipeng Zhang, Wei Chen, Ying Ding
  • for: 预测年龄相关 macular degeneration(AMD)的进行时间的个性化风险预测,可以随着新数据的可用性而更新。
  • methods: 基于时间依赖的科克斯模型(tdCox model)和神经网络(CNN),提出一种时间依赖神经网络存储模型(tdCoxSNN),用于预测AMD的进行时间的连续时间观察图像。tdCoxSNN可以模型时间依赖的非线性影响因素的效应。
  • results: 通过对两个实际数据集进行分析,包括一个大的AMD研究(Age-Related Eye Disease Study,AREDS)和一个公共数据集的主发炎病(PBC)疾病,我们的方法实现了满意的预测性能。
    Abstract The target of dynamic prediction is to provide individualized risk predictions over time which can be updated as new data become available. Motivated by establishing a dynamic prediction model for the progressive eye disease, age-related macular degeneration (AMD), we proposed a time-dependent Cox model-based survival neural network (tdCoxSNN) to predict its progression on a continuous time scale using longitudinal fundus images. tdCoxSNN extends the time-dependent Cox model by utilizing a neural network to model the non-linear effect of the time-dependent covariates on the survival outcome. Additionally, by incorporating the convolutional neural network (CNN), tdCoxSNN can take the longitudinal raw images as input. We evaluate and compare our proposed method with joint modeling and landmarking approaches through comprehensive simulations using two time-dependent accuracy metrics, the Brier Score and dynamic AUC. We applied the proposed approach to two real datasets. One is a large AMD study, the Age-Related Eye Disease Study (AREDS), in which more than 50,000 fundus images were captured over a period of 12 years for more than 4,000 participants. Another is a public dataset of the primary biliary cirrhosis (PBC) disease, in which multiple lab tests were longitudinally collected to predict the time-to-liver transplant. Our approach achieves satisfactory prediction performance in both simulation studies and the two real data analyses. tdCoxSNN was implemented in PyTorch, Tensorflow, and R-Tensorflow.
    摘要 目标是提供个性化风险预测,随着时间的推移进行更新。驱动于年龄相关的抑阻性眼病(AMD)的进程预测,我们提出了基于时间依赖的戴克模型的时间依赖神经网络(tdCoxSNN),以预测在连续时间尺度上的进程。tdCoxSNN通过使用神经网络来模型时间依赖的非线性效应,从而扩展了时间依赖戴克模型。此外,tdCoxSNN可以通过抽象神经网络来处理长itudinal的原始图像。我们通过了全面的 simulations 来评估和比较我们的提议方法和联合模型和标记方法,使用了两种时间依赖准确度指标:布里度分数和动态AUC。我们将该方法应用于两个实际数据集。一个是大量AMD研究,Age-Related Eye Disease Study (AREDS),其中超过50,000个眼科图像在12年时间内被捕捉,并且超过4,000名参与者。另一个是公共数据集的主发炎病(PBC)疾病,其中多个实验室测试被长期采集,以预测时间到肝脏移植。我们的方法在两个实际数据分析中获得了满意的预测性能。tdCoxSNN在 PyTorch、TensorFlow 和 R-Tensorflow 中实现。

Ecosystem-level Analysis of Deployed Machine Learning Reveals Homogeneous Outcomes

  • paper_url: http://arxiv.org/abs/2307.05862
  • repo_url: None
  • paper_authors: Connor Toups, Rishi Bommasani, Kathleen A. Creel, Sarah H. Bana, Dan Jurafsky, Percy Liang
  • for: 本研究旨在探讨机器学习技术在社会中的影响,以及它们在不同上下文中的应用。
  • methods: 本研究采用了生态系统水平的分析方法,而不是单独分析具体的模型。研究者们对11个数据集进行了分析,并发现了一个普遍存在的趋势:已部署的机器学习系统具有系统性的失败现象,即某些用户被所有模型都错误地分类。
  • results: 研究发现,尽管具体的模型在人口级别上的表现得到改进,但这些改进很少降低了系统性失败的频率。此外,研究者们发现了新的种族差距现象,即模型的预测与人类预测之间存在差异。这些例子表明,生态系统水平的分析具有描述机器学习技术在社会中的社会影响的独特优势。
    Abstract Machine learning is traditionally studied at the model level: researchers measure and improve the accuracy, robustness, bias, efficiency, and other dimensions of specific models. In practice, the societal impact of machine learning is determined by the surrounding context of machine learning deployments. To capture this, we introduce ecosystem-level analysis: rather than analyzing a single model, we consider the collection of models that are deployed in a given context. For example, ecosystem-level analysis in hiring recognizes that a job candidate's outcomes are not only determined by a single hiring algorithm or firm but instead by the collective decisions of all the firms they applied to. Across three modalities (text, images, speech) and 11 datasets, we establish a clear trend: deployed machine learning is prone to systemic failure, meaning some users are exclusively misclassified by all models available. Even when individual models improve at the population level over time, we find these improvements rarely reduce the prevalence of systemic failure. Instead, the benefits of these improvements predominantly accrue to individuals who are already correctly classified by other models. In light of these trends, we consider medical imaging for dermatology where the costs of systemic failure are especially high. While traditional analyses reveal racial performance disparities for both models and humans, ecosystem-level analysis reveals new forms of racial disparity in model predictions that do not present in human predictions. These examples demonstrate ecosystem-level analysis has unique strengths for characterizing the societal impact of machine learning.
    摘要 机器学习传统上被研究在模型层次上:研究人员测量和改进精度、可靠性、偏见、效率等模型维度。在实践中,机器学习的社会影响受到机器学习部署的外部环境影响。为了捕捉这一点,我们引入生态系统级分析:而不是分析单一模型,我们考虑在给定上下文中部署的模型集。例如,在招聘中,候选人的结果不仅受到单个招聘算法或公司的决策影响,而是由所有申请公司的集遇决策决定。在文本、图像和语音三种Modalities以及11个数据集中,我们发现一个明确的趋势:部署机器学习倾向于系统性失败, meaning some users are exclusively misclassified by all models available。即使个体模型在人口水平上逐渐改进,我们发现这些改进很少降低系统性失败的频率。相反,改进的 beneficial 效果主要为已经正确地分类的用户带来。为了解决这一问题,我们考虑医学影像领域,特别是皮肤病的情况。传统分析显示人类和模型之间存在种族性能差,而生态系统级分析则揭示了模型预测中新的种族性差。这些示例表明生态系统级分析具有特殊的优势,可以帮助我们更好地了解机器学习的社会影响。

FAIRO: Fairness-aware Adaptation in Sequential-Decision Making for Human-in-the-Loop Systems

  • paper_url: http://arxiv.org/abs/2307.05857
  • repo_url: None
  • paper_authors: Tianyu Zhao, Mojtaba Taherisadr, Salma Elmalaki
  • for: 本文旨在提出一种基于人类行为变化的循环决策系统中的公平性问题的解决方案,尤其是在多个人类具有不同行为和期望的情况下。
  • methods: 本文提出了一种名为FAIRO的新算法,用于在人类在Loop(HITL)环境中实现公平性。FAIRO将这个复杂的公平性问题分解成个人人类偏好的适应任务,通过利用Options reinforcement learning框架。
  • results: 评估表明,FAIRO可以在三种不同的HITL应用场景中实现公平性,同时考虑人类行为变化。FAIRO比其他方法在所有三个应用场景中平均提高公平性水平35.36%。
    Abstract Achieving fairness in sequential-decision making systems within Human-in-the-Loop (HITL) environments is a critical concern, especially when multiple humans with different behavior and expectations are affected by the same adaptation decisions in the system. This human variability factor adds more complexity since policies deemed fair at one point in time may become discriminatory over time due to variations in human preferences resulting from inter- and intra-human variability. This paper addresses the fairness problem from an equity lens, considering human behavior variability, and the changes in human preferences over time. We propose FAIRO, a novel algorithm for fairness-aware sequential-decision making in HITL adaptation, which incorporates these notions into the decision-making process. In particular, FAIRO decomposes this complex fairness task into adaptive sub-tasks based on individual human preferences through leveraging the Options reinforcement learning framework. We design FAIRO to generalize to three types of HITL application setups that have the shared adaptation decision problem. Furthermore, we recognize that fairness-aware policies can sometimes conflict with the application's utility. To address this challenge, we provide a fairness-utility tradeoff in FAIRO, allowing system designers to balance the objectives of fairness and utility based on specific application requirements. Extensive evaluations of FAIRO on the three HITL applications demonstrate its generalizability and effectiveness in promoting fairness while accounting for human variability. On average, FAIRO can improve fairness compared with other methods across all three applications by 35.36%.
    摘要

PIGEON: Predicting Image Geolocations

  • paper_url: http://arxiv.org/abs/2307.05845
  • repo_url: None
  • paper_authors: Lukas Haas, Michal Skreta, Silas Alberti
  • for: 这个论文是为了提出一种多任务端到端系统,以实现地球规模的图像地理位置确定。
  • methods: 该论文使用了semantic geocell创建和分割算法、图像地理信息预训练和ProtoNets进行位置预测精度提高。
  • results: 该论文在外部测试数据和人工评估中均达到了状态机的表现,并且提供了一个可用于邻域领域的预训练CLIP变换器模型。
    Abstract We introduce PIGEON, a multi-task end-to-end system for planet-scale image geolocalization that achieves state-of-the-art performance on both external benchmarks and in human evaluation. Our work incorporates semantic geocell creation with label smoothing, conducts pretraining of a vision transformer on images with geographic information, and refines location predictions with ProtoNets across a candidate set of geocells. The contributions of PIGEON are three-fold: first, we design a semantic geocells creation and splitting algorithm based on open-source data which can be adapted to any geospatial dataset. Second, we show the effectiveness of intra-geocell refinement and the applicability of unsupervised clustering and ProtNets to the task. Finally, we make our pre-trained CLIP transformer model, StreetCLIP, publicly available for use in adjacent domains with applications to fighting climate change and urban and rural scene understanding.
    摘要 我们介绍PIGEON,一个多任务端到端系统,用于大规模图像地理位置localization,实现了最新的性能标准。我们的工作包括semantic geocell创建与标签平滑、图像地理信息预训练vision transformer,以及在候选集geocells上进行位置预测refinement。PIGEON的贡献有三个方面:1. 我们设计了基于开源数据的semantic geocells创建和分割算法,可以适应任何地ospatial dataset。2. 我们证明了内部geocell划分的有效性和无监督归类和ProtoNets在这个任务中的可行性。3. 我们公开发布了我们预训练的CLIP transformer模型,StreetCLIP,用于附近领域的应用,如战击气候变化和城市和农村场景理解。

Scaling Distributed Multi-task Reinforcement Learning with Experience Sharing

  • paper_url: http://arxiv.org/abs/2307.05834
  • repo_url: None
  • paper_authors: Sanae Amani, Khushbu Pahwa, Vladimir Braverman, Lin F. Yang
    for:Distributed multi-task reinforcement learning (RL) is explored to benefit distributed lifelong learning agents in adapting to new challenges, specifically in the context of the ShELL program launched by DARPA.methods:The paper uses both theoretical and empirical research to address the problem of distributed multi-task RL, where a group of $N$ agents collaboratively solve $M$ tasks without prior knowledge of their identities. The problem is formulated as linearly parameterized contextual Markov decision processes (MDPs), and the proposed algorithm is called DistMT-LSVI.results:The paper shows that a single agent using DistMT-LSVI needs to run a total of at most $\tilde{\mathcal{O}({d^3H^6(\epsilon^{-2}+c_{\rm sep}^{-2})}\cdot M/N)$ episodes to achieve $\epsilon$-optimal policies for all $M$ tasks, improving the sample complexity of non-distributed settings by a factor of $1/N$. Numerical experiments conducted on OpenAI Gym Atari environments validate the theoretical findings.
    Abstract Recently, DARPA launched the ShELL program, which aims to explore how experience sharing can benefit distributed lifelong learning agents in adapting to new challenges. In this paper, we address this issue by conducting both theoretical and empirical research on distributed multi-task reinforcement learning (RL), where a group of $N$ agents collaboratively solves $M$ tasks without prior knowledge of their identities. We approach the problem by formulating it as linearly parameterized contextual Markov decision processes (MDPs), where each task is represented by a context that specifies the transition dynamics and rewards. To tackle this problem, we propose an algorithm called DistMT-LSVI. First, the agents identify the tasks, and then they exchange information through a central server to derive $\epsilon$-optimal policies for the tasks. Our research demonstrates that to achieve $\epsilon$-optimal policies for all $M$ tasks, a single agent using DistMT-LSVI needs to run a total number of episodes that is at most $\tilde{\mathcal{O}({d^3H^6(\epsilon^{-2}+c_{\rm sep}^{-2})}\cdot M/N)$, where $c_{\rm sep}>0$ is a constant representing task separability, $H$ is the horizon of each episode, and $d$ is the feature dimension of the dynamics and rewards. Notably, DistMT-LSVI improves the sample complexity of non-distributed settings by a factor of $1/N$, as each agent independently learns $\epsilon$-optimal policies for all $M$ tasks using $\tilde{\mathcal{O}(d^3H^6M\epsilon^{-2})$ episodes. Additionally, we provide numerical experiments conducted on OpenAI Gym Atari environments that validate our theoretical findings.
    摘要 最近,DARPA发起了Shell计划,旨在探索经验分享如何为分布式长期学习代理人在面临新挑战时适应。在这篇论文中,我们对这个问题进行了both theoretically和实验研究,我们使用分布式多任务强化学习(RL)来解决这个问题。我们将问题表述为线性参数化上下文 Markov决策过程(MDP),每个任务都被一个上下文特定的过程和奖励规则表示。为解决这个问题,我们提议了一个算法 called DistMT-LSVI。首先,代理人们识别任务,然后通过中央服务器交换信息以 derivate ε-优质策略。我们的研究表明,使用 DistMT-LSVI,每个代理人只需要运行 $\tilde{\mathcal{O}({d^3H^6(\epsilon^{-2}+c_{\rm sep}^{-2})}\cdot M/N)$ 集数,其中 $c_{\rm sep}>0$ 是任务分离度,$H$ 是每集的时间范围,$d$ 是动力和奖励的特征维度。各代理人独立学习 $\epsilon$-优质策略,每个任务使用 $\tilde{\mathcal{O}(d^3H^6M\epsilon^{-2})$ 集数。此外,我们对 OpenAI Gym Atari 环境进行了实验,并证明了我们的理论发现。

Memorization Through the Lens of Curvature of Loss Function Around Samples

  • paper_url: http://arxiv.org/abs/2307.05831
  • repo_url: None
  • paper_authors: Isha Garg, Kaushik Roy
  • for: 这篇论文旨在探讨神经网络在训练集上的溯源和泛化能力问题。
  • methods: 论文使用损失函数的曲线特性来衡量神经网络的溯源和泛化能力,并在各个训练轮数据上 calculate 平均损失曲线的弯曲度。
  • results: 研究发现,在各个训练集上,神经网络可以具有高度的溯源和泛化能力,但也可能存在强度的溯源和泛化能力。此外,研究还发现了一种新的失效模型,即 duplicated images with different labels。此外,通过随机损害一些样本的标签,发现 curvature 排序可以具有高 AUROC 值来识别损害的样本。
    Abstract Neural networks are overparametrized and easily overfit the datasets they train on. In the extreme case, it is shown that they can memorize a training set with fully randomized labels. We propose using the curvature of loss function around the training sample as a measure of its memorization, averaged over all training epochs. We use this to study the generalization versus memorization properties of different samples in popular image datasets. We visualize samples with the highest curvature of loss around them, and show that these visually correspond to long-tailed, mislabeled or conflicting samples. This analysis helps us find a, to the best of our knowledge, novel failure model on the CIFAR100 dataset, that of duplicated images with different labels. We also synthetically mislabel a proportion of the dataset by randomly corrupting the labels of a few samples, and show that sorting by curvature yields high AUROC values for identifying the mislabeled samples.
    摘要 Here's the Simplified Chinese translation:神经网络经常过参数化,容易过拟合训练数据。在极端情况下,它们可以完全记忆训练集的 labels。我们提议使用损失函数的 curvature around the training sample作为其记忆度的度量,并对所有训练轮进行平均计算。我们使用这种方法来研究不同样本的泛化性与记忆性的关系,并在受欢迎的图像集中进行可视化分析。我们发现了一种新的失败模式在 CIFAR100 数据集上,即重复的图像与不同标签存在。我们还使用随机损害标签的方式 Synthetically mislabel a portion of the dataset,并显示了以 curvature 为排序的 AUROC 值高。

Relational Extraction on Wikipedia Tables using Convolutional and Memory Networks

  • paper_url: http://arxiv.org/abs/2307.05827
  • repo_url: https://github.com/simpleparadox/re_656
  • paper_authors: Arif Shahriar, Rohan Saha, Denilson Barbosa
  • for: 这篇论文主要是为了提出一种基于表格数据的关系提取方法。
  • methods: 该方法使用了卷积神经网络和双向长短Term Memory网络来编码实体和学习实体之间的依赖关系。
  • results: 实验结果显示,该模型在大规模最新的数据集上 consistently 超过了之前的神经方法 для关系提取任务。
    Abstract Relation extraction (RE) is the task of extracting relations between entities in text. Most RE methods extract relations from free-form running text and leave out other rich data sources, such as tables. We explore RE from the perspective of applying neural methods on tabularly organized data. We introduce a new model consisting of Convolutional Neural Network (CNN) and Bidirectional-Long Short Term Memory (BiLSTM) network to encode entities and learn dependencies among them, respectively. We evaluate our model on a large and recent dataset and compare results with previous neural methods. Experimental results show that our model consistently outperforms the previous model for the task of relation extraction on tabular data. We perform comprehensive error analyses and ablation study to show the contribution of various components of our model. Finally, we discuss the usefulness and trade-offs of our approach, and provide suggestions for fostering further research.
    摘要 <>将文本中的关系抽取出来(Relation Extraction,RE)是一项任务,它的目标是从自由文本中提取实体之间的关系。大多数RE方法都是从自由文本中提取关系,而忽略其他丰富数据源,如表格。我们从表格化数据的角度来探讨RE。我们介绍了一种新的模型,该模型包括卷积神经网络(CNN)和双向长短期记忆网络(BiLSTM),用于编码实体和学习实体之间的依赖关系。我们对一个大型和最新的数据集进行了评估,并与前一代神经网络方法进行比较。实验结果表明,我们的模型在关系抽取任务中一直表现出色,超越了前一代神经网络方法。我们进行了完整的错误分析和减少研究,以显示我们模型的各个组件的贡献。最后,我们讨论了我们的方法的有用性和缺点,并提供了进一步研究的建议。

AnuraSet: A dataset for benchmarking Neotropical anuran calls identification in passive acoustic monitoring

  • paper_url: http://arxiv.org/abs/2307.06860
  • repo_url: https://github.com/soundclim/anuraset
  • paper_authors: Juan Sebastián Cañas, Maria Paula Toro-Gómez, Larissa Sayuri Moreira Sugai, Hernán Darío Benítez Restrepo, Jorge Rudas, Breyner Posso Bautista, Luís Felipe Toledo, Simone Dena, Adão Henrique Rosa Domingos, Franco Leandro de Souza, Selvino Neckel-Oliveira, Anderson da Rosa, Vítor Carvalho-Rocha, José Vinícius Bernardy, José Luiz Massao Moreira Sugai, Carolina Emília dos Santos, Rogério Pereira Bastos, Diego Llusia, Juan Sebastián Ulloa
  • for: 这个论文的目的是为了研究鳄鱼的叫声行为,以便通过pasive acoustic monitoring(PAM)来了解全球变化对鳄鱼的影响。
  • methods: 这篇论文使用了大规模多种鳄鱼种类的叫声数据集,包括42种不同的鳄鱼种类从两个南美生态系统中记录的27小时专家标注。
  • results: 论文提供了一个开放的数据集,包括原始录音、实验设置代码和一个基线模型的评估。同时,论文还挑战了机器学习研究人员解决鳄鱼叫声识别问题,以便为保护政策提供技术支持。
    Abstract Global change is predicted to induce shifts in anuran acoustic behavior, which can be studied through passive acoustic monitoring (PAM). Understanding changes in calling behavior requires the identification of anuran species, which is challenging due to the particular characteristics of neotropical soundscapes. In this paper, we introduce a large-scale multi-species dataset of anuran amphibians calls recorded by PAM, that comprises 27 hours of expert annotations for 42 different species from two Brazilian biomes. We provide open access to the dataset, including the raw recordings, experimental setup code, and a benchmark with a baseline model of the fine-grained categorization problem. Additionally, we highlight the challenges of the dataset to encourage machine learning researchers to solve the problem of anuran call identification towards conservation policy. All our experiments and resources can be found on our GitHub repository https://github.com/soundclim/anuraset.
    摘要

Bayesian taut splines for estimating the number of modes

  • paper_url: http://arxiv.org/abs/2307.05825
  • repo_url: None
  • paper_authors: José E. Chacón, Javier Fernández Serrano
  • for: 本研究targets the estimation of the number of modes in a probability density function, which is representative of the model’s complexity and the number of existing subpopulations.
  • methods: 我们提出了一种新的方法,启发自一些受欢迎的假设和潜在的解决方案。 Our method combines flexible kernel estimators and parsimonious compositional splines, and incorporates feature exploration, model selection, and mode testing in the Bayesian inference paradigm.
  • results: 我们的方法在一个实际应用中(体育分析)中展示了多种伴生视觉工具,并通过了严格的模拟研究。 Traditional modality-driven approaches paradoxically struggle to provide accurate results, but our method emerges as a top-tier alternative offering innovative solutions for analysts.
    Abstract The number of modes in a probability density function is representative of the model's complexity and can also be viewed as the number of existing subpopulations. Despite its relevance, little research has been devoted to its estimation. Focusing on the univariate setting, we propose a novel approach targeting prediction accuracy inspired by some overlooked aspects of the problem. We argue for the need for structure in the solutions, the subjective and uncertain nature of modes, and the convenience of a holistic view blending global and local density properties. Our method builds upon a combination of flexible kernel estimators and parsimonious compositional splines. Feature exploration, model selection and mode testing are implemented in the Bayesian inference paradigm, providing soft solutions and allowing to incorporate expert judgement in the process. The usefulness of our proposal is illustrated through a case study in sports analytics, showcasing multiple companion visualisation tools. A thorough simulation study demonstrates that traditional modality-driven approaches paradoxically struggle to provide accurate results. In this context, our method emerges as a top-tier alternative offering innovative solutions for analysts.
    摘要 “一个概率密度函数中的模式数量对于模型的复杂性是代表性的,同时也可以视为存在多个子人口的数量。尽管其重要性,仍然有少量的研究对其进行了探讨。在单变量设置下,我们提出了一种新的方法,旨在提高预测精度,基于一些受过lookover的方面。我们认为结构在解决方案中是重要的,模式的主观和不确定性,以及全局和局部概率性质的整体视图的便利性。我们的方法基于灵活的kernel估计器和简洁的 compositional splines。在 bayesian推理框架下,我们实现了特征探索、模型选择和模式测试,提供软解决方案,并允许把专家判断纳入过程中。我们的提议在运动统计领域中进行了一个案例研究,并展示了多种伴生视觉工具。一系列的 simulations 研究表明,传统的模态驱动方法却难以提供准确的结果,在这种情况下,我们的方法出现为一个优质的代替方案,为分析师提供创新的解决方案。”Note: Please note that the translation is in Simplified Chinese, and some words or phrases may have different translations in Traditional Chinese.

Safe Reinforcement Learning for Strategic Bidding of Virtual Power Plants in Day-Ahead Markets

  • paper_url: http://arxiv.org/abs/2307.05812
  • repo_url: None
  • paper_authors: Ognjen Stanojev, Lesia Mitridati, Riccardo de Nardis di Prata, Gabriela Hug
  • for: 这篇论文旨在提出一种安全的优化学习算法,用于在日后电力市场中进行虚拟发电厂(VPP)的竞投策略选择。
  • methods: 该算法使用深度决定策函数方法(DDPG)学习竞投策略,不需要精准的市场模型。此外,为了考虑分布式能源资源的复杂内部物理约束,我们提出了两个改进。首先,基于投影的安全屏障,限制代理人的行为在非线性电力流方程和运行约束下的可行空间内。其次,在奖励函数中增加了一个障碍屏障的惩罚项,以鼓励代理人学习更安全的策略。
  • results: 一个基于IEEE 13-bus网络的案例研究表明,提出的方法可以帮助代理人学习一种非常竞争力强、安全的策略。
    Abstract This paper presents a novel safe reinforcement learning algorithm for strategic bidding of Virtual Power Plants (VPPs) in day-ahead electricity markets. The proposed algorithm utilizes the Deep Deterministic Policy Gradient (DDPG) method to learn competitive bidding policies without requiring an accurate market model. Furthermore, to account for the complex internal physical constraints of VPPs we introduce two enhancements to the DDPG method. Firstly, a projection-based safety shield that restricts the agent's actions to the feasible space defined by the non-linear power flow equations and operating constraints of distributed energy resources is derived. Secondly, a penalty for the shield activation in the reward function that incentivizes the agent to learn a safer policy is introduced. A case study based on the IEEE 13-bus network demonstrates the effectiveness of the proposed approach in enabling the agent to learn a highly competitive, safe strategic policy.
    摘要
  1. A projection-based safety shield that restricts the agent’s actions to the feasible space defined by the non-linear power flow equations and operating constraints of distributed energy resources.2. A penalty for the shield activation in the reward function that incentivizes the agent to learn a safer policy.A case study based on the IEEE 13-bus network demonstrates the effectiveness of the proposed approach in enabling the agent to learn a highly competitive, safe strategic policy.Simplified Chinese translation:这篇论文提出了一种新的安全强化学习算法,用于虚拟能源厂(VPP)在日前电力市场中的投标策略。该算法使用深度决定策优化(DDPG)方法学习竞争性的投标策略,无需准确的市场模型。此外,为了考虑VPP内部的复杂物理约束,我们引入了两个优化:1. 一种基于投影的安全盾,限制智能机器人的动作在分布式能源资源的非线性电流方程和操作约束下的可行空间中。2. 在奖励函数中增加一个奖励用于盾牌活动的罚款,以鼓励智能机器人学习一个更安全的策略。基于IEEE 13 bus网络的案例研究表明,提议的方法可以帮助智能机器人学习一个非常竞争力高、安全的策略。

Differentiable Forward Projector for X-ray Computed Tomography

  • paper_url: http://arxiv.org/abs/2307.05801
  • repo_url: https://github.com/llnl/leap
  • paper_authors: Hyojin Kim, Kyle Champley
  • for: 这篇论文是为了解决计算机 Tomography 重建问题而写的。
  • methods: 这篇论文使用的方法是数据驱动深度学习,可以超越现有的分析和迭代算法,尤其是在不良定的 CT 重建问题中。
  • results: 这篇论文提出了一个准确的导计前向和反向投影软件库,以确保预测图像与原始测量数据之间的一致性。这个软件库支持多种投影几何类型,同时尽量减少 GPU 内存占用量,以便与现有的深度学习训练和推理管道集成无缝。
    Abstract Data-driven deep learning has been successfully applied to various computed tomographic reconstruction problems. The deep inference models may outperform existing analytical and iterative algorithms, especially in ill-posed CT reconstruction. However, those methods often predict images that do not agree with the measured projection data. This paper presents an accurate differentiable forward and back projection software library to ensure the consistency between the predicted images and the original measurements. The software library efficiently supports various projection geometry types while minimizing the GPU memory footprint requirement, which facilitates seamless integration with existing deep learning training and inference pipelines. The proposed software is available as open source: https://github.com/LLNL/LEAP.
    摘要 <>将数据驱动的深度学习应用到了多个计算Tomography重建问题中,深度推理模型可能超越现有的分析和迭代算法,尤其是在糜爷CT重建中。然而,这些方法经常预测不符合测量角度数据的图像。这篇文章介绍了一个准确的可导进程和反进程软件库,以确保预测的图像与原始测量数据保持一致。该软件库支持多种投影几何类型,同时减少GPU内存占用量,以便与现有的深度学习训练和推理管道集成。该软件开源可用:https://github.com/LLNL/LEAP。Note: "LEAP" stands for "Livermore Eigen Solver" in the text, but it is not translated in the Simplified Chinese version.

  • paper_url: http://arxiv.org/abs/2307.05794
  • repo_url: https://github.com/weilabmsu/oud-ppi
  • paper_authors: Long Chen, Jian Jiang, Bozheng Dou, Hongsong Feng, Jie Liu, Yueying Zhu, Bengong Zhang, Tianshou Zhou, Guo-Wei Wei
  • for: 这个研究旨在发展新的痛症处理方法,以提高现有的痛症治疗选择,并实现更好的效果和更少的副作用。
  • methods: 这个研究使用蛋白质-蛋白质互作网络(PPI)和药物-标靶互作网络(DTI)来探索痛症相关的NaV1.3、NaV1.7、NaV1.8和NaV1.9感触通道,以找到可能的领域药物。
  • results: 这个研究通过系统性的测试过程,评估了150,000多个药物潜在目标的副作用和重新利用潜力,并评估了这些目标的ADMET特性,以找到最佳的领域药物。
    Abstract Pain is a significant global health issue, and the current treatment options for pain management have limitations in terms of effectiveness, side effects, and potential for addiction. There is a pressing need for improved pain treatments and the development of new drugs. Voltage-gated sodium channels, particularly Nav1.3, Nav1.7, Nav1.8, and Nav1.9, play a crucial role in neuronal excitability and are predominantly expressed in the peripheral nervous system. Targeting these channels may provide a means to treat pain while minimizing central and cardiac adverse effects. In this study, we construct protein-protein interaction (PPI) networks based on pain-related sodium channels and develop a corresponding drug-target interaction (DTI) network to identify potential lead compounds for pain management. To ensure reliable machine learning predictions, we carefully select 111 inhibitor datasets from a pool of over 1,000 targets in the PPI network. We employ three distinct machine learning algorithms combined with advanced natural language processing (NLP)-based embeddings, specifically pre-trained transformer and autoencoder representations. Through a systematic screening process, we evaluate the side effects and repurposing potential of over 150,000 drug candidates targeting Nav1.7 and Nav1.8 sodium channels. Additionally, we assess the ADMET (absorption, distribution, metabolism, excretion, and toxicity) properties of these candidates to identify leads with near-optimal characteristics. Our strategy provides an innovative platform for the pharmacological development of pain treatments, offering the potential for improved efficacy and reduced side effects.
    摘要 疼痛是全球健康 Issue 的一个重要问题,现有的疼痛管理选项有限,效果有限,并且可能会导致成瘾。有一个急需新的疼痛治疗和新药的开发。电位调节的钾离子通道,特别是Nav1.3、Nav1.7、Nav1.8和Nav1.9,在神经元的兴奋性中扮演着关键角色,对于疼痛的治疗可能提供一个新的途径。在这个研究中,我们建立了疾病相互作用(PPI)网络,并将其与疼痛相关的钾离子通道之间的互动组合成一个对疼痛管理的药物-标的互动网络(DTI),以获得可能的领域药物。为确保机器学学习预测的可靠性,我们从抽象标的网络中选择111个抑制标的数据集,并使用三种不同的机器学学习算法,其中包括具有进步的自然语言处理(NLP)基于嵌入的预训练 transformer 和自动编码器表示。通过一系列的排序过程,我们评估了 Nav1.7 和 Nav1.8 钾离子通道的抑制者具有哪些副作用和可重用性。此外,我们评估了这些候选药物的ADMET特性,以选择具有最佳特性的领域药物。我们的策略提供了一个创新的疼痛治疗开发平台,具有改善效果和副作用的减少。

Implicit regularisation in stochastic gradient descent: from single-objective to two-player games

  • paper_url: http://arxiv.org/abs/2307.05789
  • repo_url: None
  • paper_authors: Mihaela Rosca, Marc Peter Deisenroth
  • for: 这个论文的目的是研究深度学习优化中的隐式正则化效果,以及如何使用这些效果来改进性能和稳定性。
  • methods: 这个论文使用了回归错误分析(BEA)来量化步长误差,并使用continuous-time flows来找到隐式正则化效果。
  • results: 这个论文发现了多个隐式正则化效果,包括在多个渐近梯度 descent步骤中产生的正则化效果,以及在总体 differentiable two-player games 中产生的正则化效果。
    Abstract Recent years have seen many insights on deep learning optimisation being brought forward by finding implicit regularisation effects of commonly used gradient-based optimisers. Understanding implicit regularisation can not only shed light on optimisation dynamics, but it can also be used to improve performance and stability across problem domains, from supervised learning to two-player games such as Generative Adversarial Networks. An avenue for finding such implicit regularisation effects has been quantifying the discretisation errors of discrete optimisers via continuous-time flows constructed by backward error analysis (BEA). The current usage of BEA is not without limitations, since not all the vector fields of continuous-time flows obtained using BEA can be written as a gradient, hindering the construction of modified losses revealing implicit regularisers. In this work, we provide a novel approach to use BEA, and show how our approach can be used to construct continuous-time flows with vector fields that can be written as gradients. We then use this to find previously unknown implicit regularisation effects, such as those induced by multiple stochastic gradient descent steps while accounting for the exact data batches used in the updates, and in generally differentiable two-player games.
    摘要 One avenue for finding implicit regularization effects has been to quantify the discretization errors of discrete optimizers using continuous-time flows constructed by backward error analysis (BEA). However, the current usage of BEA is not without limitations, as not all the vector fields of continuous-time flows obtained using BEA can be written as gradients, hindering the construction of modified losses revealing implicit regularizers.In this work, we propose a novel approach to using BEA, which allows us to construct continuous-time flows with vector fields that can be written as gradients. We then use this approach to find previously unknown implicit regularization effects, such as those induced by multiple stochastic gradient descent steps while accounting for the exact data batches used in the updates, and in generally differentiable two-player games.

Making the Nyström method highly accurate for low-rank approximations

  • paper_url: http://arxiv.org/abs/2307.05785
  • repo_url: None
  • paper_authors: Jianlin Xia
  • for: 本研究 propose a series of heuristic strategies to improve the accuracy of the Nystr"om method for nonsymmetric and/or rectangular matrices.
  • methods: 提议使用一种快速的准备策略,使用Nystr"om方法和纤维辐射因子作为快速的轮循策略,并使用两种反复更新策略: alternate 行和列准备,以及逐渐增加样本数ntil reached a desired rank or accuracy.
  • results: 实验表明,高精度Nystr"om方法可以快速达到预先设置的高精度,并且在一些情况下,与SVD的质量几乎相同,仅使用少量的进程式扫描步骤。
    Abstract The Nystr\"om method is a convenient heuristic method to obtain low-rank approximations to kernel matrices in nearly linear complexity. Existing studies typically use the method to approximate positive semidefinite matrices with low or modest accuracies. In this work, we propose a series of heuristic strategies to make the Nystr\"om method reach high accuracies for nonsymmetric and/or rectangular matrices. The resulting methods (called high-accuracy Nystr\"om methods) treat the Nystr\"om method and a skinny rank-revealing factorization as a fast pivoting strategy in a progressive alternating direction refinement process. Two refinement mechanisms are used: alternating the row and column pivoting starting from a small set of randomly chosen columns, and adaptively increasing the number of samples until a desired rank or accuracy is reached. A fast subset update strategy based on the progressive sampling of Schur complements is further proposed to accelerate the refinement process. Efficient randomized accuracy control is also provided. Relevant accuracy and singular value analysis is given to support some of the heuristics. Extensive tests with various kernel functions and data sets show how the methods can quickly reach prespecified high accuracies in practice, sometimes with quality close to SVDs, using only small numbers of progressive sampling steps.
    摘要 “尼斯特罗姆方法”是一种便利的估算方法,用于获取低级别approximation matrices的kernel matrices,在近似线性复杂度下。现有研究通常使用这种方法来 aproximatepositive semi-definite matrices with low or modest accuracies。在这个工作中,我们提出了一系列的优化策略,使得尼斯特罗姆方法可以在非对称和/或方正矩阵上达到高精度。这些方法(称为高精度尼斯特罗姆方法)将尼斯特罗姆方法和瘦rank-revealing factorization视为一种快速转移策略,并在进行 alternating direction refinement 过程中使用两种缓存机制: alternating the row and column pivoting,从一个小的Randomly chosen columns开始,并逐渐增加样本数 Until a desired rank or accuracy is reached。此外,我们还提出了一种快速subset update策略,基于进行逐渐 sampling of Schur complements。此外,我们还提供了高效的随机化准确性控制。 relevante accuracy和singular value analysis 支持一些of the heuristics。具体测试结果表明,这些方法可以在各种kernel functions和数据集上快速到达预先定义的高精度,有时与SVDs的质量几乎相同,只使用了小量的进行逐渐 sampling steps。

Weisfeiler and Lehman Go Measurement Modeling: Probing the Validity of the WL Test

  • paper_url: http://arxiv.org/abs/2307.05775
  • repo_url: https://github.com/arjunsubramonian/wl-test-exploration
  • paper_authors: Arjun Subramonian, Adina Williams, Maximilian Nickel, Yizhou Sun, Levent Sagun
  • for: 本研究旨在探讨图 neural network的表达能力是如何量化的,以及$k$-dimensional Weisfeiler-Lehman ($k$-WL) 测试是否能够准确地评估图 neural network的表达能力。
  • methods: 本研究采用系统性分析和评估$k$-WL 测试的可靠性和有效性,以及一份问卷调查(n = 18)来探讨实践者对表达能力的概念和$k$-WL 测试的假设。
  • results: 分析发现$k$-WL 测试并不能保证同构,可能与实际的图任务无关,并且可能不会提高通用性或可靠性。作者提议使用外部定义和测试表达能力基于标准套件,并提供了指导问题来构建这些套件,以促进图机器学习的进步。
    Abstract The expressive power of graph neural networks is usually measured by comparing how many pairs of graphs or nodes an architecture can possibly distinguish as non-isomorphic to those distinguishable by the $k$-dimensional Weisfeiler-Lehman ($k$-WL) test. In this paper, we uncover misalignments between practitioners' conceptualizations of expressive power and $k$-WL through a systematic analysis of the reliability and validity of $k$-WL. We further conduct a survey ($n = 18$) of practitioners to surface their conceptualizations of expressive power and their assumptions about $k$-WL. In contrast to practitioners' opinions, our analysis (which draws from graph theory and benchmark auditing) reveals that $k$-WL does not guarantee isometry, can be irrelevant to real-world graph tasks, and may not promote generalization or trustworthiness. We argue for extensional definitions and measurement of expressive power based on benchmarks; we further contribute guiding questions for constructing such benchmarks, which is critical for progress in graph machine learning.
    摘要 通常来说,图 neural network 的表达力是通过比较它们可以分辨的图或节点数量与 $k $-dimensional Weisfeiler-Lehman ($k $-WL) 测试的结果进行比较来度量的。在这篇论文中,我们发现了实践者们对表达力的概念和 $k $-WL 测试之间的不一致,并通过系统性的分析和survey(n = 18)来揭示实践者们对表达力的概念和 $k $-WL 测试的假设。与实践者们的意见相比,我们的分析发现了 $k $-WL 测试不能保证同构,可能与实际图任务无关,并且可能不会提高泛化性或可靠性。我们建议使用外在定义和测试表达力基于标准 benchmark,并提供了指导问题来构建这些标准 benchmark,这对图机器学习进程的进步是非常重要。

Random-Set Convolutional Neural Network (RS-CNN) for Epistemic Deep Learning

  • paper_url: http://arxiv.org/abs/2307.05772
  • repo_url: None
  • paper_authors: Shireen Kudukkil Manchingal, Muhammad Mubashar, Kaizheng Wang, Keivan Shariatmadar, Fabio Cuzzolin
  • for: 本研究旨在提供一种基于随机集的卷积神经网络(RS-CNN),用于分类 зада务中具有信任度和不确定性的评估。
  • methods: 本研究使用了随机集模型,通过表示样本空间上的分布来预测信任函数,并使用 credal sets 来估计 epistemic uncertainty。
  • results: 对比于其他不确定性意识方法,RS-CNN 在Out-of-distribution 样本上表现出色,能够正确地预测真实的结果。
    Abstract Machine learning is increasingly deployed in safety-critical domains where robustness against adversarial attacks is crucial and erroneous predictions could lead to potentially catastrophic consequences. This highlights the need for learning systems to be equipped with the means to determine a model's confidence in its prediction and the epistemic uncertainty associated with it, 'to know when a model does not know'. In this paper, we propose a novel Random-Set Convolutional Neural Network (RS-CNN) for classification which predicts belief functions rather than probability vectors over the set of classes, using the mathematics of random sets, i.e., distributions over the power set of the sample space. Based on the epistemic deep learning approach, random-set models are capable of representing the 'epistemic' uncertainty induced in machine learning by limited training sets. We estimate epistemic uncertainty by approximating the size of credal sets associated with the predicted belief functions, and experimentally demonstrate how our approach outperforms competing uncertainty-aware approaches in a classical evaluation setting. The performance of RS-CNN is best demonstrated on OOD samples where it manages to capture the true prediction while standard CNNs fail.
    摘要 Translated into Simplified Chinese:机器学习在安全关键领域得到广泛应用,其中机器学习模型需要具备对预测结果的信任度和 epistemic 不确定性的识别能力,以便知道机器学习模型在预测时不熟悉的情况。在这篇论文中,我们提出了一种基于 random set 的卷积神经网络(RS-CNN)模型,该模型预测了信函数而不是类型的概率 вектор,使用机器学习中的 epistemic 深度学习方法。通过 approximating credal sets 的大小,我们可以估算模型中 epistemic 不确定性的大小。我们在 классической评估环境中对我们的方法进行了实验比较,并证明了我们的方法在 OOD 样本上表现更好。

Unsupervised Learning in Complex Systems

  • paper_url: http://arxiv.org/abs/2307.10993
  • repo_url: https://github.com/hugcis/evolving-structures-in-complex-systems
  • paper_authors: Hugo Cisneros
  • For: 研究自适应学习和复杂系统的应用,以开发无监督学习算法,提高自适应应用的灵活性和适应性。* Methods: 使用复杂系统来研究学习和适应的自然和人工系统,开发一个通用复杂度指标,使用哥特规则归纳方法研究大规模复杂系统的计算,开发学习效率指标和学习算法评价数据集。* Results: 通过研究复杂系统的学习和适应机制,提出了一种新的自适应学习方法,并实现了无监督学习的目标,对自适应应用的灵活性和适应性做出了贡献。
    Abstract In this thesis, we explore the use of complex systems to study learning and adaptation in natural and artificial systems. The goal is to develop autonomous systems that can learn without supervision, develop on their own, and become increasingly complex over time. Complex systems are identified as a suitable framework for understanding these phenomena due to their ability to exhibit growth of complexity. Being able to build learning algorithms that require limited to no supervision would enable greater flexibility and adaptability in various applications. By understanding the fundamental principles of learning in complex systems, we hope to advance our ability to design and implement practical learning algorithms in the future. This thesis makes the following key contributions: the development of a general complexity metric that we apply to search for complex systems that exhibit growth of complexity, the introduction of a coarse-graining method to study computations in large-scale complex systems, and the development of a metric for learning efficiency as well as a benchmark dataset for evaluating the speed of learning algorithms. Our findings add substantially to our understanding of learning and adaptation in natural and artificial systems. Moreover, our approach contributes to a promising new direction for research in this area. We hope these findings will inspire the development of more effective and efficient learning algorithms in the future.
    摘要 在这个论文中,我们探讨使用复杂系统来研究学习和适应自然和人工系统的问题。我们的目标是开发无监督学习的自动化系统,能够自主发展、不断增加复杂性,并在不同应用中实现更大的灵活性和适应能力。由于复杂系统的能力表现增长复杂性,因此我们认为这种框架是研究这些现象的适当选择。通过理解复杂系统学习的基本原理,我们希望能够在未来设计和实现更加实用的学习算法。这个论文的主要贡献包括:开发一个通用复杂度度量,用于搜索展示增长复杂性的复杂系统,介绍一种大规模复杂系统的粗化方法,以及开发一个学习效率度量和一个评估学习算法速度的 benchmark 数据集。我们的发现对自然和人工系统的学习和适应有很大的贡献,同时,我们的方法也对研究这个领域的未来发展做出了重要贡献。我们希望这些发现能够激励未来的研究人员开发更有效率的学习算法。

Realtime Spectrum Monitoring via Reinforcement Learning – A Comparison Between Q-Learning and Heuristic Methods

  • paper_url: http://arxiv.org/abs/2307.05763
  • repo_url: None
  • paper_authors: Tobias Braun, Tobias Korzyzkowske, Larissa Putzar, Jan Mietzner, Peter A. Hoeher
  • for: 本研究旨在比较两种不同的接收器资源管理方法在频谱监测中的性能。
  • methods: 研究使用了一种基于循环搜索的Q学习算法和一种启发式方法来控制可用接收器资源。
  • results: 研究发现,使用Q学习算法可以在检测率和探索率之间取得一个适当的平衡,而启发式方法的检测率较低。
    Abstract Due to technological advances in the field of radio technology and its availability, the number of interference signals in the radio spectrum is continuously increasing. Interference signals must be detected in a timely fashion, in order to maintain standards and keep emergency frequencies open. To this end, specialized (multi-channel) receivers are used for spectrum monitoring. In this paper, the performances of two different approaches for controlling the available receiver resources are compared. The methods used for resource management (ReMa) are linear frequency tuning as a heuristic approach and a Q-learning algorithm from the field of reinforcement learning. To test the methods to be investigated, a simplified scenario was designed with two receiver channels monitoring ten non-overlapping frequency bands with non-uniform signal activity. For this setting, it is shown that the Q-learning algorithm used has a significantly higher detection rate than the heuristic approach at the expense of a smaller exploration rate. In particular, the Q-learning approach can be parameterized to allow for a suitable trade-off between detection and exploration rate.
    摘要 To test the methods being investigated, a simplified scenario was designed with two receiver channels monitoring ten non-overlapping frequency bands with non-uniform signal activity. The results show that the Q-learning algorithm used has a significantly higher detection rate than the heuristic approach, but at the expense of a smaller exploration rate. Specifically, the Q-learning approach can be parameterized to allow for a suitable trade-off between detection and exploration rate.

GOKU-UI: Ubiquitous Inference through Attention and Multiple Shooting for Continuous-time Generative Models

  • paper_url: http://arxiv.org/abs/2307.05735
  • repo_url: None
  • paper_authors: Germán Abrevaya, Mahta Ramezanian-Panahi, Jean-Christophe Gagnon-Audet, Irina Rish, Pablo Polosecki, Silvina Ponce Dawson, Guillermo Cecchi, Guillaume Dumas
  • for: 该研究旨在推动科学机器学习领域的发展,通过结合域知和可解释模型和agnostik机器学习技术来提高模型的表现。
  • methods: 该研究提出了一种基于GOKU-nets的生成模型GOKU-UI,通过吸引机制和多极训练策略在隐藏空间进行分布式推理,并将Stochastic Differential Equations(SDEs)纳入模型范畴。
  • results: 对于 sintetic数据和实验室数据的测试,GOKU-UI模型表现出色,比基eline模型更高的表现,特别是在数据训练量少的情况下。此外,当应用于实验室人脑数据时,GOKU-UI模型可以更好地预测人脑活动的未来变化,并且可以在12秒前预测。
    Abstract Scientific Machine Learning (SciML) is a burgeoning field that synergistically combines domain-aware and interpretable models with agnostic machine learning techniques. In this work, we introduce GOKU-UI, an evolution of the SciML generative model GOKU-nets. The GOKU-UI broadens the original model's spectrum to incorporate other classes of differential equations, such as Stochastic Differential Equations (SDEs), and integrates a distributed, i.e. ubiquitous, inference through attention mechanisms and a novel multiple shooting training strategy in the latent space. These enhancements have led to a significant increase in its performance in both reconstruction and forecast tasks, as demonstrated by our evaluation of simulated and empirical data. Specifically, GOKU-UI outperformed all baseline models on synthetic datasets even with a training set 32-fold smaller, underscoring its remarkable data efficiency. Furthermore, when applied to empirical human brain data, while incorporating stochastic Stuart-Landau oscillators into its dynamical core, it not only surpassed state-of-the-art baseline methods in the reconstruction task, but also demonstrated better prediction of future brain activity up to 12 seconds ahead. By training GOKU-UI on resting-state fMRI data, we encoded whole-brain dynamics into a latent representation, learning an effective low-dimensional dynamical system model that could offer insights into brain functionality and open avenues for practical applications such as mental state or psychiatric condition classification. Ultimately, our research provides further impetus for the field of Scientific Machine Learning, showcasing the potential for advancements when established scientific insights are interwoven with modern machine learning.
    摘要 科学机器学习(SciML)是一个 быстро发展的领域,它将域属和可解释的模型与agnostic机器学习技术相结合。在这项工作中,我们介绍了GOKU-UI,它是SciML生成模型GOKU-nets的演化。GOKU-UI扩展了原始模型的谱,包括杂eventually differential equations(SDEs),并通过注意力机制和novel multiple shooting training strategy在latent space进行分布式、ubiquitous的推理。这些改进导致GOKU-UI在重建和预测任务中表现出了显著的提升,如我们对synthetic和实验数据进行评估所示。具体来说,GOKU-UI在synthetic数据集上even with a training set 32-fold smallerthan all baseline models,manifesting its remarkable data efficiency。此外,当应用到empirical human brain data时,通过包含Stochastic Stuart-Landau oscillators在其动力核心中,不仅超越了state-of-the-art baseline方法在重建任务中,还能够预测未来脑动活的趋势达12秒。通过在resting-state fMRI数据上训练GOKU-UI,我们将整个大脑动力系统编码到了一个低维度的动力系统模型中,从而学习了一个有效的低维度动力系统模型,可以为脑功能的研究提供新的视角和实用应用 such as mental state或心理疾病诊断。最后,我们的研究为科学机器学习领域增添了新的动力,证明了当科学知识和现代机器学习技术相结合时,可以取得更大的进步。

Towards A Scalable Solution for Improving Multi-Group Fairness in Compositional Classification

  • paper_url: http://arxiv.org/abs/2307.05728
  • repo_url: None
  • paper_authors: James Atwood, Tina Tian, Ben Packer, Meghana Deodhar, Jilin Chen, Alex Beutel, Flavien Prost, Ahmad Beirami
  • for: 提高复杂系统中的机器学习公平性
  • methods: 提出两种简单的技术:任务上下文和组层排序
  • results: 实验结果在学术和实际环境中证明提案的有效性
    Abstract Despite the rich literature on machine learning fairness, relatively little attention has been paid to remediating complex systems, where the final prediction is the combination of multiple classifiers and where multiple groups are present. In this paper, we first show that natural baseline approaches for improving equal opportunity fairness scale linearly with the product of the number of remediated groups and the number of remediated prediction labels, rendering them impractical. We then introduce two simple techniques, called {\em task-overconditioning} and {\em group-interleaving}, to achieve a constant scaling in this multi-group multi-label setup. Our experimental results in academic and real-world environments demonstrate the effectiveness of our proposal at mitigation within this environment.
    摘要 尽管机器学习公平性的文献丰富,但对复杂系统进行改进却得到了 relativamente poco 的关注。在这篇论文中,我们首先表明了自然基线方法用于提高平等机会公平性的扩展级数和标签数的乘积,导致它们成为实用不可能。然后,我们介绍了两种简单的技术,即任务过程和组分排序,以实现在多组多标签设置中的常数扩展。我们在学术和实际环境中进行了实验,并证明了我们的提议的效果。

MoP-CLIP: A Mixture of Prompt-Tuned CLIP Models for Domain Incremental Learning

  • paper_url: http://arxiv.org/abs/2307.05707
  • repo_url: None
  • paper_authors: Julien Nicolas, Florent Chiaroni, Imtiaz Ziko, Ola Ahmad, Christian Desrosiers, Jose Dolz
  • for: 提高逐步学习中的分布逐步增长问题的解决方案。
  • methods: 使用权重调整CLIP模型的权重,通过在训练阶段学习每个类别在每个频谱中的特征分布,并在推理阶段使用这些学习的分布来选择正确的描述符进行分类任务。
  • results: 在标准DIL设置下与现状方法竞争,而在OOD场景下超过现状方法表现。这些结果表明MoP-CLIP的优越性,提供一种可靠和普适的分布逐步增长问题的解决方案。
    Abstract Despite the recent progress in incremental learning, addressing catastrophic forgetting under distributional drift is still an open and important problem. Indeed, while state-of-the-art domain incremental learning (DIL) methods perform satisfactorily within known domains, their performance largely degrades in the presence of novel domains. This limitation hampers their generalizability, and restricts their scalability to more realistic settings where train and test data are drawn from different distributions. To address these limitations, we present a novel DIL approach based on a mixture of prompt-tuned CLIP models (MoP-CLIP), which generalizes the paradigm of S-Prompting to handle both in-distribution and out-of-distribution data at inference. In particular, at the training stage we model the features distribution of every class in each domain, learning individual text and visual prompts to adapt to a given domain. At inference, the learned distributions allow us to identify whether a given test sample belongs to a known domain, selecting the correct prompt for the classification task, or from an unseen domain, leveraging a mixture of the prompt-tuned CLIP models. Our empirical evaluation reveals the poor performance of existing DIL methods under domain shift, and suggests that the proposed MoP-CLIP performs competitively in the standard DIL settings while outperforming state-of-the-art methods in OOD scenarios. These results demonstrate the superiority of MoP-CLIP, offering a robust and general solution to the problem of domain incremental learning.
    摘要 尽管最近的增量学习进步有所,但 catastrophic forgetting 下 distributional drift 问题仍然是一个打开的和重要的问题。实际上,当前的领域增量学习(DIL)方法在已知的领域中表现良好,但在新的领域出现时,其性能很差。这限制了它们的普遍性和可扩展性,使得它们在更真实的设置中无法扩展。为解决这些限制,我们提出了一种基于混合提示 CLIP 模型(MoP-CLIP)的新的 DIL 方法。在训练阶段,我们模型每个领域中的类别特征分布,学习具体的文本和视觉提示来适应给定的领域。在推理阶段,学习的分布使我们能够判断一个测试样本是否属于已知的领域,选择相应的提示进行分类任务,或者是从未看过的领域,利用混合的提示 tuned CLIP 模型。我们的实验表明,现有的 DIL 方法在领域变化下表现很差,而我们的 MoP-CLIP 方法在标准 DIL 设置中表现竞争力强,而且在 OOD 情况下表现出色。这些结果表明 MoP-CLIP 的优越性,提供一种可靠和通用的领域增量学习解决方案。

A Causal Ordering Prior for Unsupervised Representation Learning

  • paper_url: http://arxiv.org/abs/2307.05704
  • repo_url: None
  • paper_authors: Avinash Kori, Pedro Sanchez, Konstantinos Vilouras, Ben Glocker, Sotirios A. Tsaftaris
  • for: 这篇论文主要是为了解决无监督表示学习中的独立假设问题。
  • methods: 该论文提出了一种基于 функциональ causal模型的完全无监督表示学习方法,通过激励干扰空间遵循 causal 顺序来强制实现。
  • results: 该方法可以在无监督情况下,通过考虑数据生成过程中的干扰噪声模型,学习出高效的表示。
    Abstract Unsupervised representation learning with variational inference relies heavily on independence assumptions over latent variables. Causal representation learning (CRL), however, argues that factors of variation in a dataset are, in fact, causally related. Allowing latent variables to be correlated, as a consequence of causal relationships, is more realistic and generalisable. So far, provably identifiable methods rely on: auxiliary information, weak labels, and interventional or even counterfactual data. Inspired by causal discovery with functional causal models, we propose a fully unsupervised representation learning method that considers a data generation process with a latent additive noise model (ANM). We encourage the latent space to follow a causal ordering via loss function based on the Hessian of the latent distribution.
    摘要 文本翻译成简化中文:无监督表示学习中假设离散变量独立,但是 causal representation learning(CRL)则认为数据中变量之间存在 causal 关系。允许假设变量之间存在相关性,是更加现实和普遍的假设。目前已知可证明方法包括: auxillary 信息、弱标签和 intervenitional 或者 counterfactual 数据。 draw inspiration from causal discovery with functional causal models, we propose a fully unsupervised representation learning method that considers a data generation process with a latent additive noise model (ANM)。 we encourage the latent space to follow a causal ordering via loss function based on the Hessian of the latent distribution.

Stack More Layers Differently: High-Rank Training Through Low-Rank Updates

  • paper_url: http://arxiv.org/abs/2307.05695
  • repo_url: https://github.com/guitaricet/peft_pretraining
  • paper_authors: Vladislav Lialin, Namrata Shivagunde, Sherin Muckatira, Anna Rumshisky
  • for: 本文旨在探讨训练大型神经网络时,低级别训练技术的可行性和效果。
  • methods: 本文提出了一种名为ReLoRA的新方法,它利用低级别更新来训练高级别网络。
  • results: 作者通过应用ReLoRA方法对 pré-训练转换器语言模型进行训练,并观察到与常规神经网络训练相同的性能,同时发现ReLoRA方法的效率随模型大小增长。
    Abstract Despite the dominance and effectiveness of scaling, resulting in large networks with hundreds of billions of parameters, the necessity to train overparametrized models remains poorly understood, and alternative approaches do not necessarily make it cheaper to train high-performance models. In this paper, we explore low-rank training techniques as an alternative approach to training large neural networks. We introduce a novel method called ReLoRA, which utilizes low-rank updates to train high-rank networks. We apply ReLoRA to pre-training transformer language models with up to 350M parameters and demonstrate comparable performance to regular neural network training. Furthermore, we observe that the efficiency of ReLoRA increases with model size, making it a promising approach for training multi-billion-parameter networks efficiently. Our findings shed light on the potential of low-rank training techniques and their implications for scaling laws.
    摘要 尽管拓扑和效果性的拓扑在大型神经网络中具有很高的表现,然而训练过参数化模型的必要性还未得到充分理解,而代理方法不一定可以降低训练高性能模型的成本。在这篇论文中,我们探索使用低级别训练技术来训练大型神经网络。我们介绍了一种新的方法called ReLoRA,它利用低级别更新来训练高级别网络。我们应用ReLoRA来预训练变换器语言模型,最多350万参数,并观察到它们的性能与正常神经网络训练相当。此外,我们发现ReLoRA的效率随 modelo 的大小增长,这显示了它在训练多亿参数网络时的潜在优势。我们的发现 shed light onto the potential of low-rank training techniques and their implications for scaling laws.

Self-consistency for open-ended generations

  • paper_url: http://arxiv.org/abs/2307.06857
  • repo_url: None
  • paper_authors: Siddhartha Jain, Xiaofei Ma, Anoop Deoras, Bing Xiang
  • for: 提高语言模型生成质量
  • methods: 使用易于计算的对生成序列进行对比,选择最佳生成
  • results: 在代码生成、自动ormalization和概要生成等任务中,可以实现强大的生成质量提高
    Abstract Large Language Models (LLMs) can exhibit considerable variation in the quality of their sampled outputs. Reranking and selecting the best generation from the sampled set is a popular way of obtaining strong gains in generation quality. In this paper, we present a novel approach for reranking LLM generations. Unlike other techniques that might involve additional inferences or training a specialized reranker, our approach relies on easy to compute pairwise statistics between the generations that have minimal compute overhead. We show that our approach can be formalized as an extension of self-consistency and analyze its performance in that framework, theoretically as well as via simulations. We show strong improvements for selecting the best $k$ generations for code generation tasks as well as robust improvements for best generation for the tasks of autoformalization, and summarization. While our approach only assumes black-box access to LLMs, we show that additional access to token probabilities can improve performance even further.
    摘要 大型语言模型(LLM)的样本输出质量可能会异常差异。重新排名和选择样本中的最佳一代是一种常见的方法来提高生成质量。在这篇论文中,我们提出了一种新的重新排名LLM生成的方法。与其他技术不同,我们的方法不需要额外的推理或训练特殊的重新排名器,而是基于易于计算的对生成之间的对比统计。我们表明了我们的方法可以视为自适应性的扩展,并通过理论和仿真来分析其性能。我们在代码生成任务和自动ormalization、概要任务中都显示了强大的改进,而且对于最佳一代的选择还能够具有Robust性。我们的方法只需要黑盒访问LLM,但我们还证明了在Token概率信息可以提高性能的情况下,性能可以更加出色。

Empowering Cross-lingual Behavioral Testing of NLP Models with Typological Features

  • paper_url: http://arxiv.org/abs/2307.05454
  • repo_url: https://github.com/google-research/multi-morph-checklist
  • paper_authors: Ester Hlavnova, Sebastian Ruder
  • for: 这个论文的目的是探讨如何为世界各语言的自然语言处理(NLP)系统进行普适性测试。
  • methods: 这篇论文提出了一种基于形态意识的测试框架,可以评测NLP模型在不同语言特征下的行为。
  • results: 通过使用这种测试框架, authors发现了一些现代语言模型在某些语言特征下的普适性问题,如在斯瓦希利语中的时间表达和芬兰语中的复合possessive表达。这些发现鼓励了开发更加擅长这些特征的NLP模型。
    Abstract A challenge towards developing NLP systems for the world's languages is understanding how they generalize to typological differences relevant for real-world applications. To this end, we propose M2C, a morphologically-aware framework for behavioral testing of NLP models. We use M2C to generate tests that probe models' behavior in light of specific linguistic features in 12 typologically diverse languages. We evaluate state-of-the-art language models on the generated tests. While models excel at most tests in English, we highlight generalization failures to specific typological characteristics such as temporal expressions in Swahili and compounding possessives in Finish. Our findings motivate the development of models that address these blind spots.
    摘要 世界各语言的自然语言处理(NLP)系统的发展受到了语言类型学上的挑战。我们提出了M2C框架,它能够考虑语言的Typological differences,用于测试NLP模型的行为。我们使用M2C生成了12种语言中的特点特征的测试集,并评估了当前的语言模型。虽然模型在英语上表现出色,但我们发现了特定的语言特征,如斯瓦希利语的时间表达和芬兰语的复合所有格,导致模型的扩展缺陷。我们的发现激励了开发更加全面的模型。

ISLTranslate: Dataset for Translating Indian Sign Language

  • paper_url: http://arxiv.org/abs/2307.05440
  • repo_url: https://github.com/exploration-lab/isltranslate
  • paper_authors: Abhinav Joshi, Susmit Agrawal, Ashutosh Modi
  • for: bridge the communication gap between the hard-of-hearing community and the rest of the population
  • methods: using a translation dataset for continuous Indian Sign Language (ISL) consisting of 31k ISL-English sentence/phrase pairs
  • results: the largest translation dataset for continuous Indian Sign Language, and a detailed analysis of the dataset
    Abstract Sign languages are the primary means of communication for many hard-of-hearing people worldwide. Recently, to bridge the communication gap between the hard-of-hearing community and the rest of the population, several sign language translation datasets have been proposed to enable the development of statistical sign language translation systems. However, there is a dearth of sign language resources for the Indian sign language. This resource paper introduces ISLTranslate, a translation dataset for continuous Indian Sign Language (ISL) consisting of 31k ISL-English sentence/phrase pairs. To the best of our knowledge, it is the largest translation dataset for continuous Indian Sign Language. We provide a detailed analysis of the dataset. To validate the performance of existing end-to-end Sign language to spoken language translation systems, we benchmark the created dataset with a transformer-based model for ISL translation.
    摘要 现在,手语翻译数据集已经在全球各地为听力异常的人群提供了一种桥梁,以便开发统计手语翻译系统。然而,印度手语资源匮乏,这篇资源文章介绍了一个名为ISLTranslate的翻译集,该集包含31,000个连续印度手语(ISL)-英语句子/短语对。据我们所知,这是最大的连续印度手语翻译集。我们对该集进行了详细分析。为验证现有的端到端手语到语言翻译系统的性能,我们对创建的dataset进行了基于转换器的ISL翻译模型的测试。

Metropolis Sampling for Constrained Diffusion Models

  • paper_url: http://arxiv.org/abs/2307.05439
  • repo_url: None
  • paper_authors: Nic Fishman, Leo Klarner, Emile Mathieu, Michael Hutchinson, Valentin de Bortoli
  • for: 这篇论文的主要目标是提出一种新的抑杂方法,以提高在拥有约束的拟合问题上的生成模型的计算效率和实验性能。
  • methods: 该论文使用了 Metropolis 抽样法,并证明了该新的抑杂过程对于反射抽样动力的有效性。
  • results: 该论文通过应用在多种具有凸和非凸约束的问题上,包括地理模型、机器人和蛋白质设计等领域,并取得了较高的计算效率和实验性能。
    Abstract Denoising diffusion models have recently emerged as the predominant paradigm for generative modelling. Their extension to Riemannian manifolds has facilitated their application to an array of problems in the natural sciences. Yet, in many practical settings, such manifolds are defined by a set of constraints and are not covered by the existing (Riemannian) diffusion model methodology. Recent work has attempted to address this issue by employing novel noising processes based on logarithmic barrier methods or reflected Brownian motions. However, the associated samplers are computationally burdensome as the complexity of the constraints increases. In this paper, we introduce an alternative simple noising scheme based on Metropolis sampling that affords substantial gains in computational efficiency and empirical performance compared to the earlier samplers. Of independent interest, we prove that this new process corresponds to a valid discretisation of the reflected Brownian motion. We demonstrate the scalability and flexibility of our approach on a range of problem settings with convex and non-convex constraints, including applications from geospatial modelling, robotics and protein design.
    摘要 近些年来,杜雷尼泊 diffusion 模型在生成模型方面占据了主导地位。它们的扩展到里曼尼泊 manifold 使得它们在自然科学中的应用受到了推广。然而,在许多实际应用中,这些泊 manifold 通常是通过一些约束定义的,而现有的(里曼尼泊)扩散模型方法不适用。 latest work 尝试使用新的杂散过程,基于对数梯度方法或反射布朗尼动的新的杂散过程。然而,相关的抽样器 computationally burdensome ,随着约束的复杂度增加。在本文中,我们介绍了一种新的简单杂散方案,基于 Metropolis 抽样,它提供了substantial 的计算效率和实际性提升,相比于之前的抽样器。此外,我们证明了这新的过程对于反射布朗尼动的有效积分。我们在具有凸和非凸约束的问题设定中展示了我们的方法的可扩展性和灵活性,包括地理模型、机器人和蛋白质设计等应用。

Improving the Security of Smartwatch Payment with Deep Learning

  • paper_url: http://arxiv.org/abs/2307.05437
  • repo_url: None
  • paper_authors: George Webber
  • for: 这篇论文旨在提高智能手表支付系统的安全性,使用深度学习技术来实现较少的手势数量来进行授权。
  • methods: 本论文使用了深度学习技术建立了一个高效的授权系统,并且使用了增强器模型生成了SyntheticUser-specific手势。
  • results: 本论文的结果显示,使用这些增强器模型生成的手势可以帮助授权系统增强其分类能力,并且不需要用户提供太多的手势来进行授权。
    Abstract Making contactless payments using a smartwatch is increasingly popular, but this payment medium lacks traditional biometric security measures such as facial or fingerprint recognition. In 2022, Sturgess et al. proposed WatchAuth, a system for authenticating smartwatch payments using the physical gesture of reaching towards a payment terminal. While effective, the system requires the user to undergo a burdensome enrolment period to achieve acceptable error levels. In this dissertation, we explore whether applications of deep learning can reduce the number of gestures a user must provide to enrol into an authentication system for smartwatch payment. We firstly construct a deep-learned authentication system that outperforms the current state-of-the-art, including in a scenario where the target user has provided a limited number of gestures. We then develop a regularised autoencoder model for generating synthetic user-specific gestures. We show that using these gestures in training improves classification ability for an authentication system. Through this technique we can reduce the number of gestures required to enrol a user into a WatchAuth-like system without negatively impacting its error rates.
    摘要 使用智能手表进行无接触支付已成为流行的趋势,但这种支付方式缺乏传统的生物认证安全措施,如面部或指纹识别。在2022年,Sturgess等人提出了WatchAuth系统,用于通过智能手表支付的物理姿势认证。虽然有效,但系统需要用户进行负担重的批处程序来实现接受的错误率。在这个论文中,我们 explore了 whether deep learning可以减少用户为WatchAuth系统进行身份验证所需的姿势数量。我们首先构建了深度学习Auth系统,超越当前状态的艺术。然后,我们开发了一个彩色autoencoder模型,用于生成个性化用户特定的姿势。我们发现,使用这些姿势在训练中可以提高身份验证系统的分类能力。通过这种技术,我们可以降低用户进行WatchAuth系统的批处程序,无需增加错误率。

One-Versus-Others Attention: Scalable Multimodal Integration

  • paper_url: http://arxiv.org/abs/2307.05435
  • repo_url: https://github.com/rsinghlab/ovo
  • paper_authors: Michal Golovanevsky, Eva Schiller, Akira Nair, Ritambhara Singh, Carsten Eickhoff
  • for: 本研究旨在提出一种适用于多Modal learning模型的域外注意机制,以解决现有模型在多个模式之间进行注意的复杂性问题。
  • methods: 我们提出了一种名为“一对一”(OvO)注意机制,该机制可以在多个模式之间进行注意,并且与传统的对比注意方法相比,它具有线性的复杂度增长。
  • results: 我们通过三个真实世界数据集和一个 simulations экспериментирова,证明了我们的方法可以提高性能,同时降低计算成本。
    Abstract Multimodal learning models have become increasingly important as they surpass single-modality approaches on diverse tasks ranging from question-answering to autonomous driving. Despite the importance of multimodal learning, existing efforts focus on NLP applications, where the number of modalities is typically less than four (audio, video, text, images). However, data inputs in other domains, such as the medical field, may include X-rays, PET scans, MRIs, genetic screening, clinical notes, and more, creating a need for both efficient and accurate information fusion. Many state-of-the-art models rely on pairwise cross-modal attention, which does not scale well for applications with more than three modalities. For $n$ modalities, computing attention will result in $n \choose 2$ operations, potentially requiring considerable amounts of computational resources. To address this, we propose a new domain-neutral attention mechanism, One-Versus-Others (OvO) attention, that scales linearly with the number of modalities and requires only $n$ attention operations, thus offering a significant reduction in computational complexity compared to existing cross-modal attention algorithms. Using three diverse real-world datasets as well as an additional simulation experiment, we show that our method improves performance compared to popular fusion techniques while decreasing computation costs.
    摘要 多模态学习模型在不同任务上表现越来越重要,如问答系统和自动驾驶等。虽然多模态学习在自然语言处理领域得到了广泛应用,但是现有的努力主要集中在单模态方面,而另外的领域,如医疗领域,数据输入可能包括X射线、PET扫描、MRI、基因检测、临床笔记等,需要有效地并准确地融合信息。许多当今顶尖模型依靠对称交叉模态注意力,但是这种方法不能扩展到更多的模态。为了解决这个问题,我们提出了一种新的领域中立注意力机制,即一对一注意力(OvO),该机制与模态数量直线相关,只需要进行n个注意力操作,与现有的交叉模态注意力算法相比,可以获得显著的计算复杂性减少。使用三个多样化的实际数据集以及一个额外的模拟实验,我们显示了我们的方法在与流行的融合技术进行比较时,提高性能而减少计算成本。

Self-Supervised Learning with Lie Symmetries for Partial Differential Equations

  • paper_url: http://arxiv.org/abs/2307.05432
  • repo_url: None
  • paper_authors: Grégoire Mialon, Quentin Garrido, Hannah Lawrence, Danyal Rehman, Yann LeCun, Bobak T. Kiani
  • for: 这篇论文是为了学习泛化的方程式 differential equations 而写的,以获得更加 computationally efficient 的代替方法,并且可能会广泛地影响科学和工程领域。
  • methods: 这篇论文使用了 joint embedding methods for self-supervised learning (SSL) 来学习不同数据源的泛化表示,并且实现了一种基于自动适应的方法来学习 PDEs 的泛化表示。
  • results: 该论文的表示方法在对不同数据源的泛化任务上表现出色,比如可以正确预测 PDE 的系数,同时也提高了基于神经网络的时间步长性能。
    Abstract Machine learning for differential equations paves the way for computationally efficient alternatives to numerical solvers, with potentially broad impacts in science and engineering. Though current algorithms typically require simulated training data tailored to a given setting, one may instead wish to learn useful information from heterogeneous sources, or from real dynamical systems observations that are messy or incomplete. In this work, we learn general-purpose representations of PDEs from heterogeneous data by implementing joint embedding methods for self-supervised learning (SSL), a framework for unsupervised representation learning that has had notable success in computer vision. Our representation outperforms baseline approaches to invariant tasks, such as regressing the coefficients of a PDE, while also improving the time-stepping performance of neural solvers. We hope that our proposed methodology will prove useful in the eventual development of general-purpose foundation models for PDEs.
    摘要 Note:* "differential equations" is translated as " diferencial equations" in Simplified Chinese, with the word "diff" being translated as " diferen".* "numerical solvers" is translated as "numerical solvers" in Simplified Chinese.* "self-supervised learning" is translated as "自我超vised learning" in Simplified Chinese.* "PDEs" is translated as "PDEs" in Simplified Chinese.* "baseline approaches" is translated as "基线方法" in Simplified Chinese.* "invariant tasks" is translated as " invariable tasks" in Simplified Chinese.* "time-stepping performance" is translated as "时间步长性能" in Simplified Chinese.* "foundation models" is translated as "基础模型" in Simplified Chinese.

Geometric Neural Diffusion Processes

  • paper_url: http://arxiv.org/abs/2307.05431
  • repo_url: https://github.com/cambridge-mlg/neural_diffusion_processes
  • paper_authors: Emile Mathieu, Vincent Dutordoir, Michael J. Hutchinson, Valentin De Bortoli, Yee Whye Teh, Richard E. Turner
  • for: 用于模elling非euclidian空间中的自然科学问题,包括symmetries和非euclidian空间中的数据。
  • methods: 使用噪声扩散模型,并在这些模型中添加几何先验来保持几何协议。使用神经网络来近似分子,使其对几何协议变换。
  • results: 通过这些条件,生成函数模型可以具有相同的几何结构。通过一种新的朗格朗-基尔 conditional sampler,能够适应复杂的托管场景,并在真实世界天气数据中进行验证。
    Abstract Denoising diffusion models have proven to be a flexible and effective paradigm for generative modelling. Their recent extension to infinite dimensional Euclidean spaces has allowed for the modelling of stochastic processes. However, many problems in the natural sciences incorporate symmetries and involve data living in non-Euclidean spaces. In this work, we extend the framework of diffusion models to incorporate a series of geometric priors in infinite-dimension modelling. We do so by a) constructing a noising process which admits, as limiting distribution, a geometric Gaussian process that transforms under the symmetry group of interest, and b) approximating the score with a neural network that is equivariant w.r.t. this group. We show that with these conditions, the generative functional model admits the same symmetry. We demonstrate scalability and capacity of the model, using a novel Langevin-based conditional sampler, to fit complex scalar and vector fields, with Euclidean and spherical codomain, on synthetic and real-world weather data.
    摘要 德尔冲散度模型已经证明是一种灵活且有效的生成模型。其最近的扩展到无穷维euclidian空间已经允许模型sto噪声过程。然而,自然科学中的许多问题含有对称和非euclidian空间的数据。在这种情况下,我们将 diffusion模型扩展到包括一系列几何约束。我们通过以下两种方法来实现这一点:a) 构建一个具有限定分布为几何加载过程的噪声过程,该过程在对称群中变换,并且b) 使用对称于这个群的神经网络来近似分数。我们证明,在这些条件下,生成函数模型具有同样的对称性。我们还证明了这种模型的扩展性和容量,通过一种基于Langevin方程的条件采样器,可以适应复杂的托管场景和实际天气数据。

Differential Analysis of Triggers and Benign Features for Black-Box DNN Backdoor Detection

  • paper_url: http://arxiv.org/abs/2307.05422
  • repo_url: https://github.com/fu1001hao/five-metrics-detector
  • paper_authors: Hao Fu, Prashanth Krishnamurthy, Siddharth Garg, Farshad Khorrami
  • for: 该论文提出了一种数据效率的侦测方法,用于检测深度神经网络中的后门攻击,在黑盒场景下。
  • methods: 该方法基于触发器特征的 intuition,即触发器特征对于决定后门网络输出的影响比任何其他正常特征更高。为量化触发器和正常特征对后门网络输出的影响,我们引入了五个指标。
  • results: 我们的方法可以在广泛的后门攻击下表现出色,包括ablation study和与现有方法进行比较。我们还展示了我们的方法可以在线测试中识别恶意攻击的样本。
    Abstract This paper proposes a data-efficient detection method for deep neural networks against backdoor attacks under a black-box scenario. The proposed approach is motivated by the intuition that features corresponding to triggers have a higher influence in determining the backdoored network output than any other benign features. To quantitatively measure the effects of triggers and benign features on determining the backdoored network output, we introduce five metrics. To calculate the five-metric values for a given input, we first generate several synthetic samples by injecting the input's partial contents into clean validation samples. Then, the five metrics are computed by using the output labels of the corresponding synthetic samples. One contribution of this work is the use of a tiny clean validation dataset. Having the computed five metrics, five novelty detectors are trained from the validation dataset. A meta novelty detector fuses the output of the five trained novelty detectors to generate a meta confidence score. During online testing, our method determines if online samples are poisoned or not via assessing their meta confidence scores output by the meta novelty detector. We show the efficacy of our methodology through a broad range of backdoor attacks, including ablation studies and comparison to existing approaches. Our methodology is promising since the proposed five metrics quantify the inherent differences between clean and poisoned samples. Additionally, our detection method can be incrementally improved by appending more metrics that may be proposed to address future advanced attacks.
    摘要 Here is the text in Simplified Chinese:这篇论文提出了一种数据效率的深度神经网络响应攻击检测方法,在黑盒enario下进行检测。该方法基于 triggers 相关的特征具有更大的影响深度神经网络输出的INTUITION。为了量化 triggers 和其他无害特征对深度神经网络输出的影响,作者们引入了五个指标。为计算这些指标的值,作者们首先生成了一些synthetic samples,通过在净验证数据中插入输入的部分内容。然后,作者们计算了五个指标的值,使用相应的synthetic samples的输出标签。本文使用了一个tiny的净验证数据集,并从该数据集中训练了五个新颖检测器。一个meta新颖检测器将五个训练好的新颖检测器的输出进行拟合,生成一个meta confidence score。在在线测试中,方法通过评估meta confidence score来判断在线样本是否被毒化。作者们通过许多backdoor攻击的ablation study和相对比较来证明方法的效果。该方法可能性很大,因为提出的五个指标可以量化clean和毒化样本之间的本质差异。此外,检测方法可以通过逐渐添加更多的指标来进一步提高检测效果,以应对未来的更高级攻击。

Boosting Feedback Efficiency of Interactive Reinforcement Learning by Adaptive Learning from Scores

  • paper_url: http://arxiv.org/abs/2307.05405
  • repo_url: https://github.com/sskkai/interactive-scoring-irl
  • paper_authors: Shukai Liu, Chenming Wu, Ying Li, Liangjun Zhang
  • for: The paper is written to improve the feedback efficiency of interactive reinforcement learning by using scores provided by humans instead of pairwise preferences.
  • methods: The paper proposes an adaptive learning scheme that uses scores to train a behavioral policy in a sparse reward environment, and it is insensitive to imperfect or unreliable scores.
  • results: The proposed method can efficiently learn near-optimal policies by adaptive learning from scores while requiring less feedback compared to pairwise preference learning methods.Here’s the Chinese translation of the three key points:
  • for: 本 paper 的目的是提高互动强化学习的反馈效率,使用人类提供的分数而不是对比喜好来进行学习。
  • methods: 本 paper 提出了一种适应学习方案,使用分数来训练一个行为政策,并且对于不准确或不可靠的分数,提出了一种适应学习方法,以避免对学习过程的不良影响。
  • results: 提议的方法可以高效地从分数中学习优化的策略,而且需要更少的反馈,相比于对比喜好学习方法。
    Abstract Interactive reinforcement learning has shown promise in learning complex robotic tasks. However, the process can be human-intensive due to the requirement of a large amount of interactive feedback. This paper presents a new method that uses scores provided by humans instead of pairwise preferences to improve the feedback efficiency of interactive reinforcement learning. Our key insight is that scores can yield significantly more data than pairwise preferences. Specifically, we require a teacher to interactively score the full trajectories of an agent to train a behavioral policy in a sparse reward environment. To avoid unstable scores given by humans negatively impacting the training process, we propose an adaptive learning scheme. This enables the learning paradigm to be insensitive to imperfect or unreliable scores. We extensively evaluate our method for robotic locomotion and manipulation tasks. The results show that the proposed method can efficiently learn near-optimal policies by adaptive learning from scores while requiring less feedback compared to pairwise preference learning methods. The source codes are publicly available at https://github.com/SSKKai/Interactive-Scoring-IRL.
    摘要 互动强化学习已经显示出学习复杂机器人任务的损益。然而,过程可能需要大量互动反馈,这可能会增加人工干预。这篇论文提出了一新的方法,使用人类提供的分数来改善互动强化学习的反馈效率。我们的关键见解是,分数可以产生更多数据,而不需要每个情况都需要人类的反馈。具体来说,我们需要一位教师互动地给出机器人的全程轨迹来训练行为政策。为了避免人类提供的分数不稳定或不可靠影响训练过程,我们提出了一个适应学习方案。这个方法使得学习模式不受不确定或不可靠的分数影响。我们广泛评估了我们的方法,并发现它可以内部学习近乎最佳政策,并且需要较少的反馈。源代码可以在https://github.com/SSKKai/Interactive-Scoring-IRL上获取。

Domain-Agnostic Neural Architecture for Class Incremental Continual Learning in Document Processing Platform

  • paper_url: http://arxiv.org/abs/2307.05399
  • repo_url: https://github.com/mateusz-wojcik-97/domain-agnostic-architecture
  • paper_authors: Mateusz Wójcik, Witold Kościukiewicz, Mateusz Baran, Tomasz Kajdanowicz, Adam Gonczarek
  • for: 这篇论文是用于描述一种基于混合专家模型的完全可微分架构,用于在流动资料中进行分类问题。
  • methods: 本论文使用了可微分学习的混合专家模型,并且不需要内存缓冲。
  • results: 实验结果显示,该架构可以在多个领域中取得最佳性能,并且在生产环境中进行线上学习。该方法与参考方法相比,有着明显的性能优势。
    Abstract Production deployments in complex systems require ML architectures to be highly efficient and usable against multiple tasks. Particularly demanding are classification problems in which data arrives in a streaming fashion and each class is presented separately. Recent methods with stochastic gradient learning have been shown to struggle in such setups or have limitations like memory buffers, and being restricted to specific domains that disable its usage in real-world scenarios. For this reason, we present a fully differentiable architecture based on the Mixture of Experts model, that enables the training of high-performance classifiers when examples from each class are presented separately. We conducted exhaustive experiments that proved its applicability in various domains and ability to learn online in production environments. The proposed technique achieves SOTA results without a memory buffer and clearly outperforms the reference methods.
    摘要